Intel 80386 User Manual

5 (1)
Intel 80386 User Manual

inter

Introduction to the 80386

~ncluding

the 80386

Data Sheet

April 1986

Intel Corporation makes no warranty for the use of its products and assumes no responsibility for any errors which may appear in this document nor does it make a commitment to update the information contained herein.

Intel retains the right to make changes to these specifications at any time, without notice.

Contact your local sales office to obtain the latest specifications before placing your order.

The following are trademarks of Intel Corporation and may only be used to identify Intel Products:

Above, BITBUS, COMMputer, CREDIT, Data Pipeline, FASTPATH, GENIUS, i,~, ICE, iCEL, iCS, iDBp, iDIS, 12 1CE, iLBX, im, iMDDX, iMMX, Insite, Intel, intel, intelBOS, Intelevision, inteligent Identifier, inteligent Programming, Intellec, Intellink, iOSP, iPDS, iPSC, iRMX, iSBC, iSBX, iSDM, iSXM, KEPROM, Library Manager, MAP-NET, MeS, Megachassis, MICROMAINFRAME, MULTIBUS, MULTICHANNEL, MULTI MODULE, ONCE, OpenNET, OTp, PC-BUBBLE, Plug-A-Bubble, PROMPT, Promware, QUEST, QueX, Quick-Pulse Programming, Ripplemode, RMX/80, RUPI, Seamless, SLD, UPI, and VLSiCEL, and the combination of ICE, iCS, iRMX, iSBC, iSBX, MCS, or UPI and a numerical suffix, 4-SITE.

MDS is an ordering code only and is not used as a product name or trademark. MDS® is a registered trademark of Mohawk Data Sciences Corporation.

'MULTIBUS is a patented Intel bus.

Additional copies of this manual or other Intel literature may be obtained from:

Intel Corporation

Literature Distribution

Mail Stop SC6-59

3065 Bowers Avenue

Santa Clara, CA 95051

©INTEL CORPORATION 1986

3/86

TABLE OF CONTENTS

BOOK I

CHAPTER 1

HIGHLIGHTS

1.1

32-bit Architectu re ..............................................................

1-1

1.2

High-performance Implementation ...............................................

1-1

1.3 Virtual Memory Support ........................................................

1~3

1.4

Configurable Protection ........................................................

1-3

1.5

Extended Debugging Support ...................................................

1-3

1.6

Object Code Compatibility ......................................................

1-4

1.7

Summary .....................................................................

1-4

CHAPTER 2

APPLICATION ARCHITECTURIE

2.1 Registers ......................................................................

2-1

2.1.1

General Registers ...........................................................

2-1

2.1.2

Flags and Instruction Pointer .................................................

2-1

2.1.3

Numeric Coprocessor Registers ..............................................

2-2

2.2 Memory and Logical Addressing .................................................

2-3

2.2.1

Segments ..................................................................

2-3

2.2.2

Logical Addresses ..........................................................

2-3

2.2.3

Segment and Descriptor Registers ............................................

2-4

2.2.4

Addressing Modes ..........................................................

2-5

2.3 Data Types and Instructions .....................................................

2-6

2.3.1

Principal Data Types ........................................................

2-6

2.3.2

Numeric Coprocessor Data Types ............................................

2-7

2.3.3

Other Instructions ..........................................................

2-7

2.3.3.1

Stack Instructions .......................................................

2-7

2.3.3.2

Control Transfer Instructions .............................................

2-8

2.3.3.3

Miscellaneous Instructions ...............................................

2-10

CHAPTER 3

SYSTEM ARCHITECTURE

3.1

System Registers ...............................................................

3-1

3.2

Multitasking ...................................................................

3-1

3.2.1

Task State Segment .........................................................

3-2

3.2.2

Task Switching .............................................................

3-2

3.3

Addressing ....................................................................

3-3

3.3.1

Address Translation Overview ................................................

3-3

3.3.2 Segments .................................................................

3-4

3.3.3

Pages .....................................................................

3-7

3.3.4 Virtual Memory .............................................................

3-8

iii

3.4

Protection .................................................................

3-10

3.4.1

Privilege ................................................................

3-10

3.4.2

Privileged Instructions ....................................................

3-12

3.4.3 Segment Protection ......................................................

3-12

3.4.4

Page Protection .........................................................

3-13

3.5

System Calls ...............................................................

3-13

3.6

Interrupts and Exceptions ....................................................

3-14

3.6.1

Interrupt Descriptor Table .................................................

3-15

3.6.2

Debug Exceptions and Registers ...........................................

3-16

3.7

Input/Output ...............................................................

3-17

CHAPTER 4

ARCHITECTURAL COMPATIBILITY

4.1

80286 Compatibility .........................................................

4-1

4.2

Real and Virtual 86 Modes ....................................................

4-1

CHAPTER 5

HARDWARE IMPLEMENTATION

5.1

Internal Design .............................................................

5-1

5.2

External Interface ...........................................................

5-3

5.2.1

Clock ..................................................................

5-3

5.2.2

Data and Address Buses ..................................................

5-3

5.2.3

Bus Cycle Definition .....................................................

5-4

5.2.4

Bus Cycle Control .......................................................

5-4

5.2.5

Dynamic Bus Sizing ......................................................

5-7

5.2.6

Processor Status and Control ..............................................

5-7

5.2.7

Coprocessor Control .....................................................

5-7

BOOK II

80386 High Performance Microprocessor with Integrated Memory Management ..............

1

iv

Chapter 1

1

Highlights

 

CHAPTER 1 HIGHLIGHTS

The 80386 is a high performance 32-bit microprocessor designed to drive the most advanced computer-based applications of today and tomorrow. CAE( CAD workstations, high resolution graphics, publishing, and office and factory automation are representative of today's applications that are well-served by the 80386. Tomorrow's applications may be more constrained by the imagination of system designers than by the power and versatility of the 80386.

The 80386 offers the system designer many new and powerful capabilities, including unprecedented performance of3 to 4 million instructions per second, a complete 32-bit architecture, a 4- gigabyte (232 bytes) physical address space, and on-chip support for paged virtual memory. While embodying the latest in microprocessor technology, the 80386 retains object code compatibility with the wealth of software written for its predecessors, the 8086 and 80286. Of special interest is the 80386's virtual machine capability, which enables the 80386 to switch between programs running under different operating systems, such as Unix* and MS-DOS*. This facility enables OEMs to incorporate standard 16-bit application software directly into new 32-bit designs.

Combining the power and performance of a superminicomputer with the low cost and design versatility of a microprocessor, the 80386 can open new markets to microprocessor-based systems. Applications that have not been feasible with slower microprocessors or cost-effective with superminicomputers are now practical with the 80386. Emerging applications such as machine vision, speech recognition, advanced robots, and expert systems, which have been largely experimental, can now be brought to market.

To effectively tackle the application challenges of tomorrow requires more than 32-bit registers, instructions, and buses. These fundamental facilities are only the starting point for the 80386. The

following sections summarize the 80386's 32-bit architecture along with its more innovative features:

o High-performance Implementation

o Virtual Memory Support

e Configurable Protection

o Extended Debugging Support

oObject Code Compatibility

1.132-bit Architecture

The 80386's 32-bit architecture provides the programming resources required to directly support "large" applications-those characterized by large integers, large data structures, large programs (or large numbers of programs), and so on. The 80386's physical address space is 232 bytes, or 4 gigabytes; its logical address space is 246 bytes, or 64 terabytes. The 80386's eight 32-bit general registers can be used interchangeably both as instruction operands and addressing mode variables. Data types include 8-, 16-, and 32-bit integers and ordinals, packed and unpacked decimals, pointers, and strings of bits, bytes, words and doublewords. The 80386 has a complete set of instructions for manipulating these types, as well as for controlling execution. The 80386 addressing modes support efficient access to the elements of the standard data structures: arrays, records, arrays of records, and records containing arrays.

1.2 High-performance

Implementation

A 32-bit architecture does not guarantee high performance. To deliver the potential of the architecture requires leading-edge semiconductor technology, careful partitioning of functions, and attention to off-chip operations, particularly the interaction of processor and memory. Incor-

1-1

HIGHLIGHTS

porating all of these, the 80386 delivers the highest perfonnance of any currently available microprocessor.

The 80386 is implemented in Intel's CHMOS III, a semiconductor process that combines the high frequency of HMOS with the modest power requirements of CMOS. Using 1.5 microngeometries and two metal layers, the 80386 packs over 275,000 transistors into a single chip. Both 12 and 16 MHz versions of the 80386 are initially available; running without wait states, the 16 MHz part can achieve sustained execution rates of 34 million instructions per second.

Internally, the 80386 is partitioned into six units that operate autonomously and in parallel with each other, synchronizing as necessary. All the internal buses that connect these units are 32 bits wide. By pipelining its functional units, the 80386 can overlap the execution of different stages of one instruction and can process multiple instructions simultaneously. Thus, while one instruction is executed, another is decoded, and a third is fetched from memory.

In addition to pipelining all instructions, the 80386 applies dedicated hardware to important operations. The 80386's multiplyj divide unit can perform 32-bit multiplication in 941 clocks, depending on the number of significant digits; it can divide 32-bit operands in 38 clocks (unsigned) or 43 clocks (signed). The 80386's barrel shifter can shift 1-64 bits in a single clock.

Many 32-bit applications, such as reprogrammabIe multiuser computers, need the logical-to- physical address translation and protection provided by a memory management unit, or MMU. Other applications, for example, embedded realtime control systems, do not. Most 32-bit microprocessor architectures respond to this dichotomy by implementing the memory management unit in an optional chip. The 80386 MMU, by contrast, is incorporated on the processor chip as two of the processor's pipelined functional units. The operating system controls the operation of the MMU, allowing a real-time system, for

example, to forgo page translation. Implementing memory management on-chip produces better perfonnance for applications that use the MMU and no performance penalty for those that do not. This achievement is made possible by shorter signal propagation delays, use of the half-clock cyles that are available on-chip, and parallel operation.

Another facility that is crucial to some applications and irrelevant to others is "number crunching," particularly singleand doubleprecision floating point arithmetic. Floating point operands are large, and the useful set of operations on them is quite complex; many thousands of transistors are required to implement a standard set of floating point operations such as those defined by IEEE standard 754. Consequently, the 80386 provides hardware support for numerics in a separate numeric coprocessor chip. In fact, either of two chips, the 80287 Numeric Coprocessor or the higher-perfonnance 80387, can be connected to the 80386. The numeric coprocessors are invisible to application software; they effectively extend the 80386 architecture with IEEE 754-compatible registers, data types, and instructions. The combination of an 80386 and an 80387 can execute 1.8 million Whetstones per second.

A 32-bit processor running at 16 Mhz can outrun all but the fastest memories, making memory access time a potential performance bottleneck. The 80386 bus has been designed to make the best use of both very fast static RAMs and less expensive dynamic RAMs. For accesses to fast memory, such as caches, the 80386 provides a two-clock address-to-data bus cycle. (80386 caches can be any size from a minimum useful capacity of 4 kilobytes to the entire physical address space.) Accesses to slower memories (or Ij 0 devices) can utilize the 80386's address pipelining facility to extend the effective address- to-data time to three clocks, while maintaining two-clock throughput to the processor. Because of its internal pipelining of address translation with instruction execution, the 80386 generally

1-2

HIGHLIGHTS

computes the address and definition of the next bus cycle during the current bus cycle. Address pipelining exposes this advance information to the memory subsystem, allowing one memory bank to decode the next bus cycle while another bank is responding to the current cycle.

used pages. 80386 pages are 4 kilobytes long; by mapping 128 kilobytes of memory at once, the TLB enables the80386 to translate most addresses on-chip without consulting a memory-based page table. In typical systems, 98-99% of address references will "hit" a TLB entry.

1.3 Virtual Memory Support

Virtual memory enables the maximum size of a program, or a mix of programs, to be governed by available disk space rather than the size of physical (RAM) memory, which is presently on the order of 400 times more expensive. The resulting flexibility benefits manufacturers (who can supply multiple performance levels of a product that differ only in memory configurations), programmers (who can leave storage management to the operating system, rather than writing overlays), and end-users (who can run more and larger applications without worrying about running out of memory).

Virtual memory is implemented by an operating system with support from the hardware. The 80386 supports virtual memory systems based on segments or pages. Segment-based virtual memory is appropriate for smaller 16-bit systems whose segments are at most 64 kilobytes in length. The 80386, however, supports segments as large as 4 gigabytes; therefore most large-scale 80386-based systems will base their virtual memory systems on the 80386's demand paging facilities. For each page, the 80386 supplies the Present, Dirty, and Accessed bits required to efficiently implement demand-paged virtual memory. The 80386 automatically traps to the operating system when an instruction refers to a notpresent page; when the operating system has swapped the missing page in from disk, the 80386 automatically re-executes the instruction. To insure high virtual memory performance, the 80386 provides an associative on-<:hip cache for paging information. The cache (called a translation lookaside buffer, or TLB) contains the mapping information for the 32 most recently

1.4 Configurable Protection

Executing 3-4 million instructions per second, the 80386 has the "horsepower" to support extremely sophisticated applications consisting of hundreds or thousands of program modules. Insuch applications, the question is not whether there will be bugs, but how they can be found and eliminated as quickly as possible, and how their damage can be tightly confined. These applications can be debugged faster and made more robust in production if the processor verifies each instruction for conformance to protection criteria. The degree and style of protection that should be applied, however, is inherently application-specific. Indeed, simple embedded real-time applications may work best with no protection. A range of protection needs is best satisfied with a range of protection facilities that can be employed selectively as can those provided by the 80386:

o Separation of task address spaces;

o From zero to four privilege levels;

'" Privileged instructions (for example, Halt);

o Typed segments (for example, code or data);

'" Access rights for segments and pages (for example, read-only or execute-only);

'" Segment limit checking.

All 80386 protection checks are performed in the on-chip pipeline to maximize performance.

1.5 Extended Debugging Support

The 80386's four on-chip debug registers can also significantly reduce program debugging time. These registers operate independently of the

1-3

HIGHTLIGTHS

protection system and can therefore be used by all applications, including those that will run in production without protection. More importantly, they provide the ability to set data breakpoints in addition to the more familiar instruction breakpoints. The 80386 monitors all four current breakpoint addresses simultaneously without slowing execution.

Instruction breakpoints trap (typically to a debugger) when an instruction is executed; most processors provide this capability with a special instruction that the debugger writes over the instruction of interest. By specifying instruction breakpoint addresses in registers, the 80386 eliminates the contortions required to write breakpoint instructions into protected or shared code. Data breakpoints, which are an exceptional capability for a microprocessor, are a particularly useful debugging tool. A data breakpoint can trap the instant that an address is read, or is either read or written. Using data breakpoints, a programmer can, for example, immediately locate the instruction responsible for erroneously overwriting a data structure.

In addition to the breakpoint registers, the 80386 provides the more conventional debugging facilities of a breakpoint instruction and single stepping.

1.6 Object Code Compatibility

Two generations of 86 family processors have preceded the 80386, the 80286 and the 8086, and the 80386 is compatible at the binary level with both of them. This compatibility preserves software investments, allows rapid market entry, and can provide access to the vast library of software written for computers based on the 86 family.

Of course the 80386 can run 8086 programs; it can also run 80286 and 80386 programs concurrently. But the 80386's most innovative compatibility feature is Virtual 86 capability, which establishes a protected 8086 environment within the 80386 multitasking framework. Comple-

menting the Virtual 86 facility, 80386 paging can be used to give each Virtual 86 task a onemegabyte address space anywhere in the 80386 physical address space. Moreover, if the 80386 operating system supports virtual memory, Virtual 86 tasks can be swapped like other tasks without special attention. In short, the 80386's Virtual 86 facility permits three generations of86 family software to run at the same time.

1.7 Summary

The 80386 provides the raw performance required to implement high-end microprocessor-based systems. The 80386 architecture is flexible: rather than being bound to one view of the machine, system designers can choose the options that best match the needs of the application. Complete memory management facilities, including support for segmentation, paging, and virtual memory, are available on-chip. Up to four levels of protection can be used to build "firewalls" between software components, or protection can be forgone altogether. Virtual 86 tasks can enrich 32-bit systems with the extraordinary array of standard software already developed for business and other 86 family machines.

The power and versatility of the 80386 can be augmented by other Intel chips that can help maximize system performance. These include local area network controllers, advanced DMA controllers, disk controllers, and graphics coprocessors.

Design time and cost can be reduced with the aid. of Intel development tools and boards. Development tools include compilers, linking and loading utilities, operating systems, and an in-circuit emulator (ICPM 386). Hundreds of industry standard MULTIBUS® I boards are available to perform standard functions without incurring design and test costs; -the array of highperformance MULTIBUS II boards is growing rapidly. Finally, Intel's experienced staff ofapplication engineers and specialists can provide design assistance worldwide.

1-4

Chapter 2

2

Application Architecture

CHAPTER 2

APPLICATION ARCHITECTURE

The 80386 provides the assembly language application programmer or compiler writer with an extensive set of 32-bit resources. The chapter describes these resources in three sections: I) registers, 2) memory and logical addressing, and 3) data types and instructions.

2.1 Registers

Computers, including the 80386, provide registers that programmers can use for very fast local storage. Register-resident data can be accessed without running bus cycles, thereby improving instruction execution time and leaving more bus bandwidth for other processors, such as direct memory access controllers. The 80386 provides programmers and compilers with eight general registers; another eight registers can be supplied by an optional 80287 or 80387 Numeric Coprocessor. Two other 80386 registers, which are oriented toward processor control and status, rather than data storage, are also important to programmers; these are the Flags register and the Instruction Pointer.

2.1.1 General Registers

As Figure 2-1 shows, the 80386 general registers are 32 bits wide; the processor's internal data paths, data bus, and address bus are also 32 bits wide. By any usual definition, the 80386 is a 32-bit word machine. However, following the practice of a number of processors whose ancestors are 16-bit machines, an 80386 word is 16 bits, while a 32-bit quantity is called a doubleword, or

dword.

As Figure 2-1 shows, all of the general registers can be used as 16or 32-bit registers and four of them can also be used as eight 8-bit registers. In nearly all instructions, any general register can be specified as an operand. For example, any two registers can be multiplied together. Similarly, any register can be used as a base or index

register in an address computation (discussed later in the chapter). Because every useful program needs a stack, the ESP general register is implicitly defined as the top of stack pointer.

31

15

AH

AX

AL

EAX

I

 

 

 

 

BX

BL

EBX

BH

I

 

 

 

CH

CX

CL

ECX

I

 

 

 

OH

OX

OL

EOX

I

 

 

 

 

SI

 

ESI

 

01

 

EOI

 

BP

 

EBP

 

SP

 

ESP

Figure 2-1. General Registers

2.1.2 Flags and Instruction Pointer

Figure 2-2 shows the format of the 80386 Flags register. The flags can be considered in three classes: status, control, and system. The processor sets the status flags after many instructions to reflect the outcome of the operation. For example, when two operands compare equal, the processor sets the Zero flag. Other instructions, notably the conditional Jump instructions, testa status flag and behave differently depending on the flag's value. Programmers can set control flags to modify the semantics of some instructions. For example, the Scan string instruction looks toward higher or lower addresses depending on the value of the Direction Flag. The system flags are provided for operating system use, and can be ignored by application programmers. (The system flags are discussed in Chapter 3.) In fact, the 80386 protection system can be used to

2-1

APPLICATION ARCHITECTURE

prevent application programs from inadvertently altering the system flags.

The 80386 Instruction Pointer, called EIP, is 32 bits wide. The Instruction Pointer controls instruction fetching (including prefetching) and the processor automatically increments it after executing an instruction. Interrupts, exceptions, and control transfer instructions, such as jumps and calls, alter the Instruction Pointer.

2.1.3 Numeric Coprocessor Registers

The numeric coprocessor registers shown in Figure 2-3 improve the performance of numeric applications. Connecting an 80287 or 80387 Numeric Coprocessor to an 80386 effectively

adds these registers to the 80386. While a numeric coprocessor recognizes integers, packed decimal, and floating point formats of various lengths, internally it holds all values in an eightdeep 80-bit-wide floating point register stack. Numeric instructions may implicitly refer to the top element(s) of the stack, or explicitly to other registers. The Status Register maintains the top of stack pointer, flags that identify exceptions (for example, overflow), and condition codes that reflect the result of the last instruction. The Control Register contains option and mask bits that the programmer can set to select the rounding algorithm, how infinity is to be modeled, and whether exceptions are to be handled by the coprocessor or by software.

STATUS FLAGS

CARRY

 

 

PARITY

 

 

AUXILIARY CARRY

 

 

ZERO

 

 

SIGN

 

12 I11 10 9 8 I7 6 5 4 3 2 1 0

OVERFLOW

 

~ 17 16 15 14 13

_

VM IRF ~ NT I 10PL

IOF IOF IIF I TF ISF IZF ~AF ~PF ~ CF I

SYSTEM FLAGS

lJ

 

RESUMEVIRTUAL 86 MODE

 

NESTED TASK

1/0 PRIVILEGE LEVEL INTERRUPTS ENABLED

CONTROL FLAGS

DIRECTION

I

TRAP

Figure 2-2. Flags Register

2-2

APPLICATION ARCHITECTURE

REGISTER STACK

79

78

64

63

±EXPONENT SIGNIFICAND

CONTROL REGISTER

STATUS REGISTER

Figure 2-3. Numeric Coprocessor Registers

2.2 Memory and logical Addressing

80386 application programs use logical addresses to specify the locations of operands in a 4- gigabyte physical address space. The processor automatically translates these logical addresses to the physical addresses that it emits on the system bus. As discussed more fully in Chapter 3, an 80386 operating system can tailor an application program's view of its logical address space. For example, an operating system can define the logical address space as it is defined by many architectures, as a simple array of 232 bytes. Alternatively, an 80386 operating system can organize the logical address space as a collection of variable-length segments. An operating system can define many segments or just a few, as appropriate to its view of logical memory; the 80386 does not dictate the use of segments, but rather allows them to be used as they support application needs. When reading the following sections, bear in mind that the extent to which an

application program actively uses segments depends on the framework established by the operating system.

2.2.1 Segments

As just mentioned, an operating system can define the 80386 logical address space as one or more segments. Segments are logical units that map well to programming structures, which are inherently variable in length. For example, a l5l6-byte procedure fits exactly into a 1516-byte segment, as an 8-megabyte array (for example, a 1028x1028x8 display buffer) fits exactly into a segment of the same size. By providing architectural support for segments (for example, segments can be individually protected, and can be shared selectively between tasks), the 80386 improves the performance of systems that choose segments as a structuring mechanism. (Pages, which are described in Chapter 3, are fixed-size; they do not map well to programming constructs, but, on the other hand, are better-suited to operating system functions such as swapping.)

An 80386 segment can be any size from 1byte to 4 gigabytes. For every segment, the operating system maintains an architecture-defined descriptor that specifies the attributes of the segment. Segment attributes include a 32-bit base address and limit (length), and protection information that can guard a segment against incorrect use. Because descriptors are maintained by operating systems, fuller coverage of them is deferred to Chapter 3. Application programs deal only indirectly with descriptors, referring to segments by means of logical addresses.

2.2.2 Logical Addresses

Because a program may potentially refer to multiple segments, an 80386 logical address must identify a segment. Therefore, an 80386 logical address consists of two parts, a 16-bit segment selector and a 32-bit offset into the selected segment (see Figure 2-4). The selector part of a

2-3

APPLICATION ARCHITECTURE

logical address names a segment's descriptor. Conceptually, the processor determines a segment's address by using the selector as an index into a descriptor table maintained by the operating system. Adding the offset part of the logical address to the base address obtained from the segment's descriptor produces the operand address.

2.2.3 Segment and Descriptor

Registers

To make logical addressing efficient, the 80386 provides six segment and descriptor registers (see Figure 2-5). In effect, these registers act as a programmer-controlled cache that eliminates selectors from most instructions and permits most logical addresses to be translated on-chip without consulting a descriptor table.

The address references of most programs cluster in a few small address ranges (this is the "locality ofreference"principle that makes virtual memory practical). For example, if a procedure is stored in a segment, many instructions are likely to be fetched from the segment before control passes

 

LOGICAL ADDRESS

47

32 31

I SELECTOR

OFFSET

 

l

 

DESCRIPTOR TABLE

to another procedure in another segment. The 80386, under program control, exploits this locality of reference by keeping recently used selectors and descriptors in its on-chip registers. The on-chip descriptors enable the great majority of logical addresses to be translated without time-consuming memory references.

At any instant, up to six segments are addressable, the code segment, the stack segment, and up to four data segments. The CS, SS, DS, ES, FS, and GS segment registers contain the selectors for these segments. The corresponding descriptor registers contain the matching descriptors. If necessary, a program can make a new segment addressable by loading the new segment's selector into a segment register. The processor maintains the descriptor registers automatically, loading the proper descriptor whenever a program changes a segment register. (In fact, descriptor registers can only be loaded by the processor; they are inaccessible to programs.) Note that the Instruction Pointer contains the offset of the current instruction in the current code segment (defined by the CS register), and that register ESP contains the offset of the stack top in the

SEGMENT -

0

I

OPERAND

UP TO

232 BYTES

+

'--- DESCRIPTOR

-

Figure 2-4. Logical Address Translation

2-4

APPLICATION ARCHITECTURE

current stack segment (defined by the SS register).

To improve instruction encoding efficiency, most instructions do not name segment registers. Instead, the 80386 automatically selects a segment register based on the instruction being executed. For example, a Jump instruction implicitly refers to the CS register and a Push instruction uses the SS register. If necessary, a programmer can explicitly direct the 80386 to use a particular segment in an instruction by preceding the instruction with a one-byte segment override prefix. The prefix directs the processor to use a particular segment register to translate the address in the following instruction.

Note that a segment whose base address is 0 and whose limit is 4 gigabytes defines a 4-gigabyte logical address space. Because the processor selects segment registers automatically, an instruction can name an operand anywhere in this 4- gigabyte space with a simple 32-bit offset. If, as illustrated in Figure 2-6, all the descriptor registers are loaded with base addresses of 0 and limits of 4 gigabytes, the segments effectively disappear. Every byte in the logical address space, whether

an instruction, a variable, or an item on the stack, is addressable with a simple 32-bit offset. Thus, the segment registers give the 80386 six instantancously addressable logical address spaces of up to 4 gigabytes each. When these segments coincide, a program sees a single 4- gigabyte logical address space identical to that provided by less-flexible 32-bit architectures.

2.2.4 Addressing Modes

The 80386 provides register and immediate addressing modes for operands that are located in registers or in instructions, respectively. More importantly, the 80386 provides the addressing modes needed to efficiently refer to elements in memory-based data structures such as arrays, records (structures), arrays of records, and records containing arrays. A program spccifies the offset part of a logical address using one of the 80386 memory addressing modes. The 80386 computes the offset part of a logical address by the following formula:

offset: = base + (index * scale) + displacement

15

SEGMENT

63

DESCRIPTOR

 

 

SELECTOR

 

BASE ADDRESS. LIMIT ETC.·

CS (CODE)

 

 

 

55 (STACK)

os (DATA)

ES (DATA)

FS (DATAl

GS (DATA)

-ADDITIONAL DESCRIPTOR FIELDS ARE DESCRIBED IN CHAPTER 3.

Figure 2-5. Segment and Descriptor Registers

2-5

APPLICATION ARCHITECTURE

Any or all of the base, index, and displacement variables can be used to compute an offset. The base and index variables are the values ofgeneral registers, while the displacement value is contained in the instruction. Any general register can serve as a base or index register. The value in the index register can be scaled (mUltiplied) by I, 2,4, or 8, providing a direct way to refer to array or record elements of these lengths. A displacement value can be 8 or 32 bits long and is interpreted by the processor as a signed 2's- complement value.

The most meaningful combinations of base, index, and displacement yield the following 80386 memory addressing modes:

Based Indexed: base + index (scaled).

Based Indexed with Displacement: base + index (scaled) + displacement.

2.3Data Types and Instructions

This section describes the instructions that application programmers use most frequently. Since the majority of instructions operate on specific data types (for example, integers), types and instructions are described together. Privileged instructions, including those for performing 1/0 and handling interrupts, are covered in the next chapter.

Direct: displacement only. o Register Indirect: base only. o Based: base + displacement.

Indexed: index (scaled).

Indexed with Displacement: index (scaled) + displacement.

2.3.1 Principal Data Types

Table 2-1 shows the data types and instructions provided by the 80386. Only the most frequently used instructions are shown in Table 2-1. Also omitted are variants of instructions such as (in the case of Rotate) Rotate Left,"Rotate Right, and Rotate Through Carry Flag.

4GB # #1

b

 

SEL.

BASE

LIMIT

1# -7

"IGS: n

0

4GB

 

 

0

4GB

l~. SEGMENTS

SS:

n

 

4GB

OS:

n

 

4GB

ES:

n

0

4GB

FS:

n

 

4GB

 

SEGMENT/DESCRIPTOR REGISTERS

Figure 2-6. A 4-gigabyte Logical Address Space

2-6

APPLICATION ARCHITECTURE

Table 2-1.

Principal Data Types and Instructions

Type

Size

Instructions

Integer,

8, 16,32

Move, Exchange, Translate,

Ordinal

bits

Test, Compare, Convert,

 

 

Shift, Double Shift,

 

 

Rotate, Not, Negate, And,

 

 

Or, Exclusive Or, Add,

 

 

Subtract, Multiply,

 

 

Divide, Increment,

 

 

Decrement, Convert

 

 

(M Dve with sign/ zero

 

 

extension)

Unpacked

I digit

Adjust for: Add, Subtract,

Decimal

 

Multiply, Divide

Packed

2 digits

Adjust for: Add, Subtract

Decimal

 

 

String (hyte,

0-4G bytes,

M ave, Load, Store,

word, dword) words, dwords

Compare, Scan, Repeat

Bit String

1-4G bits

Test, Test and Set, Test

 

 

and Reset, Test and

 

 

Complement, Scan, Inscrt,

 

 

Extract

Near

32 bits

(Samc as Ordinal)

Pointer!

 

 

Far Pointer

48 bits

Load

I.A near pointer is a 32-bit offset into a segment defined by one of the segment/descriptor register pairs. A far pointer is a full logical addess, that is, a selector and an offset.

Figure 2-7 shows how exam pies of the basic data types are stored in memory, Multibyte items can be located at any byte address. Depending on the bus design, additional bus cycles may be required to access an operand located at an address that is not a multiple of its size. Therefore, for best performance independent of bus design, most programs align word operands on word boundaries, dword operands on doubleword boundaries, and so on.

2.3.2 Numeric Coprocessor Data

Types

An 80287 or 80387 Numeric Coprocessor supplements the 80386 with the data types and instructions shown in Table 2-2. Most numeric applications store input values and output results in the integer, real, or packed decimal types and reserve

the temporary real type for intermediate values, where its extended range and precision minimize rounding, underflow, and overflow problems in complex computations. In accordance with this model, a numeric coprocessor performs most computations on temporary real values stored in its registers. Loading any type into the register stack automatically converts the type to temporary real. A temporary real value in a register can be converted to any other type by a Store instruction.

Table 2-2. Principal Numeric Coprocessor

Data Types and Instructions

Type

Size

Instructions

Integer

16,32,

Load, Store, Compare,

 

64 bits

Add, Subtract, Multiply,

 

 

Divide

Packed

18 digits

Load, Store

Decimal

 

 

Real

32,64 bits

Load, Store, Compare,

 

 

Add, Subtract, Multiply,

 

 

Divide

Temporary

80 bits

Add, Subtract, Multiply,

Real

 

Divide, Square Root, Scale

 

 

Remainder, Integer Part,

 

 

Change, Sign, Absolute

 

 

Value, Extract Exponent

 

 

and Significand, Compare,

 

 

Examine, Test, Exchange

 

 

Tangent, Arctangent, 2'-1,

 

 

y* Log2 (X + 1), y* Log2

 

 

(X), Load Constant (0.0,

 

 

pi, etc.) (80387 adds Sine,

 

 

Cosine, Sine and Cosine,

 

 

Unordered Compare)

Figure 2-8 shows how numeric data types are stored in memory.

2.3.3 Other Instructions

Not all 80386 instructions are associated with data types. The following paragraphs survey the untyped instructions.

2.3.3.1 Stack Instructions

An 80386 stack is a stack of dwords whose base and top are defined by the SS and ESP

2-7

APPLICATION ARCHITECTURE

registers, respectively. The Push instruction pushes a dword onto the stack and the Pop instruction pops the top d word from the stack into a register or to memory. Push All pushes the general registers onto the stack and Pop All does the reverse.

The Enter and Leave instructions are provided for block-structured high-level languages. The Enter instruction builds the stack frame and display that compilers use to link procedure calls. The Leave instruction removes the display and stack frame from the stack in preparation for returning to the calling procedure.

"INCREASING SIGNIFICANCE

2.3.3.2 Control Transfer

Instructions

The Jump instruction transfers control to another instruction by changing the value of the Instruction Pointer. The target instruction may be in the same code segment (up to 232 bytes a way) or in a different one. The operand of an intrasegment Jump is a near pointer, that is, the offset of the target instruction in the current code segment; thus, a Jump can be directed to any location in the largest possible segment. The operand of an intersegment Jump is a far pointer. allowing control to be

I RELATIVE ADDRESSES

7

0

o

BYTE ORDINAL

MAGNITUDE ___-It

 

~

 

0

 

IIL... -__________.....IDWORD INTEGER

 

t,--I----MAGNITUDE------'

 

'-------SIGN

 

 

 

7

 

0

 

o

 

UNPACKED DECIMAL

 

7

3

0

 

OJPACKED DECIMAL

 

d~.,...I____-------.-1 BYTE STRING

 

LEFTMOST BYTE------'t

 

 

~

 

0

 

1L-__________-.l1 NEAR POINTER

47

31

 

 

L -______~____________~IFAR POINTER

L

SELECTOR ~~I---- OFFSET - ___~

Figure 2-7. Data Type Storage

2-8

APPLICATION ARCHITECTURE

transfered to any point in a segment. (The selector part of the far pointer replaces the value in the CS register while the offset part replaces the value in EIP) A full set of conditional Jump instructions, which branch based on the value of a status flag, is also available; these instructions can also transfer to locations up to 232 bytes away.

Procedures and functions (subroutines) can be invoked with the Call instruction and a called routine can return with the Return instruction. As with Jumps, Calls within a code segment have near pointer operands which specify a new value for the Instruction Pointer, while Calls to a different code segment have far pointer operands that change the CS register in addition to EIP. Call instructions push the address of the following instruction

onto the stack and then load the Instruction Pointer (and the CS register, if the transfer is to a different segment). The Return instruction pops the saved value(s) from the stack into Erp, and CS, if applicable. Calls can be indefinitely nested and recursive, subject only to the size of the stack.

For controlling loops, the 80386 provides the Loop and conditional Loop instructions in addition to conditional Jumps. The loop instructions use the ECX register as a repetition counter; they decrement ECX and terminate the loop when the register's value becomes zero. The conditional Loop instructions terminate a loop prematurely when a flag takes a specified value. While the Loop instructions are designed for "bottom of loop" testing, adding a Jump If ECX Zero instruction

 

 

 

 

 

 

 

IRELATIVE

 

 

 

 

 

 

 

ADDRESSES

 

 

..."'f-----

INCREASING SIGNIFICANCE

 

79

71

I

 

 

 

3

0

(G::::J

 

 

 

I I I IPACKED DECIMAL

Q

...'S - IG - N

--------

MAGNITUDE -----------

'

 

 

 

 

31

23

 

0

 

 

 

 

11... --.............

 

________1SHORT REAL

 

 

 

 

l'•II

MAGNITUDE----....

J

 

 

 

 

 

 

BIASED EXPONENT

 

 

 

 

 

 

 

SIGN

 

 

 

63

51

 

 

 

0

 

 

LJIIL-_-LI____________---JILONG REAL

 

 

 

II

MAGNITUDE

 

 

 

 

~t

BIASED EXPONENT

 

 

 

 

 

 

 

SIGN

 

 

 

79

 

62

 

 

 

 

0

I..&..1___--11...11_________________---11 TEMPORARY REAL

tL.1_--

,-_

---ll L.I-------

 

MAGNITUDE

 

 

 

l

t____BIASED EXPONENT

 

 

 

L - . -----

 

SIGN

 

 

 

 

 

Figure 2-8. Numeric Coprocessor Data Type Storage Examples

2-9

APPLICATION ARCHITECTURE

implements a "top of loop" test that allows the loop to be executed zero times.

2.3.3.3 Miscellaneous Instructions

The 80386 Bound instruction can be used to verify that an array subscript is within the bounds of the array. There are instructions for setting and clearing flags, and for loading and storing the status byte of the Flags register. The 80287 and 80387 supply the instructions that an operating system needs to initialize the coprocessor, handle coprocessor exceptions, and save and restore the coprocessor's state. Finally, of course, the 80386 has a No Operation instruction.

2-10

Chapter 3

3

System Architecture

 

CHAPTER 3

SYSTEM ARCHITECTURE

The purpose of a system architecture is to support operating systems, but operating systems are quite diverse in their needs. In response, the 80386 provides an array of resources that operating system designers and implementors can selectively employ. In effect, the 80386 system architecture can be configured to fit the needs of the operating system under development.

3.1 System Registers

In addition to the registers described in the preceding chapter, an operating system sometimes uses the 80386 registers shown in Figure 3-1. (Later sections of this chapter sometimes refer to these registers, so they are shown here for reference.) In the main, it is the 80386 that uses the system registers; the operating system initializes the system registers and then ignores them during normal operation. The operating system may, however, use a system register to handle an exception. For example, when a page fault occurs, the processor loads the faulting address into CR2; the operating system's page fault handler uses the address to find the associated page table entry. The system registers are normally inaccessible to application programs,

since only privileged instructions can operate on them. (Exceptions, page faults, and privileged instructions are explained later in this chapter.)

3.2 Multitasking

Many of the 80386's system architecture facilities directly support multitasking operating systems, though, of course, the 80386 can be used in demanding single-task applications. Multitasking is a technique for managing a computer system's work when that work consists of multiple activities; three such activities might be editing one file, compiling another, and transmitting a third to another computer. In a multitasking system, each activity that can proceed in parallel with other activities is represented by a task. (In this introduction, the term "task" is considered equivalent to the term "process.") Each task executes a program consisting of instructions and initial data values. More than one task can execute the same program; for example, in a timesharing multitasking system several tasks (each corresponding to a user) commonly execute the same compiler or editor. Programs and tasks are related in somewhat the same way that sheet music and musical performances are related: a

47

15

 

BASE ADDRESS

LIMIT

GDTR: GLOBAL DESCRIPTOR TABLE

63

 

IDTR: INTERRUPT DESCRIPTOR TABLE

I SELECTOR

 

LDTR: LOCAL DESCRIPTOR TABLE

I

TR: RUNNING TASK'STSS

 

 

SYSTEM ADDRESS REGISTERS

 

31

 

CRO: OPTIONS (FOR EXAMPLE, PAGING)

 

~--------------~ CR1: (RESERVED BY INTEL)

 

CR2: PAGE FAULT LINEAR ADDRESS

 

CR3: PAGE DIRECTORY BASE ADDRESS

 

SYSTEM CONTROL REGISTERS

Figure 3-1. System Registers

3-1

SYSTEM ARCHITECTURE

program is a text that describes an algorithm, and a task is one execution (performance) ofthat algorithm.

The programs that tasks execute are designed as though they were to run on dedicated processors sharing a common memory; that is, except ror occasional pauses to communicate or synchronize with other tasks, a task theoretically runs continuously in parallel with all other tasks. In fact, however, the tasks run one at a time in short bursts on a single processor.

The multitasking operating system simulates mUltiple processors by providing each task with a "virtual processor." At any instant, the operating system assigns the real processor to one of the virtual processors, thereby running the associated task. To maintain the illusion of one processor per task, the operating system frequently switches the real processor to a different virtual processor. The 80386 system architecture supports this critical task switch operation with Task State Segments and instructions that switch tasks.

3.2.1 Task State Segment

A Task State Segment (TSS) is one of several data structures defined by the 80386 system architecture. In effect, these data structures are "data types" for operating systems. A TSS (see Figure 3-2) corresponds to what some operating systems call a task control block; it holds the state of a task's virtual processor. Each 80386 task is represented by a TSS, which is divided into two parts. The lower part of the TSS is defined by the 80386 architecture and contains processor register values. The upper part of the TSS can be defined by the operating system to hold task-related data such as scheduling priority, file descriptors, and so on. To create a new task, the operating system creates a TSS and initializes it to the values the task should have when it begins execution. The 80386 then maintains the lower part of the TSS, while the upper part is the responsibility of the operating system.

3.2.2 Task Switching

The operating system interleaves the execution of tasks on the processor according to a scheduling policy. The scheduling policy sets the order in which tasks run. Because task scheduling poljcies are so diverse, the 80386 leaves them to the operating system. Once the operating system has decided to run a new task, however, it can direct the processor to perform the core of the task switch, sometimes called the context switch.

The 80386 keeps a selector and a descriptor for the running task's TSS in its Task Register (TR). To switch tasks, the operating system issues a Jump instruction whose operand is a selector for the TSS of the new task. The processor executes the Jump TSS instruction by first storing its registers in the current TSS and then loading TR with the selector (and its associated descriptor) specified in the instruction. Having obtained the address of the new TSS, the processor loads its registers with the values in the new TSS. Execution continues with the instruction pointed to by

1

 

l-

.. --

(DEFINED BY OPERATING SYSTEM)

,roo

 

 

LOCAL DESCRIPTOR TABLE ADDRESS

 

 

PAGE TABLE DIRECTORY ADDRESS

 

.. ~

GENERAL REGISTERS

~ ~

 

FLAGS REGISTER

 

 

INSTRUCTION POINTER

 

.. '"

SEGMENT REGISTERS

....

Ir--

 

PRIVILEGED STACK POINTERS

Figure 3-2. Principal Task State Segment

Fields

3-2

SYSTEM ARCHITECTURE

the new task's Instruction Pointer. To later resume execution of the old task, the operating system issues a Jump TSS to the old task's TSS; execution of the old task then continues with the instruction following the Jump TSS that suspended the task. The task switch described here takes 17 microseconds (16 M Hz., no wait states).

3.3 Addressing

The physical address space of most computers is organized as a simple array of bytes. With the development of memory management units (M MUs), computer architectures began to distinguish between the physical address space implemented by the memory hardware and the logical address space seen by a programmer. The M M U translates the logical addresses presented by programs into the physical addresses that go out on the bus. Most architectures view a task's logical address space as consisting of a collection of one of the following:

Bytes

The logical address space con-

 

sists of an array of bytes with

 

no other structure (this is some-

 

times called a "flat" or "linear"

 

address space). NoM M U trans-

 

lation is required because a

 

logical address is exactly equiv-

 

alent to a physical address.

Segments

The logical address space con-

 

sists ofa few or many segments,

 

each of which is composed of

 

a variable number of bytes. A

 

logical address is given in two

 

parts, a segment number and

 

an offset into the segment. The

 

MMU translates a logical ad-

 

dress into a physical address.

Pages

The logical address space con-

 

sists of many pages, each of

 

which is composed of a fixed

 

number of bytes. A logical

 

address is a page number plus

an offset within the page. The MMU translates a logical address into a physical address.

Paged Segments The logical address space consists of segments which themselves consist of pages. A logical address is a segment number and an offset. The MMU translates the logical address into a page number and an offset and then translates these into a physical address.

Each of these views matches some classes of system well and others less well. For example, the "flat" view is appropriate for simple embedded systems, while systems that separately manage and protect individual program structures fit better with the segmented view of memory. Technically, the 80386 views memory as a collection of segments that are optionally paged. In practice, the 80386 architecture supports operating systems that use any of the four views of memory described above.

3.3.1 Address Translation Overview

Figure 3-3 shows the fundamentals of 80386 logical-to-physical address translation. The sequence of operations shown in Figure 3-3 is central to both addressing and protection. It is described here in skeleton form to clearly establish its overall outline before considering such features as virtual memory and protection. Subsequent sections elaborate on the translation stages and show how they can be tailored to fit the needs of a particular system.

As described in the previous chapter, the 80386 memory addressing modes yield the 32-bit offset ofthe target operand. Combined with a segment selector, this offset forms a two-part logical address: the selector identifies the target segment and the offset locates the operand in the segment. In the vast majority of instructions, the selector is specified implicitly as the content of a segment register.

3-3

SYSTEM ARCHITECTURE

A selector is an index into a segment descriptor table; that is, it is a segment number. Each entry in a segment descriptor table contains the base address of a segment. The processor adds the offset to the segment's base address to produce a 32-bit linear address. If paging is not enabled, the processor considers the linear address to be the physical address and emits it on the address pins.

If paging is enabled, the 80386 translates the linear address into a physical address. It does this with the aid of page tables. A page table is conceptually similar to a descriptor table except that each page table entry contains the physical base address of a 4 kilobyte page.

Because it embraces both traditional address space structuring units (segments and, optionally, pages), and because segments can be very large (up to 4 gigabytes), the 80386's addressing technique is very flexible. An operating system can provide a task with a single flat address space, a flat address space that is paged, a

segmented address space, or a segmented address space that is paged.

With all its flexibility, the 80386's multistage address translation facility is nevertheless quite fast. The 80386 typically computes an offset and translates the resulting logical address to a physical address in 1.5 clocks. Moreover, address translation time is not visible to the application because the 80386's on-chip MMU translates addresses in parallel with other processor activities (except when a Jump or Call instruction temporarily interrupts pipelining).

3.3.2 Segments

The segment is the unit the 80386 provides for defining a task's logical address space; that is, a task's logical address space consists of one or more segments. Operating systems differ substantially in the way in which they define a task's logical address space. For example, an embedded real-time system may define a task's logical

DESCRIPTOR

TABLES

LOGICAL ADDRESS

OFFSET

PAGING ENABLED

PHYSICAL ADDRESS

PAGE

TABLES

31

LINEAR ADDRESS

Figure 3-3. Address Translation Overview

3-4

SYSTEM ARCHITECTURE

address space to be a single entity shared by all tasks and the operating system itself; in other words, a single segment is shared system-wide. At the other extreme, a system might map every data structure and procedure into a different segment, making a task's logical address space consist of dozens or hundreds of address spaces, each corresponding to a procedure or a data structure. Between these extremes might fall a general-purpose timesharing system in which tasks run in separate logical address spaces, and in which a task's code is separated from its data, and application code and data are separated from operating system code and data. The 80386 segmentation facility is versatile enough to support each ofthese examples, and others as well.

As described in Chapter 2, an instruction refers to a memory operand by a two-part logical address consisting of a segment selector and an offset into the segment. In principle, the 80386 translates the logical address to a linear address by using the selector to look up the segment's descriptor in a segment descriptor table. The descriptor contains the segment's base address in the linear address space; adding the offset produces the operand's linear address. In practice, the logical-to-linear address translation is optimized by implicit selectors and register-based descriptors. As a result, the descriptor table lookup only occurs for instructions that load

ATTRIBUTES

new selectors into segment registers (for example, a Call to a procedure in a different segment changes the selector in the CS register).

Although it rarely occurs in practice, it is nevertheless convenient to think of the processor translating logical addresses by looking up descriptors in segment descriptor tables because it follows that the descriptors in a task's segment descriptor tables define the task's logical address space. Without a descriptor a task has no way to generate a linear address.

A segment descriptor table is an array of descriptors; Figure 3-4 shows the logical format of a descriptor. The base address field has already been explained. The limit field specifies the length ofthe segment; the 80386 uses the limit field to verify that the offset part of a logical address is valid-that it actually falls within the segment. The segment attributes mainly relate to protection and are described later in the chapter.

Each task can have a system-wide and a private logical address space; these are represented by the Global Descriptor Table (GOT) and the

Local Descriptor Table (LOT), respectively. (A selector contains a bit associating it with 'one table or the other.)These descriptor tables can contain up to 8,192 descriptors each, and together they define a task's logical address space. That is, to make a new segment addressable by a task, the

ACCESSED

RIGHTS

TYPE

PRIVILEGE

PRESENT

~ 1 I II

I I I I I

LIMIT

BASE ADDRESS

Figure 3-4. Principal Descriptor Fields

3-5

Loading...
+ 164 hidden pages