2.5 Test Load Circuit ............................................................................................................................... 14
2.7 DC Characteristics ............................................................................................................................ 15
2.6 Absolu te Max im um Ra tin gs ... ... .... ............ ... ............ ........... .... ............ ... ............ .... ........... ........ .... ... . 15
2.8 AC Specifications ............................................................................................................................. 16
2.8.1 AC Specification T a bles ......................................................................................................... 17
3.0 MECHANICAL DATA ................................................................................................................................ 21
3.3 Package The rm al Spec if ic ati on .... .... .... .... ........... .... ........... .... ............ ... ............ ........... .... ........ .... ... . 29
5.0 REVISION HISTORY ............................................................................................................................... 38
Figure 3.Multiple Register Sets Are Stored On-Chip ...............................................................................6
Figure 4.Connection Recommendations for Low Current Drive Network .............................................. 11
Figure 5.Connection Recommendations for High Current Drive Network .............................................. 11
Figure 6.Typical Supply Current vs. Case Temperature .........................................................................12
Figure 7.Typical Current vs. Frequency (Room Temp) ..........................................................................12
Figure 8.Typical Current vs. Frequency (Hot Temp) ..............................................................................13
Figure 9.Worst-Case Voltage vs. Output Current on Open-Drain Pins ..................................................13
Figure 10.Capacit iv e Dera tin g Curv e ...... ... .... .... .... ........... .... ............ ... ............ .... ........... .... ............... ......13
Figure 11.Test Load Circuit for Three-State Output Pins .........................................................................14
Figure 12.Test Load Circuit for Open-Drain Output Pins ..........................................................................14
Figure 13.Drive Levels and Timing Relationships for 80960KA Signals ..................................................16
Tab le 1.80960KA Ins tru c tio n Set ....... .... .... .... ........... .... ........... .... ............ ... ............ .... ........... .... .... .... ... .. 3
Tab le 14.80960KA PQ FP Pac k age The rm al Chara ct eri st ics ........ .... ........... .... ............ .... ........... .... .......30
The 80960KB is a member of Intel’s i960® 32-bit
processor family, which is designed especially for
embedded applications. It includes a 512-byte
instruction cache, an integrated floating-point unit
and a built- in int errupt contro ller. The 80 960K B has
a larg e registe r set, m ultiple parallel execut ion units
and a high-bandwidth burst bus. Using advanced
RISC technology, this high performance processor is
capab le of exec ution rates in excess of 9. 4 million
instructions per second
for a wide range of applications including non-impact
printers, I/O control and specialty instrumentation.
The embedded market includes applications as
diverse as industrial automation, avionics, image
processing, graphics and networking. These types of
* Relative to Digital Equipment Corporation’s VAX-11/780
at 1 MIPS (VAX-11™ is a trademark of Digital Equipment
. The 80960KB is well-suited
applications require high integration, low power
consumption, quick interrupt response times and
high performance. Since time to market is critical,
embedded microprocessors need to be easy to use
in both hardware and software designs.
All members of the i960 processor family share a
common core architecture which utilizes RISC
techno logy s o that, ex cept fo r specia l functi ons, th e
family members are object-code compatible. Each
new p ro ce ss o r in th e family a dds its o w n sp ec ia l set
of functions to the core to satisfy the needs of a
specific application or range of applications in the
embedded market.
Software written for the 80960KB will run without
modification on any other member of the 80960
Family. It is also pin-compatible with the 80960KA
and the 80960MC which is a military- grade ve rsion
that supports multitasking, memory management,
multiproce ss ing and fau lt to lerance.
FFFF FFFFH0000 0000H
Figure 2. 80960KB Programming Environment
1.1Key Performan ce Featu res
The 80 96 0 arc hitec tur e is b ased on the mos t rece nt
advances in microprocessor technology and is
grounded in Intel’s long experience in the design and
manufacture of embedded microprocessors. Many
features contribute to the 80960KB’s exceptional
1. Large Register Set. Having a la rge num be r of
registers reduces the number of times that a
processor needs to access memory. Modern
compilers can take advantage of this feat ure to
optimize execution speed. For maximum flexibility, the 80960KB provides thirty-two 32-bit
registers and four 80-bit floating point registers.
(See Figure 2.)
2. Fast I nstru ction Exe cution . Simpl e functi ons
make up the bulk of instructions in most
programs so that execution speed can be
improved by ensuring that these core instructions are ex ecut ed as quic kly as po ssib le. Th e
most frequently executed instructions such as
register-register moves, add/subtract, logical
operations and shifts execute in one to two
cycles. (Table 1 contains a list of instructions.)
3. Load/Store Architecture. One way to improve
execution speed is to reduce the number of
times that the processor must access memory
to perform an operation. As with other
processors based on RISC technology, the
80960KB has a Load/Store architecture. As
such, only the LOAD and STORE instructions
reference memory; all other instructions
operat e on registers. This type of architecture
simplifies instruction decoding and is used in
comb ination with other te chniqu es t o incre ase
para llelis m.
4. Simple Instruction Formats. All instructions
in the 8 0960KB are 32 bits long and m ust be
aligned on word boundaries. This alignment
makes it possible to eliminate the instruction
align ment stage in th e p ipeli ne. To simplif y the
instruction decoder, there are only five
instruction formats; each inst ruction uses only
one format. (See Figure 3.)
5. Overlapped Instruction Execution. Load
operations allow execution of subsequent
instructions to continue before the data has
been returned from memory, so that these
instructions can overlap the load. The
80960KB manages this process transparently
to software through the use of a register scoreboar d. Condi tional ins tructio ns also m ake use
of a scoreboard so that subsequent unrelated
instructions may be executed while the conditional instruction is pendin g.
6. Integer Execution Optimization. When the
resu lt of an a rith meti c ex ecu tion i s us ed a s an
operand in a subsequent calculation, the value
is se nt immedia tely to its dest ination register.
Yet at the same time, the value is put on a
bypass path to the ALU, thereby saving the
time that otherwise would be required to
retrieve the value for the next operation.
7. Bandwidth Optimizations. The 80960KB gets
optimal use of its memory bus bandwidth
because the bus is tuned for use with the
on-chip instruction cache: instruction cache
line size matches the maximum burst size for
instruction fetches. The 80960KB automatically
fetches four words in a burst and stores them
directly in the cache. Due to the size of the
cach e and the fa ct that i t is co ntin ua lly fi lled in
anticipation of needed instructions in the
prog ram flow, the 80960K B is rel atively insensitive to memory wait states. The benefit is that
the 80960KB delivers outstanding performance
even wi th a low cost memor y system.
8. Cache Bypass. If a cache miss occurs, the
processor fetches the needed instruction then
sends it on to the instruction decoder at the
same time it updates the cache. Thus, no extra
time is spent to load and read the cache.
Table 1. 80960KB Instruction Set
Data MovementArithmeticLogicalBit and Bit Field
Load Address
Remain der
Not And
And Not
Exclusive Or
Not Or
Or Not
Exclusive Nor
Set Bit
Clear Bit
Not Bit
Check Bit
Alter Bit
Scan For Bit
Scan Over Bit
Modi fy
Comp ar i sonBranc hCall/Re t urnFault
Conditional Compare
Com pa re an d Inc r e me nt
Com pa re and Decr em e nt
Unc on di tional Br an c h
Conditional Branch
Com pa re and Bran c h
Call Extended
Call System
Conditional Fault
Synchronize Faults
Bra nch and Link
DebugMiscellaneousDecimalFloating Point
Modify Trace Controls
Force Mark
Atomic Add
Atom i c Mo di fy
Flush Local Registers
Modify Arithmetic
Scan Byte for Equa l
Test Condition C ode
Modify Process Controls
Deci mal Move
Decimal Add with Carry
Decimal Subtract with
Move Real
Square Root
Log Binary
Log Natural
Copy Real Extended
Synchronous Load
Synchronous Move
Convert Real to Integer
Convert Integer to Real
Compare and
Register to
OpcodeRegReg/LitModesExt’d OpReg/Lit
Figure 3. Instruction Formats
1.1.1Memory Space And Addressing Modes
The 80960KB offers a linear programming
environment so that all programs running on the
pro cessor are co ntaine d in a single add ress s pace.
Maximum address space size is 4 Gigabytes (2
For ease of use the 80960KB has a small number of
addr ess ing mode s, bu t inc lude s al l th ose nece ssa ry
to ensure efficient execution of high-level languages
such as C. Table 2 lists the mode s.
Table 2. Memory Addressing Modes
• 12-Bit Offset
• 32-Bit Offset
• Register-Indirect
• Register + 12-Bit Offset
• Register + 32-Bit Offset
• Register + (Index-Register x Scale-Factor)
• Register x Scale Factor + 32-Bit Displacement
• Register + (Index-Register x Scale-Factor) +
32-Bit Displacement
• Scale-Facto r is 1, 2, 4, 8 or 16
1.1.2Data Types
The 80960KB recognizes the following data types:
• 8-, 16-, 32- and 64-bit ordinals
• 8-, 16-, 32- and 64-bit integers
• 32-, 64- and 80-bit real numbers
• Bit Field
• Triple Word (96 bits)
• Quad-Word (128 bits)
1.1.3Large Registe r Se t
The 8096 0KB pro gram ming en vironm ent inc ludes a
large number of registers. In fact, 32 registers are
available at any time. The availability of this many
registers greatly reduces the number of memory
accesses required to perform algorithms, which
leads to greater instruction processing speed.
There are two types of general-purpose registers:
local an d global. The 20 global registe rs consist of
sixteen 32-bit registers (G0 though G15) and four
80-bit registers (FP0 through FP3). These registers
perform the same function as the general-purpose
regi ste rs p r ov id ed i n other po pu la r m i croproc es so r s .
The term global refers to the fact that these registers
retain their contents across procedure calls.
The local registers, on the other hand, are procedure
specific. For each procedure call, the 80960KB
alloca tes 16 loca l regis ters ( R0 thr ough R1 5). Ea ch
local register is 32 bits wide. Any register can also be
used for single or double-precision floating-point
operations; the 80-bit floating-point registers are
provided for extended prec ision.
1.1.4Multiple Register Sets
To further increase the efficiency of the register set,
multiple sets of local registers are stored on-chip
(See Figure 4). This cache holds up to four local
register frames, which means that up to three
procedure calls can be made without having to
access the procedure stack resident in memory.
Alth ough programs may have procedure calls nested
many calls deep, a program typically oscillates back
and forth between only two to three levels. As a
result, with four stack frames in the cache, the
probability of having a free frame available on the
cache when a c a ll is m ade is ve r y hi gh . In fact, runs
of representative C-language programs show that
80% of the calls are handled without needing to
access memory.
If four or more procedures are active and a new
proced ure is called, the 80960KB moves the oldest
local register set in the stack-frame cache to a
proc edure stac k in m emor y to make room fo r a new
set of registers. Global register G15 is the frame
pointer (FP) to the procedure stack.
Global and floating point registers are not
exchanged on a procedure call, but retain their
contents, making them available to all procedures for
fast parameter passing.
1.1.5Instruction Cache
To further reduce memory accesses, the 80960KB
includes a 512-byte on-chip instruction cache. The
instruction cache is based on the concept of locality
of reference; most programs are not usually
executed in a steady stream but consist of many
branches, loops and procedure calls that lead to
jumping back and forth in the same small section of
code. Thus, by maintaining a block of instructions in
cache, the number of memory references required to
read instructions into the processor is greatly
To load the instruction cache, instructions are
fetched in 16-byte blocks; up to four instructions can
be fetched at one time. An efficient prefetch
algorithm increases the probability that an instruction
will already be in the cache when it is needed.
Code for small loops often fits entirely within the
cache, leading to a great increase in processing
speed since further memory references might not be
neces s ary u nti l the program ex its the lo op. Sim il ar l y,
when calling short procedures, the code for the
calling procedure is likely to remain in the cache so it
will be there on the procedure’s return.
1.1.6Register Sco reb oa rd ing
The instruction decoder is optimized in several ways.
One optimization method is the ability to overlap
instructions by using register scoreboarding.
Register scoreboarding occurs when a LOAD moves
a variable from memory into a register. When the
instruction initiates, a scoreboard bit on the target
register is set. Once the register is loaded, the bit is
reset. In between, any reference to the register
contents is accompanied by a test of the scoreboard
bit to ensure that the load has completed before
processing continues. Since the processor does not
need to wait for the LOAD to complete, it can
execute additional instructions placed between the
LOAD and the instruction that uses the register
contents, as shown in the following example:
In essence, the two unrelated instructions between
LOAD an d ADD are exe cu ted “f or f r ee ” ( i .e. , ta ke n o
apparent time to execute) because they are
executed while the register is being loaded. Up to
three load in structions can be pending at one tim e
with three corresponding scoreboard bits set. By
exploiting this feature, system programmers and
compiler writers have a useful tool for optimizing
exec ution speed.
Figure 4. Multiple Register Sets Are Stored On-Chip
1.1.7Floating-Point Arithmetic
In the 80960KB, floating-point arithmetic has been
made an in teg r a l pa rt o f th e a rc hit ec tu r e. H avin g th e
floating-point unit integrated on-chip provides two
advantages. First, it improves the perform ance of the
chip for floating-point applications, since no
additional bus overhead is associated with
floating-point calculations, thereby leaving more time
for other bus operations such as I/O. Second, the
cost of using floating-point operations is reduced
because a separate coprocessor chip is not
The 80960KB floating-point (real-number ) data types
include single-precision (32-bit), double-precision
(64-bit) and extended precision (80-bit) floating-point
numbers. Any registers may be used to execute
floating-point operations.
Table 3. Sample Floating-Point Execution Times
(µs) at 25 MHz
Square Root3.73.9
Arcta ng ent10.113.1
1.1.8 High Bandwidth Local Bus
The processor provides hardware support for both
mandatory and recommended portions of IEEE
Standard 754 for floating-point arithmetic, including
all arithmetic, exponential, logarithmic and other
transcendental functions. Table 3 shows execution
times for some representative instructions.
The 80960KB CPU resides on a high-bandwidth
address/data bus known as the local bus (L-Bus).
The L-Bus provides a direct communication path
between the processor and the memory and I/O
subsystem interfaces. The processor uses the L-Bus
to fetch instructions, manipulate memory and
respond to interrupts. L-Bus features include:
• 32-bit mul tiplexed address/data path
• Four-word burst capability which allows transfers
from 1 to 16 byte s at a tim e
• High bandwidth reads and writes with
66.7 MBytes/s burst (at 25 MHz)
Tabl e 4 defines L-bu s signal names and functions;
Table 5 defines other component-support signals
such as interrupt lines.
1.1.9Interrupt Handling
The 80960KB can be interrupted in two ways: by the
activa tion of on e o f fou r inter rup t pin s or b y se nding
a message on the processor’s data bus.
The 80960KB is unusual in that it automatically
handle s interr upts on a pr iority ba sis and can keep
track of pending interrupts through its on-chip
interr upt co ntroller. Two of the inte rrupt pi ns can be
configured to provide 8259A-style handshaking for
expansion beyond four interrupt lines.
1.1.10 Debug Features
The 8096 0KB has bui lt-in debug ca pabilitie s. There
are two types of breakpoints and six trace modes.
Debu g feature s are co ntroll ed by two internal 32-bi t
registers: the Process-Controls Word and the
Trace-Controls Word. By setting bits in these control
words, a software debug monitor can closely control
how the processor responds during program
The 80960KB provides two hardware breakpoint
registers on-chip which, by using a special
command, can be set to any value. When the
instr uction pointer matches either breakpoint register
value, the breakpoint handling routine is automatically called.
The 80960KB also provides software breakpoints
through the use of two instructions: MARK and
FMARK. These can be placed at any point in a
prog ram and cau se the proce ssor to halt execution
at that point and call the breakpoint handling routine.
The breakpoint mechanism is easy to use and
provides a powerful debugging tool.
Tracing is available for instructions (single step
execution), calls and returns and branching. Each
trace type may be enabled sepa rately by a spec ial
debug instruction. In each case, the 80960KB
executes the in struction firs t and then calls a trace
handling routine (usually part of a software debug
monitor). Further program execution is halted until
the routine completes, at which time execution
resumes at the next instruction. The 80960KB’s
tracing mechanisms, implemented completely in
hardware, greatly simplify the task of software test
and debug.
1.1.11 Fault Detection
The 80960KB has an automatic mechanism to
handle faults. Fault types include floating point, trace
and ar ith meti c fa ults. When t he p roces sor dete cts a
fault, it automatically calls the appropriate fault
handling routine and saves the current instruction
pointer and necessary state information to make
efficient recovery possible. Like interrupt handling
routines, fault handling routines are usually writ ten to
meet the needs of specific applications and are often
included as part of the operating system or kerne l.
For each of the fault types, there are numerous
subtypes that provide specific information about a
fault. For example, a floating point fault may have
the subtype set to an Overflow or Zero-Divide fault.
The fault handler can use this specific information to
respond correctly to the fault.
1.1.12 Built-in Testability
Upon re se t, the 80 960KB a ut om at ic ally c on du c ts a n
exhaustive internal test of its major blocks of logic.
Then, b efore ex ecutin g its firs t instru ction, it does a
zero check sum on the first eight words in memory to
ensure that the memory image was programmed
correctly. If a problem is discovered at any point
during the self-test, the 80960KB asserts its
pin an d will not beg in prog ram exe cution .
Self test takes approximately 47,000 cycles to
System manufacturers can use the 80960KB’s
self-test feature d uring incoming parts i nspection. No
special diagnostic programs need to be written. The
test is both thorough and fast. The self-t est capability
helps ensure that defective parts are discovered
before systems are shipped and, once in the field,
the self -test makes it ea sier to distingu ish between
prob lems cau sed b y pr oces sor fa ilur e an d prob le ms
resulting from other causes.
1.1.13 CHMOS
The 809 60KB is fabr icated using Intel’s CHMOS IV
(Complementary High Speed Metal Oxide Semiconductor ) pr oc ess. The 8 0960 KB is cu rre ntly avai labl e
in 16, 20 and 25 MHz versions.
Table 4. 80960KB Pin Description: L-Bus Signals (S heet 1 of 2)
CLK2ISYSTEM CLOCK provides the fundamental timing for 80960KB systems. It is
divided by two inside the 80960KB and four 80-bit registers (FP0 through FP3) to
generate the internal processor clock.
LOCAL ADDRESS / DATA BUS carries 32-bit physical addresses and data to
and from memory. During an address (T
address (bits 0-1 indicate SIZE; see below). Dur ing a data (T
) cycle, bits 2-31 contain a physical word
) cycle, bits 0-31
contain read or write data. These pins float to a high impe dance state when not
Bits 0-1 comprise SIZE during a T
cycle. SIZE specifies burst transfer size in
001 Word
012 Words
103 Words
114 Words
IREADY indicates that data on LAD lines can be sampled or removed. If READY
ADDRESS LATCH ENABLE indicates the transfer of a physical address. ALE is
asserted during a T
cycle and deasserted before the beginning of the Td state. It
is activ e LO W an d flo at s to a hig h im pe dance st ate dur i ng a hold cyc le (T
ADDRESS/DATA STATUS indicates an address state. ADS is asserted every Ta
state an d dea ss er ted du rin g t h e fol l owi ng T
asserted again every T
state where READY was asserted in the previous cycle.
state . Fo r a b ur st t r ansa ct ion, A DS is
WRITE/READ specifies, during a Ta cycle, whether th e operation is a writ e or
read. It is latched on-chip and r emains valid during T
DATA TRANSMIT / RECEIVE indicates the direction of data transfer to and from
the L-Bus. It is low during T
edgm ent; it is hi gh during T
when DEN
is asserted.
and Td cycles for a read or interr upt acknowl-
and Td cycles for a write. DT/R neve r ch an ges state
DATA ENABLE (active low) enables data transceivers. The processor asserts
DEN# during all Td and Tw states. The DEN# line is an open drain-output of the
is not asserted during a T
inserting a wait state (T
cycle, the Td cycle is extended to the next cycle by
) and ADS is not asserted in the next cycle.
BUS LOCK prevents bus masters from gaining control of the L-Bus during
Read/Modify/Write (RMW) cycle s. The processor or any bus agent may assert
At t he start of a RMW operat ion, the processor examines the LOCK
pin. If the pin
is already asserted, the processor waits until it is not asserted. If the pin is not
asser te d, th e p roce ss or ass ert s LO CK
The processor deasserts LOCK
time LOCK
is asserted, a bus agent can perform a normal read or write but not a
in the Ta cycle of the write transaction. During th e
during the Ta cycle of the read transaction.
RMW op eration.
The processor also asserts LOCK
Do not leave LOCK
unconnected. It must be pulled high for the processor to
during interrupt-acknowledge transactions.
function properly.
I/O = Input/Output, O = Output, I = Input, O.D. = Open Drain, T .S. = Three-state
ERRATA - 6/13/97
pin description omitted.
