Hynix Offices in Korea or Distributors and Representatives listed at address directory may
serve additional information of this manual.
Hynix reserves the right to make changes to any Information here in at any time without
notice.
The information, diagrams, and other data in this manual are correct and reliable;
however, Hynix is in no way responsible for any violations of patents or other rights of
the third party generated by the use of this manual.
Page 3
Specifications and information in this document are subject to change without notice and do
not represent a commitment on the part of Hynix. Hynix reserves the right to make changes
to improve functioning. Although the information in this document has been carefully
reviewed, Hynix does not assume any liability arising out of the use of the product or circuit
described herein.
Hynix does not authorize the use of the Hynix microprocessor in life support applications
wherein a failure or malfunction of the microprocessor may directly threaten life or cause
injury. The user of the Hynix microprocessor in life support applications assumes all risks of
such use and indemnifies Hynix against all damages.
For further information please contact:
SEOUL OFFICE : Hynix YOUNG DONG Bldg.
891, Daechi-dong, Kangnam-gu,
Seoul, Korea.
PHONE : (02) 3459-3662~3
FAX : (02) 3459-3942
SYSTEM IC : 1, Hyangjeong-dong, Hungduk-gu,
Cheongju, 361-725, Korea.
PHONE : (0431) 270-4030~47
FAX : (0431) 270-4075
Copyright 2001Hynix Semiconductor Inc.
Revision Jun. 29, 2001.
The HME GMS30C2232 and GMS30C2216 RISC/DSP is an improved version of
HME’s existing GMS30C2132 and GMS30C2116 RISC/DSP. Using a 0.35 µm CMOS
technology, the performance of the RISC/DSP could be further improved. Being pincompatible to their predecessors, these new RISC/DSP can be used as a direct replacement
in existing customer’s designs.
The GMS30C2216 and GMS30C2232 RISC/DSP are based on hyperstone architecture.
• On chip DRAM controller : FPM(Fast-Page-Mode), (Extended-Data-Out) EDO DRAMs.
• 5.0V Tolerant Input
• Control CLKOUT pin Function
This combination of a high-performance RISC microprocessor with an additional powerful
DSP instruction set and on-chip microcontroller functions offers a high throughput. The
speed is obtained by an optimized combination of the following features:
• Pipelined memory access allows overlapping of memory accesses with execution.
• 8KByte on-chip memory.
• On-chip instruction cache omits instruction fetch in inner loops and provides prefetch.
• Variable-length instructions of 16, 32 or 48 bits provide a large, powerful instruction set,
thereby reducing the number of instructions to be executed.
• Primarily used 16-bit instructions halve the memory bandwidth required for instruction
fetch in comparison to conventional RISC architectures with fixed-length 32-bit
instructions, yielding also even better code economy than conventional CISC
architectures.
• Fast Call and Returnby parameter passing via registers.
Page 12
0-2 CHAPTER 0
• An instruction pipeline depth of only two stages — decode/execute — provides
branching without insertion of wait cycles in combination with Delayed Branch
instructions.
• Range and pointer checks are performed without speed penalty, thus, these checks need
no longer be turned off, thereby providing higher runtime reliability.
• Separate address and data buses provide a throughput of one 32-bit word each cycle.
The features noted above contribute to reduce the number of idle wait cycles to a bare
minimum. The processor is designed to sustain its execution rate with a standard DRAM
memory.
The low power consumption is of advantage for mobile (portable) applications or in
temperature-sensitive environments.
Most of the transistors are used for the on-chip memory, the instruction cache, the register
stack and the multiplier, whereas only a smallnumber is required for the control logic.
Due to their low system cost, the GMS30C2216 and GMS3OC2232 RISC/DSP are very
well suited for embedded-systems applications requiring high performance and lowest cost.
To simplify board design as well as to reduce system costs, the GMS30C2216 and
GMS30C2232 already come with integrated periphery, such as a timer and memory and
bus control logic. Therefore, complete systems with the HME’s microprocessor can be
implemented with a minimum of external components. To connect any kind of memory or
I/O, no glue logic is necessary. It is even suitable for systems where up to now
microprocessors with 16-bit architecture have been used for cost reasons. Its improved
performance compared to conventional microcontrollers can be used to software-substitute
many external peripherals like graphics controllers or DSPs.
The software development tools include an optimizing C compiler, assembler, source-level
debugger with profiler as well as a real-time kernel with an extremely fast response time.
Using this real-time kernel, up to 31 tasks, each with its own virtual timer, can be
developed independently of each other. The synchronization of these tasks is effected
almost automatically by the real-time kernel. To the developer, it seems as if he has up to
31 HME’s microprocessors to which he can allocate his programs accordingly. Real-time
debugging of multiple tasks is assisted in an optimized way.
The following description gives a brief architectural overview:
Compatibility:
• Pin compatible to HME GMS30C2116/32, and hyperstone E1-16/32
• Pin and Function Compatible to hyperstone E1-16/32X
PLL(Phased Locked Loop):
• An internal phased locked loop circuit (PLL) provides clock rate multiplication by a
factor of four, only an external crystal of 27MHz is required to achieve an internal clock
rate of 108MHz.
Page 13
Overview 0-3
Registers:
• 32 global and 64 local registers of 32 bits each
• 16 global and up to 16 local registers are addressable directly
Flags:
• Zero(Z), negative(N), carry(C) and overflow(V) flag
• Compare bits, Compare bits immediate, Compare any byte zero
• Test number of leading zeros
• Set Conditional, save conditions in a register
• Branch unconditional and conditional (12 conditions)
• Delayed Branch unconditional and conditional (12 conditions)
• Call subprogram, unconditional and on overflow
• Trap to supervisor subprogram, unconditional and conditional (11 conditions)
• Frame, structure a new stack frame, include parameters in frame addressing, set frame
length, restore reserve frame length and check for upper stack bound
• Return from subprogram, restore program counter, status register and return-frame
• Software instruction, call an associated subprogram and pass a source operand and the
address of a destination operand to it
• DSP Multiply instructions:
signed and/or unsigned multiplication ⇒ single and double word product
• DSP Multiply-Accumulate instructions:
signed multiply-add and multiply-subtract ⇒ single and double word product sum and
difference
• DSP Halfword Multiply-Accumulate instructions:
signed multiply-add operating on four halfword operands ⇒ single and double word
product sum
• DSP Complex Halfword Multiply instruction:
signed complex halfword multiplication ⇒ real and imaginary single word product
• DSP Complex Halfword Multiply-Accumulate instruction:
signed complex halfword multiply-add ⇒ real and imaginary single word product sum
Page 16
0-6 CHAPTER 0
• DSP Add and Subtract instructions:
signed halfword add and subtract with and without fixed-point adjustment ⇒ single
word sum and difference
• Floating-point instructions are architecturally fully integrated, they are executed as
Software instructions by the present version. Floating-point Add, Subtract, Multiply,
Divide, Compare and Compare unordered for single and double-precision, and Convert
single ⇔ double are provided.
Exceptions:
• Pointer, Privilege, Frame and Range Error, Extended Overflow, Parity Error, Interrupt
and Trace mode exception
• Watchdog function
• Error-causing instructions can be identified by backtracking, thus allowing a very
detailed error analysis
Timer:
• Two multifunctional timers
Bus Interface:
• Separate address bus of 26 (GMS30C2232) or 22 (GMS30C2216) bits and data bus of
up to 32 (GMS30C2232) or 16 bits (GMS30C2216) provide a throughput of four or
two bytes at each clock cycle
• Data bus width of 32, 16 or 8 bits, individually selectable for each external memory area.
• 8-bit, 16-bit, and 32-bit boot width selectable via two external pins.
• 5V tolerant input
• Configurable I/O pins
• Internal generation of all memory and I/O control signals
• Wait pin function for I/O accesses to peripheral devices.
• Wait pin function for memory accesses to address space MEM2.
• On-chip DRAM controller supporting Fast-Page-Mode DRAMs and EDO DRAMs.
• Up to seven vectored interrupts
• Control function for CLKOUT pin.
Power Management:
• Operating voltage : 3.3V ± 0.3V.
• Lower power supply current in power-down mode.
• Clock-Off function to further reduce power dissipation (Sleep Mode)
Page 17
Overview 0-7
DataBus Parity
Bus Interface
Control Unit
Bus Pipeline
Control
32
26
8 kByte
RAM
12
Execution
32
(22)
4
64 Local
26 Global
Y-Decode
0.2 Block Diagram
Register Set
XYPC
XY
ALU
Barrel shifter
ZWA
X-Decode
Instruction
Load
Instruction
Cache
Decode
Cache
Control
Instruction
Decode
I
X Y
DSP
Instruction
Control Unit
Execution
Unit
Hardware-
Multiplier
Instruction Prefetch
Control Unit
Store Data
Pipeline
(16)
(2)
Figure 0.1: Block Diagram
Address
Bus
Memory Address
Pipeline
Watchdog
Power
Down+
Reset
Control
Internal
Timer
Interrupt
control
4
Control
Bus
Page 18
0-8 CHAPTER 0
213456789
101112131415161718192021222324
25
108
107
106
105
104
103
102
101
10099989695
949392919089888786
85
84
97
VCC
GND
IO3
IOWR#
CS3#
CS2#
CS1#
GND
RAS#
A19
VCC
A20
A21
GND
D31
D30
D29
A9
A10
A11
A12
VCC
D28
D27
D26
GND
WE2# /BE2#
IORD#
OE#
VCC
CAS3#
CAS2#
CAS1#
GND
XTAL1/CLKIN
XTAL2
IO2
VCC
D16
D17
D18A3A2A1A0
GND
DP1
DP0
838281
BOOTW
CLKOUT
IO1
GND
RQST
INT4
INT3 /WAIT
INT2
INT1
GND
VCC
2627282930313233343536
GND
D25
D15
D14
VCC
D13
D12
D11
D10
GND
VCC
VCC
WE3# /BE3#NCNCNCNC
109
110
111
112
113
114
115
116
117
118
119
120
37383940NCNCNC
NC
0.3 Pin Configuration
0.3.1 GMS30C2232, 160-Pin MQFP-Package - View from Top Side
Power. Connected to the power supply. It can be 3.3V power
Ground. Connected to the system ground. All GND pins must
Input for Quartz Clock. When the clock is generated by
to
Address Bus. With the GMS30C2232, only A22..A0 are
Row Address Strobe. RAS# is activated when the processor
accesses a DRAM or refresh cycle. When a SRAM is placed in
M for
Write Enable. Active low indicates a write access, active high
Chip Select. Active low of CS1#..CS3# indicates chip select
SRAM Write Enable. Active low indicates write enable for the
I/O Read Strobe, optionally I/O Data Strobe. The use of
Bus Grant. GRANT# is signaled low by an bus arbiter to grant
master. ACT is signaled high when GRANT# is
Interrupt Request A signal of INT1..INT4 interrupt request
pins causes an interrupt exception when interrupt lock flag L is
Output Port. IO1..IO3 can be individually
configured via IOxDirection bits in the FCR as either input or
T# low resets the processor to the initial
state and halts all activity. RESET# must be low for at least
0.3.3 Pin Function
Type Name State Use
Power VCC I
GND I
Clock XTAL1 I
XTAL2 O Output for Quartz Clock.
CLKOUT O Clock Signal Output. It can be used to supply a clock signal
Address Bus A25..A0 O/Z
Data Bus D31..D0 I/O Data Bus. 32-bit bidirectional data bus
DP0..DP3 I/O Data Parity Signal. Bidirectional parity signals
Bus Control RAS# O/Z
supply.
be connected to the system ground.
external clock generator, XTAL1 is used as clock input.
peripheral devices.
connected to the address bus pins
CAS0#..CAS3# O/Z Column Address Strobe. They are only used by a DRA
WE# O/Z
CS1#..CS3# O/Z
WE0#..WE3#
OE# O/Z Output Enable for SRAMs and EPROMs.
IORD# O/Z
IOWR# O/Z I/O Write Strobe.
Bus Control RQST O RQST signals the request for a memory or I/O access
GRANT# I
ACT O Active as bus
Interrupt INT1..INT4 I
O/Z
MEM0, RAS# is used as the chip select signal
column access cylices and for “CAS before RAS” refresh.
indicates a read access.
for the memory areas MEM1..MEM3.
corresponding byte.
IORD# is specified in the I/O address bit 10.
access to the bus for memory and I/O cycles
low and it is kept high during a current bus access
I/O Port IO1..IO3 I/O General Input-
System Control
RESET# I Reset Processor. RESE
clear and the corresponding INTxMask bit in FCR is not set.
output pins (port).
two cycles
Page 21
ARCHITECTURE 1-1
1. Architecture
1.1 Introduction
1.1.1 RISC Architecture
In the early days of computer history, most computer families started with an instruction
set which was rather simple. The main reason for being simple then was the high cost for
hardware. The hardware cost has dropped and the software cost has gone up steadily in the
past three decades.
The net result is that more and more functions have been built into the hardware, making
the instruction set very large and very complex. The growth of instruction sets was also
encouraged by the popularity of microprogrammed control in the 1960s and 1970s. Even
user-defined instruction sets were implemented using microcodes in some processors for
special-purpose applications.
The evolution of computer architectures has been dominated by families of increasingly
complex processors. Under market pressures to preserve existing software, Complex Instruction Set Computer (CISC) architectures evolved by the gradual addition of
microcode and increasingly elaborate operations. The intent was to supply more support
for high-level languages and operating systems, as semiconductor advances made it
possible to fabricate more complex integrated circuits. It seemed self-evident that
architectures should become more complex as these technological advances made it
possible to hold more complexity on VLSI devices.
In recent years, however, Reduced Instruction Set Computer (RISC) architectures have
implemented a much more sophisticated handling of the complex interaction between
hardware, firmware and software. RISC concepts emerged from statistical analysis of how
software actually uses the resources of a processor. Dynamic measurement of system
kernels and object modules generated by optimizing compilers show an overwhelming
predominance of the simplest instruction, even in the code for CISC machine. Complex
instructions are often ignored because a single way of performing a complex operation
needs of high-level language and system environments. RISC designs eliminate the
microcoded routines and turn the low-level control of the machine over to software.
This approach is not new. But its application is more universal in recent years thanks to the
prevalence of high-level languages, the development of compilers that can optimize at the
microcode level, and dramatic advances in semiconductor memory and packaging. It is
now feasible to replace machine microcode ROM with faster RAM, organized as an
instruction cache. Machine control then resides in the instruction cache and is, in fact,
customized on the fly. The instruction stream generated by system- and compiler-generated
code provides a precise fit between the requirements of high-level software and the
capabilities of the hardware. So compilers are playing a vital role in RISC performance.
The advantage of RISC architecture is described as follows:
• Simplicity made VLSI implementation possible and thus higher clock rates.
• Hardwired control and separated data and program caches lower the average CPI
(Cycles per Instruction) significantly.
Page 22
1-2 CHAPTER 1
• Dynamic instruction count in a RISC program only increased slightly (less than 2)
inordinary program.
• Recently, the MIPS (Million Instructions per Second) rate of a typical RISC
microprocessor increased with a factor of 5/(2*0.1) = 25 times from that of a typical
CISC microprocessor.
• The clock rate increased from 10 MHz on a CISC processor to 50 MHz on a CMOS/
RISC microprocessor.
• The instruction count in a typical RISC program increased less than 2 times form that of
a typical CISC program.
• The average CPI for a RISC microprocessor decreased to 1.2 (instead of 12 as in a
typical CISC processor).
1.1.2 Techniques to reduce CPI (Cycles per Instruction)
If the work each instruction performs is simple and straightforward, the time required to
execute each instruction can be shortened and the number of cycles reduced. The goal of
RISC designs has been to achieve an execution rate of one instruction per machine cycle
(multiple-instruction-issue designs now seek to increase this rate to more than one
instruction per cycle). Techniques that help achieve this goal include:
• Instruction pipelines
• Load and store (load/store) architecture
• Delayed load instructions
• Delayed branch instructions
(1) Instruction Pipelines
One way to reduce the number of cycles required to execute an instruction is to overlap the
execution of multiple instructions. Instruction pipelines divide the execution of each
instruction into several discrete portions and then execute multiple instructions
simultaneously. The instruction pipeline technique can be likened to an assembled line the instruction progresses from one specialized stage to the next until it is complete (or
issued) - just as an automobile moves along an assembly line. (This is contrast to the
nonpipeline, microcode approach, where all the work is done by one general unit and is
less capable at each individual task.) For example, the execution of an instruction might be
subdivided into four portions, or clock cycles, as shown in Figure 1.1:
Cycle
#1
Fetch
Instruction
(F)
Cycle
#2
ALU
Operation
(A)
Cycle
#3
Access
Memory
(M)
Cycle
#1
Write
Results
(W)
Figure 1.1: Functional Division of a Hypothetical Pipeline
Page 23
ARCHITECTURE 1-3
An Instruction pipeline can potentially reduce the number of cycles/instructions by a factor
equal to the depth of the pipeline (the depth of the pipeline = the number of resource). For
example, in Figure 3.2 each instruction still requires a total of four clock cycles to execute.
However, if the four-level instruction-pipeline is used, a new instruction can be initiated at
each clock cycle and the effective execution rate is one cycle per instruction.
Clock Cycles
Instruction
#1
FAMW
#2
FAMW
#3
FAMW
#4
FAMW
Figure 1.2: Multiple Instructions in a Hypothetical Pipeline
(2) Load/Store Architecture
The discussion of the instruction pipeline illustrates how each instruction can be
subdivided into several discrete parts that permit the processor to execute multiple
instructions in parallel. For this technique to work efficiently, the time required to execute
each instruction subpart should be approximately equal. If one part requires an excessive
length of time, there is an unpleasant choice: either halting the pipeline (inserting wait or
idle cycles), or making all cycles longer to accommodate this lengthier portion of the
instruction.
Instructions that perform operations on operands in memory tend to increase either the
cycle time or the number of cycles/instruction. Such instruction require additional time for
execution to calculate the addresses of the operands, read the required operands from
memory, calculate the result, and store the results of the operation back to memory. To
eliminate the negative impact of such instruction, RISC designs implement a load and store
(load/store) architecture in which the processor has many register, all operations are
performed on operands held in processor registers, and main memory is accessed only by
load and store instructions.
This approach produces several benefits
• Reducing the number of memory accesses eases memory bandwidth requirements
• Limiting all operations to registers helps simplicity the instruction set
• Eliminating memory operations makes it easier for compilers to optimize register
allocation - this further reduces memory accesses and also reduces the instructions/task
factor
Page 24
1-4 CHAPTER 1
All of these factors help RISC design approach their goal of executing one
cycle/instruction. However, two classes of instructions hinder achievement of this goal load instructions and branch instructions. The following sections discuss how RISC
designs overcome obstacles raised by these classes of instructions.
(3) Delayed Load Instructions
Load instruction read operands from memory into processor register for subsequent
operation by other instructions. Because memory typically operates at much slower speeds
than processor clock rates, the loaded operand is not immediately available to subsequent
instructions in an instruction pipeline. The data dependency is illustrated in Figure 1.3.
Load
Instruction
1
Figure 1.3: Data Dependency Resulting From a Load Instruction
FAMW
2
FAMW
3
FAMW
4
FAMW
Data from Load
available as operation
In this illustration, the operand loaded by instruction 1 is not available for use in the A
cycle (ALU, or Arithmetic/Logic Unit operation) of instruction 2. One way to handle this
dependency is to delay the pipeline by inserting additional clock cycles into the execution
of instruction 2 until the loaded data becomes available. This approach obviously
introduces delays that would increase the cycles/instructions factor.
In many RISC design the technique used to handle this data dependency is to recognize
and make visible to compilers the fact that all load instructions have an inherent latency or
load delay. Figure 3.3 illustrates a load delay or latency of one instruction. The instruction
that immediately follows the load is in the load delay slot. If the instruction in this slot does
not require the data from the load, and then no pipeline delay is required.
If this load delay is made visible to software, a compiler can arrange instructions to ensure
that there is no data dependency a load instruction and the instruction in the load delay slot.
The simplest way of ensuring that there is no data dependency is to insert a No Operation
(NOP) instruction to fill the slot, as follow:
Load R1, A
Load R2, B
NOP <= This instruction fills the delay slot
ADD R3, R1, R2
Although filling the delay slot with NOP instructions eliminates the need for hardwarecontrolled pipeline stalls in this case, it still is not a very efficient use of the pipeline stream
Page 25
ARCHITECTURE 1-5
since these additional NOP instructions increase code size and perform no useful work. (In
practice, however, this technique need not have much negative impact on performance.)
A more effective solution to handling the data dependency is to fill the load delay slot with
a useful instruction. Good optimizing compilers can usually accomplish this, especially if
the load delay is only one instruction. Below example program illustrates how a compiler
might rearrange instruction to handle a potential data dependency.
# Consider the code for C := A+B; F := D
Load R1, A
Load R2, B
Add R2, R1, R2 <= This instruction stalls because R2 data is not available
Load R4, D
..... ....
# An alternative code sequence (where delay length = 1)
Load R1, A
Load R2, B
Load R4, D
Add R3, R1, R2 <= No stall since R2 data is available
(4) Delayed Branch Instructions
Branch instructions usually delay the instruction pipeline because the processor must
calculate the effective destination of the branch and fetch that instruction. When a cache
access requires an entire cycle, and the fetched branch instruction specifies the target
address, it is impossible to perform this fetch (of the destination instruction) without
delaying the pipeline for at least one pipe stage (one cycle). Conditional branches can
cause further delays because they require the calculation of a condition, as well as the
target address.
Instead of stalling the instruction pipeline to wait for the instruction at the target address,
RISC designs typically use an approach similar to that used with Load instruction: Branch
instructions are delayed and do not take effect until after one or more instructions
immediately following the Branch instruction have been executed. The instruction or
instructions immediately following the Branch instruction (delay instruction) have been
executed. Branch and delayed branch instruction are illustrated in Figure 1.4
Condition ?
Delayed Branch
Condition ?
NO
YES
Branch Target
Delay Instruction
Next Instruction
Next Instruction
Branch InstructionDelayed Branch Instruction
Figure 1.4: Block Diagram of Branch/Delayed Branch Instruction
YES
NO
Branch Target
Page 26
1-6 CHAPTER 1
1. The instruction is read from the instruction cache
The control signal of Rd (destination operand) and Rs
(source operand) is activated according to the instruction
2.1 The control signal of IR (immediate register
ess of next instruction is calculated and saved
ister stack using the
2.1 The control of ALU datapath is made and instruction
in the register
Additional ALU operation is continued and its result is
1.1.3 The pipeline structure of GMS30C2232
GMS30C2232 has a two-stage pipeline structure and each stage is composed of two phases
(TM and TV). The basic structure of GMS30C2232 pipeline is two-stage pipeline, but
actually it is lengthened by the need of some instruction. As a example, standard ALU
instruction uses 5 phases (2 stage pipeline (4 phases) + additional 1 phase). This additional
phase doesn’t use the datapath which is used next instruction, so next instruction execution
need not wait until previous ALU instruction is ended. DSP instruction takes over 2 stage
pipeline for execution, and requires same resource in the datapath which is required to next
DSP instruction. So next DSP instruction is delayed.
The pipeline structure of GMS30C2232 and the action of datapath is described in Table 1.1.
Stage Phase Datapath Action
Fetch/Decode TM (Low)
TV (High) 2.
according to the address of instruction.
that was loaded in TM phase
(operand)) and IL (instruction length) is activated.
2.2 The addr
in PC
Execute/Write TM (Low) 1. The next instruction is read from the instruction cache.
1.1 The address of Rs and Rs are determined.
1.2 The immediate operand is determined.
1.3 The operand is read from reg
address of Rs and Rd.
1.4 The operand XR, YR and QR are controlled.
TV (High) 2. The input data of ALU is attained.
is executed in ALU.
2.2 The result of ALU operation is saved
file.
Additional
Insertion
Next TM
saved in the register file.
Table 1.1: The pipeline structure of GMS30C2232 and the action of datapath.
Page 27
ARCHITECTURE 1-7
1.2 Global Register Set
The architecture provides 32 global registers of 32bit each. These are:
G0 Program Counter PC
G1 Status Register SR
G2 Floating-point Exception Register FER
G3..G15 General purpose registers
G16..G17 Reserved
G18 Stack Pointer SP
G19 Upper stack Bound UB
G20 Bus Control Register BCR (see section 6. Bus Interface)
G21 Timer Prescaler Register TPR (see section 5. Timer and CPU Clock
Modes)
G22 Timer Compare Register TCR (see section 5. Timer and CPU Clock
Modes)
G23 Timer Register TR (see section 5. Timer and CPU Clock Modes)
G24 Watchdog Compare Register WCR (see section 6. Bus Interface)
G25 Input Status Register ISR (see section 6. Bus Interface)
G26 Function Control Register FCR (see section 6. Bus Interface)
G27 Memory Control Register MCR (see section 6. Bus Interface)
G28..G31 Reserved
Registers G0..G15 can be addressed directly by the register code (0..15) of an instruction.
Registers G18..G27 can be addressed only by a MOV or MOVI instruction with the high
global flag H set to 1.
(Example)
MOVI G2, 0x20 ; G2 := 0x20 (set H flag)
MOV G3, G19 ; G3 := G19 (G19 (UB) is copied to G3)
Page 28
1-8 CHAPTER 1
G15
G16
G17
G18
G19
G20
G21
G22
G23
G24
G25
G26
G27
G28
031
G0
G1
G2
G3
Program Counter PC
Status Register SR
Floating-Point Exception Register FER
General Purpose Registers G3..G15
Reserved
Reserved
Stack Pointer SP
Upper Stack Bound UB
Bus Control Register BCR
Timer Prescaler Register TPR
Timer Compare Register TCR
Timer Register TR
Watchdog Compare Register WCR
Input Status Register ISR
0
0000
Function Control Register FCR
Memory Control Register MCR
G28..G31 Reserved
G31
Figure 1.5: Global Register Set
1.2.1 Program Counter PC, G0
G0 is the program counter PC. It is updated to the address of the next instruction through
instruction execution. Besides this implicit updating, the PC can also be addressed like a
regular source or destination register. When the PC is referenced as an operand, the
supplied value is the address of the first byte after the instruction which references it (the
address of next instruction), except when referenced by a delay instruction with a
preceding delayed branch taken. At delay branch instruction, when the branch condition is
met, place the branch address PC + rel (relative to the address of the first byte after the
Delayed Branch Instruction) in the PC (see section 1.26. Delayed Branch Instructions).
Placing a result in the PC has the effect of a branch taken. When branch is taken, the target
address of branch is placed in PC.
Bit zero of the PC is always zero, regardless of any value placed in the PC.
Page 29
ARCHITECTURE 1-9
1.2.2 Status Register SR, G1
G1 is the status register SR. Its content is updated by instruction execution. Besides this
implicit updating, the SR can also be addressed like a regular register (when H flag is set).
When addressed as source or destination operand, all 32 bits are used as an operand.
However, only bits 15..0 of a result can be placed in bits 15..0 of the SR, bits 31..16 of the
result are discarded and bits 31..16 of the SR remain unchanged. When SR addressed as
source operand, it represents 0x0 value. The full content of the SR is replaced only by the
Return Instruction. A result placed in the SR overrules any setting or clearing of the
condition flags as a result of an instruction.
31 3027 26 25 24 23 22 21 20 19 18 17 16
Figure 1.6: Status Register SR (bits 31..16)
15 1411 109876543210
LI
FRM
2829
FP
Frame PointerFrame Length
1213
FTEVN
FLS
ILC
Instruction-Length Code
MH
P
Supervisor State Flag
Z
T
Trace-Mode Flag
Trace Pending Flag
C
Carry Flag
Zero Flag
Interrupt-Mode Flag
Floating-Point Trap Enable
Floating-Point Rounding Mode
Interrupt-Lock Flag
Figure 1.7: Status Register SR (bits 15..0)
Negative Flag
Overflow Flag
Cache-Mode Flag
High Global Flag
Reserved
Page 30
1-10 CHAPTER 1
The status register SR contains the following status information:
C Carry Flag. Bit zero is the carry condition flag C. In general, when set it
indicates that the unsigned integer range is exceeded (overflow). At add
operations, it indicates a carry out of bit 31 of the result. At subtract operations,
it indicates a borrow (inverse carry) into bit 31 of the result.
Z Zero Flag. Bit one is the zero condition flag Z. When set, it indicates that all 32
or 64 result bits are equal to zero regardless of any carry, borrow or overflow.
N Negative Flag. Bit two is the negative condition flag N. On compare
instructions, it indicates the arithmetic correct (true) sign of the result
regardless of an overflow. On all other instructions, it is derived from result bit
31, which is the true sign bit when no overflow occurs. In the case of overflow,
result bit 31 and N reflect the inverted sign bit.
V Overflow Flag. Bit three is the overflow condition flag V. In general, when set
it indicates a signed overflow. At the Move instructions, it indicates a floatingpoint NaN (Not a Number).
M Cache-Mode Flag. Bit four is the cache-mode flag M. Besides being set or
cleared under program control, it is also automatically cleared by a Frame
instruction and by any branch taken except a delayed branch. See section
1.8. Instruction Cache for details.
H High Global Flag. Bit five is the high global flag H. When H is set, denoting
G0..G15 addresses G16..G31 instead. Thus, the registers G18..G27 may be
addressed by denoting G2..G11 respectively.
The H flag is effective only in the first cycle of the next instruction after it was
set; then it is cleared automatically.
Only the MOV or MOVI instruction issued as the next instructions must be
used to copy the content of a local register or an immediate value to one of the
high global registers. The MOV instruction may be used to copy the content of
a high global register (except the BCR, TPR, FCR and MCR register, which
are write-only) to a local register. With all other instructions, the result may be
invalid.
If one of the high global registers is addressed as the destination register in user
state (S = 0), the condition flags are undefined, the destination register remains
unchanged and a trap to Privilege Error occurs.
Reserved Bit six is reserved for future use. It must always be zero.
I Interrupt-Mode Flag. Bit seven is the interrupt-mode flag I. It is set
automatically on interrupt entry and reset to its old value by a Return
instruction. The I flag is used by the operating system; it must be never
changed by any user program.
FTE Floating-Point Trap Enable Flag. Bits 12..8 are the floating-point trap enable
flags They determine the Exception type and Trap execution flow(see section
3.33.2. Floating-Point Instructions).
FRM Floating-Point Rounding Mode. Bits 14..13 are the floating-point rounding
modes (see section 3.33.2. Floating-Point Instructions).
Page 31
ARCHITECTURE 1-11
L Interrupt-Lock Flag. Bit 15 is the interrupt-lock flag L. When the L flag is one,
all Interrupt, Parity Error and Extended Overflow exceptions are inhibited
regardless of individual mode bits. The state of the L flag is effective
immediately after any instruction which changed it. The L flag is set to one by
any exception.
The L flag can be cleared or kept set in any or on return to any privilege state
(user or supervisor). Changing the L flag from zero to one is privileged to
supervisor or return from supervisor to supervisor state. A trap to Privilege
Error occurs if the L flag is set under program control from zero to one in user
or on return to user state.
The following status information cannot be changed by addressing the SR:
T Trace-Mode Flag. Bit 16 is the trace-mode flag T. When both the T flag and
the trace pending flag P are one, a trace exception occurs after every instruction
except after a Delayed Branch instruction. The T flag is cleared by any
exception.
Note: The T flag can only be changed in the saved return SR and is then
effective after execution of a Return instruction.
P Trace Pending Flag. Bit 17 is the trace pending flag P. It is automatically set to
one by all instructions except by the Return instruction, which restores the P
flag from bit 17 of the saved return SR.
Since for a Trace exception both the P and the T flag must be one, the P flag
determines whether a trace exception occurs (P = 1) or does not occur (P = 0)
immediately after a Return instruction that restored the T flag to one.
When an instruction is ended, the T and P flag set to one. Therefore trace
exception is occurred. After trace exception trap is ended the process returns to
main program, and if T and P flag is set to one, trace exception occurs again.
To avoid tracing the same instruction in an endless loop, the P flag is cleared at
return instruction in trace exception trap routine.
Note: The P flag can only be changed in the saved SR. No program except the
trace exception handler should affect the saved P flag. The trace exception
handler must clear the saved P flag to prevent a trace exception on return, in
order to avoid tracing the same instruction in an endless loop.
S Supervisor State Flag. Bit 18 is the supervisor state flag S (see section
1.4. Privilege States). The S flag determine whether user state (S=0) or
supervisor state (S=1). It is set to one by any exception.
ILC Instruction-Length Code. Bits 20 and 19 represent the instruction-length code
ILC. It is updated by instruction execution. The ILC holds (in general) the
length of the last instruction: ILC values of one, two or three represent an
instruction length of one, two or three halfwords respectively. After a branch
taken, the ILC is invalid. The Return instruction clears the ILC.
Note: Since a Return instruction following an exception clears the ILC, a
program must not rely on the current value of the ILC.
FL Frame Length. Bits 24..21 represent the frame length FL. The FL holds the
number of usable local registers (maximum 16) assigned to the current stack
frame. FL = 0 is always interpreted as FL = 16.
Page 32
1-12 CHAPTER 1
FP Frame Pointer. Bits 31..25 represent the frame pointer FP. The least significant
six bits of the FP point to the beginning of the current stack frame in the local
register set, that is, they point to L0.
The FP contains bit 8..2 of the address at which the content of L0 would be
stored if pushed onto the memory part of the stack.
1.2.3 Floating-Point Exception Register FER, G2
G2 is the floating-point exception register. All bits must be cleared to zero after Reset.
Only bits 12..8 and 4..0 may be changed by a user program, all other bits must remain
unchanged.
31
Reserved
Figure 1.8: Floating-Point Exception Register
12131110987654
Reserved for Operating System
Floating-Point Actual Exceptions
3
Floating-Point Accrued Exceptions
10
2
The floating-point trap enable flags FTE and the exception flags are assigned as:
floating-point
trap enable
Accrued
exceptions
Actual
exceptions
exception type
FTE
SR(12) G2(4) G2(12) Invalid Operation
SR(11) G2(3) G2(11) Division by Zero
SR(10) G2(2) G2(10) Overflow
The reserved bits G2(31..13) and G2(7..5) must be zero.
A floating-point instruction, except a Floating-point Compare, can raise any of the
exceptions Invalid Operation, Division by Zero, Overflow, Underflow or Inexact. FCMP
and FCMPD can raise only the Invalid Operation exception (at unordered). FCMPU and
FCMPUD cannot raise any exception.
Page 33
ARCHITECTURE 1-13
At an exception, the following additional action is performed:
• Any corresponding accrued-exception flag whose corresponding trap-enable flag is
zero (not enabled) is set to one; all other accrued-exception flags remain unchanged.
• If a corresponding trap-enable flag is one (enabled), any corresponding actual-ex-
ception flag is set to one; all other actual-exception flags are cleared. The destination
remains unchanged.
In the present software version, the software emulation routine must branch to the
corresponding user-supplied exception trap handler. The (modified) result, the source
operand, the stack address of the destination operand and the address of the floating-point
instruction are passed to the trap handler. In the future hardware version, a trap to Range
Error will occur; the Range Error handler will then initiate re-execution of the floatingpoint instruction by branching to the entry of the corresponding software emulation routine,
which will then act as described before.
The only exceptions that can coincide are Inexact with Overflow and Inexact with
Underflow. An Overflow or Underflow trap, if enabled, takes precedence over an Inexact
trap; the Inexact accrued-exception flag G2(0) must then be set as well.
1.2.4 Stack Pointer SP, G18
G18 is the stack pointer SP. The SP contains the top address + 4 of the memory part of the
stack, that is the address of the first free memory location in which the first local register
would be saved by a push operation (see section 3.29. Frame Instruction for details). Stack
growth is from low to high address.
Bits one and zero of the SP must always be cleared to zero. The SP can be addressed only
via the high global flag H being set. Copying an operand to the SP is a privileged operation.
Note: Stack Pointer SP contains the top address + 4 of the memory part of the stack
(memory part stack), and Frame Pointer FP points to the beginning of the current stack
frame in the local register set (register part stack).
1.2.5 Stack Pointer SP, G18
G19 is the upper stack bound UB. The UB contains the address beyond the highest legal
memory stack location. It is used by the Frame instruction to inhibit stack overflow.
Bits one and zero of the UB must always be cleared to zero. The UB can be addressed only
via the high global flag H being set. Copying an operand to the UB is a privileged
operation.
1.2.6 Bus Control Register BCR, G20
G20 is the write-only bus control register BCR. Its content defines the options possible for
bus cycle, parity and refresh control. The BCR defines the parameters (bus timing, refresh
control, page fault and parity error disable) for accessing external memory located in
address spaces MEM0..MEM3. The BCR can be addressed only via the high global flag H
being set. Copying an operand to the BCR is a privileged operation. The BCR register is
described in detail in the bus interface description in section 6.
Page 34
1-14 CHAPTER 1
1.2.7 Timer Prescaler Register TPR, G21
G21 is the write-only timer prescaler register TPR. It adapts the timer clock to different
processor clock frequencies. The TCR can be addressed only via the high global flag H
being set. Copying an operand to the TPR is a privileged operation. The TPR is described
in the timer description in section 5.
1.2.8 Timer Compare Register TCR, G22
G22 is the timer compare register TCR. Its content is compared continuously with the
content of the timer register TR. The TCR can be addressed only via the high global flag H
being set. Copying an operand to the TCR is a privileged operation. The TCR is described
in the timer description in section 5.
1.2.9 Timer Register TR, G23
G23 is the timer register TR. Its content is incremented by one on each time unit. The TR
can be addressed only via the high global flag H being set. Copying an operand to the TR
is a privileged operation. The TR is described in the timer description in section 5.
1.2.10 Watchdog Compare Register WCR, G24
G24 is the watchdog compare register WCR. The WCR can be addressed only via the high
global flag H being set. The WCR is used by the IO3 control mode (Watchdog Mode
FCR(13) = 1, FCR(12) = 0). Copying an operand to the WCR is a privileged operation.
The WCR is described in the bus interface description in section 6.
1.2.11 Input Status Register ISR, G25
G25 is the read-only input status register ISR. The ISR reflects the input levels at the pins
IO1..IO3 as well as the input levels at the four interrupt pins INT1..INT4 and contains the
EvenFlag and the EqualFlag. The ISR can be addressed only via the high global flag H
being set. The ISR is described in the bus interface description in section 6.
1.2.12 Function Control Register FCR, G26
G26 is the write-only function control register FCR. The FCR controls the polarity and
function of the I/O pins IO1..IO3 and the interrupt pins INT1..INT4, the timer interrupt
mask and priority, the bus lock and the Extended Overflow exception. The FCR can be
addressed only via the high global flag H being set. Copying an operand to the FCR is a
privileged operation. The FCR is described in the bus interface description in section 6.
1.2.13 Memory Control Register MCR, G27
G27 is the write-only memory control register MCR. The MCR controls additional
parameters for the external memory, the internal memory refresh rate, the mapping of the
entry table and the processor power management. The MCR can be addressed only via the
high global flag H being set. Copying an operand to the MCR is a privileged operation.
The MCR is described in the bus interface description in section 6.
Page 35
ARCHITECTURE 1-15
1.3 Local Register Set
The architecture provides a set of 64 local registers of 32 bits each. The local registers
0..63 represent the register part of the stack, containing the most recent stack frame(s).
0
L0
L15
63
31
0
Local Register L0
Local Register L15
Figure 1.9: Local Register Set 0..63
The local registers can be addressed by the register code (0..15) of an instruction as
L0..L15 only relative to the frame pointer FP; they can also be addressed absolutely as part
of the stack in the stack address mode (see section 3.1.1. Address Modes).
The absolute local register address is calculated from the register code as:
That is, only the least significant six bits of the sum FP + register code are used and thus,
the absolute local register addresses for L0..L15 wrap around modulo 64. The local register
set organized as a circular buffer.
The absolute local register addresses for FP + register code + 1 or FP + FL + offset are
calculated accordingly.
The least significant six bits of Frame Pointer FP point to the beginning of the current stack
(L0).
Page 36
1-16 CHAPTER 1
1.4 Privilege States
The architecture provides two privilege states, determined by the supervisor state flag S:
User state (S = 0) and supervisor state (S = 1).
The privilege state may be used by an external memory management unit to control
memory and I/O accesses. The operating system kernel is executed in the higher privileged
supervisor state, thereby restricting access to all sensitive data to a highly reliable system
program. The following operations are also privileged to be executed only in the supervisor
or on return from supervisor to supervisor state:
• Copying an operand to any of the high global registers
• Changing the interrupt-lock flag L from zero to one
• Returning through a Return instruction to supervisor state
Any illegal attempt causes a trap to Privilege Error.
The S flag is also saved in bit zero of the saved return PC by the Call, Trap and Software
instructions and by an exception. At Call instruction (CALL Ld, Rs, const) the old PC and
the S flag is saved in Ld and the old SR is saved in Ldf. A Return instruction restores it
from this bit position to the S flag in bit position 18 of the SR (thereby overwriting the bit
18 returned from the saved return SR).
If a Return instruction attempts a return from user to supervisor state, a trap to Privilege
Error occurs (S = 1 is saved).
Returning from supervisor to user state is achieved by clearing the S flag in bit zero of the
saved return PC before return. Switching from user to supervisor state is only possible by
executing a Trap instruction or by exception processing through one of the 64 supervisor
subprogram entries (see section 2.4. Entry Tables).
Note: Since the Return instruction restores the PC first to enable the instruction fetch to
start immediately, the restored S flag must also be available immediately to prevent any
memory access with a false privilege state. The S flag is therefore packed in bit zero of the
saved return PC.
The state of the S flag can be signaled at the IO1 pin in each memory or I/O cycle.
Page 37
ARCHITECTURE 1-17
Register:
1.5 Register Data Types
31
MSBLSB
32 Bits
Bitstring
31
MSB
High-Order 32-Bits
LSBLow-Order 32-Bits
Double-Word Bitstring
31
MSB
32-Bit Magnitude
LSB
Unsigned Integer
31
MSB
High-Order 32-Bit Magnitude
LSBLow-Order 32-Bit Magnitude
Unsigned Double-Word Integer
31
MSBLSBS
31-Bit Magnitude
Signed Integer, Two's Complement
0
n
0
n
n+1
0
n
0
n
n+1
0
n
31
MSBS
High-Order 31-Bit Magnitude
Signed Double-Word Integer, Two's Complement
31
S MSBLSB S
15
MSBLSB
Two Signed Shorts
31
S MSBLSB S
Real PartImaginary Part
15
MSBLSB
Complex Signed Short
31
S 8-Bit Exponent
MSBLSB
23-Bit Fraction
Single Precision Floating-Point Number
31
11-Bit ExponentSMSB
Low-Order 32-Bit Fraction
High-Order 20-Bit Fraction
Double Precision Floating-Point Number
LSBLow-Order 32-Bit Magnitude
LSB
0
n
n+1
0
n
0
n
0
n
0
n
n+1
S = sign bit, MSB = most significant bit, LSB = least significant
Figure 1.10: Register Data Types.
Page 38
1-18 CHAPTER 1
1.6 Memory Organization
The architecture provides a memory address space in the range of 0..232 - 1
(0..4,294,967,295) 8-bit bytes (4GByte). Memory is implied to be organized as 32-bit
words. The following memory data types are available (see figure 3.12)
• Byte unsigned (unsigned 8-bit integer, bitstring or character)
• Byte signed (signed 8-bit integer, two's complement)
• Halfword unsigned (unsigned 16-bit integer or bitstring)
• Halfword signed (signed 16-bit integer, two's complement)
• Word (32-bit undedicated word)
• Double-Word (64-bit undedicated double-word)
Besides the memory address space, a separate I/O address space is provided. In the I/O
address space, only word and double-word data types are available.
Words and double-words must be located at word boundaries, that is, their most significant
byte must be located at an address whose two least significant bits are zero (...xx00).
Halfwords must be located at halfword boundaries, their most significant byte being
located at an address whose least significant bit is zero (...xx0). Bytes may be located at
any address.
The variable-length instructions are located as contiguous sequences of one, two or three
halfwords at halfword boundaries.
Memory- and I/O-accesses are pipelined to an implied depth of two addresses.
Note: All data is located high to low order at addresses ascending from low to high, that is,
the high order part of all data is located at the lower address (Big endian). This scheme
should also be used for the addressing of bit arrays. Though the most significant bit of a
word is numbered as bit position 31 for convenience of use, it should be assigned the bit
address zero to maintain consistent bit addressing in ascending order through word
boundaries.
Word
31 24 23 16 15 8 7 031 24 23 16 15 8 7 0
891011
4567
0123
Big EndianLittle Endian
Figure 1.11: Address of bytes within words: Big-endian and little endian alignment.
Address
8
4
0
891011
4567
0123
Word
Address
8
4
0
Page 39
ARCHITECTURE 1-19
Figure 1.12 shows the location of data and instructions in memory relative to a binary
address n = ...xxx00 (x = 0 or 1). The memory organization is big-endian.
At all data types, the most significant bit is located at the higher and the least significant bit
at the lower bit position.
Page 40
1-20 CHAPTER 1
1.7 Stack
A runtime stack, called stack here, holds generations of local variables in last-in-first-out
(LIFO) order. A generation of local variables, called stack frame or activation record, is
created upon subprogram entry and released upon subprogram return.
The runtime stack provided by the architecture is divided into a memory part and a register
part. The register part of the stack, implemented by a set of 64 local registers organized as
a circular buffer, holds the most recent stack frame(s). The current stack frame is always
kept in the register part of the stack. The frame pointer FP points to the beginning of the
current stack frame (addressed as register L0). The frame length FL indicates the number
of registers (maximum 16) assigned to the current stack frame. The stack grows from low
to high address. It is guarded by the upper stack bound UB.
The stack is maintained as follows:
• A Call, Trap or Software instruction increments the FP and sets FL to six, thus creating
a new stack frame with a length of six registers (including the return PC and the return
SR).
• An exception increments the FP by the value of FL and then sets FL to two.
• A Frame instruction restructures a stack frame to include (optionally) passed parameters
by decrementing the FP and by resetting the FL to the desired length, and restores a reserve of 10 local registers for the next subprogram call. If the required number of
registers + 10 do not fit in the register part of the stack, the contents of the differential
(required + 10 - available) number of local registers are pushed onto the memory part of
the stack. A trap to Frame Error occurs after the push operation when the old value of
the stack pointer SP exceeded the upper stack bound UB. The passed parameters are
located from L0 to the required number of register to be saved passed parameters.
Note: A Frame instruction must be executed before executing any other Call, Trap or
Software instruction or before the interrupt-lock flag L is being cleared, otherwise the
beginning of the register part of the stack at the FP could be overwritten without any
warning.
• A Return instruction releases the current stack frame and restores the preceding stack
frame. If the restored stack frame is not fully contained in the register part of the stack,
the content of the missing part of the stack frame is pulled from the memory part of the
stack.
For more details see the descriptions of the specific instructions.
When the number of local registers required for a stack frame exceeds its maximum length
of 16 (in rare cases), a second runtime stack in memory may be used. This second stack is
also required to hold local record or array data.
The stack is used by routines in user or supervisor state, that is, supervisor stack frames are
appended to user stack frames, and thus, parameters can be passed between user and
supervisor state. A small stack space must be reserved above UB. UB can then be set to a
higher value by the Frame Error handler to free stack space for error handling.
Because the complete stack management is accomplished automatically by the hardware,
programming the stack handling instructions is easy and does not require any knowledge
of the internal working of the stack.
Page 41
ARCHITECTURE 1-21
The following example demonstrates how the Call, Frame and Return instructions are
applied to achieve the stack behavior of the register part of the stack shown in the figures
1.13 and 1.14. Figure 3.13 shows the creation and release of stack frames in the register
part of the stack.
Program Example:
A: FRAME L13, L3 ; set frame length FL = 13, decrement FP by 3
: ; parameters passed to A can be addressed
: ; in L0, L1, L2
:
:
code of function A
:
:
MOV L7, L5 ; copy L5 to L7 for use as parameter1
MOVI L8, 4 ; set L8 = 4 for use as parameter2
CALL L9, 0, B ; call function B,
: ; save return PC, return SR in L9, L10
:
:
MOVI L0, 20 ; set L0 = 20 as return parameter for caller
RET PC, L3 ; return to function calling A,
; restore frame of caller
B: FRAME L11, L2 ; set frame length FL = 11, decrement FP by 2
: ; passed parameter1 can now be addressed in L0
: ; passed parameter2 can now be addressed in L1
:
:
code of function B
:
:
RET PC, L2 ; return to function A, frame A is restored by
; copying return PC and return SR in L2 and L3
; of frame B to PC and SR
Page 42
1-22 CHAPTER 1
L1
L3
L5
L7 L8 L9
L3 L4 L5
FP
frame B
FL = 11
L1 L2
L5 L6 L7
L9
Figure 1.13 shows the creation and release of stack frames in the register part of stack
Return from B Call B Frame in B
PC := ret. PC for B; PC := branch address; FP := FP - code of source reg.;
SR := ret. SR for B; ret. PC for B := old PC; FL := code of dest.reg.;
-- returns preceding stack frame ret. SR for B := old SR; if available registers ≥
if stack frame contained FP := FP + reg.code (required + 10) registers then
in local registers then of ret. PC; next instruction
next instruction; FL := 6; else
else -- reg.code of ret. PC = 9 push contents of
pull contents of differential words differential number of
from memory part of the stack; registers to memory
part of stack;
-- code of source reg. = 2
-- code of dest.reg. = 11
Frame
Pointer
(FP)
FP+FL
parameters
for
frame A
ret. PC for A
ret. SR for A
reserved
for
maximum
number of
variables
in frame A
L0
L2
L4
L6
L10
L11
L12
L13
L14
L15
current
length
of
frame A
FL = 13
must not
be used
New
FP
FP+FL
parameters
for
frame A
ret. PC for A
ret. SR for A
parameters
for frame B
ret. PC for B
ret. SR for B
reserved for
max. number
of variables
in frame B
L0
L1
L2
current
length
of
frame B
FL = 6
New
FP+FL
parameters
for
frame A
ret. PC for A
ret. SR for A
parameters
for frame B
ret. PC for B
ret. SR for B
reserved
for
maximum
number of
variables
in frame B
L0
L3
L4
L8
L10
current
length
of
before Call B and after CALL L9, 0, dest; after FRAME L11, L2
after Return
Figure 1.13: Stack frame handling (register part)
Page 43
ARCHITECTURE 1-23
A currently activated function A has a frame length of FL = 13, FL = 3(required to save passed
parameters) + 10(received). Registers L0..L6 are to be retained through a subsequent call,
registers L7..L12 are temporaries. A call to function B needs 2 parameters to be passed. The
parameters are placed by function A in registers L7 and L8 before calling B. The Call instruction
addresses L9 as destination for the return PC and return SR register pair to be used by function B
on return to function A.
On entry of function B, the new frame of B has an implicit length of FL = 6. It starts
physically at the former register L9 of frame A. However, since the frame pointer FP has
been incremented by 9 by the Call instruction, this register location is now being addressed
as L0 of frame B. The passed parameters cannot be addressed because they are located
below the new register L0 of frame B. To make them addressable, a Frame instruction
decrements the frame pointer FP by 2. Then, parameter 1 and 2 passed to B can be
addressed as registers L0 and L1 respectively. Note that the return PC is now to be
addressed as L2!
The Frame instruction in B specifies also the new, complete frame length FL = 11
(including the passed parameters as well as the return PC and return SR pair). Besides, a
new reserve of 10 registers for subsequent function calls and traps is provided in the
register stack. A possible overflow of the register stack is checked and handled
automatically by the Frame instruction. A program needs not and must not pay attention to
register stack overflow.
At the end of function B, a Return instruction returns control to function A and restores the
frame A. A possible underflow of the register stack is handled also automatically; thus, the
frame A is always completely restored, regardless whether it was wholly or partly pushed
into the memory part of the stack before (in the case when B called other functions).
In the present example with the frame length of FL = 13, any suitable destination register
up to L13 could be specified in the Call instruction. The parameters to be passed to the
function B would then be placed in L11 and L12. It is even possible to append a new frame
to a frame with a length of FL = 16 (coded as FL = 0 in the status register SR): the
destination register in the Call instruction is then coded as L0, but interpreted as the
register past L15.
See also sections 3.27. Call instruction, 3.29. Frame instruction and 3.30. Return
instruction for further details.
Note: With an average frame length of 8 registers, ca. 7..8 Frame instructions succeed a
pulling Return instruction until a push occurs and 7..8 Return instructions succeed a
pushing Frame instruction until a pull occurs. Thus, the built-in hysteresis makes pushing
and pulling a rare event in regular programs!
Figure 3.14 represents the stack frame pushing and popping. When the register part of the
stack A and X overlapped modulo 64 (the register part of stack was full), the frame
instruction for frame X pushed the number of words in frame A to the memory part of the
stack according to the space required for frame X. When the process returned to frame A,
the return instruction pulled the number of words form the memory part of the stack to the
register part of the stack.
Page 44
1-24 CHAPTER 1
= available part of a frame
before Frame Instruction for frame
after Frame Instruction for frame
register part
of the stack
A and X
overlap modulo 64
A
pushed
rest of frame A
various
frames
FP
available for X
additional
space for X
X
required
words
to be
space
memory part
of the stack
stack
space
required
pushed number
of words
according to
space required
for frame X
SP
FP
register part
of the stack
rest of frame A
various
frames
additional
space for X
available
memory part
of the stack
stack
space
appended
SP
FP
before Return Instruction to frame Aafter Return Instruction to frame A
frame
words for A
A
required
rest of frame A
various
frames
words
to be
pulled
pulled number
of words
SP
A
rest of frame A
various
frames
frame
words
pulled
completes
stack frame A!
words
X
to be
overwritten
stack
space
freed
SPFP
Figure 1.14: Stack frame pushing and popping
Page 45
ARCHITECTURE 1-25
1.8 Instruction Cache
The instruction cache is transparent to programs. A program executes correctly even if it
ignores the cache, whereby it is assumed that a program does not modify the instruction
code in the local range contained in the cache.
The instruction cache holds a total of up to 128 bytes (32 unstructured 32-bit words of
instructions). It is implemented as a circular buffer that is guarded by a look-ahead counter
and a look-back counter. The look-ahead counter holds the highest and the look-back
counter the lowest address of the instruction words available in the cache. The cache-mode
flag M is used to optimize special cases in loops (see details below). The cache can be
regarded as a temporary local window into the instruction sequence, moving along with
instruction execution and being halted by the execution of a program loop.
Its function is as follows:
The prefetch control loads unstructured 32-bit instruction words (without regard to
instruction boundaries) from memory into the cache. The load operation is pipelined to a
depth of two stages (see section 3.1. Memory Instructions for details of the load pipeline).
The look-ahead counter is incremented by four at each prefetch cycle. It always contains
the address of the last instruction word for which an address bus cycle is initiated,
regardless of whether the addressed instruction word is in the load pipeline or already
loaded into the instruction cache.
The prefetched instruction word is placed in the cache word location addressed by bits 6..2
of the look-ahead counter. The look-back counter remains unchanged during prefetch
unless the cache word location it addresses with its bits 6..2 is overwritten by a prefetched
instruction word. In this case, it is incremented by four to point to the then lowestaddressed usable instruction word in the cache. Since the cache is implemented as a
circular buffer, the cache word addresses derived from bits 6..2 of the look-ahead and lookback counter wrap around modulo 32.
The prefetch is halted:
• When eight words are prefetched, that is, eight words are available (including those
pending in the load pipeline) in the prefetch sequence succeeding the instruction word
addressed by the program counter PC through the instruction word addressed by the
look-ahead counter. Prefetch is resumed when the PC is advanced by instruction
execution.
• In the cycle preceding the execution cycle of an instruction accessing memory or I/O or
any potentially branch-causing instruction (regardless of whether the branch is taken)
except a forward Branch or Delayed Branch instruction with an instruction length of one
halfword and a branch target contained in the cache. Halting the prefetch in these cases
avoids filling the load pipeline with demands for potentially unnecessary instruction
words. The prefetch is also halted during the execution cycle of any instruction
accessing memory or I/O.
The cache is read in the decode cycle by using bits 6..1 of the PC as an address to the first
halfword of the instruction presently being decoded. The instruction decode needs and uses
only the number (1, 2 or 3) of instruction halfwords defined by the instruction format.
Since only the bits 6..1 of the PC are used for addressing, the halfword addresses wrap
around modulo 64. Idle wait cycles are inserted when the instruction is not or not fully
available in the cache.
Page 46
1-26 CHAPTER 1
At an explicit Branch or Delayed Branch instruction (except when placed as delay
instruction) with an instruction length of one halfword, the location of the branch target is
checked. The branch target is treated as being in the cache when the target address of a
backward branch is not lower than the address in the look-back counter and the target
address of a forward branch is not higher than two words above the address in the lookahead counter. That is, the two instruction words succeeding the instruction word
addressed by the content of the look-ahead counter are treated by a forward branch as
being in the cache. Their actual fetch overlaps in most cases with the execution of the
branch instruction and thus, no cycles are wasted. When the branch target is in the cache,
the look-back counter and the look-ahead counter remain unchanged.
When a branch is taken by a Delayed Branch instruction with an instruction length of one
halfword to a forward branch target not in the cache and the cache mode flag M is enabled
(1), the look-back counter and the look-ahead counter remain unchanged. Wait cycles are
then inserted until the ongoing prefetch has loaded the branch target instruction into the
cache.
Any other branch taken flushes the cache by placing the branch address in the look-back
and the look-ahead counter. Prefetch then starts immediately at the branch address.
Instruction decoding waits until the branch target instruction is fully available in the cache.
The cache mode flag M (bit four of the SR) can be set or cleared by logical instructions. It
is automatically cleared by a Frame instruction and by any branch taken except a branch
caused by a Delayed Branch or Return instruction; a Delayed Branch instruction leaves the
M flag unchanged and a Return instruction restores the M flag from the saved status
register SR.
Note: Since up to eight instruction words can be loaded into the cache by the prefetch, only
24 instruction words are left to be contained in a program loop. Thus, a program loop can
have a maximum length of 96 or 94 bytes including the branch instruction closing the loop,
depending on the even or odd halfword address location of the first instruction of the loop
respectively.
A forward Branch or Delayed Branch instruction with an instruction length of one
halfword into up to two instruction words succeeding the word addressed by the lookahead counter treats the branch target as being in the cache and does not flush the cache.
Thus, three or four instruction halfwords, depending on the odd or even halfword address
location of the branch instruction respectively, can always be skipped without flushing the
cache.
Enabling the cache-mode flag M is only required when a program loop to be contained in
the cache contains a forward branch to a branch target in the program loop and more than
three (or four, see above) instruction halfwords are to be skipped. In this case, the enabled
M flag in combination with a Delayed Branch instruction with an instruction length of one
halfword inhibits flushing the cache when the branch target is not yet prefetched.
Page 47
ARCHITECTURE 1-27
Since a single-word memory instruction halts the prefetch for two cycles, any sequence of
memory instructions, even with interspersed one-cycle non-memory instructions, halts the
prefetch during its execution. Thus, alternating between instruction and data memory pages
is avoided. If the number of instruction halfwords required by such a sequence is not
guaranteed to be in the cache at the beginning of the sequence, a Fetch instruction
enforcing the prefetch of the sequence may be used. A Fetch instruction may also be used
preceding a branch into a program loop; thus, flushing the cache by the first branch
repeating the loop can be avoided.
A branch taken caused by a Branch or Delayed Branch instruction with an instruction
length of two halfwords always flushes the instruction cache, even if the branch target is in
the cache. Thus, branches can be forced to bypass the cache, thereby reducing the cache to
a prefetch buffer. This reduced function can be used for testing.
1.9 On-Chip Memory (IRAM)
8KBytes of memory are provided on-chip. The on-chip-memory (IRAM) is mapped to the
hex address C000 0000 of the memory address space and wraps around modulo 8K up to
DFFF FFFF. The IRAM is implemented as dynamic memory, needing refresh (DRAM).
The refresh rate must be specified in the MCR bits 18..16 (see section 6.4. Memory
Control Register MCR) before any use (default is refresh disabled). The number given in
MCR(18..16) specifies the refresh rate in CPU clock cycles; e.g. 128 specifies a refresh
cycle automatically inserted every 128 clock cycles. Each refresh cycle refreshes 16 bytes,
thus, 256 refresh cycles are required to refresh the whole IRAM. A high refresh rate does
not degrade performance since the refresh cycles are inserted on idle IRAM cycles
whenever possible.
An access to the IRAM bypasses the access pipeline of the external memory. Thus,
pending external memory accesses do not delay accesses to the IRAM. The IRAM can
hold data as well as instructions. Instruction words from the IRAM are automatically
transferred to the instruction cache on demand; these transfers do not interfere with
external memory accesses. Besides bypassing of the external memory pipeline, memory
instructions accessing the IRAM behave exactly alike those accessing external memory.
The minimum delay for a load access is one cycle; that is, the data is not available in the
cycle after the load instruction. One or more wait cycles are automatically inserted if the
target register of the load is addressed before the data is loaded into the target register.
Attention: For selection between an internal and external memory access, bits
31..29 of the specified address register are used before calculation of the effective
address. Therefore, the content of the specified address register must point into
the IRAM address range. The IRAM address range boundary must not be crossed
when the effective memory address is calculated in the displacement address
mode.
Page 48
Page 49
Instruction General 2-1
2. Instructions General
2.1 Instruction Notation
In the following instruction-set presentation, an informal description of an instruction is
followed by a formal description in the form:
Format Notation Operation
Format denotes the instruction format.
Notation gives the assembler notation of the instruction.
Operation describes the operation in a Pascal-like notation with the following symbols:
Ls denotes any of the local registers L0..L15 used as source register or as source
operand. At memory Load instructions, Ls denotes the load destination register.
Ld denotes any of the local registers L0..L15 used as destination register or as
destination operand.
Rs denotes any of the local registers L0..L15 or any of the global registers G0..G15
used as source register or as source operand. At memory Load, see Ls.
Rd denotes any of the local registers L0..L15 or any of the global registers G0..G15
used as destination register or as destination operand.
Lsf, Ldf, Rsf and Rdf denote the register or operand following after (with a register address
one higher than) Ls, Ld, Rs and Rd respectively.
imm, const, dis, lim, rel, adr and n denote immediate operands (constants) of various
formats and ranges.
Operand(x) denotes a single bit at the bit position x of an operand.
Example: Ld(31) denotes bit 31 of Ld.
Operand(x..y) denotes bits x through y of an operand.
Example: Ls(4..0) denotes bits 4 through 0 of Ls.
Expression^ denotes an operand at a location addressed by the value of the expression.
Depending on the context, the expression addresses a memory location or a local
register.
Example: Ld^ denotes a memory operand whose memory address is the operand Ld.
(FP + FL)^ denotes a local register operand whose register address is FP + FL.
: = signifies the assignment symbol, read as "is replaced by".
// signifies the concatenation symbol. It denotes concatenation of two operand words
to a double-word operand or concatenation of bits and bitstrings.
Examples: Ld//Ldf denotes a double-word operand, 16 zeros//imm1 denotes
expanding of an immediate half-word by 16 leading zeros.
=, ≠, > and < denote the equal, unequal, greater than and less than relations.
Example: The relation Ld = 0 evaluates to one if Ld is equal to zero, otherwise it
evaluates to zero.
Page 50
3-2 CHAPTER 3
2.2 Instruction Execution
On instruction execution, all bits of the operands participate in the operations, except on
the Shift and Rotate instructions (whereat only the 5 least significant bits of the source
operand are used) and except on the byte and half-word Store instructions.
Instruction pipeline is as follows:
Instructions are executed by a two-stage pipeline. In the first stage, the instruction is
fetched from the instruction cache and decoded. In the second stage, the instruction is
executed while the next instruction in the first stage is already decoded.
Register instructions are as follows:
On register instructions executing in one or two cycles, the corresponding source and
destination operand words are read from their registers and evaluated in each cycle in
which they are used. Then the result word is placed in the corresponding destination
register in the same cycle. Thus, on all single-word register instructions executing in one
cycle, the source operand register and the destination operand register may coincide
without changing the effect of the instruction. On all other instructions, the effect of a
register coincidence depends on execution order and must be examined specifically for
each such instruction.
The content of a source register remains unchanged unless it is used coincidentally as a
destination register (except on memory Load instructions).
Conditional flags are changed:
Some instructions set or clear condition flags according to the result and special conditions
occurring during their execution. The conditions may be expressed by single bits, relations
or logical combinations of these. If a condition evaluates to one (true), the corresponding
condition flag is set to one, if it evaluates to zero (false), the corresponding condition flag
is cleared to zero. A trap to Range Error may occur if the specific flags and the destination
are updated.
All instructions may use the result and any flags updated by the preceding instruction. A
time penalty occurs only if the result of a memory Load instruction is not yet available
when needed as destination or source operand. In this case one or more (depending on the
memory access time) idle wait cycles are enforced by a hardware interlock.
Using local registers are as follows:
An instruction must not use any local register of the register sequence beginning with L0
beyond the number of usable registers specified by the current value of the frame length
FL (FL = 0 is interpreted as FL = 16). That is, the value of the corresponding register code
(0..15) addressing a local register must be lower than the interpreted value of the FL
(except with a Call or Frame instruction or some restricted cases). Otherwise, an exception
could overwrite the contents of such a register or the beginning of the register part of the
stack at the SP could be overwritten without any warning when a result is placed in such a
register.
Double-word instructions denote the high-order word (at the lower address). The low-order
word adjacently following it (at the higher address) is implied.
"Old" denotes the state before the execution of an instruction.
Page 51
Instruction General 2-3
2.3 Instruction Formats
Instructions have a length of one, two or three half-words and must be located on halfword boundaries. The following formats are provided:
Format
158 74 30
LL
LLextOP-code
LR
RR
Ln
OP-codeLd-codeLs-code
158
OP-codes Ld-codeRs-code
OP-Coded s Rd-code Rs-code
OP-coden Ld-coden
Configuration
74 30
Ld-codeLs-code
OP-code extension
9158 74 30
10 9158 74 30
9158 74 30
Ls-code encodes L0..L15 for Ls
Ld-code encodes L0..L15 for Ld
Ls-code encodes L0..L15 for Ls
Ld-code encodes L0..L15 for Ld
OP-code extension encodes the
EXTEND instructions
Rs-code encodes G0..G15 for Rs
s = 0:
Rs-code encodes L0..L15 for Rs
s = 1:
Ld-code encosed L0..L15 for Ld
Rs-code encodes G0..G15 for Rs
s = 0:
Rs-code encodes L0..L15 for Rs
s = 1:
Rd-code encodes G0..G15 for Rd
d = 0:
Rd-code encodes L0..L15 for Rd
d = 1:
Ld-code encodes L0..L15 for Ld
Bit 8//bits 3..0 encode n = 0..31
n:
10 9158 74 30
Rn
PCadr
PCrel
PCrel
OP-Coded n Rd-coden
158 70
OP-codeadr-byte
158 7061
OP-code0low-relS
158 7061
OP-code1high-rel
low-relS
Table 2.1: Instruction Formats, Part 1
Rd-code encodes G0..G15 for Rd
d = 0:
Rd-code encodes L0..L15 for Rd
d = 1:
Bit 8//bits 3..0 encode n = 0..31
n:
adr = 24 ones's//adr-byte(7..2)//00
sign bit of rel
S:
rel = 25 S//low-rel//0
range -128..126
sign bit of rel
S:
rel = 9 S//high-rel//low-rel//0
range -8 388 608..8 388 606
Page 52
3-4 CHAPTER 3
range -1 073 741 824..1 073 741 823
range -1 073 741 824..1 073 741 823
Format
LRconst
RRconst
RRdis
Configuration
14
9158 74 30Rs-code encodes G0..G15 for Rs
OP-codesLd-codeRs-code
e Sconst1
const2
14
OP-codes Rd-code Rs-code
10 9158 74 30
d
econst1
const2
14
10 9158 74 30
dOP-codes Rd-code Rs-code
e SSD Ddis1
dis2
s = 0:
Rs-code encodes L0..L15 for Rs
s = 1:
Ld-code encodes L0..L15 for Ld
Sign bit of const
S:
const = 18 S//const1
e = 0:
range -16 384..16 383
const = 2 S//const1//const2
e = 1:
Rs-code encodes G0..G15 for Rs
s = 0:
Rs-code encodes L0..L15 for Rs
s = 1:
Rd-code encodes G0..G15 for Rd
d = 0:
Rd-code encodes L0..L15 for Rd
d = 1:
Sign bit of const
S:
const = 18 S//const 1
e = 0:
range -16 384..16 383
const = 2 S//const1//const2
e = 1:
Rs-code encodes G0..G15 for Rs
s = 0:
Rs-code encodes L0..L15 for Rs
s = 1:
Rd-code encodes G0..G15 for Rd
d = 0:
Rd-code encodes L0..L15 for Rd
d = 1:
Sign bit of dis
S:
dis = 20 S//dis1
e = 0:
range -4 096..4 095
dis = 4 S//dis1//dis2
e = 1:
range -268 435 456..268 435 455
D-code, D13..D12 encode data
DD:
types at memory instructions
10 9158 74 30
Rimm
dOP-coden Rd-coden
imm1
imm2
14
RRlim
10 9158 74 30
dOP-codes Rd-codeRs-code
e X X Xlim1
lim2
Table 2.2: Instruction Formats, Part 2
Rd-code encodes G0..G15 for Rd
d = 0:
Rd-code encodes L0..L15 for Rd
d = 1:
Bit 8//bits 3..0 encode n = 0..31
n:
see Table 2.3. Encoding of
Immediate Values for encoding of
imm
Rs-code encodes G0..G15 for Rs
s = 0:
Rs-code encodes L0..L15 for Rs
s = 1:
Rd-code encodes G0..G15 for Rd
d = 0:
Rd-code encodes L0..L15 for Rd
d = 1:
X-code, X14..X12 encode Index
XXX:
instructions
lim = 20 zeros//lim1
e = 0:
range 0..4 095
lim = 4 zeros//lim1//lim2
e = 1:
range 0..268 435 455
Page 53
Instruction General 2-5
2.3.1 Table of Immediate Values
n immediate value
imm
0..1
0..16 at CMPBI, n = 0 encodes ANYBZ
6
17
18
19
20
21
22
23
24
25
26
27
imm1//imm2
16 zeros//imm1 range = 0..65 535
16 ones//imm1 range = -65 536..-1
32 bit 5 = 1, all other bits = 0
64 bit 6 = 1, all other bits = 0
128 bit 7 = 1, all other bits = 0
231
-8
-7
-6
-5
Comment
at ADDI and ADDSI n = 0 encodes CZ
range = 0..232-1 or -231..231-1
bit 31 = 1, all other bits = 0
28
29
30
31
-4
-3
-2
231-1
at CMPBI and ANDNI
bit 31 = 0, all other bits = 1
31
Table 2.3: Encoding of Immediate Values
-1 at all other instructions using imm
Note: 231 provides clear, set and invert of the floating-point sign bit at ANDNI, ORI and
XORI respectively.
231-1 provides a test for floating-point zero at CMPBI and extraction of the sign bit at
ANDNI.
See CMPBI for ANYBZ and ADDI, ADDSI for CZ.
Page 54
3-6 CHAPTER 3
CHK, CHKZ, NOP
XMx, XMxZ
CMP
MOVD, RET
MASK
MOV
ANDN
DIVU
SUM
ADD
OR
DIVS
XOR
SUBS
ADDSI
SUB
NEG
ADDI
ORI
AND
MOVI
ANDNI
CMPI
CMPBI
SHRDI
SHR
LDxx.D/A/IOD/IOA
SHRI
FSUBD
DBEBEBNE
FADDD
FADD
DBNV
DBV
BNV
LDW.R
LDD.R
SARDI
SAR
LDxx.N/S
FDIV
FDIVD
BSE
BHT
FMUL
DBNC
DBCBCSTxx.D/A/IOD/IOA
SHLI
FCMPUD
DBLE
BLE
BGT
FCMPD
DBNN
DBN
BNN
STW.R
STD.R
ROL
STxx.N/S
MUL
EXTEND
DO
CALL
FCVTD
FCVT
FRAME
DBR
STW.P
STD.P
TRAPxx, TRAP
230654B
98DCFEOP-code Bits 11..8
ABCDEFOP-code Bits 15..12
2.3.2 Table of Instruction Codes
Table 2.4: Table of Instruction Codes
Page 55
Instruction General 2-7
2.3.3 Table of Extended DSP Instruction Codes
The Extended DSP instructions are specified by a 16-bit OP-code extension succeeding the
instruction op-code for the EXTEND instruction. See section 3.32. Extended DSP
Spacing of the entries for the Trap instructions and exceptions is four bytes. These entries
are intended to each contain an instruction branching to the associated function. The entries
for the TRAPxx instructions are the same as for TRAP. Table 2.6 shows the trap entries
when the entry table is mapped to the end of memory area MEM3 (default after Reset):
Table 2.6: Trap entry table mapped to the end of MEM3
Page 57
Instruction General 2-9
Table 2.7 shows the trap entries when the entry table is mapped to the beginning of
memory areas MEM0, MEM1, MEM2 or IRAM. x is 0, 4, 8 or C corresponding to the
mapping to MEM0, MEM1, MEM2 or IRAM respectively.
Table 2.7: Trap entry table mapped to the beginning of MEM0, MEM1, MEM2 or IRAM
Page 58
3-10 CHAPTER 3
Table 2.8 below shows the addresses of the first instruction of the emulator code associated
with the floating-point instructions when the trap entry tables are mapped to the end of
memory area MEM3. Spacing of the entries for the Software instructions FADD..DO is 16
bytes.
Address (Hex) Entry Description
FFFF FE00 FADD Floating-point Add, single word
FFFF FE10 FADDD Floating-point Add, double-word
FFFF FE20 FSUB Floating-point Subtract, single word
FFFF FE30 FSUBD Floating-point Subtract, double-word
FFFF FE40 FMUL Floating-point Multiply, single word
FFFF FE50 FMULD Floating-point Multiply, double-word
FFFF FE60 FDIV Floating-point Divide, single word
FFFF FE70 FDIVD Floating-point Divide, double-word
FFFF FE80 FCMP Floating-point Compare, single word
FFFF FE90 FCMPD Floating-point Compare, double-word
FFFF FEA0 FCMPU Floating-point Compare Unordered, single word
FFFF FEB0 FCMPUD Floating-point Compare Unordered, double-word
FFFF FEC0 FCVT
FFFF FED0 FCVTD
FFFF FEE0 Reserved
FFFF FEF0 DO Do instruction
Table 2.8: Floating-Point entry table mapped to the end of MEM3
Floating-point Convert single word ⇒ double-word
Floating-point Convert double-word ⇒ single word
Page 59
Instruction General 2-11
Table 2.9 below shows the addresses of the first instruction of the emulator code associated
with the floating-point instructions when the trap entry tables are mapped to the beginning
of memory areas MEM0, MEM1, MEM2 or IRAM. x is 0, 4, 8 or C corresponding to the
mapping to MEM0, MEM1, MEM2 or IRAM respectively.
Address (Hex) Entry Description
x000 010C DO Do instruction
x000 011C Reserved
x000 012C FCVTD
x000 013C FCVT
x000 014C FCMPUD Floating-point Compare Unordered, double-word
x000 015C FCMPU Floating-point Compare Unordered, single word
x000 016C FCMPD Floating-point Compare, double-word
x000 017C FCMP Floating-point Compare, single word
x000 018C FDIVD Floating-point Divide, double-word
x000 019C FDIV Floating-point Divide, single word
x000 01AC FMULD Floating-point Multiply, double-word
x000 01BC FMUL Floating-point Multiply, single word
x000 01CC FSUBD Floating-point Subtract, double-word
x000 01DC FSUB Floating-point Subtract, single word
x000 01EC FADDD Floating-point Add, double-word
x000 01FC FADD Floating-point Add, single word
Table 2.9: Floating-Point entry table mapped to the beginning of MEM0, MEM1, MEM2 or IRAM
Floating-point Convert double-word ⇒ single word
Floating-point Convert single word ⇒ double-word
Page 60
3-12 CHAPTER 3
2.5 Instruction Timing
The following execution times are given in number of processor clock cycles.
All instructions not shown below: 1 cycle
Move Double-Word: 2 cycles
Shift Double-Word: 2 cycles
Test Leading Zeros: 2 cycles
Multiply word:
when both operands are in the range of -215..215-1: 4 cycles
all other cases: 5 cycles
Multiply double-word signed:
when both operands are in the range of -215..215-1: 5 cycles
all other cases: 6 cycles
Multiply double-word unsigned:
when both operands are in the range of 0..216-1: 4 cycles
all other cases: 6 cycles
Divide unsigned and signed: 36 cycles
Branch instructions when branch not taken: 1 cycle
when branch taken and target in on-chip cache: 2 cycles
when branch taken and target in memory : 2 + memory read latency cycles
(see next page)
Delayed Branch instructions when branch not taken: 1 cycle
when branch taken and target in on-chip cache: 1 cycle
when branch taken and target in memory: 1 + memory read latency cycles exceeding
(delay instruction cycles - 1)
Call and Trap instructions when branch not taken: 1 cycle
when branch taken: 2 + memory read latency cycles
Software instructions: 6 + memory read latency cycles exceeding 4 cycles
Frame when not pushing words on the stack: 3 cycles
additionally when pushing n words on the stack: memory write latency cycles
+ n * bus cycles per access
-- write latency = cycles elapsed until write access cycle of first word stored
(minimum = 1 at a non-RAS access and no pipeline congestion)
Return:
4 + memory read latency cycles exceeding 2 cycles
additionally when pulling n words from the stack: memory RAS latency
+ n * bus cycles per access
(RAS latency applies only at n > 2, otherwise RAS latency is always 0)
-- RAS latency = RAS precharge cycles + RAS to CAS delay cycles
Page 61
Instruction General 2-13
Fetch instruction:
when the required number of instruction half-words is already prefetched in the
instruction cache: 1 cycle
otherwise
1 + (required number of half-words - number of half-words already prefetched)/2
* bus cycles per access
Memory word instructions, non-stack address mode:
1 cycle
Memory word instructions, stack address mode:
3 cycles
Memory double-word instructions:
2 cycles
For timing calculations, double-word memory instructions are treated like a sequence of
two single-word memory instructions.
Idle wait cycles are transparently inserted when a memory instruction has to wait for
execution because the two-stage address pipeline is full.
Instruction execution proceeds after the execution of a Load instruction until the data
requested is needed (that is, the register into which the data is to be loaded is addressed) by
a further instruction.
The cycles executed between the memory instruction cycle requesting the data and the first
cycle at which the data are available are called read latency cycles. These read latency
cycles can be filled with instructions that do not need the requested data. When, after the
execution of these optional fill instruction cycles, the data is still not available in the cycle
needing it, idle wait cycles are inserted until the data is available. The idle wait cycles are
inserted transparently to the program by an on-chip hardware interlock. The read latency
is:
read latency = RAS precharge cycles + RAS to CAS delay cycles +
access cycles + 1
Additional cycles are also inserted and add to the latency when the address pipeline is
congested, these cycles must then also be taken into calculation.
A switch from an external memory or I/O read access to an immediately succeeding writes
access inserts one additional bus cycle.
Extended DSP instructions:
The instruction issue time is always 1 cycle. After the issue of an Extended DSP
instruction, execution of non-Extended-DSP instructions proceeds while the Extended DSP
instruction is executed in the multiply/accumulate unit (using separate resources). Latency
cycles are defined as the interval between instruction issue and the result being available in
the register G15 or register pair G14//G15. The latency cycles indicate as well the number
of cycles available for instructions not using the result that can be inserted between the
Page 62
3-14 CHAPTER 3
Extended DSP instruction and the first instruction using the result. When less than the
number of latency cycles are used by these instructions, the execution of the instruction
using the result is delayed until the result is available in G15 or G14//G15.
When an Extended DSP instruction that uses the internal hardware multiplier (EMUL, ...,
EHCMACD) succeeds an Extended DSP instruction that also uses the internal hardware
multiplier after less than latency - 1 cycles, the issue of the succeeding Extended DSP
instruction is delayed until latency - 1 cycles are finished. An Extended DSP instruction
succeeding the EHCSUMD or EHCFFTD instruction after less than the latency cycles for
these two instructions is always delayed until the EHCSUMD or EHCFFTD instruction is
finished.
The latency cycles are as follows:
EMUL instruction:
when both operands are in the range of -215..215-1: 1 cycle
all other cases: 3 cycles
EMULU instruction:
when both operands are in the range of 0..216-1: 2 cycles
all other cases: 4 cycles
EMULS instruction:
when both operands are in the range of -215..215-1: 3 cycles
all other cases: 4 cycles
EMAC instruction:
when both operands are in the range of -215..215-1: 2 cycles
all other cases: 3 cycles
EMACD instruction:
when both operands are in the range of -215..215-1: 3 cycles
all other cases: 4 cycles
EMSUB instruction:
when both operands are in the range of -215..215-1: 2 cycles
all other cases: 3 cycles
EMSUBD instruction:
when both operands are in the range of -215..215-1: 3 cycles
The memory instructions load data from memory in a register Rs (or a register pair
Rs//Rsf) or store data from Rs (or Rs//Rsf) to memory using the data types byte
unsigned/signed, half-word unsigned/signed, word or double-word. Since I/O devices are
also addressed by memory instructions, "memory" stands here interchangeably also for I/O
unless memory or I/O address space is specifically denoted.
The memory address is either specified by the operand Rd or Ld, by the sum Rd plus a
signed displacement or by the displacement alone, depending on the address mode.
Memory accesses to words and double-words ignore bits one and zero of the address,
memory accesses to half-words ignore bit zero of the address, (since these operands are
located at word or half-word boundaries respectively, these address bits are redundant).
If the content of any register Rd except SR is zero, the memory is not accessed and a trap
to Pointer Error occurs (see section 6. Exceptions). Thus, uninitialized pointers are
automatically checked.
Load and Store instructions are pipelined to a total depth of two word entries for Load and
Store, thus, a double-word Load or a double-word Store instruction can be executed
without halting the processor in a wait state. (The address pipeline provides a depth of two
addresses common to load and store).
Double-word memory instructions enter two separate word entries into the pipeline and
start two independent memory cycles. The first memory cycle, loading or storing the highorder word, uses the address specified by the address mode, the second cycle uses this
address incremented by four and also places it on the address bus.
Accessing data in the same DRAM memory page by any number of succeeding memory
cycles is performed in page mode.
Memory instructions leave all condition flags unchanged.
Page 64
3-2 CHAPTER 3
3.1.1 Address Modes
Register Address Mode:
Notation: LDxx.R, STxx.R -- xx: word or double word data type
The content of the destination register Ld is used as an address into memory address space.
LDxx.R Ld, Rs
Ld
ADDR
ADDR
Memory
Rs
DATADATA
STxx.R Ld, Rs
Ld
ADDR
ADDR
Memory
Rs
DATADATA
Postincrement Address Mode:
Notation: LDxx.P, STxx.P -- xx: word or double-word data type
The content of the destination register Ld is used as an address into memory address space,
then Ld is incremented according to the specified data size of a word or double-word
memory instruction by 4 or 8 respectively, regardless of any exception occurring. In the
case of a double-word data type, Ld is incremented by 8 at the first memory cycle.
LDxx.P Ld, Rs
Ld
ADDR
ADDR + size
ADDR
Memory
Rs
DATADATA
STxx.P Ld, Rs
Ld
ADDR
ADDR + size
ADDR
Memory
Rs
DATADATA
size= 4(word) or 8(double word)
size= 4(word) or 8(double word)
Displacement Address Mode:
Notation: LDxx.D, STxx.D -- xx: any data type
The sum of the contents of the destination register Rd plus a signed displacement dis is
used as an address into memory address space.
LDxx.D Rd, Rs, dis
Rd
ADDR
ADDR
ADDR + dis
Memory
Rs
DATADATA
STxx.D Rd, Rs, dis
Rd
ADDR
ADDR
ADDR + dis
Memory
Rs
DATADATA
Rd may denote any register except the SR; Rd not denoting the SR differentiates this mode
from the absolute address mode.
Page 65
Instruction Set 3-3
In the case of all data types except byte, bit zero of dis is treated as zero for the calculation
of Rd + dis.
Note: Specification of the PC for Rd provides addressing relative to the address of the first
byte after the memory instruction.
Absolute Address Mode:
Notation: LDxx.A, STxx.A -- xx: any data type
The displacement dis is used as an address into memory address space. Rd must denote the
SR to differentiate this mode from the displacement address mode; the content of the SR is
not used.
LDxx.A 0, Rs, dis
Memory
dis
DATADATA
Rs
STxx.A 0, Rs, dis
Memory
dis
DATADATA
Rs
In the case of all data types except byte, address bit zero is supplied as zero.
Note: The displacement provides absolute addressing at the beginning and the end (MEM3
area) of the memory.
I/O Displacement Address Mode:
Notation: LDxx.IOD, STxx.IOD -- xx: word or double-word data type
The sum of the contents of the destination register Rd plus a signed displacement dis is
used as an address into I/O address space.
LDxx.IOD Rd, Rs, dis
Rd
ADDR
ADDR
IO
STxx.IOD Rd, Rs, dis
Rd
ADDR
ADDR
IO
ADDR + dis
Rs
DATADATA
ADDR + dis
DATADATA
Rs
Rd may denote any register except the SR; Rd not denoting the SR differentiates this mode
from the I/O absolute address mode.
Bits one and zero of dis are treated as zero for the calculation of Rd + dis.
Page 66
3-4 CHAPTER 3
Execution of a memory instruction with I/O displacement address mode does not disrupt
any page mode sequence.
Note: The I/O displacement address mode provides dynamic addressing of peripheral
devices.
When on a load instruction only a byte or half-word is placed on the (lower part) of the
data bus, the higher-order bits are undefined and must be masked out before the loaded
operand is used further.
I/O Absolute Address Mode:
Notation: LDxx.IOA, STxx.IOA -- xx: word or double-word data type
The displacement dis is used as an address into I/O address space.
LDxx.IOA 0, Rs, dis
IO
dis
DATADATA
Rs
STxx.IOA 0, Rs, dis
dis
IO
Rs
DATADATA
Rd must denote the SR to differentiate this mode from the I/O displacement address mode;
the content of the SR is not used.
Address bits one and zero are supplied as zero.
Execution of a memory instruction with I/O address mode does not disrupt any page mode
sequence.
Note: The I/O absolute address mode provides code efficient absolute addressing of
peripheral devices and allows simple decoding of I/O addresses.
When on a load instruction only a byte or a half-word is placed on the (lower part) of the
data bus, the higher-order bits are undefined and must be masked out before the loaded
operand is used further.
Page 67
Instruction Set 3-5
Next Address Mode:
Notation: LDxx.N, STxx.N -- xx: any data type
The content of the destination register Rd is used as an address into memory address space,
then Rd is incremented by the signed displacement dis regardless of any exception
occurring. At a double-word data type, Rd is incremented at the first memory cycle.
LDxx.N Rd, Rs, dis
Rd
ADDR
ADDR + dis
ADDR
Memory
Rs
DATADATA
STxx.N Rd, Rs, dis
Rd
ADDR
ADDR + dis
ADDR
Memory
Rs
DATADATA
Rd must not denote the PC or the SR.
In the case of all data types except byte, bit zero of dis is treated as zero for the calculation
of Rd + dis.
Stack Address Mode:
Notation: LDW.S, STW.S -- only word data type
The content of the destination register Rd is used as stack address, then Rd is incremented
by dis regardless of any exception occurred.
LDxx.S Rd, Rs, dis
Rd
ADDR
ADDR + dis
ADDR
Stack
Rs
DATADATA
STxx.S Rd, Rs, dis
Rd
ADDR
ADDR + dis
ADDR
Stack
Rs
DATADATA
A stack address addresses memory address space if it is lower than the stack pointer SP;
otherwise bits 7..2 of it (higher bits are ignored) address a register in the register part of the
stack absolutely (not relative to the frame pointer FP).
Bits one and zero of dis are treated as zero for the calculation of Rd + dis.
Rd must not denote the PC or the SR.
Note: The stack address mode must be used to address an operand in the stack regardless
of its present location either in the memory part or in the register part of the stack. Rd may
be set by the Set Stack Address instruction.
Page 68
3-6 CHAPTER 3
Address Mode Encoding:
The encoding of the displacement and absolute address mode types of memory instructions
is shown in table 3.1:
D-code dis(1) dis(0) Rd does not
0 X X LDBS.D LDBS.A STBS.D STBS.A
1 X X LDBU.D LDBU.A STBU.D STBU.A
2 X 0 LDHU.D LDHU.A STHU.D STHU.A
2 X 1 LDHS.D LDHS.A STHS.D STHS.A
3 0 0 LDW.D LDW.A STW.D STW.A
3 0 1 LDD.D LDD.A STD.D STD.A
3 1 0 LDW.IOD LDW.IOA STW.IOD STW.IOA
3 1 1 LDD.IOD LDD.IOA STD.IOD STD.IOA
Table 3.1: Encoding of Displacement and Absolute Address Mode
LDxx.D/A/IOD/IOA STxx.D/A/IOD/IOA
Rd denotes SR Rd does not
denote SR
denote SR
Rd denotes SR
The encoding of the next and stack address mode types of memory instructions is shown in
table 3.2:
D-code dis(1) dis(0)
0 X X LDBS.N STBS.N
1 X X LDBU.N STBU.N
2 X 0 LDHU.N STHU.N
2 X 1 LDHS.N STHS.N
3 0 0 LDW.N STW.N
3 0 1 LDD.N STD.N
3 1 0 Reserved Reserved
3 1 1 LDW.S STW.S
Table 3.2: Encoding of Next and Stack Address Mode
With the instructions below, Rd must not denote the PC or the SR
LDxx.N/S STxx.N/S
Page 69
Instruction Set 3-7
3.1.2 Load Instructions
The Load instructions transfer data from the addressed memory location into a register Rs
or a register pair Rs//Rsf.
In the case of data types word and double-word, one or two words are read from memory
and transferred unchanged into Rs or Rs//Rsf respectively.
In the case of byte and half-word data types, up to one word (depending on bus size) is
read from memory, the byte or half-word addressed by bits one and zero or bit one of the
memory address respectively is extracted, right adjusted, expanded to 32 bits and placed in
Rs. Unsigned bytes and half-words are expanded by leading zeros; signed bytes and halfwords are expanded by leading sign bits.
Execution of a Load instruction enters the register address of Rs, memory address bits one
and zero and a code for the data type into the load pipeline, places the memory address
onto the address bus and starts a memory cycle. A double-word Load instruction enters the
register address of Rsf and the same control information into the load pipeline as a second
entry, places the memory address incremented by four onto the address bus and starts a
second memory cycle.
After execution of a Load instruction, the next instructions are executed without waiting
for the data to be loaded. A wait is enforced only if an instruction uses a register whose
register address is still in the load pipeline. The data read from memory is placed in the
register whose register address is at the head of the load pipeline, its pipeline entry is then
deleted.
At memory load instruction Rs denotes the load destination register to load data from
memory, IO or stack and Rd denotes the load source register.
Rs must not denote the PC, the SR, G14 or G15; these registers cannot be loaded from memory.
Format Notation Operation Data Type xx
LR LDxx.R Ld, Rs Rs := Ld^; W,D
The Store instructions transfer data from the register Rs or the register pair Rs//Rsf to the
addressed memory location.
In the case of data types word or double-word, one or two words are placed unchanged
from Rs or Rs//Rsf respectively onto the data bus to be stored in the memory.
In the case of byte and half-word data types, the low-order byte or half-word is placed onto
the data bus at the byte or half-word position addressed by bits one and zero or bit one of
the memory address respectively; it is implied to be merged (via byte write enable) with
the other data in the same memory word.
In the case of signed byte and signed half-word data types, any content of Rs exceeding the
value range of the specified data type causes a trap to Range Error. The byte or half-word
is stored regardless of a Range Error.
If Rs denotes the SR, zero is stored regardless of the content of SR (or of SR//G2 at
double-word).
Execution of a Store instruction enters the contents of Rs, memory address bits one and
zero and a code for the data type into the store pipeline, places the memory address onto
the address bus and starts a memory cycle. A double-word Store instruction enters the
contents of Rsf and the same control information into the store pipeline as a second entry,
places the memory address incremented by four onto the address bus and starts a second
memory cycle.
After execution of a Store instruction, the next instructions are executed without waiting
for the store memory cycle to finish. The data at the head of the store pipeline is put on the
data bus on demand from the on-chip memory control logic and its pipeline entry is deleted.
When Rsf denotes the same register as Rd (or Ld) at double-word instructions with next
address or postincrement address mode, the incremented content of Rsf is stored in the
second memory cycle; in all other cases, the unchanged content of Rs or Rsf is stored.
Format Notation Operation Data Type xx
LR STxx.R Ld, Rs Ld^ := Rs; W,D
-- next address mode
RRdis STxx.S Rd, Rs, dis Rd^ := Rs; Rd := Rd + dis; W
-- stack address mode
The expressions in brackets are only executed at double-word data types.
In the case of signed byte and half-word data types, a trap to Range Error occurs when the
value of the operand to be stored exceeds the value range of the specified data type; the
byte or half-word is stored regardless of a Range Error.
Data Type xx is with:
BU: byte unsigned; HU: half-word unsigned; W: word;
The source operand or the immediate operand is copied to the destination register and the
condition flags are set or cleared accordingly.
Format Notation Operation
RR MOV Rd, Rs Rd := Rs;
Z := Rd = 0;
N := Rd(31);
V := undefined;
Rimm MOVI Rd, imm Rd := imm;
Z := Rd = 0;
N := Rd(31);
V := 0;
3.3 Move Double-Word Instruction
The double-word source operand is copied to the double-word destination register pair and
the condition flags are set or cleared accordingly. The high-order word in Rs is copied first.
When the SR is denoted as a source operand, the source operand is supplied as zero
regardless of the content of SR//G2. When the PC is denoted as destination, the Return
instruction RET is executed instead of the Move Double-Word instruction.
Format Notation Operation
RR MOVD Rd, Rs if Rd does not denote PC and Rs does not denote SR then
Rd := Rs;
Rdf := Rsf;
Z := Rd//Rdf = 0;
N := Rd(31);
V := undefined;
RR MOVD Rd, 0 if Rd does not denote PC and Rs denotes SR then
Rd := 0;
Rdf := 0;
Z := 1;
N := 0;
V := undefined;
RR RET PC, Rs if Rd denotes PC then
execute the RET instruction;
The result of a bitwise logical AND, AND not (ANDN), OR or exclusive OR (XOR) of the
source or immediate operand and the destination operand is placed in the destination
register and the Z flag is set or cleared accordingly. At ANDN, the source operand is used
inverted (itself remaining unchanged).
All operands and the result are interpreted as bitstrings of 32 bits each.
Format Notation Operation
RR AND Rd, Rs Rd := Rd and Rs; -- logical AND
Z := Rd = 0;
RR ANDN Rd, Rs Rd := Rd and not Rs; -- logical AND with source
Z := Rd = 0; used inverted
RR OR Rd, Rs Rd := Rd or Rs; -- logical OR
Z := Rd = 0;
RR XOR Rd, Rs Rd := Rd xor Rs; -- logical exclusive OR
Z := Rd = 0;
Rimm ANDNI Rd, imm Rd := Rd and not imm; -- logical AND with imm
Z := Rd = 0; used inverted
Rimm ORI Rd, imm Rd := Rd or imm; -- logical OR
Z := Rd = 0;
Rimm XORI Rd, imm Rd := Rd xor imm; -- logical exclusive OR
Z := Rd = 0;
Note: ANDN and ANDNI are the instructions complementary to OR and ORI: Where OR
and ORI set bits, ANDN and ANDNI clear bits at bit positions with a "one" bit in the
source or immediate operand, thus obviating the need for an inverted mask in most cases.
Register
L0 : $0F0CFFFF
L1 : $FFFF0000
Instruction
AND L0, L1 ; L0 = L0 and L1 = $0F0C0000
ANDN L0, L1 ; L0 = L0 and not L1 = $0000FFFF
OR L0, L1 ; L0 = L0 or L1 = $FFFFFFFF
XOR L0, L1 ; L0 = L0 xor L1 = $F0F3FFFF
ANDNI L0, $1234 ; L0 = L0 and not imm = $0F0CEDCB
The source operand is placed bitwise inverted in the destination register and the Z flag is
set or cleared accordingly.
The source operand and the result are interpreted as bitstrings of 32 bits each.
Format Notation Operation
RR NOT Rd, Rs Rd := not Rs;
Z := Rd = 0;
3.6 Mask Instruction
The result of a bitwise logical AND of the source operand and the immediate operand is
placed in the destination register and the Z flag is set or cleared accordingly.
All operands and the result are interpreted as bitstrings of 32 bits each.
Format Notation Operation
RRconst MASK Rd, Rs, const Rd := Rs and const;
Z := Rd = 0;
Note: The Mask instruction may be used to move a source operand with bits partly masked
out by an immediate operand used as mask. The immediate operand const is constrained in
its range by bits 31 and 30 being either both zero or both one (see format RRconst). If
these bits are required to be different, the instruction pair MOVI, AND may be used
instead of MASK.
Page 76
3-14 CHAPTER 3
3.7 Add Instructions
The source operand, the source operand + C or the immediate operand is added to the
destination operand, the result is placed in the destination register and the condition flags
are set or cleared accordingly.
At ADD, ADDC and ADDI, both operands and the result are interpreted as either all
signed or all unsigned integers. At ADDS and ADDSI, both operands and the result are
signed integers and a trap to Range Error occurs at overflow.
Format Notation Operation
RR ADD Rd, Rs Rd := Rd + Rs; -- signed or unsigned Add
Z := Rd = 0;
N := Rd(31); -- sign
V := overflow;
C := carry;
RR ADDS Rd, Rs Rd := Rd + Rs; -- signed Add with trap
Z := Rd = 0;
N := Rd(31); -- sign
V := overflow;
if overflow then
trap ⇒ Range Error;
RR ADDC Rd, Rs Rd := Rd + Rs + C; -- signed or unsigned Add
Z := Z and (Rd = 0); with carry
N := Rd(31); -- sign
V := overflow;
C := carry;
When the SR is denoted as a source operand at ADD, ADDS and ADDC, C is added
instead of the SR. The notation is then:
Format Notation Operation
RR ADD Rd, C Rd := Rd + C; -- signed or unsigned Add C
RR ADDS Rd, C Rd := Rd + C; -- signed Add C with trap
RR ADDC Rd, C Rd := Rd + C;
The flags and the trap condition are treated as defined by ADD, ADDS or ADDC.
Page 77
Instruction Set 3-15
Format Notation Operation
Rimm ADDI Rd, imm Rd := Rd + imm; -- signed or unsigned Add
Z := Rd = 0;
N := Rd(31); -- sign
V := overflow;
C := carry;
Rimm ADDSI Rd, imm Rd := Rd + imm; -- signed Add with trap
Z := Rd = 0;
N := Rd(31); -- sign
V := overflow;
if overflow then
trap ⇒ Range Error;
The following instructions are special cases of ADDI and ADDSI differentiated by n = 0
(see section 2.3.1. Table of Immediate Values):
Format Notation Operation
Rimm ADDI Rd, CZ Rd := Rd + (C and (Z = 0 or Rd(0))); -- round to even
Rimm ADDSI Rd, CZ Rd := Rd + (C and (Z = 0 or Rd(0))); -- round to even
The flags and the trap condition are treated as defined by ADDI or ADDSI.
Note: At ADDC, Z is cleared if Rd ≠ 0, otherwise left unchanged; thus, Z is evaluated
correctly for multi-precision operands.
The effect of a Subtract immediate instruction can be obtained by using the negated 32-bit
value of the immediate operand to be subtracted (except zero). At unsigned, C = 0
indicates then a borrow (the unsigned number range is exceeded below zero).
At "round to even", C is only added to the destination operand if Z = 0 or Rd(0) is one. The
Z flag is assumed to be set or cleared by a preceding Shift Left instruction. "Round to
even" provides a better averaging of rounding errors than "add carry".
"Round to even" is equivalent to the "round to nearest" Floating-Point rounding mode and
may be used to implement it efficiently.
Register
L0 : $00000004
L1 : $FFFFFFFC
Instruction
ADD L0, L1 ; L0 = L0 + L1 = $0
ADDI L0, $120 ; L0 = L0 + imm = $124
Page 78
3-16 CHAPTER 3
3.8 Sum Instructions
The sum of the source operand and the immediate operand is placed in the destination
register and the condition flags are set or cleared accordingly. At SUM, both operands and
the result are interpreted as either all signed or all unsigned integers. At SUMS, both
operands and the result are signed integers and a trap to Range Error occurs at overflow.
Format Notation Operation
RRconst SUM Rd, Rs, const Rd := Rs + const; -- signed or unsigned Sum
Z := Rd = 0;
N := Rd(31); -- sign
V := overflow;
C := carry;
RRconst SUMS Rd, Rs, const Rd := Rs + const; -- signed Sum with trap
Z := Rd = 0;
N := Rd(31); -- sign
V := overflow;
if overflow then
trap ⇒ Range Error;
When the SR is denoted as a source operand at SUM and SUMS, C is added instead of the
SR. The notation is then:
Format Notation Operation
RRconst SUM Rd, C, const Rd := C + const; -- signed or unsigned Sum C
RRconst SUMS Rd, C, const Rd := C + const; -- signed Sum C
The flags are treated as defined by SUM or SUMS. A trap cannot occur.
Note: The effect of a Subtract immediate instruction can be obtained by using the negated
32-bit value of the immediate operand to be subtracted (except zero). At unsigned, C = 0
indicates then a borrow (the unsigned number range is exceeded below zero).
The immediate operand is constrained to the range of const. The instruction pair MOV,
ADDI or MOV, ADDSI may be used where the full integer range is required.
The source operand or the source operand + C is subtracted from the destination operand,
the result is placed in the destination register and the condition flags are set or cleared
accordingly.
At SUB and SUBC, both operands and the result are interpreted as either all signed or all
unsigned integers. At SUBS, both operands and the result are signed integers and a trap to
Range Error occurs at overflow.
Format Notation Operation
RR SUB Rd, Rs Rd := Rd - Rs; -- signed or unsigned Subtract
Z := Rd = 0;
N := Rd(31); -- sign
V := overflow;
C := borrow;
RR SUBS Rd, Rs Rd := Rd - Rs; -- signed Subtract with trap
Z := Rd = 0;
N := Rd(31); -- sign
V := overflow;
if overflow then
trap ⇒ Range Error;
RR SUBC Rd, Rs Rd := Rd - (Rs + C); -- signed or unsigned Subtract
Z := Z and (Rd = 0); with borrow
N := Rd(31); -- sign
V := overflow;
C := borrow;
When the SR is denoted as a source operand at SUB, SUBS and SUBC, C is subtracted
instead of the SR. The notation is then:
Format Notation Operation
RR SUB Rd, C Rd := Rd - C; -- signed or unsigned Subtract C
RR SUBS Rd, C Rd := Rd - C; -- signed Subtract C with trap
RR SUBC Rd, C Rd := Rd - C;
The flags and the trap condition are treated as defined by SUB, SUBS or SUBC.
Note: At SUBC, Z is cleared if Rd ≠ 0, otherwise left unchanged; thus, Z is evaluated
correctly for multi-precision operands.
Register
L0 : $124
L1 : $4
Instruction
SUB L0, L1 ; L0 = L0 - L1 = $120
Page 80
3-18 CHAPTER 3
3.10 Negate Instructions
The source operand is subtracted from zero, the result is placed in the destination register
and the condition flags are set or cleared accordingly.
At NEG and NEGS, the source operand and the result are interpreted as either both signed
or both unsigned integers. At NEGS, the source operand and the result are signed integers
and a trap to Range Error occurs at overflow.
Format Notation Operation
RR NEG Rd, Rs Rd := - Rs; -- signed or unsigned Negate
Z := Rd = 0;
N := Rd(31); -- sign
V := overflow;
C := borrow;
RR NEGS Rd, Rs Rd := - Rs; -- signed Negate with trap
Z := Rd = 0;
N := Rd(31); -- sign
V := overflow;
if overflow then
trap ⇒ Range Error;
When the SR is denoted as a source operand at NEG and NEGS, C is negated instead of
the SR. The notation is then:
Format Notation Operation
RR NEG Rd, C Rd := - C; -- signed or unsigned Negate C
if C is set then
Rd := -1;
else
Rd := 0;
RR NEGS Rd, C Rd := - C; -- signed Negate C
if C is set then
Rd := -1;
else
Rd := 0;
The flags are treated as defined by NEG or NEGS. A trap cannot occur.
Register
L0 : $124
L1 : $4
Instruction
NEG L0, L1 ; L0 = - L1 = $FFFFFFFC
3.11 Multiply Word Instruction
The source operand and the destination operand are multiplied, the low-order word of the
Page 81
Instruction Set 3-19
product is placed in the destination register (the high-order product word is not evaluated)
and the condition flags are set or cleared according to the single-word product.
Both operands are either signed or unsigned integers, the product is a single-word integer.
Note that the low-order word of the product is identical regardless of whether the operands
are signed or unsigned.
The result is undefined if the PC or the SR is denoted.
Format Notation Operation
RR MUL Rd, Rs Rd := low order word of product Rd ∗ Rs;
Z := singleword product = 0;
N := Rd(31);
-- sign of singleword product;
-- valid for signed operands;
V := undefined;
C := undefined;
3.12 Multiply Double-Word Instructions
The source operand and the destination operand are multiplied, the double-word product is
placed in the destination register pair (the destination register expanded by the register
following it) and the condition flags are set or cleared according to the double-word
product.
At MULS, both operands are signed integers and the product is a signed double-word
integer. At MULU, both operands are unsigned integers and the product is an unsigned
double-word integer.
The result is undefined if the PC or the SR is denoted.
Format Notation Operation
RR MULS Rd, Rs Rd//Rdf := signed doubleword product of Rd ∗ Rs;
Z := Rd//Rdf = 0;
-- doubleword product is zero
N := Rd(31);
-- doubleword product is negative
V := undefined;
C := undefined;
RR MULU Rd, Rs Rd//Rdf := unsigned doubleword product of Rd ∗ Rs;
Z := Rd//Rdf = 0;
-- doubleword product is zero
N := Rd(31);
V := undefined;
C := undefined;
Page 82
3-20 CHAPTER 3
Register
L0 : $5678
L1 : $1234
L2 : $9ABC
Instruction
MUL L0, L2 ; L0 = $3443B020
MULU L0, L2 ; L0 = $0
; L1 = $3443B020
3.13 Divide Instructions
The double-word destination operand (dividend) is divided by the single-word source
operand (divisor), the quotient is placed in the low-order destination register (Rdf), the
remainder is placed in the high-order destination register (Rd) and the condition flags are
set or cleared according to the quotient.
A trap to Range Error occurs if the divisor is zero or the value of the quotient exceeds the
integer value range (quotient overflow). The result (in Rd//Rdf) is then undefined. At
DIVS, a trap to Range Error also occurs and the result is undefined if the dividend is
negative.
At DIVS, the dividend is a non-negative signed double-word integer, the divisor, the
quotient and the remainder are signed integers; a non-zero remainder has the sign of the
dividend.
At DIVU, the dividend is an unsigned double-word integer, the divisor, the quotient and
the remainder are unsigned integers.
The result is undefined if Rs denotes the same register as Rd or Rdf or if the PC or the SR
is denoted.
Page 83
Instruction Set 3-21
Format Notation Operation
RR DIVS Rd, Rs if Rs = 0 or quotient overflow or Rd(31) = 1 then
-- dividend is negative
Rd//Rdf := undefined;
Z := undefined;
N := undefined;
V := 1;
trap ⇒ Range Error;
else
remainder Rd, quotient Rdf := (Rd//Rdf) / Rs;
Z := Rdf = 0; -- quotient is zero
N := Rdf(31); -- quotient is negative
V := 0;
RR DIVU Rd, Rs if Rs = 0 or quotient overflow then
Rd//Rdf := undefined;
Z := undefined;
N := undefined;
V := 1;
trap ⇒ Range Error;
else
remainder Rd, quotient Rdf := (Rd//Rdf) / Rs;
Z := Rdf = 0; -- quotient is zero
N := Rdf(31);
V := 0;
Register
L0 : $1
L1 : $23456789
L2 : 123456
Instruction
DIVU L0, L2 ; L0 = $789
; L1 = $1000
Page 84
3-22 CHAPTER 3
3.14 Shift Left Instructions
The destination operand is shifted left by a number of bit positions specified
at SHLI, SHLDI by n = 0..31 as a shift by 0..31;
at SHL, SHLD by bits 4..0 of the source operand as a shift by 0..31.
The higher-order bits of the source operand are ignored.
The destination operand is interpreted
at SHL and SHLI as a bitstring of 32 bits or as a signed or unsigned integer;
at SHLD and SHLDI as a double-word bitstring of 64 bits or as a signed or
unsigned double-word integer.
All Shift Left instructions insert zeros in the vacated bit positions at the right.
The double-word Shift Left instructions execute in two cycles. The low-order operand in
Ldf is shifted first. At SHLD, the result is undefined if Ls denotes the same register as Ld
or Ldf.
Format Notation Operation insert
Rn SHLI Rd, n Rd := Rd << by n; -- 0..31 zeros
Ln SHLDI Ld, n Ld//Ldf := Ld//Ldf << by n; -- 0..31 zeros
LL SHL Ld, Ls Ld := Ld << by Ls(4..0); -- 0..31 zeros
LL SHLD Ld, Ls Ld//Ldf := Ld//Ldf << by Ls(4..0); -- 0..31 zeros
The condition flags are set or cleared by all Shift Left instructions as follows:
Z := Ld = 0 or Rd = 0 on single-word;
Z := Ld//Ldf = 0 on double-word;
N := Ld(31) or Rd(31);
V := undefined
C := undefined;
Note: The symbol << signifies "shifted left".
Register
L0 : $FFFF
L1 : $2
Instruction
SHLI L0, $4 ; L0 = $000FFFF0
SHL L0, L1 ; L0 = $0003FFFC
Page 85
Instruction Set 3-23
3.15 Shift Right Instructions
The destination operand is shifted right by a number of bit positions specified
at SARI, SARDI, SHRI, SHRDI by n = 0..31 as a shift by 0..31.
at SAR, SARD, SHR, SHRD by bits 4..0 of the source operand as a shift by 0..31.
The higher-order bits of the source operand are ignored.
The destination operand is interpreted
at SAR and SARI as a signed integer;
at SARD and SARDI as a signed double-word integer;
at SHR and SHRI as a bitstring of 32 bits or as an unsigned integer;
at SHRD and SHRDI as a double-word bitstring of 64 bits or as an unsigned
double-word integer.
All Shift Right instructions that interpret the destination operand as signed insert sign bits,
all others insert zeros in the vacated bit positions at the left.
The double-word Shift Right instructions execute in two cycles. The high-order operand in
Ld is shifted first. At SARD and SHRD, the result is undefined if Ls denotes the same
register as Ld or Ldf.
Format Notation Operation insert
Rn SARI Rd, n Rd := Rd >> by n; -- 0..31 sign bits
Ln SARDI Ld, n Ld//Ldf := Ld//Ldf >> by n; -- 0..31 sign bits
LL SAR Ld, Ls Ld := Ld >> by Ls(4..0); -- 0..31 sign bits
LL SARD Ld, Ls Ld//Ldf := Ld//Ldf >> by Ls(4..0); -- 0..31 sign bits
Rn SHRI Rd, n Rd := Rd >> by n; -- 0..31 zeros
Ln SHRDI Ld, n Ld//Ldf := Ld//Ldf >> by n; -- 0..31 zeros
LL SHR Ld, Ls Ld := Ld >> by Ls(4..0); -- 0..31 zeros
LL SHRD Ld, Ls Ld//Ldf := Ld//Ldf >> by Ls(4..0); -- 0..31 zeros
The condition flags are set or cleared by all Shift Right instructions as follows:
Z := Ld = 0 or Rd = 0 on single-word;
Z := Ld//Ldf = 0 on double-word;
N := Ld(31) or Rd(31);
C := last bit shifted out is "one";
The destination operand is shifted left by a number of bit positions and the bits shifted out
are inserted in the vacated bit positions; thus, the destination operand is rotated. The
condition flags are set or cleared accordingly. Bits 4..0 of the source operand specify a
rotation by 0..31 bit positions; bits 31..5 of the source operand are ignored.
The destination operand is interpreted as a bitstring of 32 bits.
Format Notation Operation
LL ROL Ld, Ls Ld := Ld rotated left by Ls(4..0);
Z := Ld = 0;
N := Ld(31);
V := undefined;
C := undefined;
Note: The condition flags are set or cleared by the same rules applying to the Shift Left
instructions.
Register
L0 : $C000FFFF
L1 : $4
Instruction
ROL L0, L1 ; L0 = $000FFFFC
Page 87
Instruction Set 3-25
3.17 Index Move Instructions
The source operand is placed shifted left by 0, 1, 2 or 3 bit positions in the destination
register, corresponding to a multiplication by 1, 2, 4 or 8. At XM1..XM4, a trap to Range
Error occurs if the source operand is higher than the immediate operand lim (upper bound).
All condition flags remain unchanged. All operands and the result are interpreted as
unsigned integers.
The SR must not be denoted as a source nor as a destination, nor the PC as a destination
operand; these notations are reserved for future expansion. When the PC is denoted as a
source operand, a trap to Range Error occurs if PC ≥ lim.
Note: The Index Move instructions move an index value scaled (multiplied by 1, 2, 4 or 8).
XM1..XM4 check also the unscaled value for an upper bound, optionally also excluding
zero. If the lower bound is not zero or one, it may be mapped to zero by subtracting it from
the index value before applying an Index Move instruction.
Register
L0 : $456
L1 : $123
Instruction
XM2 L0, L1, 124 ; L0 = $246
XM2 L0, L1, 122 ; Integer Range Error in Task at Address XXXXXXXX
XX2 L0, L1, 0 ; L0 = $246
Page 88
3-26 CHAPTER 3
3.18 Check Instructions
The destination operand is checked and a trap to Range Error occurs
at CHK if the destination operand is higher than the source operand,
at CHKZ if the destination operand is zero.
All registers and all condition flags remain unchanged. All operands are interpreted as
unsigned integers.
CHKZ shares its basic OP-code with CHK, it is differentiated by denoting the SR as source
operand.
Format Notation Operation
RR CHK Rd, Rs if Rs does not denote SR and Rd > Rs then
trap ⇒ Range Error;
RR CHKZ Rd, 0 if Rs denotes SR and Rd = 0 then
trap ⇒ Range Error;
When Rs denotes the PC, CHK traps if Rd ≥ PC. Thus, CHK, PC, PC always traps. Since
CHK, PC, PC is encoded as 16 zeros, an erroneous jump into a string of zeros causes a trap
to Range Error, thus trapping some address errors.
Note: CHK checks the upper bound of an unsigned value range, implying a lower bound of
zero. If the lower bound is not zero, it can be mapped to zero by subtracting it from the
value to be checked and then checking against a corrected upper bound (lower bound also
subtracted). When the upper bound is a constant not exceeding the range of lim, the Index
instructions may be used for bounds checks.
CHKZ may be used to trap on uninitialized pointers with the value zero.
3.19 No Operation Instruction
The instruction CHK, L0, L0 cannot cause any trap. Since CHK leaves all registers and
condition flags unchanged, it can be used as a No Operation instruction with the notation:
Format Notation Operation
RR NOP no operation;
Note: The NOP instruction may be used as a fill instruction.
Page 89
Instruction Set 3-27
3.20 Compare Instructions
Two operands are compared by subtracting the source operand or the immediate operand
from the destination operand. The condition flags are set or cleared according to the result;
the result itself is not retained. Note that the N flag indicates the correct compare result
even in the case of an overflow.
All operands and the result are interpreted as either all signed or all unsigned integers.
Format Notation Operation
RR CMP Rd, Rs result := Rd - Rs;
Z := Rd = Rs; -- result is zero
N := Rd < Rs signed; -- result is true negative
V := overflow;
C := Rd < Rs unsigned; -- borrow
Rimm CMPI Rd, imm result := Rd - imm;
Z := Rd = imm; -- result is zero
N := Rd < imm signed; -- result is true negative
V := overflow;
C := Rd < imm unsigned; -- borrow
When the SR is denoted as a source operand at CMP, C is subtracted instead of SR. The
notation is then:
Format Notation Operation
RR CMP, Rd, C result := Rd - C;
Z := Rd = C; -- result is zero
N := Rd < C signed; -- result is true negative
V := overflow;
C := Rd < C unsigned; -- borrow
3.21 Compare Bit Instructions
The result of a bitwise logical AND of the source or immediate operand and the destination
operand is used to set or clear the Z flag accordingly; the result itself is not retained.
All operands and the result are interpreted as bitstrings of 32 bits each.
Format Notation Operation
RR CMPB Rd, Rs Z := (Rd and Rs) = 0;
Rimm CMPBI Rd, imm Z := (Rd and imm) = 0;
The following instruction is a special case of CMPBI differentiated by n = 0 (see section
4.3.1. Table of Immediate Values):
Format Notation Operation
Rimm CMPBI Rd, ANYBZ Z := Rd(31..24) = 0 or Rd(23..16) = 0 or
Rd(15..8) = 0 or Rd(7..0) = 0;
-- any Byte of Rd = 0
Page 90
3-28 CHAPTER 3
3.22 Test Leading Zeros Instruction
The number of leading zeros in the source operand is tested and placed in the destination
register. A source operand equal to zero yields 32 as a result. All condition flags remain
unchanged.
Format Notation Operation
LL TESTLZ Ld, Ls Ld := number of leading zeros in Ls;
3.23 Set Stack Address Instruction
The frame pointer FP is placed, expanded to the stack address, in the destination register.
The FP itself and all condition flags remain unchanged. The expanded FP address is the
address at which the content of L0 would be stored if pushed onto the memory part of the
stack.
The Set Stack Address instruction shares the basic OP-code SETxx, it is differentiated by
n = 0 and not denoting the SR or the PC.
n Format Notation Operation
0 Rn SETADR Rd Rd := SP(31..9)//SR(31..25)//00 + carry into bit 9
-- SR(31..25) is FP
-- carry into bit 9 := (SP(8) = 1 and SR(31) = 0)
Note: The Set Stack Address instruction calculates the stack address of the beginning of
the current stack frame. L0..L15 of this frame can then be addressed relative to this stack
address in the stack address mode with displacement values of 0..60 respectively.
Provided the stack address of a stack frame has been saved, for example in a global register,
any data in this stack frame can then be addressed also from within all younger generations
of stack frames by using the saved stack address. (Addressing of local variables in older
generations of stack frames is required by all block oriented programming languages like
Pascal, Modula-2 and Ada.)
The basic OP-code SETxx is shared as indicated:
• n = 0 while not denoting the SR or the PC differentiates the Set Stack Address
instruction.
• n = 1..31 while not denoting the SR or the PC differentiate the Set Conditional
instructions.
• Denoting the SR differentiates the Fetch instruction.
• Denoting the PC is reserved for future use.
3.24 Set Conditional Instructions
The destination register is set or cleared according to the states of the condition flags
specified by n. The condition flags themselves remain unchanged.
The Set Conditional instructions share the basic OP-code SETxx, they are differentiated by
n = 1..31 and not denoting the SR or the PC.
Page 91
Instruction Set 3-29
Format is Rn
n Notation or Alternative Operation
1 Reserved
2 SET1 Rd Rd := 1;
3 SET0 Rd Rd := 0;
4 SETLE Rd if N = 1 or Z = 1 then Rd := 1 else Rd := 0;
5 SETGT Rd if N = 0 and Z = 0 then Rd := 1 else Rd := 0;
6 SETLT Rd SETN Rd if N = 1 then Rd := 1 else Rd := 0;
7 SETGE Rd SETNN Rd if N = 0 then Rd := 1 else Rd := 0;
8 SETSE Rd if C = 1 or Z = 1 then Rd := 1 else Rd := 0;
9 SETHT Rd if C = 0 and Z = 0 then Rd := 1 else Rd := 0;
10 SETST Rd SETC Rd if C = 1 then Rd := 1 else Rd := 0;
11 SETHE Rd SETNC Rd if C = 0 then Rd := 1 else Rd := 0;
12 SETE SETZ if Z = 1 then Rd := 1 else Rd := 0;
13 SETNE SETNZ if Z = 0 then Rd := 1 else Rd := 0;
14 SETV Rd if V = 1 then Rd := 1 else Rd := 0;
15 SETNV Rd if V = 0 then Rd := 1 else Rd := 0;
16 Reserved
17 Reserved
18 SET1M Rd Rd := -1;
19 Reserved
20 SETLEM Rd if N = 1 or Z = 1 then Rd := -1 else Rd := 0;
21 SETGTM Rd if N = 0 and Z = 0 then Rd := -1 else Rd := 0;
22 SETLTM Rd SETNM Rd if N = 1 then Rd := -1 else Rd := 0;
23 SETGEM Rd SETNNM Rd if N = 0 then Rd := -1 else Rd := 0;
24 SETSEM Rd if C = 1 or Z = 1 then Rd := -1 else Rd := 0;
25 SETHTM Rd if C = 0 and Z = 0 then Rd := -1 else Rd := 0;
26 SETSTM Rd SETCM Rd if C = 1 then Rd := -1 else Rd := 0;
27 SETHEM Rd SETNCM Rd if C = 0 then Rd := -1 else Rd := 0;
28 SETEM SETZM if Z = 1 then Rd := -1 else Rd := 0;
29 SETNEM SETNZM if Z = 0 then Rd := -1 else Rd := 0;
30 SETVM Rd if V = 1 then Rd := -1 else Rd := 0;
31 SETNVM Rd if V = 0 then Rd := -1 else Rd := 0;
Page 92
3-30 CHAPTER 3
3.25 Branch Instructions
The Branch instruction BR, and any of the conditional Branch instructions when the
branch condition is met, place the branch address PC + rel (relative to the address of the
first byte after the Branch instruction) in the program counter PC and clear the cache-mode
flag M; all condition flags remain unchanged. Then instruction execution proceeds at the
branch address placed in the PC.
When the branch condition is not met, the M flag and the condition flags remain unchanged and instruction execution proceeds sequentially.
Besides these explicit Branch instructions, the instructions MOV, MOVI, ADD, ADDI,
SUM, SUB may denote the PC as a destination register and thus be executed as an implicit
branch; the M flag is cleared and the condition flags are set or cleared according to the
specified instruction. All other instructions, except Compare instructions, must not be used
with the PC as destination, otherwise possible Range Errors caused by these instructions
would lead to ambiguous results on backtracking.
Format is PCrel
Notation or alternative Operation Comment
BLE rel if N = 1 or Z = 1 then BR; -- Less or Equal signed
BGT rel if N = 0 and Z = 0 then BR; -- Greater Than signed
BLT rel BN rel if N = 1 then BR; -- Less Than signed
BGE rel BNN rel if N = 0 then BR; -- Greater or Equal signed
BSE rel if C = 1 or Z = 1 then BR; -- Smaller or Equal unsigned
BHT rel if C = 0 and Z = 0 then BR; -- Higher Than unsigned
BST rel BC rel if C = 1 then BR; -- Smaller Than unsigned
BHE rel BNC rel if C = 0 then BR; -- Higher or Equal unsigned
BE rel BZ rel if Z = 1 then BR; -- Equal
BNE rel BNZ rel if Z = 0 then BR; -- Not Equal
BV rel if V = 1 then BR; -- oVerflow
BNV rel if V = 0 then BR; -- Not oVerflow
BR rel PC := PC + rel; M := 0;
Note: rel is signed to allow forward or backward branches.
Instruction
Loop1: MOVI L0, $1234BLE Loop1 ; if N=1 or Z=1 then branch
BNE Loop1 ; if Z=0 then branch
Page 93
Instruction Set 3-31
3.26 Delayed Branch Instructions
The Delayed Branch instruction DBR, and any of the conditional Delayed Branch instructions when the branch condition is met, place the branch address PC + rel (relative to
the address of the first byte after the Delayed Branch instruction) in the program counter
PC. All condition flags and the cache mode flag M remain unchanged.
Then the instruction after the Delayed Branch instruction, called the delay instruction, is
executed regardless of whether the delayed branch is taken or not taken.
When the delayed branch is not taken, the delay instruction is executed like a regular
instruction. The PC and the ILC are updated accordingly and instruction execution
proceeds sequentially.
When the delayed branch is taken, the delay instruction is executed before execution
proceeds at the branch target. The PC (containing the delayed-branch target address) is not
updated by the delay instruction. Any reference to the PC by the delay instruction
references the delayed-branch target address.
In the case of an Error exception caused by a delay instruction succeeding a delayed
branch taken, the location of the saved return PC contains the address of the first byte of
the delay instruction. The saved ILC contains the length (1 or 2 half-words) of the Delayed
Branch instruction. In the case of all other exceptions following a delay instruction
succeeding a delayed branch taken, the location of the saved return PC contains the branch
target address of the delayed branch and the saved ILC is invalid.
The following restrictions apply to delay instructions:
The sum of the length of the Delayed Branch instruction and the delay instruction must not
exceed three half-words, otherwise an arbitrary bit pattern may be supplied and
erroneously used for the second or third half-word of the delay instruction without any
warning.
The Delayed Branch instruction and the delay instruction are locked against any exception
except Reset.
A Fetch or any branching instruction must not be placed as a delay instruction. A
misplaced Delayed Branch instruction would be executed like the corresponding nondelayed Branch instruction to inhibit a permanent exception lock-out.
Format is PCrel
Page 94
3-32 CHAPTER 3
Notation or alternative Operation Comment
DBLE rel if N = 1 or Z = 1 then DBR; -- Less or Equal signed
DBGT rel if N = 0 and Z = 0 then DBR; -- Greater Than signed
DBLT rel DBN rel if N = 1 then DBR; -- Less Than signed
DBGE rel DBNN rel if N = 0 then DBR; -- Greater or Equal signed
DBSE rel if C = 1 or Z = 1 then DBR; -- Smaller or Equal unsigned
DBHT rel if C = 0 and Z = 0 then DBR; -- Higher Than unsigned
DBST rel DBC rel if C = 1 then DBR; -- Smaller Than unsigned
DBHE rel DBNC rel if C = 0 then DBR; -- Higher or Equal unsigned
DBE rel DBZ rel if Z = 1 then DBR; -- Equal
DBNE rel DBNZ rel if Z = 0 then DBR; -- Not Equal
DBV rel if V = 1 then DBR; -- oVerflow
DBNV rel if V = 0 then DBR; -- Not oVerflow
DBR rel PC := PC + rel;
Note: rel is signed to allow forward or backward branches.
Attention: Since the PC seen by the delay instruction depends on the delayed branch
taken or not taken, a delay instruction after a conditional Delayed Branch instruction
must not reference the PC.
Instruction
Loop1:MOVI L0, $1234DBLE Loop1 ; if N=1 or Z=1 then delay branch
ADDI L0, $10 ; => if N=1 or Z=1
; then L0 = L0 + $10, branch to Loop1
DBNE Loop1 ; if Z=0 then delay branch
ADDI L0, $10 ; => if N=1 or Z=1
; then L0 = L0 + $10, branch to Loop1
Page 95
Instruction Set 3-33
3.27 Call Instruction
The Call instruction causes a branch to a subprogram.
The branch address Rs + const, or const alone if Rs denotes the SR, is placed in the
program counter PC. The old PC containing the return address is saved in Ld; the old
supervisor-state flag S is also saved in bit zero of Ld. The old status register SR is saved in
Ldf; the saved instruction-length code ILC contains the length (2 or 3) of the Call
instruction.
Then the frame pointer FP is incremented by the value of the Ld-code (Ld-code = 0 is
interpreted as Ld-code = 16) and the frame length FL is set to six, thus creating a new stack
frame. The cache-mode flag M is cleared. All condition flags remain unchanged. Then
instruction execution proceeds at the branch address placed in the PC.
The value of the Ld-code must not exceed the value of the old FL (FL = 0 is interpreted as
FL = 16), otherwise the beginning of the register part of the stack at the SP could be
overwritten without any warning. Bit zero of const must be 0.
Rs and Ld may denote the same register.
Format Notation Operation
LRconst CALL Ld, Rs, const if Rs denotes not SR then
or CALL Ld, 0, const PC := Rs + const;
else
PC := const;
Ld := old PC(31..1)//old S;
Note: At the new stack frame, the saved PC is located in L0 and the saved SR is located in
L1.
A Frame instruction must be executed immediately after a Call instruction, otherwise an
Interrupt, Parity Error, Extended Overflow or Trace exception could separate the Call from
the corresponding Frame instruction before the frame pointer FP is decremented to include
(optionally) passed parameters. After a Call instruction, an Interrupt, Parity Error,
Extended Overflow or Trace exception is locked out for one instruction regardless of the
interrupt lock flag L.
The Trap instructions TRAP and any of the conditional Trap instructions when the trap
condition is met, cause a branch to one out of 64 supervisor subprogram entries (see
section 2.4. Entry Tables).
When the trap condition is not met, instruction execution proceeds sequentially.
When the subprogram branch is taken, the subprogram entry address adr is placed in the
program counter PC and the supervisor-state flag S is set to one. The old PC containing the
return address is saved in the register addressed by FP + FL; the old S flag is also saved in
bit zero of this register. The old status register SR is saved in the register addressed by
FP + FL + 1 (FL = 0 is interpreted as FL = 16); the saved instruction-length code ILC
contains the length (1) of the Trap instruction.
Then the frame pointer FP is incremented by the old frame length FL and FL is set to six,
thus creating a new stack frame. The cache-mode flag M and the trace-mode flag T are
cleared, the interrupt-lock flag L is set to one. All condition flags remain unchanged. Then
instruction execution proceeds at the entry address placed in the PC.
The trap instructions are differentiated by the 12 code values given by the bits 9 and 8 of
the OP-code and bits 1 and 0 of the adr-byte (code = OP(9..8)//adr-byte(1..0)). Since
OP(9..8) = 0 does not denote Trap instructions (the code is occupied by the BR instruction),
trap codes 0..3 are not available.
Format is PCadr
Code Notation Operation
4 TRAPLE trapno if N = 1 or Z = 1 then execute TRAP else execute next instruction;
5 TRAPGT trapno if N = 0 and Z = 0 then execute TRAP else execute next instruction;
6 TRAPLT trapno if N = 1 then execute TRAP else execute next instruction;
7 TRAPGE trapno if N = 0 then execute TRAP else execute next instruction;
8 TRAPSE trapno if C = 1 or Z = 1 then execute TRAP else execute next instruction;
9 TRAPHT trapno if C = 0 and Z = 0 then execute TRAP else execute next instruction;
10 TRAPST trapno if C = 1 then execute TRAP else execute next instruction;
11 TRAPHE trapno if C = 0 then execute TRAP else execute next instruction;
12 TRAPE trapno if Z = 1 then execute TRAP else execute next instruction;
13 TRAPNE trapno if Z = 0 then execute TRAP else execute next instruction;
14 TRAPV trapno if V = 1 then execute TRAP else execute next instruction;
15 TRAP trapno PC := adr;
S := 1;
(FP + FL)^ := old PC(31..1)//old S;
(FP + FL + 1)^ := old SR;
FP := FP + FL; -- FL = 0 is treated as FL = 16
FL := 6;
M := 0;
T := 0;
L := 1;
Page 97
Instruction Set 3-35
trapno indicates one of the traps 0..63.
Note: At the new stack frame, the saved PC is located in L0 and the saved SR is located in
L1; L2..L5 are free for use as required.
A Frame instruction must be executed before executing any other Trap, Call or Software
instruction or before the interrupt-lock flag L is being cleared, otherwise the beginning of
the register part of the stack at the SP could be overwritten without any warning.
3.29 Frame Instruction
A Frame instruction restructures the current stack frame by
• decrementing the frame pointer FP to include (optionally) passed parameters in the local
register addressing range; the first parameter passed is then addressable as L0;
• resetting the frame length FL to the actual number of registers needed for the current
stack frame.
It also restores the reserve number of 10 registers in the register part of the stack to allow
any further Call, Trap or Software instructions and clears the cache mode flag M.
The frame pointer FP is decremented by the value of the Ls-code and the Ld-code is placed
in the frame length FL (FL = 0 is always interpreted as FL = 16). Then the difference
(available number of registers) - (required number of registers + 10) is evaluated and
interpreted as a signed 7-bit integer.
If the difference is not negative, all the registers required plus the reserve of 10 fit into the
register part of the stack; no further action is needed and the Frame instruction is finished.
If the difference is negative, the content of the old stack pointer SP is compared with the
address in the upper stack bound UB. If the value in the SP is equal or higher than the
value in the UB, a temporary flag is set. Then the contents of the number of local registers
equal to the negative difference evaluated are pushed onto the memory part of the stack,
beginning with the content of the local register addressed absolutely by SP(7..2) being
pushed onto the location addressed by the SP. After each memory cycle, the SP is
incremented by four until the difference is eliminated. A trap to Frame Error occurs after
completion of the push operation when the temporary flag is set.
All condition flags remain unchanged.
Page 98
3-36 CHAPTER 3
Format Notation Operation
LL FRAME Ld, Ls FP := FP - Ls code;
-- 10 = number of reserve registers
if difference ≥ 0 then
continue at next instruction;
-- Frame is finished
else
temporary flag := SP ≥ UB;
repeat
memory SP^ := register SP(7..2)^;
-- local register ⇒ memory
SP := SP + 4;
difference := difference + 1;
until difference = 0;
if temporary flag = 1 then
trap ⇒ Frame Error;
Note: Ls also identifies the same source operand that must be denoted by the Return
instruction to address the saved return PC.
Ld (L0 is interpreted as L16) also identifies the register in which the return PC is being
saved by a Trap or Software instruction or by an exception; therefore only local registers
with a lower register code than the interpreted Ld-code of the Frame instruction may be
used after execution of a Frame instruction.
The reserve of 10 registers is to be used as follows:
• A Call, Trap or Software instruction uses six registers.
• A subsequent exception, occurring before a Frame instruction is executed, uses another
two registers.
• Two registers remain in reserve.
Note that the Frame instruction can write into the memory stack at address locations up to
37 words higher than indicated by the address in the UB. This is due to the fact that the
upper bound is checked before the execution of the Frame instruction.
Attention: The Frame instruction must always be the first instruction executed in a
function entered by a Call instruction, otherwise the Frame instruction could be separated
from the preceding Call instruction by an Interrupt, Parity Error, Extended Overflow or
Trace exception (see section 3.27. Call instruction).
The Return instruction returns control from a subprogram entered through a Call, Trap or
Software instruction or an exception to the instruction located at the return address and
restores the status from the saved return status.
The source operand pair Rs//Rsf is placed in the register pair PC//SR. The program counter
PC is restored first from Rs. Then all bits of the status register SR are replaced by Rsf,
except the supervisor flag S, which is restored from bit zero of Rs and except the
instruction length code ILC, which is cleared to zero.
If the return occurred from user to supervisor state or if the interrupt-lock flag L was
changed from zero to one on return from any state to user state, a trap to Privilege Error
occurs. Exception processing saves the restored contents of the register pair PC//SR; an
illegally set S or L flag is also saved.
Then the difference between frame pointer FP - stack pointer SP(8..2) is evaluated and
interpreted as a signed 7-bit integer. If the difference is not negative, the register pointed to
by FP(5..0) is in the register part of the stack; no further action is then required and the
Return instruction is completed.
If the difference is negative, the number of words equal to the negative difference are
pulled from the memory part of the stack and transferred to the register part of the stack,
beginning with the contents of the memory location SP - 4 being transferred to the local
register addressed absolutely by bits 7..2 of SP - 4. After each memory cycle, the SP is
decremented by four until the difference is eliminated.
The Return instruction shares its basic OP-code with the Move Double-Word instruction. It
is differentiated from it by denoting the PC as destination register Rd.
The PC or the SR must not be denoted as a source operand; these notations are reserved for
future expansion.
Format Notation Operation
RR RET PC, Rs old S := S;
old L := L;
PC := Rs(31..1)//0;
SR := Rsf(31..21)//00//Rs(0)//Rsf(17..0);
-- ILC := 0;
-- S := Rs(0);
if old S = 0 and S = 1 or
S = 0 and old L = 0 and L = 1 then
trap ⇒ Privilege Error;
difference(6..0) := FP - SP(8..2);
-- difference is signed, difference(6) = sign bit
if difference ≥ 0 then
continue at next instruction;
-- RET is finished
else
repeat
SP := SP - 4;
register SP(7..2)^ := memory SP^;
-- memory ⇒ local register
difference := difference + 1;
until difference = 0;
Page 100
3-38 CHAPTER 3
3.31 Fetch Instruction
The instruction execution is halted until a number of at least n/2 + 1 (n = 0, 2, 4..30)
instruction half-words succeeding the Fetch instruction are prefetched in the instruction
cache. Since instruction words are fetched, one more half-word may be fetched. The
number n/2 is derived by using bits 4..1 of n, bit 0 of n must be zero.
The Fetch instruction must not be placed as a delay instruction; when the preceding branch
is taken, the prefetch is undefined.
The Fetch instruction shares the basic OP-code SETxx, it is differentiated by denoting the
SR for the Rd-code (see section 2.3. Instruction Formats).
n Format Notation Operation
0 Rn FETCH 1 Wait until 1 instruction half-word is fetched;
. . .
. . .
. . .
30 Rn FETCH 16 Wait until 16 instruction half-words are fetched
Note: The Fetch instruction supplements the standard prefetch of instruction words. It may
be used to speed up the execution of a sequence of memory instructions by avoiding
alternating between instruction and data memory pages. By executing a Fetch instruction
preceding a sequence of memory instructions addressing the same data memory page, the
memory accesses can be constrained to the data memory page by prefetching all required
instructions in advance.
A Fetch instruction may also be used preceding a branch into a program loop; thus,
flushing the cache by the first branch repeating the loop can be avoided.
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.