Hynix Offices in Korea or Distributors and Representatives listed at address directory may
serve additional information of this manual.
Hynix reserves the right to make changes to any Information here in at any time without
notice.
The information, diagrams, and other data in this manual are correct and reliable;
however, Hynix is in no way responsible for any violations of patents or other rights of
the third party generated by the use of this manual.
Specifications and information in this document are subject to change without notice and do
not represent a commitment on the part of Hynix. Hynix reserves the right to make changes
to improve functioning. Although the information in this document has been carefully
reviewed, Hynix does not assume any liability arising out of the use of the product or circuit
described herein.
Hynix does not authorize the use of the Hynix microprocessor in life support applications
wherein a failure or malfunction of the microprocessor may directly threaten life or cause
injury. The user of the Hynix microprocessor in life support applications assumes all risks of
such use and indemnifies Hynix against all damages.
For further information please contact:
SEOUL OFFICE : Hynix YOUNG DONG Bldg.
891, Daechi-dong, Kangnam-gu,
Seoul, Korea.
PHONE : (02) 3459-3662~3
FAX : (02) 3459-3942
SYSTEM IC : 1, Hyangjeong-dong, Hungduk-gu,
Cheongju, 361-725, Korea.
PHONE : (0431) 270-4030~47
FAX : (0431) 270-4075
Copyright 2001Hynix Semiconductor Inc.
Revision Jun. 29, 2001.
The HME GMS30C2232 and GMS30C2216 RISC/DSP is an improved version of
HME’s existing GMS30C2132 and GMS30C2116 RISC/DSP. Using a 0.35 µm CMOS
technology, the performance of the RISC/DSP could be further improved. Being pincompatible to their predecessors, these new RISC/DSP can be used as a direct replacement
in existing customer’s designs.
The GMS30C2216 and GMS30C2232 RISC/DSP are based on hyperstone architecture.
• On chip DRAM controller : FPM(Fast-Page-Mode), (Extended-Data-Out) EDO DRAMs.
• 5.0V Tolerant Input
• Control CLKOUT pin Function
This combination of a high-performance RISC microprocessor with an additional powerful
DSP instruction set and on-chip microcontroller functions offers a high throughput. The
speed is obtained by an optimized combination of the following features:
• Pipelined memory access allows overlapping of memory accesses with execution.
• 8KByte on-chip memory.
• On-chip instruction cache omits instruction fetch in inner loops and provides prefetch.
• Variable-length instructions of 16, 32 or 48 bits provide a large, powerful instruction set,
thereby reducing the number of instructions to be executed.
• Primarily used 16-bit instructions halve the memory bandwidth required for instruction
fetch in comparison to conventional RISC architectures with fixed-length 32-bit
instructions, yielding also even better code economy than conventional CISC
architectures.
• Fast Call and Returnby parameter passing via registers.
0-2 CHAPTER 0
• An instruction pipeline depth of only two stages — decode/execute — provides
branching without insertion of wait cycles in combination with Delayed Branch
instructions.
• Range and pointer checks are performed without speed penalty, thus, these checks need
no longer be turned off, thereby providing higher runtime reliability.
• Separate address and data buses provide a throughput of one 32-bit word each cycle.
The features noted above contribute to reduce the number of idle wait cycles to a bare
minimum. The processor is designed to sustain its execution rate with a standard DRAM
memory.
The low power consumption is of advantage for mobile (portable) applications or in
temperature-sensitive environments.
Most of the transistors are used for the on-chip memory, the instruction cache, the register
stack and the multiplier, whereas only a smallnumber is required for the control logic.
Due to their low system cost, the GMS30C2216 and GMS3OC2232 RISC/DSP are very
well suited for embedded-systems applications requiring high performance and lowest cost.
To simplify board design as well as to reduce system costs, the GMS30C2216 and
GMS30C2232 already come with integrated periphery, such as a timer and memory and
bus control logic. Therefore, complete systems with the HME’s microprocessor can be
implemented with a minimum of external components. To connect any kind of memory or
I/O, no glue logic is necessary. It is even suitable for systems where up to now
microprocessors with 16-bit architecture have been used for cost reasons. Its improved
performance compared to conventional microcontrollers can be used to software-substitute
many external peripherals like graphics controllers or DSPs.
The software development tools include an optimizing C compiler, assembler, source-level
debugger with profiler as well as a real-time kernel with an extremely fast response time.
Using this real-time kernel, up to 31 tasks, each with its own virtual timer, can be
developed independently of each other. The synchronization of these tasks is effected
almost automatically by the real-time kernel. To the developer, it seems as if he has up to
31 HME’s microprocessors to which he can allocate his programs accordingly. Real-time
debugging of multiple tasks is assisted in an optimized way.
The following description gives a brief architectural overview:
Compatibility:
• Pin compatible to HME GMS30C2116/32, and hyperstone E1-16/32
• Pin and Function Compatible to hyperstone E1-16/32X
PLL(Phased Locked Loop):
• An internal phased locked loop circuit (PLL) provides clock rate multiplication by a
factor of four, only an external crystal of 27MHz is required to achieve an internal clock
rate of 108MHz.
Overview 0-3
Registers:
• 32 global and 64 local registers of 32 bits each
• 16 global and up to 16 local registers are addressable directly
Flags:
• Zero(Z), negative(N), carry(C) and overflow(V) flag
• Compare bits, Compare bits immediate, Compare any byte zero
• Test number of leading zeros
• Set Conditional, save conditions in a register
• Branch unconditional and conditional (12 conditions)
• Delayed Branch unconditional and conditional (12 conditions)
• Call subprogram, unconditional and on overflow
• Trap to supervisor subprogram, unconditional and conditional (11 conditions)
• Frame, structure a new stack frame, include parameters in frame addressing, set frame
length, restore reserve frame length and check for upper stack bound
• Return from subprogram, restore program counter, status register and return-frame
• Software instruction, call an associated subprogram and pass a source operand and the
address of a destination operand to it
• DSP Multiply instructions:
signed and/or unsigned multiplication ⇒ single and double word product
• DSP Multiply-Accumulate instructions:
signed multiply-add and multiply-subtract ⇒ single and double word product sum and
difference
• DSP Halfword Multiply-Accumulate instructions:
signed multiply-add operating on four halfword operands ⇒ single and double word
product sum
• DSP Complex Halfword Multiply instruction:
signed complex halfword multiplication ⇒ real and imaginary single word product
• DSP Complex Halfword Multiply-Accumulate instruction:
signed complex halfword multiply-add ⇒ real and imaginary single word product sum
0-6 CHAPTER 0
• DSP Add and Subtract instructions:
signed halfword add and subtract with and without fixed-point adjustment ⇒ single
word sum and difference
• Floating-point instructions are architecturally fully integrated, they are executed as
Software instructions by the present version. Floating-point Add, Subtract, Multiply,
Divide, Compare and Compare unordered for single and double-precision, and Convert
single ⇔ double are provided.
Exceptions:
• Pointer, Privilege, Frame and Range Error, Extended Overflow, Parity Error, Interrupt
and Trace mode exception
• Watchdog function
• Error-causing instructions can be identified by backtracking, thus allowing a very
detailed error analysis
Timer:
• Two multifunctional timers
Bus Interface:
• Separate address bus of 26 (GMS30C2232) or 22 (GMS30C2216) bits and data bus of
up to 32 (GMS30C2232) or 16 bits (GMS30C2216) provide a throughput of four or
two bytes at each clock cycle
• Data bus width of 32, 16 or 8 bits, individually selectable for each external memory area.
• 8-bit, 16-bit, and 32-bit boot width selectable via two external pins.
• 5V tolerant input
• Configurable I/O pins
• Internal generation of all memory and I/O control signals
• Wait pin function for I/O accesses to peripheral devices.
• Wait pin function for memory accesses to address space MEM2.
• On-chip DRAM controller supporting Fast-Page-Mode DRAMs and EDO DRAMs.
• Up to seven vectored interrupts
• Control function for CLKOUT pin.
Power Management:
• Operating voltage : 3.3V ± 0.3V.
• Lower power supply current in power-down mode.
• Clock-Off function to further reduce power dissipation (Sleep Mode)
Overview 0-7
DataBus Parity
Bus Interface
Control Unit
Bus Pipeline
Control
32
26
8 kByte
RAM
12
Execution
32
(22)
4
64 Local
26 Global
Y-Decode
0.2 Block Diagram
Register Set
XYPC
XY
ALU
Barrel shifter
ZWA
X-Decode
Instruction
Load
Instruction
Cache
Decode
Cache
Control
Instruction
Decode
I
X Y
DSP
Instruction
Control Unit
Execution
Unit
Hardware-
Multiplier
Instruction Prefetch
Control Unit
Store Data
Pipeline
(16)
(2)
Figure 0.1: Block Diagram
Address
Bus
Memory Address
Pipeline
Watchdog
Power
Down+
Reset
Control
Internal
Timer
Interrupt
control
4
Control
Bus
0-8 CHAPTER 0
213456789
101112131415161718192021222324
25
108
107
106
105
104
103
102
101
10099989695
949392919089888786
85
84
97
VCC
GND
IO3
IOWR#
CS3#
CS2#
CS1#
GND
RAS#
A19
VCC
A20
A21
GND
D31
D30
D29
A9
A10
A11
A12
VCC
D28
D27
D26
GND
WE2# /BE2#
IORD#
OE#
VCC
CAS3#
CAS2#
CAS1#
GND
XTAL1/CLKIN
XTAL2
IO2
VCC
D16
D17
D18A3A2A1A0
GND
DP1
DP0
838281
BOOTW
CLKOUT
IO1
GND
RQST
INT4
INT3 /WAIT
INT2
INT1
GND
VCC
2627282930313233343536
GND
D25
D15
D14
VCC
D13
D12
D11
D10
GND
VCC
VCC
WE3# /BE3#NCNCNCNC
109
110
111
112
113
114
115
116
117
118
119
120
37383940NCNCNC
NC
0.3 Pin Configuration
0.3.1 GMS30C2232, 160-Pin MQFP-Package - View from Top Side
Power. Connected to the power supply. It can be 3.3V power
Ground. Connected to the system ground. All GND pins must
Input for Quartz Clock. When the clock is generated by
to
Address Bus. With the GMS30C2232, only A22..A0 are
Row Address Strobe. RAS# is activated when the processor
accesses a DRAM or refresh cycle. When a SRAM is placed in
M for
Write Enable. Active low indicates a write access, active high
Chip Select. Active low of CS1#..CS3# indicates chip select
SRAM Write Enable. Active low indicates write enable for the
I/O Read Strobe, optionally I/O Data Strobe. The use of
Bus Grant. GRANT# is signaled low by an bus arbiter to grant
master. ACT is signaled high when GRANT# is
Interrupt Request A signal of INT1..INT4 interrupt request
pins causes an interrupt exception when interrupt lock flag L is
Output Port. IO1..IO3 can be individually
configured via IOxDirection bits in the FCR as either input or
T# low resets the processor to the initial
state and halts all activity. RESET# must be low for at least
0.3.3 Pin Function
Type Name State Use
Power VCC I
GND I
Clock XTAL1 I
XTAL2 O Output for Quartz Clock.
CLKOUT O Clock Signal Output. It can be used to supply a clock signal
Address Bus A25..A0 O/Z
Data Bus D31..D0 I/O Data Bus. 32-bit bidirectional data bus
DP0..DP3 I/O Data Parity Signal. Bidirectional parity signals
Bus Control RAS# O/Z
supply.
be connected to the system ground.
external clock generator, XTAL1 is used as clock input.
peripheral devices.
connected to the address bus pins
CAS0#..CAS3# O/Z Column Address Strobe. They are only used by a DRA
WE# O/Z
CS1#..CS3# O/Z
WE0#..WE3#
OE# O/Z Output Enable for SRAMs and EPROMs.
IORD# O/Z
IOWR# O/Z I/O Write Strobe.
Bus Control RQST O RQST signals the request for a memory or I/O access
GRANT# I
ACT O Active as bus
Interrupt INT1..INT4 I
O/Z
MEM0, RAS# is used as the chip select signal
column access cylices and for “CAS before RAS” refresh.
indicates a read access.
for the memory areas MEM1..MEM3.
corresponding byte.
IORD# is specified in the I/O address bit 10.
access to the bus for memory and I/O cycles
low and it is kept high during a current bus access
I/O Port IO1..IO3 I/O General Input-
System Control
RESET# I Reset Processor. RESE
clear and the corresponding INTxMask bit in FCR is not set.
output pins (port).
two cycles
ARCHITECTURE 1-1
1. Architecture
1.1 Introduction
1.1.1 RISC Architecture
In the early days of computer history, most computer families started with an instruction
set which was rather simple. The main reason for being simple then was the high cost for
hardware. The hardware cost has dropped and the software cost has gone up steadily in the
past three decades.
The net result is that more and more functions have been built into the hardware, making
the instruction set very large and very complex. The growth of instruction sets was also
encouraged by the popularity of microprogrammed control in the 1960s and 1970s. Even
user-defined instruction sets were implemented using microcodes in some processors for
special-purpose applications.
The evolution of computer architectures has been dominated by families of increasingly
complex processors. Under market pressures to preserve existing software, Complex Instruction Set Computer (CISC) architectures evolved by the gradual addition of
microcode and increasingly elaborate operations. The intent was to supply more support
for high-level languages and operating systems, as semiconductor advances made it
possible to fabricate more complex integrated circuits. It seemed self-evident that
architectures should become more complex as these technological advances made it
possible to hold more complexity on VLSI devices.
In recent years, however, Reduced Instruction Set Computer (RISC) architectures have
implemented a much more sophisticated handling of the complex interaction between
hardware, firmware and software. RISC concepts emerged from statistical analysis of how
software actually uses the resources of a processor. Dynamic measurement of system
kernels and object modules generated by optimizing compilers show an overwhelming
predominance of the simplest instruction, even in the code for CISC machine. Complex
instructions are often ignored because a single way of performing a complex operation
needs of high-level language and system environments. RISC designs eliminate the
microcoded routines and turn the low-level control of the machine over to software.
This approach is not new. But its application is more universal in recent years thanks to the
prevalence of high-level languages, the development of compilers that can optimize at the
microcode level, and dramatic advances in semiconductor memory and packaging. It is
now feasible to replace machine microcode ROM with faster RAM, organized as an
instruction cache. Machine control then resides in the instruction cache and is, in fact,
customized on the fly. The instruction stream generated by system- and compiler-generated
code provides a precise fit between the requirements of high-level software and the
capabilities of the hardware. So compilers are playing a vital role in RISC performance.
The advantage of RISC architecture is described as follows:
• Simplicity made VLSI implementation possible and thus higher clock rates.
• Hardwired control and separated data and program caches lower the average CPI
(Cycles per Instruction) significantly.
1-2 CHAPTER 1
• Dynamic instruction count in a RISC program only increased slightly (less than 2)
inordinary program.
• Recently, the MIPS (Million Instructions per Second) rate of a typical RISC
microprocessor increased with a factor of 5/(2*0.1) = 25 times from that of a typical
CISC microprocessor.
• The clock rate increased from 10 MHz on a CISC processor to 50 MHz on a CMOS/
RISC microprocessor.
• The instruction count in a typical RISC program increased less than 2 times form that of
a typical CISC program.
• The average CPI for a RISC microprocessor decreased to 1.2 (instead of 12 as in a
typical CISC processor).
1.1.2 Techniques to reduce CPI (Cycles per Instruction)
If the work each instruction performs is simple and straightforward, the time required to
execute each instruction can be shortened and the number of cycles reduced. The goal of
RISC designs has been to achieve an execution rate of one instruction per machine cycle
(multiple-instruction-issue designs now seek to increase this rate to more than one
instruction per cycle). Techniques that help achieve this goal include:
• Instruction pipelines
• Load and store (load/store) architecture
• Delayed load instructions
• Delayed branch instructions
(1) Instruction Pipelines
One way to reduce the number of cycles required to execute an instruction is to overlap the
execution of multiple instructions. Instruction pipelines divide the execution of each
instruction into several discrete portions and then execute multiple instructions
simultaneously. The instruction pipeline technique can be likened to an assembled line the instruction progresses from one specialized stage to the next until it is complete (or
issued) - just as an automobile moves along an assembly line. (This is contrast to the
nonpipeline, microcode approach, where all the work is done by one general unit and is
less capable at each individual task.) For example, the execution of an instruction might be
subdivided into four portions, or clock cycles, as shown in Figure 1.1:
Cycle
#1
Fetch
Instruction
(F)
Cycle
#2
ALU
Operation
(A)
Cycle
#3
Access
Memory
(M)
Cycle
#1
Write
Results
(W)
Figure 1.1: Functional Division of a Hypothetical Pipeline
ARCHITECTURE 1-3
An Instruction pipeline can potentially reduce the number of cycles/instructions by a factor
equal to the depth of the pipeline (the depth of the pipeline = the number of resource). For
example, in Figure 3.2 each instruction still requires a total of four clock cycles to execute.
However, if the four-level instruction-pipeline is used, a new instruction can be initiated at
each clock cycle and the effective execution rate is one cycle per instruction.
Clock Cycles
Instruction
#1
FAMW
#2
FAMW
#3
FAMW
#4
FAMW
Figure 1.2: Multiple Instructions in a Hypothetical Pipeline
(2) Load/Store Architecture
The discussion of the instruction pipeline illustrates how each instruction can be
subdivided into several discrete parts that permit the processor to execute multiple
instructions in parallel. For this technique to work efficiently, the time required to execute
each instruction subpart should be approximately equal. If one part requires an excessive
length of time, there is an unpleasant choice: either halting the pipeline (inserting wait or
idle cycles), or making all cycles longer to accommodate this lengthier portion of the
instruction.
Instructions that perform operations on operands in memory tend to increase either the
cycle time or the number of cycles/instruction. Such instruction require additional time for
execution to calculate the addresses of the operands, read the required operands from
memory, calculate the result, and store the results of the operation back to memory. To
eliminate the negative impact of such instruction, RISC designs implement a load and store
(load/store) architecture in which the processor has many register, all operations are
performed on operands held in processor registers, and main memory is accessed only by
load and store instructions.
This approach produces several benefits
• Reducing the number of memory accesses eases memory bandwidth requirements
• Limiting all operations to registers helps simplicity the instruction set
• Eliminating memory operations makes it easier for compilers to optimize register
allocation - this further reduces memory accesses and also reduces the instructions/task
factor
1-4 CHAPTER 1
All of these factors help RISC design approach their goal of executing one
cycle/instruction. However, two classes of instructions hinder achievement of this goal load instructions and branch instructions. The following sections discuss how RISC
designs overcome obstacles raised by these classes of instructions.
(3) Delayed Load Instructions
Load instruction read operands from memory into processor register for subsequent
operation by other instructions. Because memory typically operates at much slower speeds
than processor clock rates, the loaded operand is not immediately available to subsequent
instructions in an instruction pipeline. The data dependency is illustrated in Figure 1.3.
Load
Instruction
1
Figure 1.3: Data Dependency Resulting From a Load Instruction
FAMW
2
FAMW
3
FAMW
4
FAMW
Data from Load
available as operation
In this illustration, the operand loaded by instruction 1 is not available for use in the A
cycle (ALU, or Arithmetic/Logic Unit operation) of instruction 2. One way to handle this
dependency is to delay the pipeline by inserting additional clock cycles into the execution
of instruction 2 until the loaded data becomes available. This approach obviously
introduces delays that would increase the cycles/instructions factor.
In many RISC design the technique used to handle this data dependency is to recognize
and make visible to compilers the fact that all load instructions have an inherent latency or
load delay. Figure 3.3 illustrates a load delay or latency of one instruction. The instruction
that immediately follows the load is in the load delay slot. If the instruction in this slot does
not require the data from the load, and then no pipeline delay is required.
If this load delay is made visible to software, a compiler can arrange instructions to ensure
that there is no data dependency a load instruction and the instruction in the load delay slot.
The simplest way of ensuring that there is no data dependency is to insert a No Operation
(NOP) instruction to fill the slot, as follow:
Load R1, A
Load R2, B
NOP <= This instruction fills the delay slot
ADD R3, R1, R2
Although filling the delay slot with NOP instructions eliminates the need for hardwarecontrolled pipeline stalls in this case, it still is not a very efficient use of the pipeline stream
ARCHITECTURE 1-5
since these additional NOP instructions increase code size and perform no useful work. (In
practice, however, this technique need not have much negative impact on performance.)
A more effective solution to handling the data dependency is to fill the load delay slot with
a useful instruction. Good optimizing compilers can usually accomplish this, especially if
the load delay is only one instruction. Below example program illustrates how a compiler
might rearrange instruction to handle a potential data dependency.
# Consider the code for C := A+B; F := D
Load R1, A
Load R2, B
Add R2, R1, R2 <= This instruction stalls because R2 data is not available
Load R4, D
..... ....
# An alternative code sequence (where delay length = 1)
Load R1, A
Load R2, B
Load R4, D
Add R3, R1, R2 <= No stall since R2 data is available
(4) Delayed Branch Instructions
Branch instructions usually delay the instruction pipeline because the processor must
calculate the effective destination of the branch and fetch that instruction. When a cache
access requires an entire cycle, and the fetched branch instruction specifies the target
address, it is impossible to perform this fetch (of the destination instruction) without
delaying the pipeline for at least one pipe stage (one cycle). Conditional branches can
cause further delays because they require the calculation of a condition, as well as the
target address.
Instead of stalling the instruction pipeline to wait for the instruction at the target address,
RISC designs typically use an approach similar to that used with Load instruction: Branch
instructions are delayed and do not take effect until after one or more instructions
immediately following the Branch instruction have been executed. The instruction or
instructions immediately following the Branch instruction (delay instruction) have been
executed. Branch and delayed branch instruction are illustrated in Figure 1.4
Condition ?
Delayed Branch
Condition ?
NO
YES
Branch Target
Delay Instruction
Next Instruction
Next Instruction
Branch InstructionDelayed Branch Instruction
Figure 1.4: Block Diagram of Branch/Delayed Branch Instruction
YES
NO
Branch Target
1-6 CHAPTER 1
1. The instruction is read from the instruction cache
The control signal of Rd (destination operand) and Rs
(source operand) is activated according to the instruction
2.1 The control signal of IR (immediate register
ess of next instruction is calculated and saved
ister stack using the
2.1 The control of ALU datapath is made and instruction
in the register
Additional ALU operation is continued and its result is
1.1.3 The pipeline structure of GMS30C2232
GMS30C2232 has a two-stage pipeline structure and each stage is composed of two phases
(TM and TV). The basic structure of GMS30C2232 pipeline is two-stage pipeline, but
actually it is lengthened by the need of some instruction. As a example, standard ALU
instruction uses 5 phases (2 stage pipeline (4 phases) + additional 1 phase). This additional
phase doesn’t use the datapath which is used next instruction, so next instruction execution
need not wait until previous ALU instruction is ended. DSP instruction takes over 2 stage
pipeline for execution, and requires same resource in the datapath which is required to next
DSP instruction. So next DSP instruction is delayed.
The pipeline structure of GMS30C2232 and the action of datapath is described in Table 1.1.
Stage Phase Datapath Action
Fetch/Decode TM (Low)
TV (High) 2.
according to the address of instruction.
that was loaded in TM phase
(operand)) and IL (instruction length) is activated.
2.2 The addr
in PC
Execute/Write TM (Low) 1. The next instruction is read from the instruction cache.
1.1 The address of Rs and Rs are determined.
1.2 The immediate operand is determined.
1.3 The operand is read from reg
address of Rs and Rd.
1.4 The operand XR, YR and QR are controlled.
TV (High) 2. The input data of ALU is attained.
is executed in ALU.
2.2 The result of ALU operation is saved
file.
Additional
Insertion
Next TM
saved in the register file.
Table 1.1: The pipeline structure of GMS30C2232 and the action of datapath.
ARCHITECTURE 1-7
1.2 Global Register Set
The architecture provides 32 global registers of 32bit each. These are:
G0 Program Counter PC
G1 Status Register SR
G2 Floating-point Exception Register FER
G3..G15 General purpose registers
G16..G17 Reserved
G18 Stack Pointer SP
G19 Upper stack Bound UB
G20 Bus Control Register BCR (see section 6. Bus Interface)
G21 Timer Prescaler Register TPR (see section 5. Timer and CPU Clock
Modes)
G22 Timer Compare Register TCR (see section 5. Timer and CPU Clock
Modes)
G23 Timer Register TR (see section 5. Timer and CPU Clock Modes)
G24 Watchdog Compare Register WCR (see section 6. Bus Interface)
G25 Input Status Register ISR (see section 6. Bus Interface)
G26 Function Control Register FCR (see section 6. Bus Interface)
G27 Memory Control Register MCR (see section 6. Bus Interface)
G28..G31 Reserved
Registers G0..G15 can be addressed directly by the register code (0..15) of an instruction.
Registers G18..G27 can be addressed only by a MOV or MOVI instruction with the high
global flag H set to 1.
(Example)
MOVI G2, 0x20 ; G2 := 0x20 (set H flag)
MOV G3, G19 ; G3 := G19 (G19 (UB) is copied to G3)
1-8 CHAPTER 1
G15
G16
G17
G18
G19
G20
G21
G22
G23
G24
G25
G26
G27
G28
031
G0
G1
G2
G3
Program Counter PC
Status Register SR
Floating-Point Exception Register FER
General Purpose Registers G3..G15
Reserved
Reserved
Stack Pointer SP
Upper Stack Bound UB
Bus Control Register BCR
Timer Prescaler Register TPR
Timer Compare Register TCR
Timer Register TR
Watchdog Compare Register WCR
Input Status Register ISR
0
0000
Function Control Register FCR
Memory Control Register MCR
G28..G31 Reserved
G31
Figure 1.5: Global Register Set
1.2.1 Program Counter PC, G0
G0 is the program counter PC. It is updated to the address of the next instruction through
instruction execution. Besides this implicit updating, the PC can also be addressed like a
regular source or destination register. When the PC is referenced as an operand, the
supplied value is the address of the first byte after the instruction which references it (the
address of next instruction), except when referenced by a delay instruction with a
preceding delayed branch taken. At delay branch instruction, when the branch condition is
met, place the branch address PC + rel (relative to the address of the first byte after the
Delayed Branch Instruction) in the PC (see section 1.26. Delayed Branch Instructions).
Placing a result in the PC has the effect of a branch taken. When branch is taken, the target
address of branch is placed in PC.
Bit zero of the PC is always zero, regardless of any value placed in the PC.
ARCHITECTURE 1-9
1.2.2 Status Register SR, G1
G1 is the status register SR. Its content is updated by instruction execution. Besides this
implicit updating, the SR can also be addressed like a regular register (when H flag is set).
When addressed as source or destination operand, all 32 bits are used as an operand.
However, only bits 15..0 of a result can be placed in bits 15..0 of the SR, bits 31..16 of the
result are discarded and bits 31..16 of the SR remain unchanged. When SR addressed as
source operand, it represents 0x0 value. The full content of the SR is replaced only by the
Return Instruction. A result placed in the SR overrules any setting or clearing of the
condition flags as a result of an instruction.
31 3027 26 25 24 23 22 21 20 19 18 17 16
Figure 1.6: Status Register SR (bits 31..16)
15 1411 109876543210
LI
FRM
2829
FP
Frame PointerFrame Length
1213
FTEVN
FLS
ILC
Instruction-Length Code
MH
P
Supervisor State Flag
Z
T
Trace-Mode Flag
Trace Pending Flag
C
Carry Flag
Zero Flag
Interrupt-Mode Flag
Floating-Point Trap Enable
Floating-Point Rounding Mode
Interrupt-Lock Flag
Figure 1.7: Status Register SR (bits 15..0)
Negative Flag
Overflow Flag
Cache-Mode Flag
High Global Flag
Reserved
1-10 CHAPTER 1
The status register SR contains the following status information:
C Carry Flag. Bit zero is the carry condition flag C. In general, when set it
indicates that the unsigned integer range is exceeded (overflow). At add
operations, it indicates a carry out of bit 31 of the result. At subtract operations,
it indicates a borrow (inverse carry) into bit 31 of the result.
Z Zero Flag. Bit one is the zero condition flag Z. When set, it indicates that all 32
or 64 result bits are equal to zero regardless of any carry, borrow or overflow.
N Negative Flag. Bit two is the negative condition flag N. On compare
instructions, it indicates the arithmetic correct (true) sign of the result
regardless of an overflow. On all other instructions, it is derived from result bit
31, which is the true sign bit when no overflow occurs. In the case of overflow,
result bit 31 and N reflect the inverted sign bit.
V Overflow Flag. Bit three is the overflow condition flag V. In general, when set
it indicates a signed overflow. At the Move instructions, it indicates a floatingpoint NaN (Not a Number).
M Cache-Mode Flag. Bit four is the cache-mode flag M. Besides being set or
cleared under program control, it is also automatically cleared by a Frame
instruction and by any branch taken except a delayed branch. See section
1.8. Instruction Cache for details.
H High Global Flag. Bit five is the high global flag H. When H is set, denoting
G0..G15 addresses G16..G31 instead. Thus, the registers G18..G27 may be
addressed by denoting G2..G11 respectively.
The H flag is effective only in the first cycle of the next instruction after it was
set; then it is cleared automatically.
Only the MOV or MOVI instruction issued as the next instructions must be
used to copy the content of a local register or an immediate value to one of the
high global registers. The MOV instruction may be used to copy the content of
a high global register (except the BCR, TPR, FCR and MCR register, which
are write-only) to a local register. With all other instructions, the result may be
invalid.
If one of the high global registers is addressed as the destination register in user
state (S = 0), the condition flags are undefined, the destination register remains
unchanged and a trap to Privilege Error occurs.
Reserved Bit six is reserved for future use. It must always be zero.
I Interrupt-Mode Flag. Bit seven is the interrupt-mode flag I. It is set
automatically on interrupt entry and reset to its old value by a Return
instruction. The I flag is used by the operating system; it must be never
changed by any user program.
FTE Floating-Point Trap Enable Flag. Bits 12..8 are the floating-point trap enable
flags They determine the Exception type and Trap execution flow(see section
3.33.2. Floating-Point Instructions).
FRM Floating-Point Rounding Mode. Bits 14..13 are the floating-point rounding
modes (see section 3.33.2. Floating-Point Instructions).
Loading...
+ 290 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.