HYNIX GMS30C2232, GMS30C2216 Datasheet

Jun. 29, 2001 Ver. 3.1
16/32 BIT RISC/DSP
GMS30C2216 GMS30C2232
USER’S MANUAL
IDA Team in Hynix Semiconductor Inc.
¨Ï
Hynix Semiconductor 2001. All Right Reserved.
Hynix Offices in Korea or Distributors and Representatives listed at address directory may serve additional information of this manual.
Hynix reserves the right to make changes to any Information here in at any time without notice.
The information, diagrams, and other data in this manual are correct and reliable; however, Hynix is in no way responsible for any violations of patents or other rights of the third party generated by the use of this manual.
Specifications and information in this document are subject to change without notice and do not represent a commitment on the part of Hynix. Hynix reserves the right to make changes to improve functioning. Although the information in this document has been carefully reviewed, Hynix does not assume any liability arising out of the use of the product or circuit described herein.
Hynix does not authorize the use of the Hynix microprocessor in life support applications wherein a failure or malfunction of the microprocessor may directly threaten life or cause injury. The user of the Hynix microprocessor in life support applications assumes all risks of such use and indemnifies Hynix against all damages.
For further information please contact:
SEOUL OFFICE : Hynix YOUNG DONG Bldg. 891, Daechi-dong, Kangnam-gu, Seoul, Korea. PHONE : (02) 3459-3662~3 FAX : (02) 3459-3942 SYSTEM IC : 1, Hyangjeong-dong, Hungduk-gu, Cheongju, 361-725, Korea. PHONE : (0431) 270-4030~47 FAX : (0431) 270-4075
Copyright 2001Hynix Semiconductor Inc. Revision Jun. 29, 2001.
Table of Contents i
Table of Contents
0. Overview
0.1 GMS30C2216/32 RISC/DSP.............................................................................. 0-1
0.2 Block Diagram.................................................................................................... 0-8
0.3 Pin Configuration................................................................................................ 0-9
0.3.1 GMS30C2232, 160-Pin MQFP-Package - View from Top Side........ 0-9
0.3.2 Pin Cross Reference by Pin Name .................................................... 0-10
0.3.2 Pin Cross Reference by Location...................................................... 0-11
0.3.4 Pin Fuction ........................................................................................ 0-12
1. Architecture
1.1 Introduction...................................................................................................... 1-1
1.1.1 RISC Architecture............................................................................... 1-1
1.1.2 Techniques to reduce CPI (Cycles per Instruction)............................. 1-2
1.1.3 The pipeline structure of GMS30C2232............................................. 1-7
1.2 Global Register Set..........................................................................................1-8
1.2.1 Program Counter PC, G0 .................................................................... 1-9
1.2.2 Status Register SR, G1...................................................................... 1-10
1.2.3 Floating-Point Exception Register FER, G2..................................... 1-13
1.2.4 Stack Pointer SP, G18....................................................................... 1-14
1.2.5 Upper Stack Bound UB, G19............................................................ 1-14
1.2.6 Bus Control Register BCR, G20 ....................................................... 1-14
1.2.7 Timer Prescaler Register TPR, G21.................................................. 1-15
1.2.8 Timer Compare Register TCR, G22.................................................. 1-15
1.2.9 Timer Register TR, G23.................................................................... 1-15
1.2.10 Watchdog Compare Register WCR, G24........................................ 1-15
1.2.11 Input Status Register ISR, G25 ....................................................... 1-15
1.2.12 Function Control Register FCR, G26..............................................1-15
1.2.13 Memory Control Register MCR, G27............................................. 1-16
1.3 Local Register Set.......................................................................................... 1-16
1.4 Privilege States .............................................................................................. 1-17
1.5 Register Data Types ....................................................................................... 1-18
1.6 Memory Organization .................................................................................... 1-19
1.7 Stack............................................................................................................... 1-21
1.8 Instruction Cache...........................................................................................1-26
1.9 On-Chip Memory (IRAM)............................................................................. 1-29
ii TABLE of Contents
2. Instructions General
2.1 Instruction Notation..........................................................................................2-1
2.2 Instruction Execution........................................................................................2-2
2.3 Instruction Formats...........................................................................................2-3
2.3.1 Table of Immediate Values..................................................................2-5
2.3.2 Table of Instruction Codes...................................................................2-6
2.3.3 Table of Extended DSP Instruction Codes ..........................................2-7
2.4 Entry Tables......................................................................................................2-8
2.5 Instruction Timing..........................................................................................2-12
3. Instruction Set
3.1 Memory Instructions ........................................................................................3-1
3.1.1 Address Modes.....................................................................................3-2
3.1.2 Load Instructions..................................................................................3-7
3.1.3 Store Instructions ...............................................................................3-10
3.2 Move Word Instructions.................................................................................3-13
3.3 Move Double-Word Instruction .....................................................................3-13
3.4 Logical Instructions........................................................................................3-15
3.5 Invert Instruction ............................................................................................3-16
3.6 Mask Instruction.............................................................................................3-16
3.7 Add Instructions .............................................................................................3-17
3.8 Sum Instructions.............................................................................................3-19
3.9 Subtract Instructions.......................................................................................3-20
3.10 Negate Instructions.......................................................................................3-21
3.11 Multiply Word Instruction............................................................................3-22
3.12 Multiply Double-Word Instructions.............................................................3-22
3.13 Divide Instructions .......................................................................................3-24
3.14 Shift Left Instructions...................................................................................3-26
3.15 Shift Right Instructions.................................................................................3-27
3.16 Rotate Left Instruction..................................................................................3-29
3.17 Index Move Instructions...............................................................................3-20
3.18 Check Instructions........................................................................................3-32
3.19 No Operation Instruction..............................................................................3-32
3.20 Compare Instructions....................................................................................3-33
3.21 Compare Bit Instructions..............................................................................3-34
3.22 Test Leading Zeros Instruction.....................................................................3-34
3.23 Set Stack Address Instruction.......................................................................3-35
3.24 Set Conditional Instructions .........................................................................3-35
3.25 Branch Instructions.......................................................................................3-37
3.26 Delayed Branch Instructions ........................................................................3-39
Table of Contents iii
3.27 Call Instruction ............................................................................................ 3-41
3.28 Trap Instructions.......................................................................................... 3-43
3.29 Frame Instruction......................................................................................... 3-45
3.30 Return Instruction........................................................................................ 3-48
3.31 Fetch Instruction.......................................................................................... 3-50
3.32 Extended DSP Instructions..........................................................................3-51
3.33 Software Instructions...................................................................................3-54
3.33.1 Do Instruction.................................................................................. 3-55
3.33.2 Floating-Point Instructions.............................................................. 3-56
4. Exceptions
4.1 Exception Processing....................................................................................... 4-1
4.2 Exception Types .............................................................................................. 4-2
4.2.1 Reset.................................................................................................... 4-2
4.2.2 Range, Pointer, Frame and Privilege Error ......................................... 4-2
4.2.3 Extended Overflow.............................................................................. 4-3
4.2.4 Parity Error.......................................................................................... 4-3
4.2.5 Interrupt............................................................................................... 4-3
4.2.6 Trace Exception................................................................................... 4-4
4.3 Exception Backtracking................................................................................... 4-4
5. Timer and CPU clock Modes
5.1 Overview.......................................................................................................... 5-1
5.1.1 Timer Prescaler Register TPR............................................................. 5-1
5.1.2 Timer Register TR............................................................................... 5-2
5.1.3 Timer Compare Register TCR ............................................................ 5-3
5.1.4 Power-Down Mode ............................................................................. 5-3
5.1.5 Additional Power Saving..................................................................... 5-4
5.1.6 Sleep Mode.......................................................................................... 5-5
iv TABLE of Contents
6. Bus Interface
6.1 Bus Control General.........................................................................................6-1
6.1.1 Boot Width Selection...........................................................................6-2
6.1.2 SRAM and ROM Bus Access..............................................................6-2
6.1.3 DRAM Bus Access..............................................................................6-3
6.1.3.1 DRAM Row Address Bits Multiplexing.............................6-4
6.2 I/O Bus Control ................................................................................................6-5
6.2.1 I/O Bus Control....................................................................................6-6
6.3 Bus Control Register BCR ...............................................................................6-7
6.4 Memory Control Register MCR.....................................................................6-11
6.4.1 MEMx Parity Disable ........................................................................6-13
6.4.2 MEMx Wait Disable..........................................................................6-13
6.4.3 MEMx Byte Mode.............................................................................6-13
6.4.4 Power Down.......................................................................................6-13
6.4.5 IRAM Refresh Test............................................................................6-14
6.4.6 IRAM Refresh Rate ...........................................................................6-14
6.4.7 DRAM Type ......................................................................................6-14
6.4.8 Entry Table Map ................................................................................6-14
6.4.10 MEMx Bus Size...............................................................................6-14
6.5 Input Status Register ISR ...............................................................................6-15
6.6 Function Control Register FCR......................................................................6-16
6.7 Watchdog Compare Register WCR................................................................6-18
6.8 IO3 Control Modes.........................................................................................6-18
6.8.1 IO3Standard Mode.............................................................................6-18
6.8.2 Watchdog Mode.................................................................................6-18
6.8.3 IO3Timing Mode ...............................................................................6-19
6.8.4 IO3TimerInterrupt Mode ...................................................................6-19
6.9 Bus Signals.....................................................................................................6-20
6.9.1 Bus Signals for the GMS30C2232 Processor....................................6-20
6.9.2 Bus Signals for the GMS30C2216 Processor....................................6-21
6.9.3 Bus Signal Description.......................................................................6-22
6.10 Bus Cycles....................................................................................................6-27
6.10.1 MEMx Byte Mode =1......................................................................6-27
6.10.1.1 SRAM and ROM Single-Cycle Read Access.................6-27
6.10.1.2 SRAM and ROM Single-Cycle Write Access................6-27
6.10.1.3 SRAM and ROM Multi-Cycle Read Access..................6-28
6.10.1.4 SRAM Multi-Cycle Write Access..................................6-28
6.10.2 MEMx Byte Mode =0......................................................................6-29
6.10.2.1 SRAM Single-Cycle Read Access..................................6-29
6.10.2.2 SRAM Single-Cycle Write Access.................................6-29
6.10.2.3 SRAM Multi-Cycle Read Access...................................6-30
Table of Contents v
6.10.2.4 SRAM Multi-Cycle Write Access ................................. 6-30
6.10.3 MEM2 Read Access with WAIT Pin.............................................. 6-31
6.10.4 I/O Read Access.............................................................................. 6-32
6.10.5 I/O Read Access with WAIT Pin.................................................... 6-33
6.10.6 I/O Write Access............................................................................. 6-34
6.10.7 DRAM.............................................................................................6-35
6.10.7.1 Fast Page Mode DRAM Access..................................... 6-35
6.10.7.2 EDO DRAM Single-Cycle Access................................. 6-36
6.10.7.3 EDO DRAM Multi-Cycle Access.................................. 6-37
6.10.7.4 DRAM Refresh(CAS Before RAS Refresh.................... 6-38
6.10 DC Characteristics.......................................................................................6-39
7. Mechanical Data
7.1 GMS30C2232, 160-Pin MQFP-Package....................................................... 7-1
7.1.1 Pin Configuration - View from Top Side............................................ 7-1
7.1.2 Pin Cross Reference by Pin Name ...................................................... 7-2
7.1.3 Pin Cross Reference by Location........................................................ 7-3
7.2 GMS30C2232, 144-Pin TQFP-Package........................................................ 7-4
7.2.1 Pin Configuration - View from Top Side............................................ 7-4
7.2.2 Pin Cross Reference by Pin Name ...................................................... 7-5
7.2.3 Pin Cross Reference by Location........................................................ 7-6
7.3 GMS30C2216, 100-Pin TQFP-Package........................................................ 7-7
7.3.1 Pin Configuration - View from Top Side............................................ 7-7
7.3.2 Pin Cross Reference by Pin Name ...................................................... 7-8
7.3.3 Pin Cross Reference by Location........................................................ 7-9
7.4 Package-Dimensions...................................................................................... 7-10
Appendix. Instruction Set Detail
Overview 0-1
0. Overview
0.1 GMS30C2216/32 RISC/DSP
The HME GMS30C2232 and GMS30C2216 RISC/DSP is an improved version of HME’s existing GMS30C2132 and GMS30C2116 RISC/DSP. Using a 0.35 µm CMOS technology, the performance of the RISC/DSP could be further improved. Being pin­compatible to their predecessors, these new RISC/DSP can be used as a direct replacement in existing customer’s designs.
The GMS30C2216 and GMS30C2232 RISC/DSP are based on hyperstone architecture.
Improved Points
Maximum Operating Frequency : 108MHz @3.3V
Operating Voltage : 3. 3V ± 0.3V
8KByte on-chip memory
On chip Phased Locked Loop circuit (x0.5, x1, x2, x4)
Boot bus width selectable by two external pins
Wait Pin Function
On chip DRAM controller : FPM(Fast-Page-Mode), (Extended-Data-Out) EDO DRAMs.
5.0V Tolerant Input
Control CLKOUT pin Function
This combination of a high-performance RISC microprocessor with an additional powerful DSP instruction set and on-chip microcontroller functions offers a high throughput. The speed is obtained by an optimized combination of the following features:
Pipelined memory access allows overlapping of memory accesses with execution.
8KByte on-chip memory.
On-chip instruction cache omits instruction fetch in inner loops and provides prefetch.
Variable-length instructions of 16, 32 or 48 bits provide a large, powerful instruction set,
thereby reducing the number of instructions to be executed.
Primarily used 16-bit instructions halve the memory bandwidth required for instruction
fetch in comparison to conventional RISC architectures with fixed-length 32-bit instructions, yielding also even better code economy than conventional CISC architectures.
Orthogonal instruction set
Most instructions execute in one cycle.
Pipelined DSP instructions.
Parallel execution of ALU and DSP instructions.
Single-cycle halfword multiply-accumulate operation.
Fast Call and Return by parameter passing via registers.
0-2 CHAPTER 0
An instruction pipeline depth of only two stages — decode/execute — provides
branching without insertion of wait cycles in combination with Delayed Branch instructions.
Range and pointer checks are performed without speed penalty, thus, these checks need
no longer be turned off, thereby providing higher runtime reliability.
Separate address and data buses provide a throughput of one 32-bit word each cycle. The features noted above contribute to reduce the number of idle wait cycles to a bare
minimum. The processor is designed to sustain its execution rate with a standard DRAM memory.
The low power consumption is of advantage for mobile (portable) applications or in temperature-sensitive environments.
Most of the transistors are used for the on-chip memory, the instruction cache, the register stack and the multiplier, whereas only a smallnumber is required for the control logic.
Due to their low system cost, the GMS30C2216 and GMS3OC2232 RISC/DSP are very well suited for embedded-systems applications requiring high performance and lowest cost. To simplify board design as well as to reduce system costs, the GMS30C2216 and GMS30C2232 already come with integrated periphery, such as a timer and memory and bus control logic. Therefore, complete systems with the HME’s microprocessor can be implemented with a minimum of external components. To connect any kind of memory or I/O, no glue logic is necessary. It is even suitable for systems where up to now microprocessors with 16-bit architecture have been used for cost reasons. Its improved performance compared to conventional microcontrollers can be used to software-substitute many external peripherals like graphics controllers or DSPs.
The software development tools include an optimizing C compiler, assembler, source-level debugger with profiler as well as a real-time kernel with an extremely fast response time. Using this real-time kernel, up to 31 tasks, each with its own virtual timer, can be developed independently of each other. The synchronization of these tasks is effected almost automatically by the real-time kernel. To the developer, it seems as if he has up to 31 HME’s microprocessors to which he can allocate his programs accordingly. Real-time debugging of multiple tasks is assisted in an optimized way.
The following description gives a brief architectural overview:
Compatibility:
Pin compatible to HME GMS30C2116/32, and hyperstone E1-16/32
Pin and Function Compatible to hyperstone E1-16/32X
PLL(Phased Locked Loop):
An internal phased locked loop circuit (PLL) provides clock rate multiplication by a
factor of four, only an external crystal of 27MHz is required to achieve an internal clock rate of 108MHz.
Overview 0-3
Registers:
32 global and 64 local registers of 32 bits each
16 global and up to 16 local registers are addressable directly
Flags:
Zero(Z), negative(N), carry(C) and overflow(V) flag
Interrupt-mode, interrupt-lock, trace-mode, trace-pending, supervisor state, cache-mode
and high global flag
Register Data Types:
Unsigned integer, signed integer, signed short, signed complex short, 16-bit fixed-point,
bitstring, IEEE-754 floating-point, each either 32 or 64 bits
External Memory:
Address space of 4Gbytes, divided into five areas
Separate I/O address space
Load/Store architecture
Pipelined memory and I/O accesses
High-order data located and addressed at lower address (big endian)
Instructions and double-word data may cross DRAM page boundaries
On-chip Memory:
8Kbytes internal (on-chip) memory
Memory Data Types:
Unsigned and signed byte (8 bit)
Unsigned and signed halfword (16 bit), located on halfword boundary
Undedicated word (32 bit), located on word boundary
Undedicated double-word (64 bit), located on word boundary
Runtime Stack:
Runtime stack is divided into memory part and register part
Register part is implemented by the 64 local registers holding the most recent stack
frame(s)
Current stack frame (maximum 16 registers) is always kept in register part of the stack
Data transfer between memory and register part of the stack is automatic
Upper stack bound is guarded
0-4 CHAPTER 0
Instruction Cache:
An on-chip instruction cache reduces instruction memory access substantially
Instructions General:
Variable-length instructions of one, two or three halfwords halve required memory
bandwidth
Pipeline depth of only two stages, assures immediate refill after branches
Register instructions of type "source operator destination destination" or "source operator immediate destination"
All register bits participate in an operation
Immediate operands of 5, 16 and 32 bits, zero- or sign-expanded
Large address displacement of up to 28 bits
Two sets of signed arithmetical instructions: instructions set or clear either only the
overflow flag or trap additionally to a Range Error routine on overflow
DSP instructions operate on 16-bit integer, real and complex fixed-point data and 32-bit integer data into 32-bit and 64-bit hardware accumulators
Instruction Summary:
Memory instructions pipelined to a depth of two stages, trap on address register equal to zero (check for invalid pointers)
Memory address modes: register address, register postincrement, register + displacement (including PC relative), register postincrement by displacement (next address), absolute, stack address, I/O absolute and I/O displacement
Load, all data types, bytes and halfwords right adjusted and zero- or sign-expanded, execution proceeds after Load until data is needed
Store, all data types, trap when range of signed byte or halfword is exceeded
Move, Move immediate, Move double-word
Logical instructions AND, AND not, OR, XOR, NOT, AND not immediate, OR
immediate, XOR immediate
Mask source and immediate destination
Add unsigned/signed, Add signed with trap on overflow, Add with carry
Add unsigned/signed immediate, Add signed immediate with trap on overflow
Sum source + immediate destination, unsigned/signed and signed with trap on
overflow
Subtract unsigned/signed, Subtract signed with trap on overflow, Subtract with carry
Negate unsigned/signed, Negate signed with trap on overflow
Multiply word * word low-order word unsigned or signed, Multiply word * word
double-word unsigned and signed
Overview 0-5
Divide double-word by word quotient and remainder, unsigned and signed
Shift left unsigned/signed, single and double-word, by constant and by content of
register, Shift left signed by constant with trap on loss of high-order bits
Shift right unsigned and signed, single and double-word, by constant and by content of
register
Rotate left single word by content of register
Index Move, move an index value scaled by 1, 2, 4 or 8, optionally with bounds check
Check a value for an upper bound specified in a register or check for zero
Compare unsigned/signed, Compare unsigned/signed immediate
Compare bits, Compare bits immediate, Compare any byte zero
Test number of leading zeros
Set Conditional, save conditions in a register
Branch unconditional and conditional (12 conditions)
Delayed Branch unconditional and conditional (12 conditions)
Call subprogram, unconditional and on overflow
Trap to supervisor subprogram, unconditional and conditional (11 conditions)
Frame, structure a new stack frame, include parameters in frame addressing, set frame
length, restore reserve frame length and check for upper stack bound
Return from subprogram, restore program counter, status register and return-frame
Software instruction, call an associated subprogram and pass a source operand and the
address of a destination operand to it
DSP Multiply instructions: signed and/or unsigned multiplication single and double word product
DSP Multiply-Accumulate instructions: signed multiply-add and multiply-subtract single and double word product sum and difference
DSP Halfword Multiply-Accumulate instructions: signed multiply-add operating on four halfword operands single and double word product sum
DSP Complex Halfword Multiply instruction: signed complex halfword multiplication real and imaginary single word product
DSP Complex Halfword Multiply-Accumulate instruction: signed complex halfword multiply-add real and imaginary single word product sum
0-6 CHAPTER 0
DSP Add and Subtract instructions: signed halfword add and subtract with and without fixed-point adjustment single word sum and difference
Floating-point instructions are architecturally fully integrated, they are executed as Software instructions by the present version. Floating-point Add, Subtract, Multiply, Divide, Compare and Compare unordered for single and double-precision, and Convert
single double are provided.
Exceptions:
Pointer, Privilege, Frame and Range Error, Extended Overflow, Parity Error, Interrupt and Trace mode exception
Watchdog function
Error-causing instructions can be identified by backtracking, thus allowing a very
detailed error analysis
Timer:
Two multifunctional timers
Bus Interface:
Separate address bus of 26 (GMS30C2232) or 22 (GMS30C2216) bits and data bus of up to 32 (GMS30C2232) or 16 bits (GMS30C2216) provide a throughput of four or two bytes at each clock cycle
Data bus width of 32, 16 or 8 bits, individually selectable for each external memory area.
8-bit, 16-bit, and 32-bit boot width selectable via two external pins.
5V tolerant input
Configurable I/O pins
Internal generation of all memory and I/O control signals
Wait pin function for I/O accesses to peripheral devices.
Wait pin function for memory accesses to address space MEM2.
On-chip DRAM controller supporting Fast-Page-Mode DRAMs and EDO DRAMs.
Up to seven vectored interrupts
Control function for CLKOUT pin.
Power Management:
Operating voltage : 3.3V ± 0.3V.
Lower power supply current in power-down mode.
Clock-Off function to further reduce power dissipation (Sleep Mode)
Overview 0-7
DataBus Parity
Bus Interface
Control Unit
Bus Pipeline
Control
32
26
8 kByte
RAM
12
Execution
32
(22)
4
64 Local
26 Global
Y-Decode
0.2 Block Diagram
Register Set
X Y PC
X Y
ALU
Barrel shifter
Z W A
X-Decode
Instruction
Load
Instruction
Cache
Decode
Cache
Control
Instruction
Decode
I
X Y
DSP
Instruction
Control Unit
Execution
Unit
Hardware-
Multiplier
Instruction Prefetch
Control Unit
Store Data
Pipeline
(16)
(2)
Figure 0.1: Block Diagram
Address
Bus
Memory Address
Pipeline
Watchdog
Power
Down+
Reset
Control
Internal
Timer
Interrupt
control
4
Control
Bus
0-8 CHAPTER 0
213456789
101112131415161718192021222324
25
108
107
106
105
104
103
102
101
10099989695
949392919089888786
85
84
97
VCC
GND
IO3
IOWR#
CS3#
CS2#
CS1#
GND
RAS#
A19
VCC
A20
A21
GND
D31
D30
D29
A9
A10
A11
A12
VCC
D28
D27
D26
GND
WE2# /BE2#
IORD#
OE#
VCC
CAS3#
CAS2#
CAS1#
GND
XTAL1/CLKIN
XTAL2
IO2
VCC
D16
D17
D18A3A2A1A0
GND
DP1
DP0
838281
BOOTW
CLKOUT
IO1
GND
RQST
INT4
INT3 /WAIT
INT2
INT1
GND
VCC
2627282930313233343536
GND
D25
D15
D14
VCC
D13
D12
D11
D10
GND
VCC
VCC
WE3# /BE3#NCNCNCNC
109
110
111
112
113
114
115
116
117
118
119
120
37383940NCNCNC
NC
0.3 Pin Configuration
0.3.1 GMS30C2232, 160-Pin MQFP-Package - View from Top Side
VCC
GND
NC
NC WE# GND
A13
ACT
VCC
GND
A14
CAS0#
VCC
WE1#/BE1# WE0#/BE0#
GND
A4 A5 A6
VCC
A7 A8
A22
VCC
GND
A23 A24
GND
A25 A15 A16
VCC
GND
A17 A18
BOOTB
NC
NC GND
VCC
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
GMS30C2232
VCC
80
GND
79 78
NC
77
NC VCC
76
GRANT#
75
RESET#
74
GND
73
VCC
72 71
DP3
70
DP2
69
D19
68
GND D20
67 66
D21 GND
65 64
VCC
63
D0
62
D1 D2
61
VCC
60
D3
59
D4
58
D5
57
GND
56
D22
55
D23
54
VCC
53 52
D24
51
D6
50
GND
49
VCC
48
D7
47
D8
46
GND
45
D9
44
NC
43
NC
42
GND
41
VCC
Figure 0.2: GMS30C2232, 160-Pin MQFP-Package
Overview 0-9
0.3.2 Pin Cross Reference by Pin Name
Signal Location Signal Location Signal Location Signal Location
A0...................97 D3 .....................59 GND ................. 50 NC................... 118
A1...................98 D4 .....................58 GND ................. 56 NC...................123
A2...................99 D5 .....................57 GND ................. 65 NC...................124
A3.................100 D6 .....................51 GND ................. 68 NC...................157
A4.................137 D7 .....................48 GND ................. 73 NC...................158
A5.................138 D8 .....................47 GND ................. 79 OE #................ 113
A6.................139 D9 .....................45 GND ................. 82 RAS# ................ 11
A7.................141 D10 ...................36 GND ................. 90 RESET#............74
A8.................142 D11....................35 GND ................. 96 RQST................89
A9...................20 D12 ...................34 GND ............... 108 VCC....................1
A10.................21 D13 ...................33 GND ................119 VCC..................13
A11.................22 D14 ...................31 GND ............... 122 VCC..................24
A12.................23 D15 ...................30 GND ............... 126 VCC..................32
A13...............127 D16 .................103 GND ............... 130 VCC..................40
A14...............131 D17 .................102 GND ............... 136 VCC..................41
A15...............150 D18 .................101 GND ............... 145 VCC..................49
A16...............151 D19 ...................69 GND ............... 148 VCC..................53
A17...............154 D20 ...................67 GND ............... 153 VCC..................60
A18...............155 D21 ...................66 GND ............... 159 VCC..................64
A19.................12 D22 ...................55 GRANT#........... 75 VCC ..................72
A20.................14 D23 ...................54 INT1.................. 85 VCC..................76
A21.................15 D24 ...................52 INT2.................. 86 VCC..................80
A22...............143 D25 ...................29 INT3/WAIT........ 87 VCC ..................81
A23...............146 D26 ...................27 INT4.................. 88 VCC................104
A24...............147 D27 ...................26 IO1.................... 91 VCC................ 112
A25...............149 D28 ...................25 IO2.................. 105 VCC................120
ACT..............128 D29 ...................19 IO3...................... 5 VCC................121
BOOTB.........156 D30 ...................18 IORD# .............114 VCC ................129
BOOTW..........93 D31 ...................17 IOWR#................ 6 VCC ................133
CAS0#.......... 132 DP0...................94 NC ...................... 3 VCC ................140
CAS1#.......... 109 DP1...................95 NC ...................... 4 VCC ................144
CAS2#.......... 110 DP2...................70 NC .................... 37 VCC ................152
CSS3#...........111 DP3...................71 NC.................... 38 VCC................160
CLKOUT.........92 GND....................2 NC .................... 43 WE#................125
CS1#................9 GND..................10 NC .................... 44 WE0#/BE0#....135
CS2#.................8 GND..................16 NC .................... 77 WE1#/BE1#....134
CS3#................7 GND..................28 NC .................... 78 WE2#/BE2#.... 115
D0...................63 GND ..................39 NC.................... 83 WE3#/BE3#....116
D1...................62 GND ..................42 NC.................... 84 XTAL1/CLKIN .107
D2...................61 GND ..................46 NC...................117 XTAL2.............106
0-10 CHAPTER 0
Power. Connected to the power supply. It can be 3.3V power
Ground. Connected to the system ground. All GND pins must
Input for Quartz Clock. When the clock is generated by
to
Address Bus. With the GMS30C2232, only A22..A0 are
Row Address Strobe. RAS# is activated when the processor accesses a DRAM or refresh cycle. When a SRAM is placed in
M for
Write Enable. Active low indicates a write access, active high
Chip Select. Active low of CS1#..CS3# indicates chip select
SRAM Write Enable. Active low indicates write enable for the
I/O Read Strobe, optionally I/O Data Strobe. The use of
Bus Grant. GRANT# is signaled low by an bus arbiter to grant
master. ACT is signaled high when GRANT# is
Interrupt Request A signal of INT1..INT4 interrupt request pins causes an interrupt exception when interrupt lock flag L is
Output Port. IO1..IO3 can be individually
configured via IOxDirection bits in the FCR as either input or
T# low resets the processor to the initial
state and halts all activity. RESET# must be low for at least
0.3.3 Pin Function
Type Name State Use
Power VCC I
GND I
Clock XTAL1 I
XTAL2 O Output for Quartz Clock. CLKOUT O Clock Signal Output. It can be used to supply a clock signal
Address Bus A25..A0 O/Z
Data Bus D31..D0 I/O Data Bus. 32-bit bidirectional data bus
DP0..DP3 I/O Data Parity Signal. Bidirectional parity signals
Bus Control RAS# O/Z
supply.
be connected to the system ground.
external clock generator, XTAL1 is used as clock input.
peripheral devices.
connected to the address bus pins
CAS0#..CAS3# O/Z Column Address Strobe. They are only used by a DRA
WE# O/Z
CS1#..CS3# O/Z
WE0#..WE3#
OE# O/Z Output Enable for SRAMs and EPROMs. IORD# O/Z
IOWR# O/Z I/O Write Strobe.
Bus Control RQST O RQST signals the request for a memory or I/O access
GRANT# I
ACT O Active as bus
Interrupt INT1..INT4 I
O/Z
MEM0, RAS# is used as the chip select signal
column access cylices and for “CAS before RAS” refresh.
indicates a read access.
for the memory areas MEM1..MEM3.
corresponding byte.
IORD# is specified in the I/O address bit 10.
access to the bus for memory and I/O cycles
low and it is kept high during a current bus access
I/O Port IO1..IO3 I/O General Input-
System Control
RESET# I Reset Processor. RESE
clear and the corresponding INTxMask bit in FCR is not set.
output pins (port).
two cycles
ARCHITECTURE 1-1
1. Architecture
1.1 Introduction
1.1.1 RISC Architecture
In the early days of computer history, most computer families started with an instruction set which was rather simple. The main reason for being simple then was the high cost for hardware. The hardware cost has dropped and the software cost has gone up steadily in the past three decades.
The net result is that more and more functions have been built into the hardware, making the instruction set very large and very complex. The growth of instruction sets was also encouraged by the popularity of microprogrammed control in the 1960s and 1970s. Even user-defined instruction sets were implemented using microcodes in some processors for special-purpose applications.
The evolution of computer architectures has been dominated by families of increasingly complex processors. Under market pressures to preserve existing software, Complex Instruction Set Computer (CISC) architectures evolved by the gradual addition of microcode and increasingly elaborate operations. The intent was to supply more support for high-level languages and operating systems, as semiconductor advances made it possible to fabricate more complex integrated circuits. It seemed self-evident that architectures should become more complex as these technological advances made it possible to hold more complexity on VLSI devices.
In recent years, however, Reduced Instruction Set Computer (RISC) architectures have implemented a much more sophisticated handling of the complex interaction between hardware, firmware and software. RISC concepts emerged from statistical analysis of how software actually uses the resources of a processor. Dynamic measurement of system kernels and object modules generated by optimizing compilers show an overwhelming predominance of the simplest instruction, even in the code for CISC machine. Complex instructions are often ignored because a single way of performing a complex operation needs of high-level language and system environments. RISC designs eliminate the microcoded routines and turn the low-level control of the machine over to software.
This approach is not new. But its application is more universal in recent years thanks to the prevalence of high-level languages, the development of compilers that can optimize at the microcode level, and dramatic advances in semiconductor memory and packaging. It is now feasible to replace machine microcode ROM with faster RAM, organized as an instruction cache. Machine control then resides in the instruction cache and is, in fact, customized on the fly. The instruction stream generated by system- and compiler-generated code provides a precise fit between the requirements of high-level software and the capabilities of the hardware. So compilers are playing a vital role in RISC performance.
The advantage of RISC architecture is described as follows:
Simplicity made VLSI implementation possible and thus higher clock rates.
Hardwired control and separated data and program caches lower the average CPI
(Cycles per Instruction) significantly.
1-2 CHAPTER 1
Dynamic instruction count in a RISC program only increased slightly (less than 2)
inordinary program.
Recently, the MIPS (Million Instructions per Second) rate of a typical RISC
microprocessor increased with a factor of 5/(2*0.1) = 25 times from that of a typical CISC microprocessor.
The clock rate increased from 10 MHz on a CISC processor to 50 MHz on a CMOS/
RISC microprocessor.
The instruction count in a typical RISC program increased less than 2 times form that of
a typical CISC program.
The average CPI for a RISC microprocessor decreased to 1.2 (instead of 12 as in a
typical CISC processor).
1.1.2 Techniques to reduce CPI (Cycles per Instruction)
If the work each instruction performs is simple and straightforward, the time required to execute each instruction can be shortened and the number of cycles reduced. The goal of RISC designs has been to achieve an execution rate of one instruction per machine cycle (multiple-instruction-issue designs now seek to increase this rate to more than one instruction per cycle). Techniques that help achieve this goal include:
Instruction pipelines
Load and store (load/store) architecture
Delayed load instructions
Delayed branch instructions
(1) Instruction Pipelines One way to reduce the number of cycles required to execute an instruction is to overlap the
execution of multiple instructions. Instruction pipelines divide the execution of each instruction into several discrete portions and then execute multiple instructions simultaneously. The instruction pipeline technique can be likened to an assembled line ­the instruction progresses from one specialized stage to the next until it is complete (or issued) - just as an automobile moves along an assembly line. (This is contrast to the nonpipeline, microcode approach, where all the work is done by one general unit and is less capable at each individual task.) For example, the execution of an instruction might be subdivided into four portions, or clock cycles, as shown in Figure 1.1:
Cycle
#1
Fetch
Instruction
(F)
Cycle
#2
ALU
Operation
(A)
Cycle
#3
Access
Memory
(M)
Cycle
#1
Write
Results
(W)
Figure 1.1: Functional Division of a Hypothetical Pipeline
ARCHITECTURE 1-3
An Instruction pipeline can potentially reduce the number of cycles/instructions by a factor equal to the depth of the pipeline (the depth of the pipeline = the number of resource). For example, in Figure 3.2 each instruction still requires a total of four clock cycles to execute. However, if the four-level instruction-pipeline is used, a new instruction can be initiated at each clock cycle and the effective execution rate is one cycle per instruction.
Clock Cycles
Instruction
#1
F A M W
#2
F A M W
#3
F A M W
#4
F A M W
Figure 1.2: Multiple Instructions in a Hypothetical Pipeline
(2) Load/Store Architecture The discussion of the instruction pipeline illustrates how each instruction can be
subdivided into several discrete parts that permit the processor to execute multiple instructions in parallel. For this technique to work efficiently, the time required to execute each instruction subpart should be approximately equal. If one part requires an excessive length of time, there is an unpleasant choice: either halting the pipeline (inserting wait or idle cycles), or making all cycles longer to accommodate this lengthier portion of the instruction.
Instructions that perform operations on operands in memory tend to increase either the cycle time or the number of cycles/instruction. Such instruction require additional time for execution to calculate the addresses of the operands, read the required operands from memory, calculate the result, and store the results of the operation back to memory. To eliminate the negative impact of such instruction, RISC designs implement a load and store (load/store) architecture in which the processor has many register, all operations are performed on operands held in processor registers, and main memory is accessed only by
load and store instructions.
This approach produces several benefits
Reducing the number of memory accesses eases memory bandwidth requirements
Limiting all operations to registers helps simplicity the instruction set
Eliminating memory operations makes it easier for compilers to optimize register
allocation - this further reduces memory accesses and also reduces the instructions/task factor
1-4 CHAPTER 1
All of these factors help RISC design approach their goal of executing one cycle/instruction. However, two classes of instructions hinder achievement of this goal ­load instructions and branch instructions. The following sections discuss how RISC designs overcome obstacles raised by these classes of instructions.
(3) Delayed Load Instructions Load instruction read operands from memory into processor register for subsequent
operation by other instructions. Because memory typically operates at much slower speeds than processor clock rates, the loaded operand is not immediately available to subsequent instructions in an instruction pipeline. The data dependency is illustrated in Figure 1.3.
Load
Instruction
1
Figure 1.3: Data Dependency Resulting From a Load Instruction
F A M W
2
F A M W
3
F A M W
4
F A M W
Data from Load
available as operation
In this illustration, the operand loaded by instruction 1 is not available for use in the A cycle (ALU, or Arithmetic/Logic Unit operation) of instruction 2. One way to handle this dependency is to delay the pipeline by inserting additional clock cycles into the execution of instruction 2 until the loaded data becomes available. This approach obviously introduces delays that would increase the cycles/instructions factor.
In many RISC design the technique used to handle this data dependency is to recognize and make visible to compilers the fact that all load instructions have an inherent latency or load delay. Figure 3.3 illustrates a load delay or latency of one instruction. The instruction that immediately follows the load is in the load delay slot. If the instruction in this slot does not require the data from the load, and then no pipeline delay is required.
If this load delay is made visible to software, a compiler can arrange instructions to ensure that there is no data dependency a load instruction and the instruction in the load delay slot. The simplest way of ensuring that there is no data dependency is to insert a No Operation (NOP) instruction to fill the slot, as follow:
Load R1, A Load R2, B NOP <= This instruction fills the delay slot ADD R3, R1, R2
Although filling the delay slot with NOP instructions eliminates the need for hardware­controlled pipeline stalls in this case, it still is not a very efficient use of the pipeline stream
ARCHITECTURE 1-5
since these additional NOP instructions increase code size and perform no useful work. (In practice, however, this technique need not have much negative impact on performance.)
A more effective solution to handling the data dependency is to fill the load delay slot with a useful instruction. Good optimizing compilers can usually accomplish this, especially if the load delay is only one instruction. Below example program illustrates how a compiler might rearrange instruction to handle a potential data dependency.
# Consider the code for C := A+B; F := D Load R1, A Load R2, B Add R2, R1, R2 <= This instruction stalls because R2 data is not available Load R4, D
..... ....
# An alternative code sequence (where delay length = 1) Load R1, A Load R2, B Load R4, D Add R3, R1, R2 <= No stall since R2 data is available
(4) Delayed Branch Instructions Branch instructions usually delay the instruction pipeline because the processor must
calculate the effective destination of the branch and fetch that instruction. When a cache access requires an entire cycle, and the fetched branch instruction specifies the target address, it is impossible to perform this fetch (of the destination instruction) without delaying the pipeline for at least one pipe stage (one cycle). Conditional branches can cause further delays because they require the calculation of a condition, as well as the target address.
Instead of stalling the instruction pipeline to wait for the instruction at the target address, RISC designs typically use an approach similar to that used with Load instruction: Branch instructions are delayed and do not take effect until after one or more instructions immediately following the Branch instruction have been executed. The instruction or instructions immediately following the Branch instruction (delay instruction) have been executed. Branch and delayed branch instruction are illustrated in Figure 1.4
Condition ?
Delayed Branch
Condition ?
NO
YES
Branch Target
Delay Instruction
Next Instruction
Next Instruction
Branch Instruction Delayed Branch Instruction
Figure 1.4: Block Diagram of Branch/Delayed Branch Instruction
YES
NO
Branch Target
1-6 CHAPTER 1
1. The instruction is read from the instruction cache
The control signal of Rd (destination operand) and Rs
(source operand) is activated according to the instruction
2.1 The control signal of IR (immediate register
ess of next instruction is calculated and saved
ister stack using the
2.1 The control of ALU datapath is made and instruction
in the register
Additional ALU operation is continued and its result is
1.1.3 The pipeline structure of GMS30C2232
GMS30C2232 has a two-stage pipeline structure and each stage is composed of two phases (TM and TV). The basic structure of GMS30C2232 pipeline is two-stage pipeline, but actually it is lengthened by the need of some instruction. As a example, standard ALU instruction uses 5 phases (2 stage pipeline (4 phases) + additional 1 phase). This additional phase doesn’t use the datapath which is used next instruction, so next instruction execution need not wait until previous ALU instruction is ended. DSP instruction takes over 2 stage pipeline for execution, and requires same resource in the datapath which is required to next DSP instruction. So next DSP instruction is delayed.
The pipeline structure of GMS30C2232 and the action of datapath is described in Table 1.1.
Stage Phase Datapath Action
Fetch/Decode TM (Low)
TV (High) 2.
according to the address of instruction.
that was loaded in TM phase
(operand)) and IL (instruction length) is activated.
2.2 The addr in PC
Execute/Write TM (Low) 1. The next instruction is read from the instruction cache.
1.1 The address of Rs and Rs are determined.
1.2 The immediate operand is determined.
1.3 The operand is read from reg address of Rs and Rd.
1.4 The operand XR, YR and QR are controlled.
TV (High) 2. The input data of ALU is attained.
is executed in ALU.
2.2 The result of ALU operation is saved file.
Additional
Insertion
Next TM
saved in the register file.
Table 1.1: The pipeline structure of GMS30C2232 and the action of datapath.
ARCHITECTURE 1-7
1.2 Global Register Set
The architecture provides 32 global registers of 32bit each. These are:
G0 Program Counter PC G1 Status Register SR G2 Floating-point Exception Register FER G3..G15 General purpose registers G16..G17 Reserved G18 Stack Pointer SP G19 Upper stack Bound UB G20 Bus Control Register BCR (see section 6. Bus Interface) G21 Timer Prescaler Register TPR (see section 5. Timer and CPU Clock
Modes)
G22 Timer Compare Register TCR (see section 5. Timer and CPU Clock
Modes)
G23 Timer Register TR (see section 5. Timer and CPU Clock Modes) G24 Watchdog Compare Register WCR (see section 6. Bus Interface) G25 Input Status Register ISR (see section 6. Bus Interface) G26 Function Control Register FCR (see section 6. Bus Interface) G27 Memory Control Register MCR (see section 6. Bus Interface) G28..G31 Reserved
Registers G0..G15 can be addressed directly by the register code (0..15) of an instruction. Registers G18..G27 can be addressed only by a MOV or MOVI instruction with the high global flag H set to 1.
(Example) MOVI G2, 0x20 ; G2 := 0x20 (set H flag) MOV G3, G19 ; G3 := G19 (G19 (UB) is copied to G3)
1-8 CHAPTER 1
G15
G16
G17
G18
G19
G20
G21
G22
G23
G24
G25
G26
G27
G28
031
G0 G1 G2 G3
Program Counter PC
Status Register SR
Floating-Point Exception Register FER
General Purpose Registers G3..G15
Reserved Reserved
Stack Pointer SP
Upper Stack Bound UB
Bus Control Register BCR
Timer Prescaler Register TPR
Timer Compare Register TCR
Timer Register TR
Watchdog Compare Register WCR
Input Status Register ISR
0
0000
Function Control Register FCR Memory Control Register MCR
G28..G31 Reserved
G31
Figure 1.5: Global Register Set
1.2.1 Program Counter PC, G0
G0 is the program counter PC. It is updated to the address of the next instruction through instruction execution. Besides this implicit updating, the PC can also be addressed like a regular source or destination register. When the PC is referenced as an operand, the supplied value is the address of the first byte after the instruction which references it (the address of next instruction), except when referenced by a delay instruction with a preceding delayed branch taken. At delay branch instruction, when the branch condition is met, place the branch address PC + rel (relative to the address of the first byte after the Delayed Branch Instruction) in the PC (see section 1.26. Delayed Branch Instructions).
Placing a result in the PC has the effect of a branch taken. When branch is taken, the target address of branch is placed in PC.
Bit zero of the PC is always zero, regardless of any value placed in the PC.
ARCHITECTURE 1-9
1.2.2 Status Register SR, G1
G1 is the status register SR. Its content is updated by instruction execution. Besides this implicit updating, the SR can also be addressed like a regular register (when H flag is set). When addressed as source or destination operand, all 32 bits are used as an operand. However, only bits 15..0 of a result can be placed in bits 15..0 of the SR, bits 31..16 of the result are discarded and bits 31..16 of the SR remain unchanged. When SR addressed as source operand, it represents 0x0 value. The full content of the SR is replaced only by the Return Instruction. A result placed in the SR overrules any setting or clearing of the condition flags as a result of an instruction.
31 30 27 26 25 24 23 22 21 20 19 18 17 16
Figure 1.6: Status Register SR (bits 31..16)
15 14 11 10 9 8 7 6 5 4 3 2 1 0
L I
FRM
2829 FP
Frame Pointer Frame Length
1213
FTE V N
FL S
ILC
Instruction-Length Code
MH
P
Supervisor State Flag
Z
T
Trace-Mode Flag
Trace Pending Flag
C
Carry Flag
Zero Flag
Interrupt-Mode Flag
Floating-Point Trap Enable
Floating-Point Rounding Mode
Interrupt-Lock Flag
Figure 1.7: Status Register SR (bits 15..0)
Negative Flag
Overflow Flag
Cache-Mode Flag
High Global Flag
Reserved
1-10 CHAPTER 1
The status register SR contains the following status information: C Carry Flag. Bit zero is the carry condition flag C. In general, when set it
indicates that the unsigned integer range is exceeded (overflow). At add operations, it indicates a carry out of bit 31 of the result. At subtract operations, it indicates a borrow (inverse carry) into bit 31 of the result.
Z Zero Flag. Bit one is the zero condition flag Z. When set, it indicates that all 32
or 64 result bits are equal to zero regardless of any carry, borrow or overflow.
N Negative Flag. Bit two is the negative condition flag N. On compare
instructions, it indicates the arithmetic correct (true) sign of the result regardless of an overflow. On all other instructions, it is derived from result bit 31, which is the true sign bit when no overflow occurs. In the case of overflow, result bit 31 and N reflect the inverted sign bit.
V Overflow Flag. Bit three is the overflow condition flag V. In general, when set
it indicates a signed overflow. At the Move instructions, it indicates a floating­point NaN (Not a Number).
M Cache-Mode Flag. Bit four is the cache-mode flag M. Besides being set or
cleared under program control, it is also automatically cleared by a Frame instruction and by any branch taken except a delayed branch. See section
1.8. Instruction Cache for details.
H High Global Flag. Bit five is the high global flag H. When H is set, denoting
G0..G15 addresses G16..G31 instead. Thus, the registers G18..G27 may be addressed by denoting G2..G11 respectively. The H flag is effective only in the first cycle of the next instruction after it was set; then it is cleared automatically. Only the MOV or MOVI instruction issued as the next instructions must be used to copy the content of a local register or an immediate value to one of the high global registers. The MOV instruction may be used to copy the content of a high global register (except the BCR, TPR, FCR and MCR register, which are write-only) to a local register. With all other instructions, the result may be invalid. If one of the high global registers is addressed as the destination register in user state (S = 0), the condition flags are undefined, the destination register remains unchanged and a trap to Privilege Error occurs.
Reserved Bit six is reserved for future use. It must always be zero. I Interrupt-Mode Flag. Bit seven is the interrupt-mode flag I. It is set
automatically on interrupt entry and reset to its old value by a Return instruction. The I flag is used by the operating system; it must be never changed by any user program.
FTE Floating-Point Trap Enable Flag. Bits 12..8 are the floating-point trap enable
flags They determine the Exception type and Trap execution flow(see section
3.33.2. Floating-Point Instructions).
FRM Floating-Point Rounding Mode. Bits 14..13 are the floating-point rounding
modes (see section 3.33.2. Floating-Point Instructions).
Loading...
+ 290 hidden pages