HYNIX GMS30C2132 Datasheet

Jun 13, 2001 Ver. 3.1
16/32 BIT RISC/DSP
GMS30C2116 GMS30C2132
USER’S MANUAL
Revision 3.1 Published by
IDA Team in Hynix Semiconductor Inc.
¨Ï
Hynix Semiconductor 2001. All Right Reserved.
Hynix Offices in Korea or Distributors and Representatives listed at address directory may serve additional information of this manual.
Hynix reserves the right to make changes to any Information here in at any time without notice. The information, diagrams, and other data in this manual are correct and reliable; however, Hynix is
in no way responsible for any violations of patents or other rights of the third party generated by the use of this manual.
Specifications and information in this document are subject to change without notice and do not represent a commitment on the part of Hynix. Hynix reserves the right to make changes to improve functioning. Although the information in this document has been carefully reviewed, Hynix does not assume any liability arising out of the use of the product or circuit described herein.
Hynix does not authorize the use of the Hynix microprocessor in life support applications wherein a failure or malfunction of the microprocessor may directly threaten life or cause injury. The user of the Hynix microprocessor in life support applications assumes all risks of such use and indemnifies Hynix against all damages.
For further information please contact:
SEOUL OFFICE : Hynix Semiconductor YOUNG DONG Bldg. 891, Daechi-dong, Kangnam-gu, Seoul, Korea. PHONE : (02) 3459-3662~3 FAX : (02) 3459-3942 SYSTEM IC : 1, Hyangjeong-dong, Hungduk-gu, Cheongju, 361-725, Korea. PHONE : (0431) 270-4030~47 FAX : (0431) 270-4075
Copyright 2001 Hynix Semiconductor Inc. Revision Jun. 29, 2001
TABLE OF CONTENTS i
Table of Contents
0. Overview
0.1 GMS30C2116/32 RISC/DSP.............................................................................. 0-1
0.2 Block Diagram.................................................................................................... 0-6
0.3 Pin Configuration................................................................................................ 0-7
0.3.1 GMS30C2132, 160-Pin MQFP-Package - View from Top Side........ 0-7
0.3.2 Pin Cross Reference by Pin Name ...................................................... 0-8
0.3.3 Pin Fuction .......................................................................................... 0-9
1. Architecture
1.1 Introduction...................................................................................................... 1-1
1.1.1 RISC Architecture............................................................................... 1-1
1.1.2 Techniques to reduce CPI (Cycles per Instruction)............................. 1-2
1.1.3 The pipeline structure of GMS30C2132............................................. 1-6
1.2 Global Register Set..........................................................................................1-7
1.2.1 Program Counter PC, G0 .................................................................... 1-8
1.2.2 Status Register SR, G1........................................................................ 1-9
1.2.3 Floating-Point Exception Register FER, G2..................................... 1-12
1.2.4 Stack Pointer SP, G18....................................................................... 1-13
1.2.5 Upper Stack Bound UB, G19............................................................ 1-13
1.2.6 Bus Control Register BCR, G20 ....................................................... 1-13
1.2.7 Timer Prescaler Register TPR, G21.................................................. 1-14
1.2.8 Timer Compare Register TCR, G22.................................................. 1-14
1.2.9 Timer Register TR, G23.................................................................... 1-14
1.2.10 Watchdog Compare Register WCR, G24........................................ 1-14
1.2.11 Input Status Register ISR, G25 ....................................................... 1-14
1.2.12 Function Control Register FCR, G26..............................................1-14
1.2.13 Memory Control Register MCR, G27............................................. 1-15
1.3 Local Register Set.......................................................................................... 1-15
1.4 Privilege States .............................................................................................. 1-16
1.5 Register Data Types ....................................................................................... 1-17
1.6 Memory Organization .................................................................................... 1-18
1.7 Stack............................................................................................................... 1-20
1.8 Instruction Cache...........................................................................................1-25
1.9 On-Chip Memory (IRAM)............................................................................. 1-28
ii TABLE OF CONTENTS
2. Instructions General
2.1 Instruction Notation..........................................................................................2-1
2.2 Instruction Execution........................................................................................2-2
2.3 Instruction Formats...........................................................................................2-3
2.3.1 Table of Immediate Values..................................................................2-5
2.3.2 Table of Instruction Codes...................................................................2-6
2.3.3 Table of Extended DSP Instruction Codes ..........................................2-7
2.4 Entry Tables......................................................................................................2-8
2.5 Instruction Timing..........................................................................................2-12
3. Instruction Set
3.1 Memory Instructions ........................................................................................3-1
3.1.1 Address Modes.....................................................................................3-2
3.1.2 Load Instructions..................................................................................3-7
3.1.3 Store Instructions .................................................................................3-9
3.2 Move Word Instructions.................................................................................3-11
3.3 Move Double-Word Instruction .....................................................................3-11
3.4 Logical Instructions........................................................................................3-12
3.5 Invert Instruction ............................................................................................3-13
3.6 Mask Instruction.............................................................................................3-13
3.7 Add Instructions .............................................................................................3-14
3.8 Sum Instructions.............................................................................................3-16
3.9 Subtract Instructions.......................................................................................3-17
3.10 Negate Instructions.......................................................................................3-18
3.11 Multiply Word Instruction............................................................................3-19
3.12 Multiply Double-Word Instructions.............................................................3-19
3.13 Divide Instructions .......................................................................................3-20
3.14 Shift Left Instructions...................................................................................3-22
3.15 Shift Right Instructions.................................................................................3-23
3.16 Rotate Left Instruction..................................................................................3-24
3.17 Index Move Instructions...............................................................................3-25
3.18 Check Instructions........................................................................................3-26
3.19 No Operation Instruction..............................................................................3-26
3.20 Compare Instructions....................................................................................3-27
3.21 Compare Bit Instructions..............................................................................3-27
3.22 Test Leading Zeros Instruction.....................................................................3-28
3.23 Set Stack Address Instruction.......................................................................3-28
3.24 Set Conditional Instructions .........................................................................3-28
3.25 Branch Instructions.......................................................................................3-30
3.26 Delayed Branch Instructions ........................................................................3-31
TABLE OF CONTENTS iii
3.27 Call Instruction ............................................................................................ 3-33
3.28 Trap Instructions.......................................................................................... 3-34
3.29 Frame Instruction......................................................................................... 3-35
3.30 Return Instruction........................................................................................ 3-37
3.31 Fetch Instruction.......................................................................................... 3-38
3.32 Extended DSP Instructions..........................................................................3-39
3.33 Software Instructions...................................................................................3-41
3.33.1 Do Instruction.................................................................................. 3-42
3.33.2 Floating-Point Instructions.............................................................. 3-43
4. Exceptions
4.1 Exception Processing....................................................................................... 4-1
4.2 Exception Types .............................................................................................. 4-2
4.2.1 Reset.................................................................................................... 4-2
4.2.2 Range, Pointer, Frame and Privilege Error ......................................... 4-2
4.2.3 Extended Overflow.............................................................................. 4-2
4.2.4 Parity Error.......................................................................................... 4-3
4.2.5 Interrupt............................................................................................... 4-3
4.2.6 Trace Exception................................................................................... 4-3
4.3 Exception Backtracking................................................................................... 4-4
5. Timer
5.1 Overview.......................................................................................................... 5-1
5.1.1 Timer Prescaler Register TPR............................................................. 5-1
5.1.2 Timer Register TR............................................................................... 5-2
5.1.3 Timer Compare Register TCR ............................................................ 5-2
6. Bus Interface
6.1 Bus Control General ........................................................................................ 6-1
6.1.1 SRAM and ROM Bus Access............................................................. 6-1
6.1.1.1 SRAM and ROM Single-Cycle Read Access....................6-2
6.1.1.2 SRAM and ROM Multi-Cycle Read Access ..................... 6-2
6.1.1.3 SRAM Single-Cycle Write Access.................................... 6-3
6.1.1.4 SRAM Multi-Cycle write Access ...................................... 6-3
6.1.2 DRAM Bus Access ............................................................................. 6-4
6.1.2.1 DRAM Access ................................................................... 6-5
6.1.2.2 DRAM Refresh (CAS before RAS Refresh) ..................... 6-6
6.1.3 I/O Bus Access.................................................................................... 6-7
6.1.3.1 I/O Read Access................................................................. 6-7
6.1.3.2 I/O Write Access................................................................ 6-8
iv TABLE OF CONTENTS
6.2 I/O Bus Control ................................................................................................6-9
6.3 Bus Control Register BCR .............................................................................6-10
6.4 Memory Control Register MCR.....................................................................6-13
6.4.1 Output Voltage...................................................................................6-14
6.4.2 Input Threshold..................................................................................6-14
6.4.3 Power Down.......................................................................................6-14
6.4.4 IRAM Refresh Test............................................................................6-15
6.4.5 IRAM Refresh Rate ...........................................................................6-15
6.4.6 Entry Table Map ................................................................................6-15
6.4.7 MEMx Bus Hold Break .....................................................................6-15
6.5 Input Status Register ISR ...............................................................................6-16
6.6 Function Control Register FCR......................................................................6-17
6.7 Watchdog Compare Register WCR................................................................6-19
6.8 IO3 Control Modes.........................................................................................6-19
6.8.1 IO3Standard Mode.............................................................................6-19
6.8.2 Watchdog Mode.................................................................................6-19
6.8.3 IO3Timing Mode ...............................................................................6-20
6.8.4 IO3TimerInterrupt Mode ...................................................................6-20
6.9 Bus Signals.....................................................................................................6-21
6.9.1 Bus Signals for the GMS30C2132 Processor....................................6-21
6.9.2 Bus Signals for the GMS30C2116 Processor....................................6-22
6.9.3 Bus Signal Description.......................................................................6-23
6.10 DC Characteristics........................................................................................6-27
6.11 AC Characteristics........................................................................................6-29
6.11.1 Processor Clock................................................................................6-29
6.11.2 DRAM RAS Access.........................................................................6-30
6.11.3 DRAM Fast Page Mode Access.......................................................6-31
6.11.3.1 Multi-Cycle Access.........................................................6-31
6.11.3.2 Single-Cycle Access .......................................................6-32
6.11.4 DRAM CAS-Before-RAS Refresh..................................................6-34
6.11.5 SRAM Access..................................................................................6-35
6.11.5.1 Multi-Cycle Access.........................................................6-35
6.11.5.2 Single-Cycle Access .......................................................6-37
6.11.6 I/0 Access.........................................................................................6-38
TABLE OF CONTENTS v
7. Mechanical Data
7.1 GMS30C2132, 160-Pin MQFP-Package....................................................... 7-1
7.1.1 Pin Configuration - View from Top Side............................................ 7-1
7.1.2 Pin Cross Reference by Pin Name ...................................................... 7-2
7.1.3 Pin Cross Reference by Location........................................................ 7-3
7.2 GMS30C2132, 144-Pin TQFP-Package........................................................ 7-4
7.2.1 Pin Configuration - View from Top Side............................................ 7-4
7.2.2 Pin Cross Reference by Pin Name ...................................................... 7-5
7.2.3 Pin Cross Reference by Location........................................................ 7-6
7.3 GMS30C2116, 100-Pin TQFP-Package........................................................ 7-7
7.3.1 Pin Configuration - View from Top Side............................................ 7-7
7.3.2 Pin Cross Reference by Pin Name ...................................................... 7-8
7.3.3 Pin Cross Reference by Location........................................................ 7-9
7.4 Package-Dimensions...................................................................................... 7-10
Appendix. Instruction Set Details
Overview 0-1
0. Overview
0.1 GMS30C2116/32 RISC/DSP
The GMS30C2116 and GMS30C2132 RISC/DSP present a new class of microprocessors: The combination of a high-performance RISC microprocessor with an additional powerful DSP instruction set and on-chip micro-controller functions. The high throughput is not achieved by raw clock speed, it is due to a sophisticated novel architecture, combining the advantages of RISC and DSP technology.
The speed is obtained by an optimized combination of the following features:
¡Ü The most recent stack frames are kept in a register stack, thereby reducing data memory
accesses to a minimum by keeping almost all local data in registers.
¡Ü Pipelined memory access allows overlapping of memory accesses with execution. ¡Ü 4KByte on-chip memory. ¡Ü On-chip instruction cache omits instruction fetch in inner loops and provides pre-fetch. ¡Ü Variable-length instructions of 16, 32 or 48 bits provide a large, powerful instruction set,
thereby reducing the number of instructions to be executed.
¡Ü Primarily used 16-bit instructions halve the memory bandwidth required for instruction
fetch in comparison to conventional RISC architectures with fixed-length 32-bit instructions, yielding also even better code economy than conventional CISC architectures.
¡Ü Regular instruction set allows hardwiring of control logic at low component count. ¡Ü Most instructions execute in one cycle. ¡Ü Pipelined DSP instructions. ¡Ü Parallel execution of ALU and DSP instructions. ¡Ü Single-cycle half word multiply-accumulate operation. ¡Ü Fast Call and Return by parameter passing via registers. ¡Ü An instruction pipeline depth of only two stages - decode/execute - provides branching
without insertion of wait cycles in combination with Delayed Branch instructions.
¡Ü Range and pointer checks are performed without speed penalty, thus, these checks need
no longer be turned off, thereby providing higher runtime reliability.
¡Ü Separate address and data buses provide a throughput of one 32-bit word each cycle.
The features noted above contribute to reduce the number of idle wait cycles to a bare minimum. The processor is designed to sustain its execution rate with a standard DRAM memory.
The low power consumption is of advantage for mobile (portable) applications or in temperature-sensitive environments.
In the current version, the GMS30C2116 and GMS30C2132 RISC/DSP are implemented in a
0.6 µm-CMOS-process. The GMS30C2116 and GMS30C2132 RISC/DSP are based on hyperstone architecture.
0-2 CHAPTER 0
0.1. GMS30C2116/32 RISC/DSP (continued)
Most of the transistors are used for the on-chip memory, the instruction cache, the register stack and the multiplier, whereas only a small-number is required for the control logic.
Due to the Hynix’s low system cost, the GMS30C2116 and GMS3OC2132 RISC/DSP are very well suited for embedded-systems applications requiring high performance and lowest cost. To simplify board design as well as to reduce system costs, the GMS30C2116 and
GMS30C2132 already come with integrated periphery, such as a timer and memory and bus
control logic. Therefore, complete systems with the Hynix’s microprocessor can be implemented with a minimum of external components. To connect any kind of memory or I/O, no glue logic is necessary. It is even suitable for systems where up to now microprocessors with 16-bit architecture have been used for cost reasons. Its improved performance compared to conventional micro-controllers can be used to software­substitute many external peripherals like graphics controllers or DSPs.
The software development tools include an optimizing C compiler, assembler, source-level debugger with profiler as well as a real-time kernel with an extremely fast response time. Using this real-time kernel, up to 31 tasks, each with its own virtual timer, can be developed independently of each other. The synchronization of these tasks is effected almost automatically by the real-time kernel. To the developer, it seems as if he has up to 31 Hynix’s microprocessors to which he can allocate his programs accordingly. Real-time debugging of multiple tasks is assisted in an optimized way.
The following description gives a brief architectural overview:
Registers:
¡Ü 32 global and 64 local registers of 32 bits each ¡Ü 16 global and up to 16 local registers are addressable directly
Flags:
¡Ü Zero(Z), negative(N), carry(C) and overflow(V) flag ¡Ü Interrupt-mode, interrupt-lock, trace-mode, trace-pending, supervisor state, cache-mode
and high global flag
Register Data Types:
¡Ü Unsigned integer, signed integer, signed short, signed complex short, 16-bit fixed-point,
bit-string, IEEE-754 floating-point, each either 32 or 64 bits
External Memory:
¡Ü Address space of 4Gbytes, divided into five areas ¡Ü Separate I/O address space ¡Ü Load/Store architecture ¡Ü Pipelined memory and I/O accesses ¡Ü High-order data located and addressed at lower address (big endian) ¡Ü Instructions and double-word data may cross DRAM page boundaries
Overview 0-3
0.1. GMS30C2116/32 RISC/DSP (continued)
On-chip Memory:
¡Ü 4KByte internal (on-chip) memory
Memory Data Types:
¡Ü Unsigned and signed byte (8 bit) ¡Ü Unsigned and signed half word (16 bit), located on half word boundary ¡Ü Undedicated word (32 bit), located on word boundary ¡Ü Undedicated double-word (64 bit), located on word boundary
Runtime Stack:
¡Ü Runtime stack is divided into memory part and register part ¡Ü Register part is implemented by the 64 local registers holding the most recent stack
frame(s)
¡Ü Current stack frame (maximum 16 registers) is always kept in register part of the stack ¡Ü Data transfer between memory and register part of the stack is automatic ¡Ü Upper stack bound is guarded
Instruction Cache:
¡Ü An on-chip instruction cache reduces instruction memory access substantially
Instructions General:
¡Ü Variable-length instructions of one, two or three half words halve required memory
bandwidth
¡Ü Pipeline depth of only two stages, assures immediate refill after branches ¡Ü Register instructions of type "source operator destination destination" or
"source operator immediate destination"
¡Ü All register bits participate in an operation ¡Ü Immediate operands of 5, 16 and 32 bits, zero- or sign-expanded ¡Ü Large address displacement of up to 28 bits ¡Ü Two sets of signed arithmetical instructions: instructions set or clear either only the
overflow flag or trap additionally to a Range Error routine on overflow
¡Ü DSP instructions operate on 16-bit integer, real and complex fixed-point data and 32-bit
integer data into 32-bit and 64-bit hardware accumulators
Instruction Summary:
¡Ü Memory instructions pipelined to a depth of two stages, trap on address register equal to
zero (check for invalid pointers)
0-4 CHAPTER 0
0.1. GMS30C2116/32 RISC/DSP (continued)
¡Ü Memory address modes: register address, register post-increment, register + dis-
placement (including PC relative), register post-increment by displacement (next address), absolute, stack address, I/O absolute and I/O displacement
¡Ü Load, all data types, bytes and half words right adjusted and zero- or sign-expanded,
execution proceeds after Load until data is needed
¡Ü Store, all data types, trap when range of signed byte or half word is exceeded ¡Ü Move, Move immediate, Move double-word ¡Ü Logical instructions AND, AND not, OR, XOR, NOT, AND not immediate, OR
immediate, XOR immediate
¡Ü Mask source and immediate destination ¡Ü Add unsigned/signed, Add signed with trap on overflow, Add with carry ¡Ü Add unsigned/signed immediate, Add signed immediate with trap on overflow ¡Ü Sum source + immediate destination, unsigned/signed and signed with trap on
overflow
¡Ü Subtract unsigned/signed, Subtract signed with trap on overflow, Subtract with carry ¡Ü Negate unsigned/signed, Negate signed with trap on overflow
¡Ü Multiply word word low-order word unsigned or signed, Multiply
word word double-word unsigned and signed
¡Ü Divide double-word by word quotient and remainder, unsigned and signed ¡Ü Shift left unsigned/signed, single and double-word, by constant and by content of
register, Shift left signed by constant with trap on loss of high-order bits
¡Ü Shift right unsigned and signed, single and double-word, by constant and by content of
register
¡Ü Rotate left single word by content of register ¡Ü Index Move, move an index value scaled by 1, 2, 4 or 8, optionally with bounds check ¡Ü Check a value for an upper bound specified in a register or check for zero ¡Ü Compare unsigned/signed, Compare unsigned/signed immediate ¡Ü Compare bits, Compare bits immediate, Compare any byte zero ¡Ü Test number of leading zeros ¡Ü Set Conditional, save conditions in a register ¡Ü Branch unconditional and conditional (12 conditions) ¡Ü Delayed Branch unconditional and conditional (12 conditions) ¡Ü Call subprogram, unconditional and on overflow ¡Ü Trap to supervisor subprogram, unconditional and conditional (11 conditions) ¡Ü Frame, structure a new stack frame, include parameters in frame addressing, set frame
length, restore reserve frame length and check for upper stack bound
¡Ü Return from subprogram, restore program counter, status register and return-frame
Overview 0-5
0.1. GMS30C2116/32 RISC/DSP (continued)
¡Ü Software instructions, call an associated subprogram and pass a source operand and the
address of a destination operand to it
¡Ü DSP Multiply instructions:
signed and/or unsigned multiplication single and double word product
¡Ü DSP Multiply-Accumulate instructions:
signed multiply-add and multiply-subtract single and double word product sum and difference
¡Ü DSP Half word Multiply-Accumulate instructions:
signed multiply-add operating on four half word operands single and double word product sum
¡Ü DSP Complex Half word Multiply instruction:
signed complex half word multiplication real and imaginary single word product
¡Ü DSP Complex Half word Multiply-Accumulate instruction:
signed complex half word multiply-add real and imaginary single word product sum
¡Ü DSP Add and Subtract instructions:
signed half word add and subtract with and without fixed-point adjustment single word sum and difference
¡Ü Floating-point instructions are architecturally fully integrated, they are executed as
Software instructions by the present version. Floating-point Add, Subtract, Multiply, Divide, Compare and Compare unordered for single and double-precision, and Convert single double are provided.
Exceptions:
¡Ü Pointer, Privilege, Frame and Range Error, Extended Overflow, Parity Error, Interrupt
and Trace mode exception
¡Ü Watchdog function ¡Ü Error-causing instructions can be identified by backtracking, thus allowing a very
detailed error analysis
Timer:
¡Ü Two multifunctional timers
Bus Interface:
¡Ü Separate address bus of 26 (GMS30C2132) or 22 (GMS30C2116) bits and data bus of up
to 32 (GMS30C2132) or 16 bits (GMS30C2116) provide a throughput of four or two bytes at each clock cycle
¡Ü Data bus width of 32, 16 or 8 bits, individually selectable for each external memory area. ¡Ü Up to seven vectored interrupts ¡Ü Configurable I/O pins ¡Ü Internal generation of all memory and I/O control signals
0-6 CHAPTER 0
Data
Barrel shifter
ZWA
XYI
XYPC
Execution
Internal
(16)
X Y
0.2 Block Diagram
Register Set
64 Local
26 Global
ALU
X-Decode Y-Decode
Instruction
Cache
Control
DSP
Execution
Unit
Hardware-
Multiplier
Instruction
Cache
Instruction
Decode
Instruction
Control Unit
Instruction Prefetch
Control Unit
Load
Decode
Bus Interface
Control Unit
Store Data
Pipeline
32
32
Figure 0.1: Block Diagram
4
(2)
Bus Parity
4 kByte
RAM
12
26
(22)
Address
Bus
Memory Address
Pipeline
Watchdog
Power
Down+
Reset
Control
Bus Pipeline
Timer
Interrupt
control
4
Control
Control
Bus
Overview 0-7
213456789
101112131415161718192021222324
25
108
107
106
105
104
103
102
101
10099989695
949392919089888786
85
84
97
VCC
GND
IO3
IOWR#
CS3#
CS2#
CS1#
GND
RAS#
A19
VCC
A20
A21
GND
D31
D30
D29A9A10
A11
A12
VCC
D28
D27
D26
GND
WE2#
IORD#
OE#
VCC
CAS3#
CAS2#
CAS1#
GND
XTAL1/CLKIN
XTAL2
IO2
VCC
D16
D17
D18A3A2A1A0
GND
DP1
DP0838281VCC
CLKOUT
IO1
GND
RQST
INT4
INT3
INT2
INT1
GND
VCC
2627282930313233343536
GND
D25
D15
D14
VCC
D13
D12
D11
D10
GND
VCC
VCC
WE3#NCNCNCNC
109
110
111
112
113
114
115
116
117
118
119
120
37383940NCNCNC
NC
0.3 Pin Configuration
0.3.1 GMS30C2132, 160-Pin MQFP-Package - View from Top Side
VCC
GND
WE# GND
A13
ACT
VCC
GND
A14
CAS0#
VCC WE1# WE0#
GND
VCC
A22
VCC
GND
A23 A24
GND
A25 A15 A16
VCC
GND
A17 A18
VCC
GND
VCC
NC NC
NC NC
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135
A4 A5 A6
A7 A8
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160
GMS30C2132
80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41
VCC GND NC NC VCC GRANT# RESET# GND VCC DP3 DP2 D19 GND D20 D21 GND VCC D0 D1 D2 VCC D3 D4 D5 GND D22 D23 VCC D24 D6 GND VCC D7 D8 GND D9 NC NC GND VCC
Figure 0.2: GMS30C2132, 160-Pin MQFP-Package
0-8 CHAPTER 0
0.3. Pin Configuration (continued)
0.3.2 Pin Cross Reference by Pin Name
Signal Location Signal Location Signal Location Signal Location
A0...................97 D5......................57 GND..................65 NC...................124
A1...................98 D6......................51 GND..................68 NC...................157
A2...................99 D7......................48 GND..................73 NC...................158
A3.................100 D8......................47 GND..................79 OE#.................113
A4.................137 D9......................45 GND..................82 RAS# ................11
A5.................138 D10....................36 GND..................90 RESET#............74
A6.................139 D11....................35 GND..................96 RQST................89
A7.................141 D12....................34 GND................108 VCC ....................1
A8.................142 D13....................33 GND................119 VCC ..................13
A9...................20 D14....................31 GND................122 VCC ..................24
A10.................21 D15....................30 GND................126 VCC ..................32
A11.................22 D16..................103 GND................130 VCC ..................40
A12.................23 D17..................102 GND................136 VCC ..................41
A13...............127 D18..................101 GND................145 VCC ..................49
A14...............131 D19....................69 GND................148 VCC ..................53
A15...............150 D20....................67 GND................153 VCC ..................60
A16...............151 D21....................66 GND................159 VCC ..................64
A17...............154 D22....................55 GRANT#........... 75 VCC ..................72
A18...............155 D23....................54 INT1.................. 85 VCC ..................76
A19.................12 D24....................52 INT2.................. 86 VCC ..................80
A20.................14 D25....................29 INT3.................. 87 VCC ..................81
A21.................15 D26....................27 INT4.................. 88 VCC ..................93
A22...............143 D27....................26 IO1.................... 91 VCC ................104
A23...............146 D28....................25 IO2.................. 105 VCC ................112
A24...............147 D29....................19 IO3...................... 5 VCC ................120
A25...............149 D30....................18 IORD#............. 114 VCC ................121
ACT..............128 D31....................17 IOWR#................ 6 VCC ................133
CAS0#..........132 DP0...................94 NC ...................... 3 VCC ................140
CAS1#..........109 DP1...................95 NC ...................... 4 VCC ................156
CAS2#..........110 DP2...................70 NC .................... 37 VCC ................160
CAS3#..........111 DP3...................71 NC .................... 38 VCC ................129
CLKOUT.........92 GND ....................2 NC .................... 43 VCC ................144
CS1# ................9 GND ..................10 NC ....................44 VCC ................152
CS2# ................8 GND ..................16 NC ....................77 WE#................125
CS3# ................7 GND ..................28 NC ....................78 WE0#..............135
D0...................63 GND ..................39 NC....................83 WE1#..............134
D1...................62 GND ..................42 NC....................84 WE2#..............115
D2...................61 GND ..................46 NC..................117 WE3#..............116
D3...................59 GND ..................50 NC..................118 XTAL1/CLKIN.107
D4...................58 GND ..................56 NC..................123 XTAL2.............106
Overview 0-9
Power. Connected to the power supply. It can be
Ground. Connected to the system ground. All GND
lock. When external clock
generator generates the clock, XTAL1 is used as
Clock Signal Output. It can be used to supply a
With the GMS30C2132, only A22..A0
ted when the processor accesses a DRAM or refresh cycle. When a SRAM is placed in MEM0, RAS# is used as the
Column Address Strobe. They are only used by a DRAM for column access cycles and for “CAS
Write Enable. Active low indicates a write access,
Chip Select. Active low of CS1#..CS3# indicates
indicates write
I/O Read Strobe, optionally I/O Data Strobe. The
0.3. Pin Configuration (continued)
0.3.3 Pin Function
Type Name State
Power VCC I
GND I
Clock XTAL1 I Input for Quartz C
XTAL2 O Output for Quartz Clock. CLKOUT O
Address Bus
Data Bus D31..D0 I/O Data Bus. 32-bit bi-directional data bus
DP0..DP3 I/O Data Parity Signal. Bi-directional parity signals
Bus Control
CAS0#..CAS
WE# O/Z
CS1#..CS3# O/Z
WE0#..WE3# O/Z SRAM Write Enable. Active low
OE# O/Z Output Enable for SRAM’s and EPROM’s.
A25..A0 O/Z Address Bus.
RAS# O/Z Row Address Strobe. RAS# is activa
O/Z
3#
Use
selected 5.0V or 3.3V power supply.
pins must be connected to the system ground.
clock input.
clock signal to peripheral devices.
are connected to the address bus pins
chip select signal
before RAS” refresh.
active high indicates a read access.
chip select for the memory areas MEM1..MEM3.
enable for the corresponding byte.
IORD# O/Z
IOWR# O/Z I/O Write Strobe.
use of IORD# is specified in the I/O address bit 10.
0-10 CHAPTER 0
RQST signals the request for a memory or I/O
Bus Grant. GRANT# is signaled low by an bus arbiter to grant access to the bus for memory and
Active as bus master. ACT is signaled high when GRANT# is low and it is kept high during a current
Interrupt Request A signal of INT1..INT4 interrupt request pins causes an interrupt exception when
L is clear and the corresponding
Output Port. IO1..IO3 can be
individually configured via IOxDirection bits in the
et Processor. RESET# low resets the processor
to the initial state and halts all activity. RESET#
0.3. Pin Configuration (continued)
Type Name State
Bus Control
GRANT# I
ACT O
Interrupt INT1..INT4
I/O Port IO1..IO3 I/O General Input-
System
Control
RQST O
I
RESET# I Res
Use
access
I/O cycles
bus access
interrupt lock flag INTxMask bit in FCR is not set.
FCR as either input or output pins (port).
must be low for at least two cycles
ARCHITECTURE 1-1
1. Architecture
1.1 Introduction
1.1.1 RISC Architecture
In the early days of computer history, most computer families started with an instruction set which was rather simple. The main reason for being simple then was the high cost for hardware. The hardware cost has dropped and the software cost has gone up steadily in the past three decades.
The net result is that more and more functions have been built into the hardware, making the instruction set very large and very complex. The growth of instruction sets was also encouraged by the popularity of microprogrammed control in the 1960s and 1970s. Even user-defined instruction sets were implemented using microcodes in some processors for special-purpose applications.
The evolution of computer architectures has been dominated by families of increasingly complex processors. Under market pressures to preserve existing software, Complex Instruction Set Computer (CISC) architectures evolved by the gradual addition of microcode and increasingly elaborate operations. The intent was to supply more support for high-level languages and operating systems, as semiconductor advances made it possible to fabricate more complex integrated circuits. It seemed self-evident that architectures should become more complex as these technological advances made it possible to hold more complexity on VLSI devices.
In recent years, however, Reduced Instruction Set Computer (RISC) architectures have implemented a much more sophisticated handling of the complex interaction between hardware, firmware and software. RISC concepts emerged from statistical analysis of how software actually uses the resources of a processor. Dynamic measurement of system kernels and object modules generated by optimizing compilers show an overwhelming predominance of the simplest instruction, even in the code for CISC machine. Complex instructions are often ignored because a single way of performing a complex operation needs of high-level language and system environments. RISC designs eliminate the microcoded routines and turn the low-level control of the machine over to software.
This approach is not new. But its application is more universal in recent years thanks to the prevalence of high-level languages, the development of compilers that can optimize at the microcode level, and dramatic advances in semiconductor memory and packaging. It is now feasible to replace machine microcode ROM with faster RAM, organized as an instruction cache. Machine control then resides in the instruction cache and is, in fact, customized on the fly. The instruction stream generated by system- and compiler-generated code provides a precise fit between the requirements of high-level software and the capabilities of the hardware. So compilers are playing a vital role in RISC performance.
The advantage of RISC architecture is described as follows:
l Simplicity made VLSI implementation possible and thus higher clock rates. l Hardwired control and separated data and program caches lower the average CPI
(Cycles per Instruction) significantly.
l Dynamic instruction count in a RISC program only increased slightly (less than 2) in
ordinary program.
1-2 CHAPTER 1
l Recently, the MIPS (Million Instructions per Second) rate of a typical RISC
microprocessor increased with a factor of 5/(2*0.1) = 25 times from that of a typical CISC microprocessor.
l The clock rate increased from 10 MHz on a CISC processor to 50 MHz on a CMOS/
RISC microprocessor.
l The instruction count in a typical RISC program increased less than 2 times form that
of a typical CISC program.
l The average CPI for a RISC microprocessor decreased to 1.2 (instead of 12 as in a
typical CISC processor).
1.1.2 Techniques to reduce CPI (Cycles per Instruction)
If the work each instruction performs is simple and straightforward, the time required to execute each instruction can be shortened and the number of cycles reduced. The goal of RISC designs has been to achieve an execution rate of one instruction per machine cycle (multiple-instruction-issue designs now seek to increase this rate to more than one instruction per cycle). Techniques that help achieve this goal include:
l Instruction pipelines l Load and store (load/store) architecture l Delayed load instructions l Delayed branch instructions
(1) Instruction Pipelines One way to reduce the number of cycles required to execute an instruction is to overlap the
execution of multiple instructions. Instruction pipelines divide the execution of each instruction into several discrete portions and then execute multiple instructions simultaneously. The instruction pipeline technique can be likened to an assembled line ­the instruction progresses from one specialized stage to the next until it is complete (or issued) - just as an automobile moves along an assembly line. (This is contrast to the nonpipeline, microcode approach, where all the work is done by one general unit and is less capable at each individual task.) For example, the execution of an instruction might be subdivided into four portions, or clock cycles, as shown in Figure 1.1:
Cycle
#1
Fetch
Instruction
(F)
Cycle
#2
ALU
Operation
(A)
Cycle
#3
Access
Memory
(M)
Cycle
#1
Write
Results
(W)
Figure 1.1: Functional Division of a Hypothetical Pipeline
An Instruction pipeline can potentially reduce the number of cycles/instructions by a factor equal to the depth of the pipeline (the depth of the pipeline = the number of resource). For example, in Figure 1.2 each instruction still requires a total of four clock cycles to execute. However, if a four-level instruction pipeline is used, a new instruction can be initiated at
ARCHITECTURE 1-3
each clock cycle and the effective execution rate is one cycle per instruction.
Clock Cycles
Instruction
#1
F A M W
#2
F A M W
#3
F A M W
#4
F A M W
Figure 1.2: Multiple Instructions in a Hypothetical Pipeline
(2) Load/Store Architecture The discussion of the instruction pipeline illustrates how each instruction can be
subdivided into several discrete parts that permit the processor to execute multiple instructions in parallel. For this technique to work efficiently, the time required to execute each instruction subpart should be approximately equal. If one part requires an excessive length of time, there is an unpleasant choice: either halting the pipeline (inserting wait or idle cycles), or making all cycles longer to accommodate this lengthier portion of the instruction.
Instructions that perform operations on operands in memory tend to increase either the cycle time or the number of cycles/instruction. Such instruction require additional time for execution to calculate the addresses of the operands, read the required operands from memory, calculate the result, and store the results of the operation back to memory. To eliminate the negative impact of such instruction, RISC designs implement a load and store (load/store) architecture in which the processor has many register, all operations are performed on operands held in processor registers, and main memory is accessed only by
load and store instructions.
This approach produces several benefits
l Reducing the number of memory accesses eases memory bandwidth requirements l Limiting all operations to registers helps simplicity the instruction set l Eliminating memory operations makes it easier for compilers to optimize register
allocation - this further reduces memory accesses and also reduces the instructions/task factor
All of these factors help RISC design approach their goal of executing one cycle/instruction. However, two classes of instructions hinder achievement of this goal ­load instructions and branch instructions. The following sections discuss how RISC designs overcome obstacles raised by these classes of instructions.
1-4 CHAPTER 1
(3) Delayed Load Instructions Load instruction read operands from memory into processor register for subsequent
operation by other instructions. Because memory typically operates at much slower speeds than processor clock rates, the loaded operand is not immediately available to subsequent instructions in an instruction pipeline. The data dependency is illustrated in Figure 1.3.
Load
Instruction
1
Figure 1.3: Data Dependency Resulting From a Load Instruction
F A M W
2
F A M W
3
F A M W
4
F A M W
Data from Load
available as operation
In this illustration, the operand loaded by instruction 1 is not available for use in a cycle (ALU, or Arithmetic/Logic Unit operation) of instruction 2. One way to handle this dependency is to delay the pipeline by inserting additional clock cycles into the execution of instruction 2 until the loaded data becomes available. This approach obviously introduces delays that would increase the cycles/instructions factor.
In many RISC designs the technique used to handle this data dependency is to recognize and make visible to compilers the fact that all load instructions have an inherent latency or load delay. Figure 1.3 illustrates a load delay or latency of one instruction. The instruction that immediately follows the load is in the load delay slot. If the instruction in this slot does not require the data from the load, then no pipeline delay is required.
If this load delay is made visible to software, a compiler can arrange instructions to ensure that there is no data dependency a load instruction and the instruction in the load delay slot. The simplest way of ensuring that there is no data dependency is to insert a No Operation (NOP) instruction to fill the slot, as follow:
Load R1, A Load R2, B NOP <= This instruction fills the delay slot ADD R3, R1, R2
Although filling the delay slot with NOP instructions eliminates the need for hardware­controlled pipeline stalls in this case, it still is not a very efficient use of the pipeline stream since these additional NOP instructions increase code size and perform no useful work. (In practice, however, this technique need not have much negative impact on performance.)
A more effective solution to handling the data dependency is to fill the load delay slot with a useful instruction. Good optimizing compilers can usually accomplish this, especially if
ARCHITECTURE 1-5
the load delay is only one instruction. Below example program illustrates how a compiler might rearrange instruction to handle a potential data dependency.
# Consider the code for C := A+B; F := D Load R1, A Load R2, B Add R2, R1, R2 <= This instruction stalls because R2 data is not available Load R4, D
..... ....
# An alternative code sequence (where delay length = 1) Load R1, A Load R2, B Load R4, D Add R3, R1, R2 <= No stall since R2 data is available
(4) Delayed Branch Instructions Branch instructions usually delay the instruction pipeline because the processor must
calculate the effective destination of the branch and fetch that instruction. When a cache access requires an entire cycle, and the fetched branch instruction specifies the target address, it is impossible to perform this fetch (of the destination instruction) without delaying the pipeline for at least one pipe stage (one cycle). Conditional branches can cause further delays because they require the calculation of a condition, as well as the target address.
Instead of stalling the instruction pipeline to wait for the instruction at the target address, RISC designs typically use an approach similar to that used with Load instruction: Branch instructions are delayed and do not take effect until after one or more instructions immediately following the Branch instruction have been executed. The instruction or instructions immediately following the Branch instruction (delay instruction) have been executed. Branch and delayed branch instruction are illustrated in Figure 1.4
Condition ?
Delayed Branch
Branch Target
Condition ?
NO
Next Instruction
YES
Branch Target
Delay Instruction
YES
NO
Next Instruction
Branch Instruction Delayed Branch Instruction
Figure 1.4: Block Diagram of Branch/Delayed Branch Instruction
1-6 CHAPTER 1
1. The instruction is read from the instruction cache
f Rd (destination operand) and Rs
(source operand) is activated according to the instruction
2.1 The control signal of IR (immediate register
on is calculated and saved
2.1 The control of ALU datapath is made and instruction
2.2 The result of ALU operation is saved in the register
Additional ALU operation is continued and its result is
1.1.3 The pipeline structure of GMS30C2132
GMS30C2132 has a two-stage pipeline structure and each stage is composed of two phases (TM and TV). The basic structure of GMS30C2132 pipeline is two-stage pipeline, but actually it is lengthened by the need of some instruction. As an example, standard ALU instruction uses 5 phases (2 stage pipeline (4 phases) + additional 1 phase). This additional phase doesn’t use the datapath that is used next instruction, so next instruction execution need not wait until previous ALU instruction is ended. DSP instruction takes over 2 stage pipeline for execution, and requires same resource in the datapath that is required to next DSP instruction. So next DSP instruction is delayed.
The pipeline structure of GMS30C2132 and the action of datapath are described in Table
1.1.
Stage Phase Datapath Action
Fetch/Decode TM (Low)
TV (High) 2. The control signal o
according to the address of instruction.
that was loaded in TM phase
(operand)) and IL (instruction length) is activated.
2.2 The address of next instructi in PC
Execute/Write TM (Low) 1. The next instruction is read from the instruction cache.
1.1 The address of Rs and Rs are determined.
1.2 The immediate operand is determined.
1.3 The operand is read from register stack using the address of Rs and Rd.
1.4 The operand XR, YR and QR are controlled.
TV (High) 2. The input data of ALU is attained.
Additional
Insertion
Next TM
is executed in ALU.
file.
saved in the register file.
Table 1.1: The pipeline structure of GMS30C2132 and the action of datapath.
ARCHITECTURE 1-7
1.2 Global Register Set
The architecture provides 32 global registers of 32 bit each. These are:
G0 Program Counter PC G1 Status Register SR G2 Floating-point Exception Register FER G3..G15 General purpose registers G16..G17 Reserved G18 Stack Pointer SP G19 Upper stack Bound UB G20 Bus Control Register BCR (see section 6. Bus Interface) G21 Timer Prescaler Register TPR (see section 5. Timer) G22 Timer Compare Register TCR (see section 5. Timer) G23 Timer Register TR (see section 5. Timer) G24 Watchdog Compare Register WCR (see section 6. Bus Interface) G25 Input Status Register ISR (see section 6. Bus Interface) G26 Function Control Register FCR (see section 6. Bus Interface) G27 Memory Control Register MCR (see section 6. Bus Interface) G28..G31 Reserved
Registers G0..G15 can be addressed directly by the register code (0..15) of an instruction. Registers G18..G27 can be addressed only by a MOV or MOVI instruction with the high global flag H set to 1.
(Example) MOVI G2, 0x20 ; G2 := 0x20 (set H flag) MOV G3, G19 ; G3 := G19 (G19 (UB) is copied to G3)
1-8 CHAPTER 1
G15
G16
G17
G18
G19
G20
G21
G22
G23
G24
G25
G26
G27
G28
031
G0 G1 G2 G3
Program Counter PC
Status Register SR
Floating-Point Exception Register FER
General Purpose Registers G3..G15
Reserved Reserved
Stack Pointer SP
Upper Stack Bound UB
Bus Control Register BCR
Timer Prescaler Register TPR Timer Compare Register TCR
Timer Register TR
Watchdog Compare Register WCR
Input Status Register ISR
0
0000
Function Control Register FCR Memory Control Register MCR
G28..G31 Reserved
G31
Figure 1.5: Global Register Set
1.2.1 Program Counter PC, G0
G0 is the program counter PC. It is updated to the address of the next instruction through instruction execution. Besides this implicit updating, the PC can also be addressed like a regular source or destination register. When the PC is referenced as an operand, the supplied value is the address of the first byte after the instruction which references it (the address of next instruction), except when referenced by a delay instruction with a preceding delayed branch taken. At delay branch instruction, when the branch condition is met, place the branch address PC + rel (relative to the address of the first byte after the Delayed Branch Instruction) in the PC (see section 3.26. Delayed Branch Instructions).
Placing a result in the PC has the effect of a branch taken. When branch is taken, the target address of branch is placed in PC.
Bit zero of the PC is always zero, regardless of any value placed in the PC.
ARCHITECTURE 1-9
1.2.2 Status Register SR, G1
G1 is the status register SR. Its content is updated by instruction execution. Besides this implicit updating, the SR can also be addressed like a regular register (when H flag is set). When addressed as source or destination operand, all 32 bits are used as an operand. However, only bits 15..0 of a result can be placed in bits 15..0 of the SR, bits 31..16 of the result are discarded and bits 31..16 of the SR remain unchanged. When SR addressed as source operand, it represents 0x0 value. The full content of the SR is replaced only by the Return Instruction. A result placed in the SR overrules any setting or clearing of the condition flags as a result of an instruction.
31 30 27 26 25 24 23 22 21 20 19 18 17 16
Figure 1.6: Status Register SR (bits 31..16)
15 14 11 10 9 8 7 6 5 4 3 2 1 0
2829 FP
Frame Pointer Frame Length
1213
FTE V N Z C
FL S
ILC
Instruction-Length Code
MHFRML I
P
Supervisor State Flag
T
Trace-Mode Flag
Trace Pending Flag
Carry Flag
Floating-Point Trap Enable
Floating-Point Rounding Mode
Interrupt-Lock Flag
Figure 1.7: Status Register SR (bits 15..0)
Zero Flag
Negative Flag
Overflow Flag
Cache-Mode Flag
High Global Flag
Reserved
Interrupt-Mode Flag
1-10 CHAPTER 1
1.2.2 Status Register SR, G1 (continued)
The status register SR contains the following status information: C Carry Flag. Bit zero is the carry condition flag C. In general, when set it
indicates that the unsigned integer range is exceeded (overflow). At add operations, it indicates a carry out of bit 31 of the result. At subtract operations, it indicates a borrow (inverse carry) into bit 31 of the result.
Z Zero Flag. Bit one is the zero condition flag Z. When set, it indicates that all 32
or 64 result bits are equal to zero regardless of any carry, borrow or overflow.
N Negative Flag. Bit two is the negative condition flag N. On compare
instructions, it indicates the arithmetic correct (true) sign of the result regardless of an overflow. On all other instructions, it is derived from result bit 31, which is the true sign bit when no overflow occurs. In the case of overflow, result bit 31 and N reflect the inverted sign bit.
V Overflow Flag. Bit three is the overflow condition flag V. In general, when set
it indicates a signed overflow. At the Move instructions, it indicates a floating­point NaN (Not a Number).
M Cache-Mode Flag. Bit four is the cache-mode flag M. Besides being set or
cleared under program control, it is also automatically cleared by a Frame instruction and by any branch taken except a delayed branch. See section
1.8. Instruction Cache for details.
H High Global Flag. Bit five is the high global flag H. When H is set, denoting
G0..G15 addresses G16..G31 instead. Thus, the registers G18..G27 may be addressed by denoting G2..G11 respectively. The H flag is effective only in the first cycle of the next instruction after it was set; then it is cleared automatically. Only the MOV or MOVI instruction issued as the next instructions must be used to copy the content of a local register or an immediate value to one of the high global registers. The MOV instruction may be used to copy the content of a high global register (except the BCR, TPR, FCR and MCR register, which are write-only) to a local register. With all other instructions, the result may be invalid. If one of the high global registers is addressed as the destination register in user state (S = 0), the condition flags are undefined, the destination register remains unchanged and a trap to Privilege Error occurs.
Reserved Bit six is reserved for future use. It must always be zero. I Interrupt-Mode Flag. Bit seven is the interrupt-mode flag I. It is set
automatically on interrupt entry and reset to its old value by a Return instruction. The I flag is used by the operating system; it must be never changed by any user program.
FTE Floating-Point Trap Enable Flag. Bits 12..8 are the floating-point trap enable
flags They determine the Exception type and Trap execution flow(see section
3.33.2. Floating-Point Instructions).
Loading...
+ 292 hidden pages