HYNIX GMS30C2132 Datasheet

Jun 13, 2001 Ver. 3.1

16/32 BIT RISC/DSP

GMS30C2116 GMS30C2132

USER’S MANUAL

Revision 3.1 Published by

IDA Team in Hynix Semiconductor Inc.

¨Ï

Hynix Offices in Korea or Distributors and Representatives listed at address directory may serve additional information of this manual.

Hynix reserves the right to make changes to any Information here in at any time without notice. The information, diagrams, and other data in this manual are correct and reliable; however, Hynix is

in no way responsible for any violations of patents or other rights of the third party generated by the use of this manual.

Specifications and information in this document are subject to change without notice and do not represent a commitment on the part of Hynix. Hynix reserves the right to make changes to improve functioning. Although the information in this document has been carefully reviewed, Hynix does not assume any liability arising out of the use of the product or circuit described herein.

Hynix does not authorize the use of the Hynix microprocessor in life support applications wherein a failure or malfunction of the microprocessor may directly threaten life or cause injury. The user of the Hynix microprocessor in life support applications assumes all risks of such use and indemnifies Hynix against all damages.

For further information please contact:

SEOUL OFFICE : Hynix Semiconductor YOUNG DONG Bldg. 891, Daechi-dong, Kangnam-gu, Seoul, Korea. PHONE : (02) 3459-3662~3 FAX : (02) 3459-3942 SYSTEM IC : 1, Hyangjeong-dong, Hungduk-gu, Cheongju, 361-725, Korea. PHONE : (0431) 270-4030~47 FAX : (0431) 270-4075

TABLE OF CONTENTS i

Table of Contents

0. Overview

0.1 GMS30C2116/32 RISC/DSP.............................................................................. 0-1

0.2 Block Diagram.................................................................................................... 0-6

0.3 Pin Configuration................................................................................................ 0-7

0.3.1 GMS30C2132, 160-Pin MQFP-Package - View from Top Side........ 0-7

0.3.2 Pin Cross Reference by Pin Name ...................................................... 0-8

0.3.3 Pin Fuction .......................................................................................... 0-9

1. Architecture

1.1 Introduction...................................................................................................... 1-1

1.1.1 RISC Architecture............................................................................... 1-1

1.1.2 Techniques to reduce CPI (Cycles per Instruction)............................. 1-2

1.1.3 The pipeline structure of GMS30C2132............................................. 1-6

1.2 Global Register Set..........................................................................................1-7

1.2.1 Program Counter PC, G0 .................................................................... 1-8

1.2.2 Status Register SR, G1........................................................................ 1-9

1.2.3 Floating-Point Exception Register FER, G2..................................... 1-12

1.2.4 Stack Pointer SP, G18....................................................................... 1-13

1.2.5 Upper Stack Bound UB, G19............................................................ 1-13

1.2.6 Bus Control Register BCR, G20 ....................................................... 1-13

1.2.7 Timer Prescaler Register TPR, G21.................................................. 1-14

1.2.8 Timer Compare Register TCR, G22.................................................. 1-14

1.2.9 Timer Register TR, G23.................................................................... 1-14

1.2.10 Watchdog Compare Register WCR, G24........................................ 1-14

1.2.11 Input Status Register ISR, G25 ....................................................... 1-14

1.2.12 Function Control Register FCR, G26..............................................1-14

1.2.13 Memory Control Register MCR, G27............................................. 1-15

1.3 Local Register Set.......................................................................................... 1-15

1.4 Privilege States .............................................................................................. 1-16

1.5 Register Data Types ....................................................................................... 1-17

1.6 Memory Organization .................................................................................... 1-18

1.7 Stack............................................................................................................... 1-20

1.8 Instruction Cache...........................................................................................1-25

1.9 On-Chip Memory (IRAM)............................................................................. 1-28

ii TABLE OF CONTENTS

2. Instructions General

2.1 Instruction Notation..........................................................................................2-1

2.2 Instruction Execution........................................................................................2-2

2.3 Instruction Formats...........................................................................................2-3

2.3.1 Table of Immediate Values..................................................................2-5

2.3.2 Table of Instruction Codes...................................................................2-6

2.3.3 Table of Extended DSP Instruction Codes ..........................................2-7

2.4 Entry Tables......................................................................................................2-8

2.5 Instruction Timing..........................................................................................2-12

3. Instruction Set

3.1 Memory Instructions ........................................................................................3-1

3.1.1 Address Modes.....................................................................................3-2

3.1.2 Load Instructions..................................................................................3-7

3.1.3 Store Instructions .................................................................................3-9

3.2 Move Word Instructions.................................................................................3-11

3.3 Move Double-Word Instruction .....................................................................3-11

3.4 Logical Instructions........................................................................................3-12

3.5 Invert Instruction ............................................................................................3-13

3.6 Mask Instruction.............................................................................................3-13

3.7 Add Instructions .............................................................................................3-14

3.8 Sum Instructions.............................................................................................3-16

3.9 Subtract Instructions.......................................................................................3-17

3.10 Negate Instructions.......................................................................................3-18

3.11 Multiply Word Instruction............................................................................3-19

3.12 Multiply Double-Word Instructions.............................................................3-19

3.13 Divide Instructions .......................................................................................3-20

3.14 Shift Left Instructions...................................................................................3-22

3.15 Shift Right Instructions.................................................................................3-23

3.16 Rotate Left Instruction..................................................................................3-24

3.17 Index Move Instructions...............................................................................3-25

3.18 Check Instructions........................................................................................3-26

3.19 No Operation Instruction..............................................................................3-26

3.20 Compare Instructions....................................................................................3-27

3.21 Compare Bit Instructions..............................................................................3-27

3.22 Test Leading Zeros Instruction.....................................................................3-28

3.23 Set Stack Address Instruction.......................................................................3-28

3.24 Set Conditional Instructions .........................................................................3-28

3.25 Branch Instructions.......................................................................................3-30

3.26 Delayed Branch Instructions ........................................................................3-31

TABLE OF CONTENTS iii

3.27 Call Instruction ............................................................................................ 3-33

3.28 Trap Instructions.......................................................................................... 3-34

3.29 Frame Instruction......................................................................................... 3-35

3.30 Return Instruction........................................................................................ 3-37

3.31 Fetch Instruction.......................................................................................... 3-38

3.32 Extended DSP Instructions..........................................................................3-39

3.33 Software Instructions...................................................................................3-41

3.33.1 Do Instruction.................................................................................. 3-42

3.33.2 Floating-Point Instructions.............................................................. 3-43

4. Exceptions

4.1 Exception Processing....................................................................................... 4-1

4.2 Exception Types .............................................................................................. 4-2

4.2.1 Reset.................................................................................................... 4-2

4.2.2 Range, Pointer, Frame and Privilege Error ......................................... 4-2

4.2.3 Extended Overflow.............................................................................. 4-2

4.2.4 Parity Error.......................................................................................... 4-3

4.2.5 Interrupt............................................................................................... 4-3

4.2.6 Trace Exception................................................................................... 4-3

4.3 Exception Backtracking................................................................................... 4-4

5. Timer

5.1 Overview.......................................................................................................... 5-1

5.1.1 Timer Prescaler Register TPR............................................................. 5-1

5.1.2 Timer Register TR............................................................................... 5-2

5.1.3 Timer Compare Register TCR ............................................................ 5-2

6. Bus Interface

6.1 Bus Control General ........................................................................................ 6-1

6.1.1 SRAM and ROM Bus Access............................................................. 6-1

6.1.1.1 SRAM and ROM Single-Cycle Read Access....................6-2

6.1.1.2 SRAM and ROM Multi-Cycle Read Access ..................... 6-2

6.1.1.3 SRAM Single-Cycle Write Access.................................... 6-3

6.1.1.4 SRAM Multi-Cycle write Access ...................................... 6-3

6.1.2 DRAM Bus Access ............................................................................. 6-4

6.1.2.1 DRAM Access ................................................................... 6-5

6.1.2.2 DRAM Refresh (CAS before RAS Refresh) ..................... 6-6

6.1.3 I/O Bus Access.................................................................................... 6-7

6.1.3.1 I/O Read Access................................................................. 6-7

6.1.3.2 I/O Write Access................................................................ 6-8

iv TABLE OF CONTENTS

6.2 I/O Bus Control ................................................................................................6-9

6.3 Bus Control Register BCR .............................................................................6-10

6.4 Memory Control Register MCR.....................................................................6-13

6.4.1 Output Voltage...................................................................................6-14

6.4.2 Input Threshold..................................................................................6-14

6.4.3 Power Down.......................................................................................6-14

6.4.4 IRAM Refresh Test............................................................................6-15

6.4.5 IRAM Refresh Rate ...........................................................................6-15

6.4.6 Entry Table Map ................................................................................6-15

6.4.7 MEMx Bus Hold Break .....................................................................6-15

6.5 Input Status Register ISR ...............................................................................6-16

6.6 Function Control Register FCR......................................................................6-17

6.7 Watchdog Compare Register WCR................................................................6-19

6.8 IO3 Control Modes.........................................................................................6-19

6.8.1 IO3Standard Mode.............................................................................6-19

6.8.2 Watchdog Mode.................................................................................6-19

6.8.3 IO3Timing Mode ...............................................................................6-20

6.8.4 IO3TimerInterrupt Mode ...................................................................6-20

6.9 Bus Signals.....................................................................................................6-21

6.9.1 Bus Signals for the GMS30C2132 Processor....................................6-21

6.9.2 Bus Signals for the GMS30C2116 Processor....................................6-22

6.9.3 Bus Signal Description.......................................................................6-23

6.10 DC Characteristics........................................................................................6-27

6.11 AC Characteristics........................................................................................6-29

6.11.1 Processor Clock................................................................................6-29

6.11.2 DRAM RAS Access.........................................................................6-30

6.11.3 DRAM Fast Page Mode Access.......................................................6-31

6.11.3.1 Multi-Cycle Access.........................................................6-31

6.11.3.2 Single-Cycle Access .......................................................6-32

6.11.4 DRAM CAS-Before-RAS Refresh..................................................6-34

6.11.5 SRAM Access..................................................................................6-35

6.11.5.1 Multi-Cycle Access.........................................................6-35

6.11.5.2 Single-Cycle Access .......................................................6-37

6.11.6 I/0 Access.........................................................................................6-38

TABLE OF CONTENTS v

7. Mechanical Data

7.1 GMS30C2132, 160-Pin MQFP-Package....................................................... 7-1

7.1.1 Pin Configuration - View from Top Side............................................ 7-1

7.1.2 Pin Cross Reference by Pin Name ...................................................... 7-2

7.1.3 Pin Cross Reference by Location........................................................ 7-3

7.2 GMS30C2132, 144-Pin TQFP-Package........................................................ 7-4

7.2.1 Pin Configuration - View from Top Side............................................ 7-4

7.2.2 Pin Cross Reference by Pin Name ...................................................... 7-5

7.2.3 Pin Cross Reference by Location........................................................ 7-6

7.3 GMS30C2116, 100-Pin TQFP-Package........................................................ 7-7

7.3.1 Pin Configuration - View from Top Side............................................ 7-7

7.3.2 Pin Cross Reference by Pin Name ...................................................... 7-8

7.3.3 Pin Cross Reference by Location........................................................ 7-9

7.4 Package-Dimensions...................................................................................... 7-10

Appendix. Instruction Set Details

Overview 0-1

0. Overview

0.1 GMS30C2116/32 RISC/DSP

The GMS30C2116 and GMS30C2132 RISC/DSP present a new class of microprocessors: The combination of a high-performance RISC microprocessor with an additional powerful DSP instruction set and on-chip micro-controller functions. The high throughput is not achieved by raw clock speed, it is due to a sophisticated novel architecture, combining the advantages of RISC and DSP technology.

The speed is obtained by an optimized combination of the following features:

¡Ü The most recent stack frames are kept in a register stack, thereby reducing data memory

accesses to a minimum by keeping almost all local data in registers.

¡Ü Pipelined memory access allows overlapping of memory accesses with execution. ¡Ü 4KByte on-chip memory. ¡Ü On-chip instruction cache omits instruction fetch in inner loops and provides pre-fetch. ¡Ü Variable-length instructions of 16, 32 or 48 bits provide a large, powerful instruction set,

thereby reducing the number of instructions to be executed.

¡Ü Primarily used 16-bit instructions halve the memory bandwidth required for instruction

fetch in comparison to conventional RISC architectures with fixed-length 32-bit instructions, yielding also even better code economy than conventional CISC architectures.

¡Ü Regular instruction set allows hardwiring of control logic at low component count. ¡Ü Most instructions execute in one cycle. ¡Ü Pipelined DSP instructions. ¡Ü Parallel execution of ALU and DSP instructions. ¡Ü Single-cycle half word multiply-accumulate operation. ¡Ü Fast Call and Return by parameter passing via registers. ¡Ü An instruction pipeline depth of only two stages - decode/execute - provides branching

without insertion of wait cycles in combination with Delayed Branch instructions.

¡Ü Range and pointer checks are performed without speed penalty, thus, these checks need

no longer be turned off, thereby providing higher runtime reliability.

¡Ü Separate address and data buses provide a throughput of one 32-bit word each cycle.

The features noted above contribute to reduce the number of idle wait cycles to a bare minimum. The processor is designed to sustain its execution rate with a standard DRAM memory.

The low power consumption is of advantage for mobile (portable) applications or in temperature-sensitive environments.

In the current version, the GMS30C2116 and GMS30C2132 RISC/DSP are implemented in a

0.6 µm-CMOS-process. The GMS30C2116 and GMS30C2132 RISC/DSP are based on hyperstone architecture.

0-2 CHAPTER 0

0.1. GMS30C2116/32 RISC/DSP (continued)

Most of the transistors are used for the on-chip memory, the instruction cache, the register stack and the multiplier, whereas only a small-number is required for the control logic.

Due to the Hynix’s low system cost, the GMS30C2116 and GMS3OC2132 RISC/DSP are very well suited for embedded-systems applications requiring high performance and lowest cost. To simplify board design as well as to reduce system costs, the GMS30C2116 and

GMS30C2132 already come with integrated periphery, such as a timer and memory and bus

control logic. Therefore, complete systems with the Hynix’s microprocessor can be implemented with a minimum of external components. To connect any kind of memory or I/O, no glue logic is necessary. It is even suitable for systems where up to now microprocessors with 16-bit architecture have been used for cost reasons. Its improved performance compared to conventional micro-controllers can be used to softwaresubstitute many external peripherals like graphics controllers or DSPs.

The software development tools include an optimizing C compiler, assembler, source-level debugger with profiler as well as a real-time kernel with an extremely fast response time. Using this real-time kernel, up to 31 tasks, each with its own virtual timer, can be developed independently of each other. The synchronization of these tasks is effected almost automatically by the real-time kernel. To the developer, it seems as if he has up to 31 Hynix’s microprocessors to which he can allocate his programs accordingly. Real-time debugging of multiple tasks is assisted in an optimized way.

The following description gives a brief architectural overview:

Registers:

¡Ü 32 global and 64 local registers of 32 bits each ¡Ü 16 global and up to 16 local registers are addressable directly

Flags:

¡Ü Zero(Z), negative(N), carry(C) and overflow(V) flag ¡Ü Interrupt-mode, interrupt-lock, trace-mode, trace-pending, supervisor state, cache-mode

and high global flag

¡Ü Unsigned integer, signed integer, signed short, signed complex short, 16-bit fixed-point,

bit-string, IEEE-754 floating-point, each either 32 or 64 bits

External Memory:

¡Ü Address space of 4Gbytes, divided into five areas ¡Ü Separate I/O address space ¡Ü Load/Store architecture ¡Ü Pipelined memory and I/O accesses ¡Ü High-order data located and addressed at lower address (big endian) ¡Ü Instructions and double-word data may cross DRAM page boundaries

Overview 0-3

0.1. GMS30C2116/32 RISC/DSP (continued)

On-chip Memory:

¡Ü 4KByte internal (on-chip) memory

Memory Data Types:

¡Ü Unsigned and signed byte (8 bit) ¡Ü Unsigned and signed half word (16 bit), located on half word boundary ¡Ü Undedicated word (32 bit), located on word boundary ¡Ü Undedicated double-word (64 bit), located on word boundary

Runtime Stack:

¡Ü Runtime stack is divided into memory part and register part ¡Ü Register part is implemented by the 64 local registers holding the most recent stack

frame(s)

¡Ü Current stack frame (maximum 16 registers) is always kept in register part of the stack ¡Ü Data transfer between memory and register part of the stack is automatic ¡Ü Upper stack bound is guarded

Instruction Cache:

¡Ü An on-chip instruction cache reduces instruction memory access substantially

Instructions General:

¡Ü Variable-length instructions of one, two or three half words halve required memory

bandwidth

¡Ü Pipeline depth of only two stages, assures immediate refill after branches ¡Ü Register instructions of type "source operator destination ⇒ destination" or

"source operator immediate ⇒ destination"

¡Ü All register bits participate in an operation ¡Ü Immediate operands of 5, 16 and 32 bits, zero- or sign-expanded ¡Ü Large address displacement of up to 28 bits ¡Ü Two sets of signed arithmetical instructions: instructions set or clear either only the

overflow flag or trap additionally to a Range Error routine on overflow

¡Ü DSP instructions operate on 16-bit integer, real and complex fixed-point data and 32-bit

integer data into 32-bit and 64-bit hardware accumulators

Instruction Summary:

¡Ü Memory instructions pipelined to a depth of two stages, trap on address register equal to

zero (check for invalid pointers)

0-4 CHAPTER 0

0.1. GMS30C2116/32 RISC/DSP (continued)

¡Ü Memory address modes: register address, register post-increment, register + dis-

placement (including PC relative), register post-increment by displacement (next address), absolute, stack address, I/O absolute and I/O displacement

¡Ü Load, all data types, bytes and half words right adjusted and zero- or sign-expanded,

execution proceeds after Load until data is needed

¡Ü Store, all data types, trap when range of signed byte or half word is exceeded ¡Ü Move, Move immediate, Move double-word ¡Ü Logical instructions AND, AND not, OR, XOR, NOT, AND not immediate, OR

immediate, XOR immediate

¡Ü Mask source and immediate ⇒ destination ¡Ü Add unsigned/signed, Add signed with trap on overflow, Add with carry ¡Ü Add unsigned/signed immediate, Add signed immediate with trap on overflow ¡Ü Sum source + immediate ⇒ destination, unsigned/signed and signed with trap on

overflow

¡Ü Subtract unsigned/signed, Subtract signed with trap on overflow, Subtract with carry ¡Ü Negate unsigned/signed, Negate signed with trap on overflow

¡Ü Multiply word ∗ word ⇒ low-order word unsigned or signed, Multiply

word ∗ word ⇒ double-word unsigned and signed

¡Ü Divide double-word by word ⇒ quotient and remainder, unsigned and signed ¡Ü Shift left unsigned/signed, single and double-word, by constant and by content of

¡Ü Shift right unsigned and signed, single and double-word, by constant and by content of

¡Ü Rotate left single word by content of register ¡Ü Index Move, move an index value scaled by 1, 2, 4 or 8, optionally with bounds check ¡Ü Check a value for an upper bound specified in a register or check for zero ¡Ü Compare unsigned/signed, Compare unsigned/signed immediate ¡Ü Compare bits, Compare bits immediate, Compare any byte zero ¡Ü Test number of leading zeros ¡Ü Set Conditional, save conditions in a register ¡Ü Branch unconditional and conditional (12 conditions) ¡Ü Delayed Branch unconditional and conditional (12 conditions) ¡Ü Call subprogram, unconditional and on overflow ¡Ü Trap to supervisor subprogram, unconditional and conditional (11 conditions) ¡Ü Frame, structure a new stack frame, include parameters in frame addressing, set frame

length, restore reserve frame length and check for upper stack bound

¡Ü Return from subprogram, restore program counter, status register and return-frame

Overview 0-5

0.1. GMS30C2116/32 RISC/DSP (continued)

¡Ü Software instructions, call an associated subprogram and pass a source operand and the

address of a destination operand to it

¡Ü DSP Multiply instructions:

signed and/or unsigned multiplication ⇒ single and double word product

¡Ü DSP Multiply-Accumulate instructions:

signed multiply-add and multiply-subtract ⇒ single and double word product sum and difference

¡Ü DSP Half word Multiply-Accumulate instructions:

signed multiply-add operating on four half word operands ⇒ single and double word product sum

¡Ü DSP Complex Half word Multiply instruction:

signed complex half word multiplication ⇒ real and imaginary single word product

¡Ü DSP Complex Half word Multiply-Accumulate instruction:

signed complex half word multiply-add ⇒ real and imaginary single word product sum

¡Ü DSP Add and Subtract instructions:

signed half word add and subtract with and without fixed-point adjustment ⇒ single word sum and difference

¡Ü Floating-point instructions are architecturally fully integrated, they are executed as

Software instructions by the present version. Floating-point Add, Subtract, Multiply, Divide, Compare and Compare unordered for single and double-precision, and Convert single ⇔ double are provided.

Exceptions:

¡Ü Pointer, Privilege, Frame and Range Error, Extended Overflow, Parity Error, Interrupt

and Trace mode exception

¡Ü Watchdog function ¡Ü Error-causing instructions can be identified by backtracking, thus allowing a very

detailed error analysis

Timer:

¡Ü Two multifunctional timers

Bus Interface:

¡Ü Separate address bus of 26 (GMS30C2132) or 22 (GMS30C2116) bits and data bus of up

to 32 (GMS30C2132) or 16 bits (GMS30C2116) provide a throughput of four or two bytes at each clock cycle

¡Ü Data bus width of 32, 16 or 8 bits, individually selectable for each external memory area. ¡Ü Up to seven vectored interrupts ¡Ü Configurable I/O pins ¡Ü Internal generation of all memory and I/O control signals

0-6 CHAPTER 0

Data

Barrel shifter

ZWA

XYI

XYPC

Execution

Internal

(16)

X Y

0.2 Block Diagram

64 Local

26 Global

ALU

X-Decode Y-Decode

Instruction

Cache

Control

DSP

Execution

Unit

Hardware-

Multiplier

Instruction

Cache

Instruction

Decode

Instruction

Control Unit

Instruction Prefetch

Control Unit

Load

Decode

Bus Interface

Control Unit

Store Data

Pipeline

Figure 0.1: Block Diagram

(2)

Bus Parity

4 kByte

RAM

(22)

Address

Bus

Memory Address

Pipeline

Watchdog

Power

Down+

Reset

Control

Bus Pipeline

Timer

Interrupt

control

Control

Bus

Overview 0-7

213456789

101112131415161718192021222324

108

107

106

105

104

103

102

101

10099989695

949392919089888786

VCC

GND

IO3

IOWR#

CS3#

CS2#

CS1#

GND

RAS#

A19

VCC

A20

A21

GND

D31

D30

D29A9A10

A11

A12

VCC

D28

D27

D26

GND

WE2#

IORD#

OE#

VCC

CAS3#

CAS2#

CAS1#

GND

XTAL1/CLKIN

XTAL2

IO2

VCC

D16

D17

D18A3A2A1A0

GND

DP1

DP0838281VCC

CLKOUT

IO1

GND

RQST

INT4

INT3

INT2

INT1

GND

VCC

2627282930313233343536

GND

D25

D15

D14

VCC

D13

D12

D11

D10

GND

VCC

WE3#NCNCNCNC

109

110

111

112

113

114

115

116

117

118

119

120

37383940NCNCNC

0.3 Pin Configuration

0.3.1 GMS30C2132, 160-Pin MQFP-Package - View from Top Side

VCC

GND

WE# GND

A13

ACT

VCC

GND

A14

CAS0#

VCC WE1# WE0#

GND

VCC

A22

VCC

GND

A23 A24

GND

A25 A15 A16

VCC

GND

A17 A18

VCC

GND

VCC

NC NC

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135

A4 A5 A6

A7 A8

136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160

GMS30C2132

80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41

VCC GND NC NC VCC GRANT# RESET# GND VCC DP3 DP2 D19 GND D20 D21 GND VCC D0 D1 D2 VCC D3 D4 D5 GND D22 D23 VCC D24 D6 GND VCC D7 D8 GND D9 NC NC GND VCC

Figure 0.2: GMS30C2132, 160-Pin MQFP-Package

0-8 CHAPTER 0

0.3. Pin Configuration (continued)

0.3.2 Pin Cross Reference by Pin Name

Signal Location Signal Location Signal Location Signal Location

A0...................97 D5......................57 GND..................65 NC...................124

A1...................98 D6......................51 GND..................68 NC...................157

A2...................99 D7......................48 GND..................73 NC...................158

A3.................100 D8......................47 GND..................79 OE#.................113

A4.................137 D9......................45 GND..................82 RAS# ................11

A5.................138 D10....................36 GND..................90 RESET#............74

A6.................139 D11....................35 GND..................96 RQST................89

A7.................141 D12....................34 GND................108 VCC ....................1

A8.................142 D13....................33 GND................119 VCC ..................13

A9...................20 D14....................31 GND................122 VCC ..................24

A10.................21 D15....................30 GND................126 VCC ..................32

A11.................22 D16..................103 GND................130 VCC ..................40

A12.................23 D17..................102 GND................136 VCC ..................41

A13...............127 D18..................101 GND................145 VCC ..................49

A14...............131 D19....................69 GND................148 VCC ..................53

A15...............150 D20....................67 GND................153 VCC ..................60

A16...............151 D21....................66 GND................159 VCC ..................64

A17...............154 D22....................55 GRANT#........... 75 VCC ..................72

A18...............155 D23....................54 INT1.................. 85 VCC ..................76

A19.................12 D24....................52 INT2.................. 86 VCC ..................80

A20.................14 D25....................29 INT3.................. 87 VCC ..................81

A21.................15 D26....................27 INT4.................. 88 VCC ..................93

A22...............143 D27....................26 IO1.................... 91 VCC ................104

A23...............146 D28....................25 IO2.................. 105 VCC ................112

A24...............147 D29....................19 IO3...................... 5 VCC ................120

A25...............149 D30....................18 IORD#............. 114 VCC ................121

ACT..............128 D31....................17 IOWR#................ 6 VCC ................133

CAS0#..........132 DP0...................94 NC ...................... 3 VCC ................140

CAS1#..........109 DP1...................95 NC ...................... 4 VCC ................156

CAS2#..........110 DP2...................70 NC .................... 37 VCC ................160

CAS3#..........111 DP3...................71 NC .................... 38 VCC ................129

CLKOUT.........92 GND ....................2 NC .................... 43 VCC ................144

CS1# ................9 GND ..................10 NC ....................44 VCC ................152

CS2# ................8 GND ..................16 NC ....................77 WE#................125

CS3# ................7 GND ..................28 NC ....................78 WE0#..............135

D0...................63 GND ..................39 NC....................83 WE1#..............134

D1...................62 GND ..................42 NC....................84 WE2#..............115

D2...................61 GND ..................46 NC..................117 WE3#..............116

D3...................59 GND ..................50 NC..................118 XTAL1/CLKIN.107

D4...................58 GND ..................56 NC..................123 XTAL2.............106

Overview 0-9

Power. Connected to the power supply. It can be

Ground. Connected to the system ground. All GND

lock. When external clock

generator generates the clock, XTAL1 is used as

Clock Signal Output. It can be used to supply a

With the GMS30C2132, only A22..A0

ted when the processor accesses a DRAM or refresh cycle. When a SRAM is placed in MEM0, RAS# is used as the

Column Address Strobe. They are only used by a DRAM for column access cycles and for “CAS

Write Enable. Active low indicates a write access,

Chip Select. Active low of CS1#..CS3# indicates

indicates write

I/O Read Strobe, optionally I/O Data Strobe. The

0.3. Pin Configuration (continued)

0.3.3 Pin Function

Type Name State

Power VCC I

GND I

Clock XTAL1 I Input for Quartz C

XTAL2 O Output for Quartz Clock. CLKOUT O

Address Bus

Data Bus D31..D0 I/O Data Bus. 32-bit bi-directional data bus

DP0..DP3 I/O Data Parity Signal. Bi-directional parity signals

Bus Control

CAS0#..CAS

WE# O/Z

CS1#..CS3# O/Z

WE0#..WE3# O/Z SRAM Write Enable. Active low

OE# O/Z Output Enable for SRAM’s and EPROM’s.

A25..A0 O/Z Address Bus.

RAS# O/Z Row Address Strobe. RAS# is activa

O/Z

Use

selected 5.0V or 3.3V power supply.

pins must be connected to the system ground.

clock input.

clock signal to peripheral devices.

are connected to the address bus pins

chip select signal

before RAS” refresh.

active high indicates a read access.

chip select for the memory areas MEM1..MEM3.

enable for the corresponding byte.

IORD# O/Z

IOWR# O/Z I/O Write Strobe.

use of IORD# is specified in the I/O address bit 10.

0-10 CHAPTER 0

RQST signals the request for a memory or I/O

Bus Grant. GRANT# is signaled low by an bus arbiter to grant access to the bus for memory and

Active as bus master. ACT is signaled high when GRANT# is low and it is kept high during a current

Interrupt Request A signal of INT1..INT4 interrupt request pins causes an interrupt exception when

L is clear and the corresponding

Output Port. IO1..IO3 can be

individually configured via IOxDirection bits in the

et Processor. RESET# low resets the processor

to the initial state and halts all activity. RESET#

0.3. Pin Configuration (continued)

Type Name State

Bus Control

GRANT# I

ACT O

Interrupt INT1..INT4

I/O Port IO1..IO3 I/O General Input-

System

Control

RQST O

RESET# I Res

Use

access

I/O cycles

bus access

interrupt lock flag INTxMask bit in FCR is not set.

FCR as either input or output pins (port).

must be low for at least two cycles

ARCHITECTURE 1-1

1. Architecture

1.1 Introduction

1.1.1 RISC Architecture

In the early days of computer history, most computer families started with an instruction set which was rather simple. The main reason for being simple then was the high cost for hardware. The hardware cost has dropped and the software cost has gone up steadily in the past three decades.

The net result is that more and more functions have been built into the hardware, making the instruction set very large and very complex. The growth of instruction sets was also encouraged by the popularity of microprogrammed control in the 1960s and 1970s. Even user-defined instruction sets were implemented using microcodes in some processors for special-purpose applications.

The evolution of computer architectures has been dominated by families of increasingly complex processors. Under market pressures to preserve existing software, Complex Instruction Set Computer (CISC) architectures evolved by the gradual addition of microcode and increasingly elaborate operations. The intent was to supply more support for high-level languages and operating systems, as semiconductor advances made it possible to fabricate more complex integrated circuits. It seemed self-evident that architectures should become more complex as these technological advances made it possible to hold more complexity on VLSI devices.

In recent years, however, Reduced Instruction Set Computer (RISC) architectures have implemented a much more sophisticated handling of the complex interaction between hardware, firmware and software. RISC concepts emerged from statistical analysis of how software actually uses the resources of a processor. Dynamic measurement of system kernels and object modules generated by optimizing compilers show an overwhelming predominance of the simplest instruction, even in the code for CISC machine. Complex instructions are often ignored because a single way of performing a complex operation needs of high-level language and system environments. RISC designs eliminate the microcoded routines and turn the low-level control of the machine over to software.

This approach is not new. But its application is more universal in recent years thanks to the prevalence of high-level languages, the development of compilers that can optimize at the microcode level, and dramatic advances in semiconductor memory and packaging. It is now feasible to replace machine microcode ROM with faster RAM, organized as an instruction cache. Machine control then resides in the instruction cache and is, in fact, customized on the fly. The instruction stream generated by system- and compiler-generated code provides a precise fit between the requirements of high-level software and the capabilities of the hardware. So compilers are playing a vital role in RISC performance.

The advantage of RISC architecture is described as follows:

l Simplicity made VLSI implementation possible and thus higher clock rates. l Hardwired control and separated data and program caches lower the average CPI

(Cycles per Instruction) significantly.

l Dynamic instruction count in a RISC program only increased slightly (less than 2) in

ordinary program.

1-2 CHAPTER 1

l Recently, the MIPS (Million Instructions per Second) rate of a typical RISC

microprocessor increased with a factor of 5/(2*0.1) = 25 times from that of a typical CISC microprocessor.

l The clock rate increased from 10 MHz on a CISC processor to 50 MHz on a CMOS/

RISC microprocessor.

l The instruction count in a typical RISC program increased less than 2 times form that

of a typical CISC program.

l The average CPI for a RISC microprocessor decreased to 1.2 (instead of 12 as in a

typical CISC processor).

1.1.2 Techniques to reduce CPI (Cycles per Instruction)

If the work each instruction performs is simple and straightforward, the time required to execute each instruction can be shortened and the number of cycles reduced. The goal of RISC designs has been to achieve an execution rate of one instruction per machine cycle (multiple-instruction-issue designs now seek to increase this rate to more than one instruction per cycle). Techniques that help achieve this goal include:

l Instruction pipelines l Load and store (load/store) architecture l Delayed load instructions l Delayed branch instructions

(1) Instruction Pipelines One way to reduce the number of cycles required to execute an instruction is to overlap the

execution of multiple instructions. Instruction pipelines divide the execution of each instruction into several discrete portions and then execute multiple instructions simultaneously. The instruction pipeline technique can be likened to an assembled line the instruction progresses from one specialized stage to the next until it is complete (or issued) - just as an automobile moves along an assembly line. (This is contrast to the nonpipeline, microcode approach, where all the work is done by one general unit and is less capable at each individual task.) For example, the execution of an instruction might be subdivided into four portions, or clock cycles, as shown in Figure 1.1:

Cycle

Fetch

Instruction

(F)

Cycle

ALU

Operation

(A)

Cycle

Access

Memory

(M)

Cycle

Write

Results

(W)

Figure 1.1: Functional Division of a Hypothetical Pipeline

An Instruction pipeline can potentially reduce the number of cycles/instructions by a factor equal to the depth of the pipeline (the depth of the pipeline = the number of resource). For example, in Figure 1.2 each instruction still requires a total of four clock cycles to execute. However, if a four-level instruction pipeline is used, a new instruction can be initiated at

ARCHITECTURE 1-3

each clock cycle and the effective execution rate is one cycle per instruction.

Clock Cycles

Instruction

F A M W

Figure 1.2: Multiple Instructions in a Hypothetical Pipeline

(2) Load/Store Architecture The discussion of the instruction pipeline illustrates how each instruction can be

subdivided into several discrete parts that permit the processor to execute multiple instructions in parallel. For this technique to work efficiently, the time required to execute each instruction subpart should be approximately equal. If one part requires an excessive length of time, there is an unpleasant choice: either halting the pipeline (inserting wait or idle cycles), or making all cycles longer to accommodate this lengthier portion of the instruction.

Instructions that perform operations on operands in memory tend to increase either the cycle time or the number of cycles/instruction. Such instruction require additional time for execution to calculate the addresses of the operands, read the required operands from memory, calculate the result, and store the results of the operation back to memory. To eliminate the negative impact of such instruction, RISC designs implement a load and store (load/store) architecture in which the processor has many register, all operations are performed on operands held in processor registers, and main memory is accessed only by

load and store instructions.

This approach produces several benefits

l Reducing the number of memory accesses eases memory bandwidth requirements l Limiting all operations to registers helps simplicity the instruction set l Eliminating memory operations makes it easier for compilers to optimize register

allocation - this further reduces memory accesses and also reduces the instructions/task factor

All of these factors help RISC design approach their goal of executing one cycle/instruction. However, two classes of instructions hinder achievement of this goal load instructions and branch instructions. The following sections discuss how RISC designs overcome obstacles raised by these classes of instructions.

1-4 CHAPTER 1

(3) Delayed Load Instructions Load instruction read operands from memory into processor register for subsequent

operation by other instructions. Because memory typically operates at much slower speeds than processor clock rates, the loaded operand is not immediately available to subsequent instructions in an instruction pipeline. The data dependency is illustrated in Figure 1.3.

Load

Instruction

Figure 1.3: Data Dependency Resulting From a Load Instruction

F A M W

Data from Load

available as operation

In this illustration, the operand loaded by instruction 1 is not available for use in a cycle (ALU, or Arithmetic/Logic Unit operation) of instruction 2. One way to handle this dependency is to delay the pipeline by inserting additional clock cycles into the execution of instruction 2 until the loaded data becomes available. This approach obviously introduces delays that would increase the cycles/instructions factor.

In many RISC designs the technique used to handle this data dependency is to recognize and make visible to compilers the fact that all load instructions have an inherent latency or load delay. Figure 1.3 illustrates a load delay or latency of one instruction. The instruction that immediately follows the load is in the load delay slot. If the instruction in this slot does not require the data from the load, then no pipeline delay is required.

If this load delay is made visible to software, a compiler can arrange instructions to ensure that there is no data dependency a load instruction and the instruction in the load delay slot. The simplest way of ensuring that there is no data dependency is to insert a No Operation (NOP) instruction to fill the slot, as follow:

Load R1, A Load R2, B NOP <= This instruction fills the delay slot ADD R3, R1, R2

Although filling the delay slot with NOP instructions eliminates the need for hardwarecontrolled pipeline stalls in this case, it still is not a very efficient use of the pipeline stream since these additional NOP instructions increase code size and perform no useful work. (In practice, however, this technique need not have much negative impact on performance.)

A more effective solution to handling the data dependency is to fill the load delay slot with a useful instruction. Good optimizing compilers can usually accomplish this, especially if

ARCHITECTURE 1-5

the load delay is only one instruction. Below example program illustrates how a compiler might rearrange instruction to handle a potential data dependency.

# Consider the code for C := A+B; F := D Load R1, A Load R2, B Add R2, R1, R2 <= This instruction stalls because R2 data is not available Load R4, D

..... ....

# An alternative code sequence (where delay length = 1) Load R1, A Load R2, B Load R4, D Add R3, R1, R2 <= No stall since R2 data is available

(4) Delayed Branch Instructions Branch instructions usually delay the instruction pipeline because the processor must

calculate the effective destination of the branch and fetch that instruction. When a cache access requires an entire cycle, and the fetched branch instruction specifies the target address, it is impossible to perform this fetch (of the destination instruction) without delaying the pipeline for at least one pipe stage (one cycle). Conditional branches can cause further delays because they require the calculation of a condition, as well as the target address.

Instead of stalling the instruction pipeline to wait for the instruction at the target address, RISC designs typically use an approach similar to that used with Load instruction: Branch instructions are delayed and do not take effect until after one or more instructions immediately following the Branch instruction have been executed. The instruction or instructions immediately following the Branch instruction (delay instruction) have been executed. Branch and delayed branch instruction are illustrated in Figure 1.4

Condition ?

Delayed Branch

Branch Target

Condition ?

Next Instruction

YES

Branch Target

Delay Instruction

YES

Next Instruction

Branch Instruction Delayed Branch Instruction

Figure 1.4: Block Diagram of Branch/Delayed Branch Instruction

1-6 CHAPTER 1

1. The instruction is read from the instruction cache

f Rd (destination operand) and Rs

(source operand) is activated according to the instruction

2.1 The control signal of IR (immediate register

on is calculated and saved

2.1 The control of ALU datapath is made and instruction

2.2 The result of ALU operation is saved in the register

Additional ALU operation is continued and its result is

1.1.3 The pipeline structure of GMS30C2132

GMS30C2132 has a two-stage pipeline structure and each stage is composed of two phases (TM and TV). The basic structure of GMS30C2132 pipeline is two-stage pipeline, but actually it is lengthened by the need of some instruction. As an example, standard ALU instruction uses 5 phases (2 stage pipeline (4 phases) + additional 1 phase). This additional phase doesn’t use the datapath that is used next instruction, so next instruction execution need not wait until previous ALU instruction is ended. DSP instruction takes over 2 stage pipeline for execution, and requires same resource in the datapath that is required to next DSP instruction. So next DSP instruction is delayed.

The pipeline structure of GMS30C2132 and the action of datapath are described in Table

1.1.

Stage Phase Datapath Action

Fetch/Decode TM (Low)

TV (High) 2. The control signal o

according to the address of instruction.

that was loaded in TM phase

(operand)) and IL (instruction length) is activated.

2.2 The address of next instructi in PC

Execute/Write TM (Low) 1. The next instruction is read from the instruction cache.

1.1 The address of Rs and Rs are determined.

1.2 The immediate operand is determined.

1.3 The operand is read from register stack using the address of Rs and Rd.

1.4 The operand XR, YR and QR are controlled.

TV (High) 2. The input data of ALU is attained.

Additional

Insertion

Next TM

is executed in ALU.

file.

saved in the register file.

Table 1.1: The pipeline structure of GMS30C2132 and the action of datapath.

ARCHITECTURE 1-7

1.2 Global Register Set

The architecture provides 32 global registers of 32 bit each. These are:

G0 Program Counter PC G1 Status Register SR G2 Floating-point Exception Register FER G3..G15 General purpose registers G16..G17 Reserved G18 Stack Pointer SP G19 Upper stack Bound UB G20 Bus Control Register BCR (see section 6. Bus Interface) G21 Timer Prescaler Register TPR (see section 5. Timer) G22 Timer Compare Register TCR (see section 5. Timer) G23 Timer Register TR (see section 5. Timer) G24 Watchdog Compare Register WCR (see section 6. Bus Interface) G25 Input Status Register ISR (see section 6. Bus Interface) G26 Function Control Register FCR (see section 6. Bus Interface) G27 Memory Control Register MCR (see section 6. Bus Interface) G28..G31 Reserved

Registers G0..G15 can be addressed directly by the register code (0..15) of an instruction. Registers G18..G27 can be addressed only by a MOV or MOVI instruction with the high global flag H set to 1.

(Example) MOVI G2, 0x20 ; G2 := 0x20 (set H flag) MOV G3, G19 ; G3 := G19 (G19 (UB) is copied to G3)

1-8 CHAPTER 1

G15

G16

G17

G18

G19

G20

G21

G22

G23

G24

G25

G26

G27

G28

031

G0 G1 G2 G3

Program Counter PC

Status Register SR

Floating-Point Exception Register FER

General Purpose Registers G3..G15

Reserved Reserved

Stack Pointer SP

Upper Stack Bound UB

Bus Control Register BCR

Timer Prescaler Register TPR Timer Compare Register TCR

Timer Register TR

Watchdog Compare Register WCR

Input Status Register ISR

0000

Function Control Register FCR Memory Control Register MCR

G28..G31 Reserved

G31

Figure 1.5: Global Register Set

1.2.1 Program Counter PC, G0

G0 is the program counter PC. It is updated to the address of the next instruction through instruction execution. Besides this implicit updating, the PC can also be addressed like a regular source or destination register. When the PC is referenced as an operand, the supplied value is the address of the first byte after the instruction which references it (the address of next instruction), except when referenced by a delay instruction with a preceding delayed branch taken. At delay branch instruction, when the branch condition is met, place the branch address PC + rel (relative to the address of the first byte after the Delayed Branch Instruction) in the PC (see section 3.26. Delayed Branch Instructions).

Placing a result in the PC has the effect of a branch taken. When branch is taken, the target address of branch is placed in PC.

Bit zero of the PC is always zero, regardless of any value placed in the PC.

ARCHITECTURE 1-9

1.2.2 Status Register SR, G1

G1 is the status register SR. Its content is updated by instruction execution. Besides this implicit updating, the SR can also be addressed like a regular register (when H flag is set). When addressed as source or destination operand, all 32 bits are used as an operand. However, only bits 15..0 of a result can be placed in bits 15..0 of the SR, bits 31..16 of the result are discarded and bits 31..16 of the SR remain unchanged. When SR addressed as source operand, it represents 0x0 value. The full content of the SR is replaced only by the Return Instruction. A result placed in the SR overrules any setting or clearing of the condition flags as a result of an instruction.

31 30 27 26 25 24 23 22 21 20 19 18 17 16

Figure 1.6: Status Register SR (bits 31..16)

15 14 11 10 9 8 7 6 5 4 3 2 1 0

2829 FP

Frame Pointer Frame Length

1213

FTE V N Z C

FL S

ILC

Instruction-Length Code

MHFRML I

Supervisor State Flag

Trace-Mode Flag

Trace Pending Flag

Carry Flag

Floating-Point Trap Enable

Floating-Point Rounding Mode

Interrupt-Lock Flag

Figure 1.7: Status Register SR (bits 15..0)

Zero Flag

Negative Flag

Overflow Flag

Cache-Mode Flag

High Global Flag

Reserved

Interrupt-Mode Flag

1-10 CHAPTER 1

1.2.2 Status Register SR, G1 (continued)

The status register SR contains the following status information: C Carry Flag. Bit zero is the carry condition flag C. In general, when set it

indicates that the unsigned integer range is exceeded (overflow). At add operations, it indicates a carry out of bit 31 of the result. At subtract operations, it indicates a borrow (inverse carry) into bit 31 of the result.

Z Zero Flag. Bit one is the zero condition flag Z. When set, it indicates that all 32

or 64 result bits are equal to zero regardless of any carry, borrow or overflow.

N Negative Flag. Bit two is the negative condition flag N. On compare

instructions, it indicates the arithmetic correct (true) sign of the result regardless of an overflow. On all other instructions, it is derived from result bit 31, which is the true sign bit when no overflow occurs. In the case of overflow, result bit 31 and N reflect the inverted sign bit.

V Overflow Flag. Bit three is the overflow condition flag V. In general, when set

it indicates a signed overflow. At the Move instructions, it indicates a floatingpoint NaN (Not a Number).

M Cache-Mode Flag. Bit four is the cache-mode flag M. Besides being set or

cleared under program control, it is also automatically cleared by a Frame instruction and by any branch taken except a delayed branch. See section

1.8. Instruction Cache for details.

H High Global Flag. Bit five is the high global flag H. When H is set, denoting

G0..G15 addresses G16..G31 instead. Thus, the registers G18..G27 may be addressed by denoting G2..G11 respectively. The H flag is effective only in the first cycle of the next instruction after it was set; then it is cleared automatically. Only the MOV or MOVI instruction issued as the next instructions must be used to copy the content of a local register or an immediate value to one of the high global registers. The MOV instruction may be used to copy the content of a high global register (except the BCR, TPR, FCR and MCR register, which are write-only) to a local register. With all other instructions, the result may be invalid. If one of the high global registers is addressed as the destination register in user state (S = 0), the condition flags are undefined, the destination register remains unchanged and a trap to Privilege Error occurs.

Reserved Bit six is reserved for future use. It must always be zero. I Interrupt-Mode Flag. Bit seven is the interrupt-mode flag I. It is set

automatically on interrupt entry and reset to its old value by a Return instruction. The I flag is used by the operating system; it must be never changed by any user program.

FTE Floating-Point Trap Enable Flag. Bits 12..8 are the floating-point trap enable

flags They determine the Exception type and Trap execution flow(see section

3.33.2. Floating-Point Instructions).

+ 292 hidden pages

HYNIX GMS30C2132 Datasheet

Specifications and Main Features

Frequently Asked Questions

User Manual