HYNIX GMS30C2232, GMS30C2216 Datasheet

Jun. 29, 2001 Ver. 3.1

16/32 BIT RISC/DSP

GMS30C2216 GMS30C2232

USER’S MANUAL

Revision 3.1 Published by

IDA Team in Hynix Semiconductor Inc.

¨Ï

Hynix Offices in Korea or Distributors and Representatives listed at address directory may serve additional information of this manual.

Hynix reserves the right to make changes to any Information here in at any time without notice.

The information, diagrams, and other data in this manual are correct and reliable; however, Hynix is in no way responsible for any violations of patents or other rights of the third party generated by the use of this manual.

Specifications and information in this document are subject to change without notice and do not represent a commitment on the part of Hynix. Hynix reserves the right to make changes to improve functioning. Although the information in this document has been carefully reviewed, Hynix does not assume any liability arising out of the use of the product or circuit described herein.

Hynix does not authorize the use of the Hynix microprocessor in life support applications wherein a failure or malfunction of the microprocessor may directly threaten life or cause injury. The user of the Hynix microprocessor in life support applications assumes all risks of such use and indemnifies Hynix against all damages.

For further information please contact:

SEOUL OFFICE : Hynix YOUNG DONG Bldg. 891, Daechi-dong, Kangnam-gu, Seoul, Korea. PHONE : (02) 3459-3662~3 FAX : (02) 3459-3942 SYSTEM IC : 1, Hyangjeong-dong, Hungduk-gu, Cheongju, 361-725, Korea. PHONE : (0431) 270-4030~47 FAX : (0431) 270-4075

Table of Contents i

Table of Contents

0. Overview

0.1 GMS30C2216/32 RISC/DSP.............................................................................. 0-1

0.2 Block Diagram.................................................................................................... 0-8

0.3 Pin Configuration................................................................................................ 0-9

0.3.1 GMS30C2232, 160-Pin MQFP-Package - View from Top Side........ 0-9

0.3.2 Pin Cross Reference by Pin Name .................................................... 0-10

0.3.2 Pin Cross Reference by Location...................................................... 0-11

0.3.4 Pin Fuction ........................................................................................ 0-12

1. Architecture

1.1 Introduction...................................................................................................... 1-1

1.1.1 RISC Architecture............................................................................... 1-1

1.1.2 Techniques to reduce CPI (Cycles per Instruction)............................. 1-2

1.1.3 The pipeline structure of GMS30C2232............................................. 1-7

1.2 Global Register Set..........................................................................................1-8

1.2.1 Program Counter PC, G0 .................................................................... 1-9

1.2.2 Status Register SR, G1...................................................................... 1-10

1.2.3 Floating-Point Exception Register FER, G2..................................... 1-13

1.2.4 Stack Pointer SP, G18....................................................................... 1-14

1.2.5 Upper Stack Bound UB, G19............................................................ 1-14

1.2.6 Bus Control Register BCR, G20 ....................................................... 1-14

1.2.7 Timer Prescaler Register TPR, G21.................................................. 1-15

1.2.8 Timer Compare Register TCR, G22.................................................. 1-15

1.2.9 Timer Register TR, G23.................................................................... 1-15

1.2.10 Watchdog Compare Register WCR, G24........................................ 1-15

1.2.11 Input Status Register ISR, G25 ....................................................... 1-15

1.2.12 Function Control Register FCR, G26..............................................1-15

1.2.13 Memory Control Register MCR, G27............................................. 1-16

1.3 Local Register Set.......................................................................................... 1-16

1.4 Privilege States .............................................................................................. 1-17

1.5 Register Data Types ....................................................................................... 1-18

1.6 Memory Organization .................................................................................... 1-19

1.7 Stack............................................................................................................... 1-21

1.8 Instruction Cache...........................................................................................1-26

1.9 On-Chip Memory (IRAM)............................................................................. 1-29

ii TABLE of Contents

2. Instructions General

2.1 Instruction Notation..........................................................................................2-1

2.2 Instruction Execution........................................................................................2-2

2.3 Instruction Formats...........................................................................................2-3

2.3.1 Table of Immediate Values..................................................................2-5

2.3.2 Table of Instruction Codes...................................................................2-6

2.3.3 Table of Extended DSP Instruction Codes ..........................................2-7

2.4 Entry Tables......................................................................................................2-8

2.5 Instruction Timing..........................................................................................2-12

3. Instruction Set

3.1 Memory Instructions ........................................................................................3-1

3.1.1 Address Modes.....................................................................................3-2

3.1.2 Load Instructions..................................................................................3-7

3.1.3 Store Instructions ...............................................................................3-10

3.2 Move Word Instructions.................................................................................3-13

3.3 Move Double-Word Instruction .....................................................................3-13

3.4 Logical Instructions........................................................................................3-15

3.5 Invert Instruction ............................................................................................3-16

3.6 Mask Instruction.............................................................................................3-16

3.7 Add Instructions .............................................................................................3-17

3.8 Sum Instructions.............................................................................................3-19

3.9 Subtract Instructions.......................................................................................3-20

3.10 Negate Instructions.......................................................................................3-21

3.11 Multiply Word Instruction............................................................................3-22

3.12 Multiply Double-Word Instructions.............................................................3-22

3.13 Divide Instructions .......................................................................................3-24

3.14 Shift Left Instructions...................................................................................3-26

3.15 Shift Right Instructions.................................................................................3-27

3.16 Rotate Left Instruction..................................................................................3-29

3.17 Index Move Instructions...............................................................................3-20

3.18 Check Instructions........................................................................................3-32

3.19 No Operation Instruction..............................................................................3-32

3.20 Compare Instructions....................................................................................3-33

3.21 Compare Bit Instructions..............................................................................3-34

3.22 Test Leading Zeros Instruction.....................................................................3-34

3.23 Set Stack Address Instruction.......................................................................3-35

3.24 Set Conditional Instructions .........................................................................3-35

3.25 Branch Instructions.......................................................................................3-37

3.26 Delayed Branch Instructions ........................................................................3-39

Table of Contents iii

3.27 Call Instruction ............................................................................................ 3-41

3.28 Trap Instructions.......................................................................................... 3-43

3.29 Frame Instruction......................................................................................... 3-45

3.30 Return Instruction........................................................................................ 3-48

3.31 Fetch Instruction.......................................................................................... 3-50

3.32 Extended DSP Instructions..........................................................................3-51

3.33 Software Instructions...................................................................................3-54

3.33.1 Do Instruction.................................................................................. 3-55

3.33.2 Floating-Point Instructions.............................................................. 3-56

4. Exceptions

4.1 Exception Processing....................................................................................... 4-1

4.2 Exception Types .............................................................................................. 4-2

4.2.1 Reset.................................................................................................... 4-2

4.2.2 Range, Pointer, Frame and Privilege Error ......................................... 4-2

4.2.3 Extended Overflow.............................................................................. 4-3

4.2.4 Parity Error.......................................................................................... 4-3

4.2.5 Interrupt............................................................................................... 4-3

4.2.6 Trace Exception................................................................................... 4-4

4.3 Exception Backtracking................................................................................... 4-4

5. Timer and CPU clock Modes

5.1 Overview.......................................................................................................... 5-1

5.1.1 Timer Prescaler Register TPR............................................................. 5-1

5.1.2 Timer Register TR............................................................................... 5-2

5.1.3 Timer Compare Register TCR ............................................................ 5-3

5.1.4 Power-Down Mode ............................................................................. 5-3

5.1.5 Additional Power Saving..................................................................... 5-4

5.1.6 Sleep Mode.......................................................................................... 5-5

iv TABLE of Contents

6. Bus Interface

6.1 Bus Control General.........................................................................................6-1

6.1.1 Boot Width Selection...........................................................................6-2

6.1.2 SRAM and ROM Bus Access..............................................................6-2

6.1.3 DRAM Bus Access..............................................................................6-3

6.1.3.1 DRAM Row Address Bits Multiplexing.............................6-4

6.2 I/O Bus Control ................................................................................................6-5

6.2.1 I/O Bus Control....................................................................................6-6

6.3 Bus Control Register BCR ...............................................................................6-7

6.4 Memory Control Register MCR.....................................................................6-11

6.4.1 MEMx Parity Disable ........................................................................6-13

6.4.2 MEMx Wait Disable..........................................................................6-13

6.4.3 MEMx Byte Mode.............................................................................6-13

6.4.4 Power Down.......................................................................................6-13

6.4.5 IRAM Refresh Test............................................................................6-14

6.4.6 IRAM Refresh Rate ...........................................................................6-14

6.4.7 DRAM Type ......................................................................................6-14

6.4.8 Entry Table Map ................................................................................6-14

6.4.10 MEMx Bus Size...............................................................................6-14

6.5 Input Status Register ISR ...............................................................................6-15

6.6 Function Control Register FCR......................................................................6-16

6.7 Watchdog Compare Register WCR................................................................6-18

6.8 IO3 Control Modes.........................................................................................6-18

6.8.1 IO3Standard Mode.............................................................................6-18

6.8.2 Watchdog Mode.................................................................................6-18

6.8.3 IO3Timing Mode ...............................................................................6-19

6.8.4 IO3TimerInterrupt Mode ...................................................................6-19

6.9 Bus Signals.....................................................................................................6-20

6.9.1 Bus Signals for the GMS30C2232 Processor....................................6-20

6.9.2 Bus Signals for the GMS30C2216 Processor....................................6-21

6.9.3 Bus Signal Description.......................................................................6-22

6.10 Bus Cycles....................................................................................................6-27

6.10.1 MEMx Byte Mode =1......................................................................6-27

6.10.1.1 SRAM and ROM Single-Cycle Read Access.................6-27

6.10.1.2 SRAM and ROM Single-Cycle Write Access................6-27

6.10.1.3 SRAM and ROM Multi-Cycle Read Access..................6-28

6.10.1.4 SRAM Multi-Cycle Write Access..................................6-28

6.10.2 MEMx Byte Mode =0......................................................................6-29

6.10.2.1 SRAM Single-Cycle Read Access..................................6-29

6.10.2.2 SRAM Single-Cycle Write Access.................................6-29

6.10.2.3 SRAM Multi-Cycle Read Access...................................6-30

Table of Contents v

6.10.2.4 SRAM Multi-Cycle Write Access ................................. 6-30

6.10.3 MEM2 Read Access with WAIT Pin.............................................. 6-31

6.10.4 I/O Read Access.............................................................................. 6-32

6.10.5 I/O Read Access with WAIT Pin.................................................... 6-33

6.10.6 I/O Write Access............................................................................. 6-34

6.10.7 DRAM.............................................................................................6-35

6.10.7.1 Fast Page Mode DRAM Access..................................... 6-35

6.10.7.2 EDO DRAM Single-Cycle Access................................. 6-36

6.10.7.3 EDO DRAM Multi-Cycle Access.................................. 6-37

6.10.7.4 DRAM Refresh(CAS Before RAS Refresh.................... 6-38

6.10 DC Characteristics.......................................................................................6-39

7. Mechanical Data

7.1 GMS30C2232, 160-Pin MQFP-Package....................................................... 7-1

7.1.1 Pin Configuration - View from Top Side............................................ 7-1

7.1.2 Pin Cross Reference by Pin Name ...................................................... 7-2

7.1.3 Pin Cross Reference by Location........................................................ 7-3

7.2 GMS30C2232, 144-Pin TQFP-Package........................................................ 7-4

7.2.1 Pin Configuration - View from Top Side............................................ 7-4

7.2.2 Pin Cross Reference by Pin Name ...................................................... 7-5

7.2.3 Pin Cross Reference by Location........................................................ 7-6

7.3 GMS30C2216, 100-Pin TQFP-Package........................................................ 7-7

7.3.1 Pin Configuration - View from Top Side............................................ 7-7

7.3.2 Pin Cross Reference by Pin Name ...................................................... 7-8

7.3.3 Pin Cross Reference by Location........................................................ 7-9

7.4 Package-Dimensions...................................................................................... 7-10

Appendix. Instruction Set Detail

Overview 0-1

0. Overview

0.1 GMS30C2216/32 RISC/DSP

The HME GMS30C2232 and GMS30C2216 RISC/DSP is an improved version of HME’s existing GMS30C2132 and GMS30C2116 RISC/DSP. Using a 0.35 µm CMOS technology, the performance of the RISC/DSP could be further improved. Being pincompatible to their predecessors, these new RISC/DSP can be used as a direct replacement in existing customer’s designs.

The GMS30C2216 and GMS30C2232 RISC/DSP are based on hyperstone architecture.

Improved Points

• Maximum Operating Frequency : 108MHz @3.3V

• Operating Voltage : 3. 3V ± 0.3V

• 8KByte on-chip memory

• On chip Phased Locked Loop circuit (x0.5, x1, x2, x4)

• Boot bus width selectable by two external pins

• Wait Pin Function

• On chip DRAM controller : FPM(Fast-Page-Mode), (Extended-Data-Out) EDO DRAMs.

• 5.0V Tolerant Input

• Control CLKOUT pin Function

This combination of a high-performance RISC microprocessor with an additional powerful DSP instruction set and on-chip microcontroller functions offers a high throughput. The speed is obtained by an optimized combination of the following features:

• Pipelined memory access allows overlapping of memory accesses with execution.

• 8KByte on-chip memory.

• On-chip instruction cache omits instruction fetch in inner loops and provides prefetch.

• Variable-length instructions of 16, 32 or 48 bits provide a large, powerful instruction set,

thereby reducing the number of instructions to be executed.

• Primarily used 16-bit instructions halve the memory bandwidth required for instruction

fetch in comparison to conventional RISC architectures with fixed-length 32-bit instructions, yielding also even better code economy than conventional CISC architectures.

• Orthogonal instruction set

• Most instructions execute in one cycle.

• Pipelined DSP instructions.

• Parallel execution of ALU and DSP instructions.

• Single-cycle halfword multiply-accumulate operation.

• Fast Call and Return by parameter passing via registers.

0-2 CHAPTER 0

• An instruction pipeline depth of only two stages — decode/execute — provides

branching without insertion of wait cycles in combination with Delayed Branch instructions.

• Range and pointer checks are performed without speed penalty, thus, these checks need

no longer be turned off, thereby providing higher runtime reliability.

• Separate address and data buses provide a throughput of one 32-bit word each cycle. The features noted above contribute to reduce the number of idle wait cycles to a bare

minimum. The processor is designed to sustain its execution rate with a standard DRAM memory.

The low power consumption is of advantage for mobile (portable) applications or in temperature-sensitive environments.

Most of the transistors are used for the on-chip memory, the instruction cache, the register stack and the multiplier, whereas only a smallnumber is required for the control logic.

Due to their low system cost, the GMS30C2216 and GMS3OC2232 RISC/DSP are very well suited for embedded-systems applications requiring high performance and lowest cost. To simplify board design as well as to reduce system costs, the GMS30C2216 and GMS30C2232 already come with integrated periphery, such as a timer and memory and bus control logic. Therefore, complete systems with the HME’s microprocessor can be implemented with a minimum of external components. To connect any kind of memory or I/O, no glue logic is necessary. It is even suitable for systems where up to now microprocessors with 16-bit architecture have been used for cost reasons. Its improved performance compared to conventional microcontrollers can be used to software-substitute many external peripherals like graphics controllers or DSPs.

The software development tools include an optimizing C compiler, assembler, source-level debugger with profiler as well as a real-time kernel with an extremely fast response time. Using this real-time kernel, up to 31 tasks, each with its own virtual timer, can be developed independently of each other. The synchronization of these tasks is effected almost automatically by the real-time kernel. To the developer, it seems as if he has up to 31 HME’s microprocessors to which he can allocate his programs accordingly. Real-time debugging of multiple tasks is assisted in an optimized way.

The following description gives a brief architectural overview:

Compatibility:

• Pin compatible to HME GMS30C2116/32, and hyperstone E1-16/32

• Pin and Function Compatible to hyperstone E1-16/32X

PLL(Phased Locked Loop):

• An internal phased locked loop circuit (PLL) provides clock rate multiplication by a

factor of four, only an external crystal of 27MHz is required to achieve an internal clock rate of 108MHz.

Overview 0-3

Registers:

• 32 global and 64 local registers of 32 bits each

• 16 global and up to 16 local registers are addressable directly

Flags:

• Zero(Z), negative(N), carry(C) and overflow(V) flag

• Interrupt-mode, interrupt-lock, trace-mode, trace-pending, supervisor state, cache-mode

and high global flag

• Unsigned integer, signed integer, signed short, signed complex short, 16-bit fixed-point,

bitstring, IEEE-754 floating-point, each either 32 or 64 bits

External Memory:

• Address space of 4Gbytes, divided into five areas

• Separate I/O address space

• Load/Store architecture

• Pipelined memory and I/O accesses

• High-order data located and addressed at lower address (big endian)

• Instructions and double-word data may cross DRAM page boundaries

On-chip Memory:

• 8Kbytes internal (on-chip) memory

Memory Data Types:

• Unsigned and signed byte (8 bit)

• Unsigned and signed halfword (16 bit), located on halfword boundary

• Undedicated word (32 bit), located on word boundary

• Undedicated double-word (64 bit), located on word boundary

Runtime Stack:

• Runtime stack is divided into memory part and register part

• Register part is implemented by the 64 local registers holding the most recent stack

frame(s)

• Current stack frame (maximum 16 registers) is always kept in register part of the stack

• Data transfer between memory and register part of the stack is automatic

• Upper stack bound is guarded

0-4 CHAPTER 0

Instruction Cache:

• An on-chip instruction cache reduces instruction memory access substantially

Instructions General:

• Variable-length instructions of one, two or three halfwords halve required memory

bandwidth

• Pipeline depth of only two stages, assures immediate refill after branches

• Register instructions of type "source operator destination ⇒ destination" or "source operator immediate ⇒ destination"

• All register bits participate in an operation

• Immediate operands of 5, 16 and 32 bits, zero- or sign-expanded

• Large address displacement of up to 28 bits

• Two sets of signed arithmetical instructions: instructions set or clear either only the

overflow flag or trap additionally to a Range Error routine on overflow

• DSP instructions operate on 16-bit integer, real and complex fixed-point data and 32-bit integer data into 32-bit and 64-bit hardware accumulators

Instruction Summary:

• Memory instructions pipelined to a depth of two stages, trap on address register equal to zero (check for invalid pointers)

• Memory address modes: register address, register postincrement, register + displacement (including PC relative), register postincrement by displacement (next address), absolute, stack address, I/O absolute and I/O displacement

• Load, all data types, bytes and halfwords right adjusted and zero- or sign-expanded, execution proceeds after Load until data is needed

• Store, all data types, trap when range of signed byte or halfword is exceeded

• Move, Move immediate, Move double-word

• Logical instructions AND, AND not, OR, XOR, NOT, AND not immediate, OR

immediate, XOR immediate

• Mask source and immediate ⇒ destination

• Add unsigned/signed, Add signed with trap on overflow, Add with carry

• Add unsigned/signed immediate, Add signed immediate with trap on overflow

• Sum source + immediate ⇒ destination, unsigned/signed and signed with trap on

overflow

• Subtract unsigned/signed, Subtract signed with trap on overflow, Subtract with carry

• Negate unsigned/signed, Negate signed with trap on overflow

• Multiply word * word ⇒ low-order word unsigned or signed, Multiply word * word ⇒

double-word unsigned and signed

Overview 0-5

• Divide double-word by word ⇒ quotient and remainder, unsigned and signed

• Shift left unsigned/signed, single and double-word, by constant and by content of

• Shift right unsigned and signed, single and double-word, by constant and by content of

• Rotate left single word by content of register

• Index Move, move an index value scaled by 1, 2, 4 or 8, optionally with bounds check

• Check a value for an upper bound specified in a register or check for zero

• Compare unsigned/signed, Compare unsigned/signed immediate

• Compare bits, Compare bits immediate, Compare any byte zero

• Test number of leading zeros

• Set Conditional, save conditions in a register

• Branch unconditional and conditional (12 conditions)

• Delayed Branch unconditional and conditional (12 conditions)

• Call subprogram, unconditional and on overflow

• Trap to supervisor subprogram, unconditional and conditional (11 conditions)

• Frame, structure a new stack frame, include parameters in frame addressing, set frame

length, restore reserve frame length and check for upper stack bound

• Return from subprogram, restore program counter, status register and return-frame

• Software instruction, call an associated subprogram and pass a source operand and the

address of a destination operand to it

• DSP Multiply instructions: signed and/or unsigned multiplication ⇒ single and double word product

• DSP Multiply-Accumulate instructions: signed multiply-add and multiply-subtract ⇒ single and double word product sum and difference

• DSP Halfword Multiply-Accumulate instructions: signed multiply-add operating on four halfword operands ⇒ single and double word product sum

• DSP Complex Halfword Multiply instruction: signed complex halfword multiplication ⇒ real and imaginary single word product

• DSP Complex Halfword Multiply-Accumulate instruction: signed complex halfword multiply-add ⇒ real and imaginary single word product sum

0-6 CHAPTER 0

• DSP Add and Subtract instructions: signed halfword add and subtract with and without fixed-point adjustment ⇒ single word sum and difference

• Floating-point instructions are architecturally fully integrated, they are executed as Software instructions by the present version. Floating-point Add, Subtract, Multiply, Divide, Compare and Compare unordered for single and double-precision, and Convert

single ⇔ double are provided.

Exceptions:

• Pointer, Privilege, Frame and Range Error, Extended Overflow, Parity Error, Interrupt and Trace mode exception

• Watchdog function

• Error-causing instructions can be identified by backtracking, thus allowing a very

detailed error analysis

Timer:

• Two multifunctional timers

Bus Interface:

• Separate address bus of 26 (GMS30C2232) or 22 (GMS30C2216) bits and data bus of up to 32 (GMS30C2232) or 16 bits (GMS30C2216) provide a throughput of four or two bytes at each clock cycle

• Data bus width of 32, 16 or 8 bits, individually selectable for each external memory area.

• 8-bit, 16-bit, and 32-bit boot width selectable via two external pins.

• 5V tolerant input

• Configurable I/O pins

• Internal generation of all memory and I/O control signals

• Wait pin function for I/O accesses to peripheral devices.

• Wait pin function for memory accesses to address space MEM2.

• On-chip DRAM controller supporting Fast-Page-Mode DRAMs and EDO DRAMs.

• Up to seven vectored interrupts

• Control function for CLKOUT pin.

Power Management:

• Operating voltage : 3.3V ± 0.3V.

• Lower power supply current in power-down mode.

• Clock-Off function to further reduce power dissipation (Sleep Mode)

Overview 0-7

DataBus Parity

Bus Interface

Control Unit

Bus Pipeline

Control

8 kByte

RAM

Execution

(22)

64 Local

26 Global

Y-Decode

0.2 Block Diagram

X Y PC

X Y

ALU

Barrel shifter

Z W A

X-Decode

Instruction

Load

Instruction

Cache

Decode

Cache

Control

Instruction

Decode

X Y

DSP

Instruction

Control Unit

Execution

Unit

Hardware-

Multiplier

Instruction Prefetch

Control Unit

Store Data

Pipeline

(16)

(2)

Figure 0.1: Block Diagram

Address

Bus

Memory Address

Pipeline

Watchdog

Power

Down+

Reset

Control

Internal

Timer

Interrupt

control

Control

Bus

0-8 CHAPTER 0

213456789

101112131415161718192021222324

108

107

106

105

104

103

102

101

10099989695

949392919089888786

VCC

GND

IO3

IOWR#

CS3#

CS2#

CS1#

GND

RAS#

A19

VCC

A20

A21

GND

D31

D30

D29

A10

A11

A12

VCC

D28

D27

D26

GND

WE2# /BE2#

IORD#

OE#

VCC

CAS3#

CAS2#

CAS1#

GND

XTAL1/CLKIN

XTAL2

IO2

VCC

D16

D17

D18A3A2A1A0

GND

DP1

DP0

838281

BOOTW

CLKOUT

IO1

GND

RQST

INT4

INT3 /WAIT

INT2

INT1

GND

VCC

2627282930313233343536

GND

D25

D15

D14

VCC

D13

D12

D11

D10

GND

VCC

WE3# /BE3#NCNCNCNC

109

110

111

112

113

114

115

116

117

118

119

120

37383940NCNCNC

0.3 Pin Configuration

0.3.1 GMS30C2232, 160-Pin MQFP-Package - View from Top Side

VCC

GND

NC WE# GND

A13

ACT

VCC

GND

A14

CAS0#

VCC

WE1#/BE1# WE0#/BE0#

GND

A4 A5 A6

VCC

A7 A8

A22

VCC

GND

A23 A24

GND

A25 A15 A16

VCC

GND

A17 A18

BOOTB

NC GND

VCC

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160

GMS30C2232

VCC

GND

79 78

NC VCC

GRANT#

RESET#

GND

VCC

72 71

DP3

DP2

D19

GND D20

67 66

D21 GND

65 64

VCC

D1 D2

VCC

GND

D22

D23

VCC

53 52

D24

GND

VCC

GND

VCC

Figure 0.2: GMS30C2232, 160-Pin MQFP-Package

Overview 0-9

0.3.2 Pin Cross Reference by Pin Name

Signal Location Signal Location Signal Location Signal Location

A0...................97 D3 .....................59 GND ................. 50 NC................... 118

A1...................98 D4 .....................58 GND ................. 56 NC...................123

A2...................99 D5 .....................57 GND ................. 65 NC...................124

A3.................100 D6 .....................51 GND ................. 68 NC...................157

A4.................137 D7 .....................48 GND ................. 73 NC...................158

A5.................138 D8 .....................47 GND ................. 79 OE #................ 113

A6.................139 D9 .....................45 GND ................. 82 RAS# ................ 11

A7.................141 D10 ...................36 GND ................. 90 RESET#............74

A8.................142 D11....................35 GND ................. 96 RQST................89

A9...................20 D12 ...................34 GND ............... 108 VCC....................1

A10.................21 D13 ...................33 GND ................119 VCC..................13

A11.................22 D14 ...................31 GND ............... 122 VCC..................24

A12.................23 D15 ...................30 GND ............... 126 VCC..................32

A13...............127 D16 .................103 GND ............... 130 VCC..................40

A14...............131 D17 .................102 GND ............... 136 VCC..................41

A15...............150 D18 .................101 GND ............... 145 VCC..................49

A16...............151 D19 ...................69 GND ............... 148 VCC..................53

A17...............154 D20 ...................67 GND ............... 153 VCC..................60

A18...............155 D21 ...................66 GND ............... 159 VCC..................64

A19.................12 D22 ...................55 GRANT#........... 75 VCC ..................72

A20.................14 D23 ...................54 INT1.................. 85 VCC..................76

A21.................15 D24 ...................52 INT2.................. 86 VCC..................80

A22...............143 D25 ...................29 INT3/WAIT........ 87 VCC ..................81

A23...............146 D26 ...................27 INT4.................. 88 VCC................104

A24...............147 D27 ...................26 IO1.................... 91 VCC................ 112

A25...............149 D28 ...................25 IO2.................. 105 VCC................120

ACT..............128 D29 ...................19 IO3...................... 5 VCC................121

BOOTB.........156 D30 ...................18 IORD# .............114 VCC ................129

BOOTW..........93 D31 ...................17 IOWR#................ 6 VCC ................133

CAS0#.......... 132 DP0...................94 NC ...................... 3 VCC ................140

CAS1#.......... 109 DP1...................95 NC ...................... 4 VCC ................144

CAS2#.......... 110 DP2...................70 NC .................... 37 VCC ................152

CSS3#...........111 DP3...................71 NC.................... 38 VCC................160

CLKOUT.........92 GND....................2 NC .................... 43 WE#................125

CS1#................9 GND..................10 NC .................... 44 WE0#/BE0#....135

CS2#.................8 GND..................16 NC .................... 77 WE1#/BE1#....134

CS3#................7 GND..................28 NC .................... 78 WE2#/BE2#.... 115

D0...................63 GND ..................39 NC.................... 83 WE3#/BE3#....116

D1...................62 GND ..................42 NC.................... 84 XTAL1/CLKIN .107

D2...................61 GND ..................46 NC...................117 XTAL2.............106

0-10 CHAPTER 0

Power. Connected to the power supply. It can be 3.3V power

Ground. Connected to the system ground. All GND pins must

Input for Quartz Clock. When the clock is generated by

Address Bus. With the GMS30C2232, only A22..A0 are

Row Address Strobe. RAS# is activated when the processor accesses a DRAM or refresh cycle. When a SRAM is placed in

M for

Write Enable. Active low indicates a write access, active high

Chip Select. Active low of CS1#..CS3# indicates chip select

SRAM Write Enable. Active low indicates write enable for the

I/O Read Strobe, optionally I/O Data Strobe. The use of

Bus Grant. GRANT# is signaled low by an bus arbiter to grant

master. ACT is signaled high when GRANT# is

Interrupt Request A signal of INT1..INT4 interrupt request pins causes an interrupt exception when interrupt lock flag L is

Output Port. IO1..IO3 can be individually

configured via IOxDirection bits in the FCR as either input or

T# low resets the processor to the initial

state and halts all activity. RESET# must be low for at least

0.3.3 Pin Function

Type Name State Use

Power VCC I

GND I

Clock XTAL1 I

XTAL2 O Output for Quartz Clock. CLKOUT O Clock Signal Output. It can be used to supply a clock signal

Address Bus A25..A0 O/Z

Data Bus D31..D0 I/O Data Bus. 32-bit bidirectional data bus

DP0..DP3 I/O Data Parity Signal. Bidirectional parity signals

Bus Control RAS# O/Z

supply.

be connected to the system ground.

external clock generator, XTAL1 is used as clock input.

peripheral devices.

connected to the address bus pins

CAS0#..CAS3# O/Z Column Address Strobe. They are only used by a DRA

WE# O/Z

CS1#..CS3# O/Z

WE0#..WE3#

OE# O/Z Output Enable for SRAMs and EPROMs. IORD# O/Z

IOWR# O/Z I/O Write Strobe.

Bus Control RQST O RQST signals the request for a memory or I/O access

GRANT# I

ACT O Active as bus

Interrupt INT1..INT4 I

O/Z

MEM0, RAS# is used as the chip select signal

column access cylices and for “CAS before RAS” refresh.

indicates a read access.

for the memory areas MEM1..MEM3.

corresponding byte.

IORD# is specified in the I/O address bit 10.

access to the bus for memory and I/O cycles

low and it is kept high during a current bus access

I/O Port IO1..IO3 I/O General Input-

System Control

RESET# I Reset Processor. RESE

clear and the corresponding INTxMask bit in FCR is not set.

output pins (port).

two cycles

ARCHITECTURE 1-1

1. Architecture

1.1 Introduction

1.1.1 RISC Architecture

In the early days of computer history, most computer families started with an instruction set which was rather simple. The main reason for being simple then was the high cost for hardware. The hardware cost has dropped and the software cost has gone up steadily in the past three decades.

The net result is that more and more functions have been built into the hardware, making the instruction set very large and very complex. The growth of instruction sets was also encouraged by the popularity of microprogrammed control in the 1960s and 1970s. Even user-defined instruction sets were implemented using microcodes in some processors for special-purpose applications.

The evolution of computer architectures has been dominated by families of increasingly complex processors. Under market pressures to preserve existing software, Complex Instruction Set Computer (CISC) architectures evolved by the gradual addition of microcode and increasingly elaborate operations. The intent was to supply more support for high-level languages and operating systems, as semiconductor advances made it possible to fabricate more complex integrated circuits. It seemed self-evident that architectures should become more complex as these technological advances made it possible to hold more complexity on VLSI devices.

In recent years, however, Reduced Instruction Set Computer (RISC) architectures have implemented a much more sophisticated handling of the complex interaction between hardware, firmware and software. RISC concepts emerged from statistical analysis of how software actually uses the resources of a processor. Dynamic measurement of system kernels and object modules generated by optimizing compilers show an overwhelming predominance of the simplest instruction, even in the code for CISC machine. Complex instructions are often ignored because a single way of performing a complex operation needs of high-level language and system environments. RISC designs eliminate the microcoded routines and turn the low-level control of the machine over to software.

This approach is not new. But its application is more universal in recent years thanks to the prevalence of high-level languages, the development of compilers that can optimize at the microcode level, and dramatic advances in semiconductor memory and packaging. It is now feasible to replace machine microcode ROM with faster RAM, organized as an instruction cache. Machine control then resides in the instruction cache and is, in fact, customized on the fly. The instruction stream generated by system- and compiler-generated code provides a precise fit between the requirements of high-level software and the capabilities of the hardware. So compilers are playing a vital role in RISC performance.

The advantage of RISC architecture is described as follows:

• Simplicity made VLSI implementation possible and thus higher clock rates.

• Hardwired control and separated data and program caches lower the average CPI

(Cycles per Instruction) significantly.

1-2 CHAPTER 1

• Dynamic instruction count in a RISC program only increased slightly (less than 2)

inordinary program.

• Recently, the MIPS (Million Instructions per Second) rate of a typical RISC

microprocessor increased with a factor of 5/(2*0.1) = 25 times from that of a typical CISC microprocessor.

• The clock rate increased from 10 MHz on a CISC processor to 50 MHz on a CMOS/

RISC microprocessor.

• The instruction count in a typical RISC program increased less than 2 times form that of

a typical CISC program.

• The average CPI for a RISC microprocessor decreased to 1.2 (instead of 12 as in a

typical CISC processor).

1.1.2 Techniques to reduce CPI (Cycles per Instruction)

If the work each instruction performs is simple and straightforward, the time required to execute each instruction can be shortened and the number of cycles reduced. The goal of RISC designs has been to achieve an execution rate of one instruction per machine cycle (multiple-instruction-issue designs now seek to increase this rate to more than one instruction per cycle). Techniques that help achieve this goal include:

• Instruction pipelines

• Load and store (load/store) architecture

• Delayed load instructions

• Delayed branch instructions

(1) Instruction Pipelines One way to reduce the number of cycles required to execute an instruction is to overlap the

execution of multiple instructions. Instruction pipelines divide the execution of each instruction into several discrete portions and then execute multiple instructions simultaneously. The instruction pipeline technique can be likened to an assembled line the instruction progresses from one specialized stage to the next until it is complete (or issued) - just as an automobile moves along an assembly line. (This is contrast to the nonpipeline, microcode approach, where all the work is done by one general unit and is less capable at each individual task.) For example, the execution of an instruction might be subdivided into four portions, or clock cycles, as shown in Figure 1.1:

Cycle

Fetch

Instruction

(F)

Cycle

ALU

Operation

(A)

Cycle

Access

Memory

(M)

Cycle

Write

Results

(W)

Figure 1.1: Functional Division of a Hypothetical Pipeline

ARCHITECTURE 1-3

An Instruction pipeline can potentially reduce the number of cycles/instructions by a factor equal to the depth of the pipeline (the depth of the pipeline = the number of resource). For example, in Figure 3.2 each instruction still requires a total of four clock cycles to execute. However, if the four-level instruction-pipeline is used, a new instruction can be initiated at each clock cycle and the effective execution rate is one cycle per instruction.

Clock Cycles

Instruction

F A M W

Figure 1.2: Multiple Instructions in a Hypothetical Pipeline

(2) Load/Store Architecture The discussion of the instruction pipeline illustrates how each instruction can be

subdivided into several discrete parts that permit the processor to execute multiple instructions in parallel. For this technique to work efficiently, the time required to execute each instruction subpart should be approximately equal. If one part requires an excessive length of time, there is an unpleasant choice: either halting the pipeline (inserting wait or idle cycles), or making all cycles longer to accommodate this lengthier portion of the instruction.

Instructions that perform operations on operands in memory tend to increase either the cycle time or the number of cycles/instruction. Such instruction require additional time for execution to calculate the addresses of the operands, read the required operands from memory, calculate the result, and store the results of the operation back to memory. To eliminate the negative impact of such instruction, RISC designs implement a load and store (load/store) architecture in which the processor has many register, all operations are performed on operands held in processor registers, and main memory is accessed only by

load and store instructions.

This approach produces several benefits

• Reducing the number of memory accesses eases memory bandwidth requirements

• Limiting all operations to registers helps simplicity the instruction set

• Eliminating memory operations makes it easier for compilers to optimize register

allocation - this further reduces memory accesses and also reduces the instructions/task factor

1-4 CHAPTER 1

All of these factors help RISC design approach their goal of executing one cycle/instruction. However, two classes of instructions hinder achievement of this goal load instructions and branch instructions. The following sections discuss how RISC designs overcome obstacles raised by these classes of instructions.

(3) Delayed Load Instructions Load instruction read operands from memory into processor register for subsequent

operation by other instructions. Because memory typically operates at much slower speeds than processor clock rates, the loaded operand is not immediately available to subsequent instructions in an instruction pipeline. The data dependency is illustrated in Figure 1.3.

Load

Instruction

Figure 1.3: Data Dependency Resulting From a Load Instruction

F A M W

Data from Load

available as operation

In this illustration, the operand loaded by instruction 1 is not available for use in the A cycle (ALU, or Arithmetic/Logic Unit operation) of instruction 2. One way to handle this dependency is to delay the pipeline by inserting additional clock cycles into the execution of instruction 2 until the loaded data becomes available. This approach obviously introduces delays that would increase the cycles/instructions factor.

In many RISC design the technique used to handle this data dependency is to recognize and make visible to compilers the fact that all load instructions have an inherent latency or load delay. Figure 3.3 illustrates a load delay or latency of one instruction. The instruction that immediately follows the load is in the load delay slot. If the instruction in this slot does not require the data from the load, and then no pipeline delay is required.

If this load delay is made visible to software, a compiler can arrange instructions to ensure that there is no data dependency a load instruction and the instruction in the load delay slot. The simplest way of ensuring that there is no data dependency is to insert a No Operation (NOP) instruction to fill the slot, as follow:

Load R1, A Load R2, B NOP <= This instruction fills the delay slot ADD R3, R1, R2

Although filling the delay slot with NOP instructions eliminates the need for hardwarecontrolled pipeline stalls in this case, it still is not a very efficient use of the pipeline stream

ARCHITECTURE 1-5

since these additional NOP instructions increase code size and perform no useful work. (In practice, however, this technique need not have much negative impact on performance.)

A more effective solution to handling the data dependency is to fill the load delay slot with a useful instruction. Good optimizing compilers can usually accomplish this, especially if the load delay is only one instruction. Below example program illustrates how a compiler might rearrange instruction to handle a potential data dependency.

# Consider the code for C := A+B; F := D Load R1, A Load R2, B Add R2, R1, R2 <= This instruction stalls because R2 data is not available Load R4, D

..... ....

# An alternative code sequence (where delay length = 1) Load R1, A Load R2, B Load R4, D Add R3, R1, R2 <= No stall since R2 data is available

(4) Delayed Branch Instructions Branch instructions usually delay the instruction pipeline because the processor must

calculate the effective destination of the branch and fetch that instruction. When a cache access requires an entire cycle, and the fetched branch instruction specifies the target address, it is impossible to perform this fetch (of the destination instruction) without delaying the pipeline for at least one pipe stage (one cycle). Conditional branches can cause further delays because they require the calculation of a condition, as well as the target address.

Instead of stalling the instruction pipeline to wait for the instruction at the target address, RISC designs typically use an approach similar to that used with Load instruction: Branch instructions are delayed and do not take effect until after one or more instructions immediately following the Branch instruction have been executed. The instruction or instructions immediately following the Branch instruction (delay instruction) have been executed. Branch and delayed branch instruction are illustrated in Figure 1.4

Condition ?

Delayed Branch

Condition ?

YES

Branch Target

Delay Instruction

Next Instruction

Branch Instruction Delayed Branch Instruction

Figure 1.4: Block Diagram of Branch/Delayed Branch Instruction

YES

Branch Target

1-6 CHAPTER 1

1. The instruction is read from the instruction cache

The control signal of Rd (destination operand) and Rs

(source operand) is activated according to the instruction

2.1 The control signal of IR (immediate register

ess of next instruction is calculated and saved

ister stack using the

2.1 The control of ALU datapath is made and instruction

in the register

Additional ALU operation is continued and its result is

1.1.3 The pipeline structure of GMS30C2232

GMS30C2232 has a two-stage pipeline structure and each stage is composed of two phases (TM and TV). The basic structure of GMS30C2232 pipeline is two-stage pipeline, but actually it is lengthened by the need of some instruction. As a example, standard ALU instruction uses 5 phases (2 stage pipeline (4 phases) + additional 1 phase). This additional phase doesn’t use the datapath which is used next instruction, so next instruction execution need not wait until previous ALU instruction is ended. DSP instruction takes over 2 stage pipeline for execution, and requires same resource in the datapath which is required to next DSP instruction. So next DSP instruction is delayed.

The pipeline structure of GMS30C2232 and the action of datapath is described in Table 1.1.

Stage Phase Datapath Action

Fetch/Decode TM (Low)

TV (High) 2.

according to the address of instruction.

that was loaded in TM phase

(operand)) and IL (instruction length) is activated.

2.2 The addr in PC

Execute/Write TM (Low) 1. The next instruction is read from the instruction cache.

1.1 The address of Rs and Rs are determined.

1.2 The immediate operand is determined.

1.3 The operand is read from reg address of Rs and Rd.

1.4 The operand XR, YR and QR are controlled.

TV (High) 2. The input data of ALU is attained.

is executed in ALU.

2.2 The result of ALU operation is saved file.

Additional

Insertion

Next TM

saved in the register file.

Table 1.1: The pipeline structure of GMS30C2232 and the action of datapath.

ARCHITECTURE 1-7

1.2 Global Register Set

The architecture provides 32 global registers of 32bit each. These are:

G0 Program Counter PC G1 Status Register SR G2 Floating-point Exception Register FER G3..G15 General purpose registers G16..G17 Reserved G18 Stack Pointer SP G19 Upper stack Bound UB G20 Bus Control Register BCR (see section 6. Bus Interface) G21 Timer Prescaler Register TPR (see section 5. Timer and CPU Clock

Modes)

G22 Timer Compare Register TCR (see section 5. Timer and CPU Clock

Modes)

G23 Timer Register TR (see section 5. Timer and CPU Clock Modes) G24 Watchdog Compare Register WCR (see section 6. Bus Interface) G25 Input Status Register ISR (see section 6. Bus Interface) G26 Function Control Register FCR (see section 6. Bus Interface) G27 Memory Control Register MCR (see section 6. Bus Interface) G28..G31 Reserved

Registers G0..G15 can be addressed directly by the register code (0..15) of an instruction. Registers G18..G27 can be addressed only by a MOV or MOVI instruction with the high global flag H set to 1.

(Example) MOVI G2, 0x20 ; G2 := 0x20 (set H flag) MOV G3, G19 ; G3 := G19 (G19 (UB) is copied to G3)

1-8 CHAPTER 1

G15

G16

G17

G18

G19

G20

G21

G22

G23

G24

G25

G26

G27

G28

031

G0 G1 G2 G3

Program Counter PC

Status Register SR

Floating-Point Exception Register FER

General Purpose Registers G3..G15

Reserved Reserved

Stack Pointer SP

Upper Stack Bound UB

Bus Control Register BCR

Timer Prescaler Register TPR

Timer Compare Register TCR

Timer Register TR

Watchdog Compare Register WCR

Input Status Register ISR

0000

Function Control Register FCR Memory Control Register MCR

G28..G31 Reserved

G31

Figure 1.5: Global Register Set

1.2.1 Program Counter PC, G0

G0 is the program counter PC. It is updated to the address of the next instruction through instruction execution. Besides this implicit updating, the PC can also be addressed like a regular source or destination register. When the PC is referenced as an operand, the supplied value is the address of the first byte after the instruction which references it (the address of next instruction), except when referenced by a delay instruction with a preceding delayed branch taken. At delay branch instruction, when the branch condition is met, place the branch address PC + rel (relative to the address of the first byte after the Delayed Branch Instruction) in the PC (see section 1.26. Delayed Branch Instructions).

Placing a result in the PC has the effect of a branch taken. When branch is taken, the target address of branch is placed in PC.

Bit zero of the PC is always zero, regardless of any value placed in the PC.

ARCHITECTURE 1-9

1.2.2 Status Register SR, G1

G1 is the status register SR. Its content is updated by instruction execution. Besides this implicit updating, the SR can also be addressed like a regular register (when H flag is set). When addressed as source or destination operand, all 32 bits are used as an operand. However, only bits 15..0 of a result can be placed in bits 15..0 of the SR, bits 31..16 of the result are discarded and bits 31..16 of the SR remain unchanged. When SR addressed as source operand, it represents 0x0 value. The full content of the SR is replaced only by the Return Instruction. A result placed in the SR overrules any setting or clearing of the condition flags as a result of an instruction.

31 30 27 26 25 24 23 22 21 20 19 18 17 16

Figure 1.6: Status Register SR (bits 31..16)

15 14 11 10 9 8 7 6 5 4 3 2 1 0

L I

FRM

2829 FP

Frame Pointer Frame Length

1213

FTE V N

FL S

ILC

Instruction-Length Code

Supervisor State Flag

Trace-Mode Flag

Trace Pending Flag

Carry Flag

Zero Flag

Interrupt-Mode Flag

Floating-Point Trap Enable

Floating-Point Rounding Mode

Interrupt-Lock Flag

Figure 1.7: Status Register SR (bits 15..0)

Negative Flag

Overflow Flag

Cache-Mode Flag

High Global Flag

Reserved

1-10 CHAPTER 1

The status register SR contains the following status information: C Carry Flag. Bit zero is the carry condition flag C. In general, when set it

indicates that the unsigned integer range is exceeded (overflow). At add operations, it indicates a carry out of bit 31 of the result. At subtract operations, it indicates a borrow (inverse carry) into bit 31 of the result.

Z Zero Flag. Bit one is the zero condition flag Z. When set, it indicates that all 32

or 64 result bits are equal to zero regardless of any carry, borrow or overflow.

N Negative Flag. Bit two is the negative condition flag N. On compare

instructions, it indicates the arithmetic correct (true) sign of the result regardless of an overflow. On all other instructions, it is derived from result bit 31, which is the true sign bit when no overflow occurs. In the case of overflow, result bit 31 and N reflect the inverted sign bit.

V Overflow Flag. Bit three is the overflow condition flag V. In general, when set

it indicates a signed overflow. At the Move instructions, it indicates a floatingpoint NaN (Not a Number).

M Cache-Mode Flag. Bit four is the cache-mode flag M. Besides being set or

cleared under program control, it is also automatically cleared by a Frame instruction and by any branch taken except a delayed branch. See section

1.8. Instruction Cache for details.

H High Global Flag. Bit five is the high global flag H. When H is set, denoting

G0..G15 addresses G16..G31 instead. Thus, the registers G18..G27 may be addressed by denoting G2..G11 respectively. The H flag is effective only in the first cycle of the next instruction after it was set; then it is cleared automatically. Only the MOV or MOVI instruction issued as the next instructions must be used to copy the content of a local register or an immediate value to one of the high global registers. The MOV instruction may be used to copy the content of a high global register (except the BCR, TPR, FCR and MCR register, which are write-only) to a local register. With all other instructions, the result may be invalid. If one of the high global registers is addressed as the destination register in user state (S = 0), the condition flags are undefined, the destination register remains unchanged and a trap to Privilege Error occurs.

Reserved Bit six is reserved for future use. It must always be zero. I Interrupt-Mode Flag. Bit seven is the interrupt-mode flag I. It is set

automatically on interrupt entry and reset to its old value by a Return instruction. The I flag is used by the operating system; it must be never changed by any user program.

FTE Floating-Point Trap Enable Flag. Bits 12..8 are the floating-point trap enable

flags They determine the Exception type and Trap execution flow(see section

3.33.2. Floating-Point Instructions).

FRM Floating-Point Rounding Mode. Bits 14..13 are the floating-point rounding

modes (see section 3.33.2. Floating-Point Instructions).

+ 290 hidden pages

HYNIX GMS30C2232, GMS30C2216 Datasheet

Specifications and Main Features

Frequently Asked Questions

User Manual