NSC NS32FX164V-20, NS32FX164V-25, NS32FX164AV-25, N32FX164AVLJ-25 Datasheet

0 (0)

February 1992

NS32FX161-15/NS32FX161-20/NS32FX164-20/ NS32FX164-25/NS32FV16-20/NS32FV16-25 Advanced Imaging/Communication Signal Processors

General Description

The NS32FX164, the NS32FV16 and the NS32FX161 are high-performance 32-bit members of the Series 32000É/ EPTM family of National's Embedded System ProcessorsTM specifically optimized for CCITT Group 2 and Group 3 Facsimile Applications, Data Modems, Voice Mail Systems, Laser Printers, or any combination of the above.

Unless specified otherwise any reference to the NS32FX164 in this document applies to the NS32FV16 and the NS32FX161 as well.

The NS32FX164 can perform all the computations and control functions required for a stand-alone Fax system, a PC add-in Fax/Voice/Data Modem card or a Laser/Fax system.

It also meets the performance requirements to implement 14400, 9600 and 7200 bps modems complying with CCITT V.17, V.29 and V.27 standards. The NS32FV16 supports V.29 and V.27 standards as well as voice. The NS32FX161 supports V.29 and V.27 standards.

The NS32FX164 provides a 16 Mbyte Linear external address space and a 16-bit external data bus.

The CPU core, which is the same as that of the NS32CG16, incorporates a 32-bit ALU and instruction pipeline, and an 8-byte prefetch queue.

Also integrated on-chip with the CPU are a DSP Module (DSPM) and a 4K-byte RAM Array (2K in the NS32FV16 and NS32FX161). The DSPM is a complete processing unit, capable of autonomous operation parallel to the CPU core operation. The DSPM executes programs stored in an internal on-chip Random Access Memory (RAM), and manipulates data stored either in the internal RAM or in an external off-chip memory. To maximize utilization of hardware resources, the DSPM contains a pipelined DSP-oriented datapath, and a control logic that implements a set of DSP vector commands.

The NS32FX164 capabilities can be expanded by using an external floating point unit (FPU) which directly interfaces to the NS32FX164 using the slave protocol. The CPU-FPU cluster features high speed execution of the floating-point instructions.

The NS32FX164 highly-efficient architecture combined with the NS32CG16 graphics instructions and the high-perform- ance vector operation capability, makes the device the ideal choice for PostscriptTM and Fax applications.

Features

YSoftware compatible with the Series 32000/EP processors

YDesigned around the CPU core of the NS32CG16

YPin compatible with the NS32FX16

Y32-bit architecture and implementation

YOn-chip DSP Module for high-speed DSP operations

YSpecial support for graphics applications

Ð18 graphics instructions

ÐBinary compression/expansion capability for font storage using RLL encoding

ÐPattern magnification

ÐInterface to an external BITBLT processing units for fast color BITBLT operations

Y4K-byte on-chip RAM array (2K in NS32FV16 and NS32FX161)

YOn-chip clock generator

YFloating-point support via the NS32081 or NS32181

YOptimal interface to large memory arrays via the NS32CG821 and the DP84xx family of DRAM controllers

YPower save mode

YHigh-speed CMOS technology

Y68-pin PLCC package

Block Diagram

TL/EE/11267 ± 1

FIGURE 1-1. CPU Block Diagram

Series 32000É is a registered trademark of National Semiconductor Corporation.

EPTM and Embedded System ProcessorsTM are trademarks of National Semiconductor Corporation. PostscriptTM is a trademark of Adobe Systems, Inc.

C1995 National Semiconductor Corporation

TL/EE11267

RRD-B30M115/Printed in U. S. A.

Signal Imaging/Communication Advanced	20/NS32FX164-15/NS32FX161-NS32FX161
Processors	20/NS32FX164-
	25-20/NS32FV16-25/NS32FV16-

Table of Contents

1.0PRODUCT INTRODUCTION АААААААААААААААААААААА6

1.1NS32FX164 Special Features АААААААААААААААААААА6

2.0ARCHITECTURAL DESCRIPTION ААААААААААААААААА7

2.1Register Set ААААААААААААААААААААААААААААААААААА7

2.1.1General Purpose Registers ААААААААААААААААА7

2.1.2Address Registers ААААААААААААААААААААААААА8

2.1.3Processor Status Register АААААААААААААААААА8

2.1.4Configuration Register ААААААААААААААААААААА9

2.1.5DSP Module Registers ААААААААААААААААААААА9

2.2Memory Organization АААААААААААААААААААААААААА11

2.2.1Address MappingААААААААААААААААААААААААА12

2.3Modular Software Support АААААААААААААААААААААА12

2.4Instruction Set АААААААААААААААААААААААААААААААА12

2.4.1General Instruction Format АААААААААААААААА12

2.4.2Addressing ModesАААААААААААААААААААААААА14

2.4.3Instruction Set Summary АААААААААААААААААА16

2.5Graphics SupportАААААААААААААААААААААААААААААА20

2.5.1Frame Buffer Addressing АААААААААААААААААА20

2.5.2BITBLT Fundamentals АААААААААААААААААААА20

2.5.2.1Frame Buffer ArchitectureААААААААААА21

2.5.2.2Bit AlignmentАААААААААААААААААААААА21

2.5.2.3Block Boundaries and Destination MasksАААААААААААААААААААААААААААА21

2.5.2.4BITBLT Directions ААААААААААААААААА22

2.5.2.5BITBLT Variations ААААААААААААААААА23

2.5.3Graphics Support Instructions АААААААААААААА23

2.5.3.1BITBLT (BIT-aligned BLock Transfer)À23

2.5.3.2Pattern Fill АААААААААААААААААААААААА24

2.5.3.3Data Compression, Expansion and MagnifyААААААААААААААААААААААААААА24

2.5.3.3.1Magnifying Compressed

Data ААААААААААААААААААААА26

3.0FUNCTIONAL DESCRIPTION АААААААААААААААААААА26

3.1Instruction Execution АААААААААААААААААААААААААА26

3.1.1Operating States ААААААААААААААААААААААААА26

3.1.2Instruction Endings ААААААААААААААААААААААА26

3.1.2.1Completed Instructions ААААААААААААА27

3.1.2.2Suspended InstructionsААААААААААААА27

3.1.2.3Terminated InstructionsААААААААААААА27

3.1.2.4Partially Completed Instructions ААААА27

3.1.3Slave Processor Instructions ААААААААААААААА27

3.1.3.1Slave Processor Protocol ААААААААААА27

3.1.3.2Floating-Point Instructions АААААААААА28

3.2Exception Processing АААААААААААААААААААААААААА29

3.2.1Exception Acknowledge Sequence ААААААААА29

3.2.2Returning from an Exception Service

Procedure ААААААААААААААААААААААААААААААА30

3.2.3Maskable InterruptsААААААААААААААААААААААА34

3.2.3.1Non-Vectored Mode ААААААААААААААА34

3.2.3.2Vectored Mode: Non-Cascaded

Case ААААААААААААААААААААААААААААА35

3.2.3.3Vectored Mode: Cascaded Case ААААА35

3.2.4Non-Maskable Interrupt ААААААААААААААААААА37

3.2.5Traps ААААААААААААААААААААААААААААААААААА37

3.2.6Priority among Exceptions ААААААААААААААААА37

3.2.7Exception Acknowledge Sequences: Detailed Flow АААААААААААААААААААААААААААААААААААА39

3.2.7.1Maskable/Non-Maskable Interrupt Sequence АААААААААААААААААААААААА39

3.2.7.2SLAVE/ILL/SVC/DVZ/FLG/BPT/UND Trap Sequence АААААААААААААААААААА39

3.2.7.3Trace Trap Sequence АААААААААААААА39

3.3Debugging Support АААААААААААААААААААААААААААА40

3.3.1Instruction TracingАААААААААААААААААААААААА40

3.4DSP Module АААААААААААААААААААААААААААААААААА40

3.4.1Programming Model АААААААААААААААААААААА40

3.4.2RAM Organization and Data Types ААААААААА41

3.4.2.1Integer ValuesААААААААААААААААААААА41

3.4.2.2Aligned-Integer Values ААААААААААААА41

3.4.2.3Real Values ААААААААААААААААААААААА41

3.4.3.4Aligned-Real Values ААААААААААААААА41

3.4.2.5Extended Precision Real Values ААААА41

3.4.2.6Complex Values ААААААААААААААААААА42

3.4.3Command List Format АААААААААААААААААААА42

3.4.4CPU Core Interface ААААААААААААААААААААААА42

3.4.4.1Synchronization of Parallel OperationÀ42

3.4.4.2DSPM RAM Organization ААААААААААА43

3.4.5DSPM Instruction Set ААААААААААААААААААААА43

3.4.5.1Conventions АААААААААААААААААААААА43

3.4.5.2Type Casting АААААААААААААААААААААА43

3.4.5.3General NotesААААААААААААААААААААА44

3.4.5.4Load Register Instructions АААААААААА44

3.4.5.5Store Register Instructions АААААААААА45

3.4.5.6Adjust Register Instructions ААААААААА46

3.4.5.7Flow Control Instructions ААААААААААА47

3.4.5.8Internal Memory Move Instructions ÀÀÀ48

3.4.5.9External Memory Move Instructions ÀÀ48

3.4.5.10Arithmetic/Logical Instructions ААААА49

3.4.5.11Multiply-and-Accumulate

Instructions АААААААААААААААААААААА49

3.4.5.12Multiply-and-Add InstructionsААААААА50

3.4.5.13Clipping and Min/Max Instructions ÀÀ52

3.4.5.14Special Instructions ААААААААААААААА53

Table of Contents (Continued)

3.5 System Interface АААААААААААААААААААААААААААААА55	4.2 Absolute Maximum Ratings ААААААААААААААААААААА74
3.5.1 Power and Grounding ААААААААААААААААААААА55	4.3 Electrical Characteristics ААААААААААААААААААААААА74
3.5.2 Clocking АААААААААААААААААААААААААААААААА56	4.4 Switching Characteristics ААААААААААААААААААААААА74
3.5.3 Power Save Mode АААААААААААААААААААААААА57	4.4.1 Definitions ААААААААААААААААААААААААААААААА74
	4.4.1 Definitions ААААААААААААААААААААААААААААААА74
3.5.4 ResettingАААААААААААААААААААААААААААААААА57	4.4.2 Timing TablesАААААААААААААААААААААААААААА75
	4.4.2 Timing TablesАААААААААААААААААААААААААААА75
3.5.5 Bus Cycles АААААААААААААААААААААААААААААА58	4.4.2.1 Output Signals: Internal Propagation
	4.4.2.1 Output Signals: Internal Propagation
3.5.5.1 Bus Status АААААААААААААААААААААААА58	Delays ААААААААААААААААААААААААААА75
3.5.5.2 Basic Read and Write Cycles АААААААА58	4.4.2.2 Input Signal Requirements АААААААААА77
3.5.5.3 Cycle Extension ААААААААААААААААААА62	4.4.3 Timing Diagrams ААААААААААААААААААААААААА79
3.5.5.4 Instruction Fetch Cycles АААААААААААА63	APPENDIX A: INSTRUCTION FORMATS ААААААААААААА89
	APPENDIX A: INSTRUCTION FORMATS ААААААААААААА89

3.5.5.5Interrupt Control CyclesААААААААААААА64

3.5.5.6Special Bus CyclesААААААААААААААААА65 APPENDIX B: INSTRUCTION EXECUTION TIMESААААА92

	3.5.5.7 Slave Processor Bus CyclesААААААААА65	B.1 Basic and Floating-Point Instructions АААААААААААА92
	3.5.5.8 Data Access Sequences АААААААААААА67	B.1.1 Equations ААААААААААААААААААААААААААААААА92
		B.1.1 Equations ААААААААААААААААААААААААААААААА92
	3.5.5.9 Bus Access Control АААААААААААААААА68	B.1.2 Notes on Table Use АААААААААААААААААААААА93
	3.5.5.10 Instruction Status ААААААААААААААААА71	B.1.2 Notes on Table Use АААААААААААААААААААААА93
	3.5.5.10 Instruction Status ААААААААААААААААА71	B.1.3 Calculation of the Execution Time TEX for Basic
		B.1.3 Calculation of the Execution Time TEX for Basic
	4.0 DEVICE SPECIFICATIONS АААААААААААААААААААААА71	Instructions ААААААААААААААААААААААААААААА93
	4.1 NS32FX164 Pin Descriptions ААААААААААААААААААА71	B.1.4 Calculation of the Execution Time TEX for
	4.1.1 Supplies АААААААААААААААААААААААААААААААА71	Floating-Point InstructionsААААААААААААААААА93
	4.1.1 Supplies АААААААААААААААААААААААААААААААА71
	4.1.2 Input SignalsААААААААААААААААААААААААААААА71	B.2 Special Graphics Instructions ААААААААААААААААААА99
	4.1.2 Input SignalsААААААААААААААААААААААААААААА71
	4.1.3 Output Signals ААААААААААААААААААААААААААА71	B.2.1 Execution Time Calculation for Special
	4.1.4 Input-Output Signals АААААААААААААААААААААА72	Graphics Instructions ААААААААААААААААААААА99
	4.1.4 Input-Output Signals АААААААААААААААААААААА72
		B.3 DSPM Instructions ААААААААААААААААААААААААААА100

List of Figures

FIGURE 1-1. CPU Block Diagram ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААААА1

FIGURE 2-1. NS32FX164 Internal Registers ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААА7

FIGURE 2-2. Processor Status Register (PSR) ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААА8

FIGURE 2-3. Configuration Register (CFG) ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААА9

FIGURE 2-4. DSP Module Registers Address MapААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААА9

FIGURE 2-5. Accumulator Format ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААААА9

FIGURE 2-6. X, Y, Z Registers Format ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААА9

FIGURE 2-7. EABR Register FormatААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААА10

FIGURE 2-8. OVF Register Format ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААА10

FIGURE 2-9. PARAM Register Format ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААА10

FIGURE 2-10. REPEAT Register Format ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААА10

FIGURE 2-11. EXT Register Format ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААА11

FIGURE 2-12. CLSTAT Register Format ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААА11

FIGURE 2-13. DSPINT and DSPMASK Register Format ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААА11

FIGURE 2-14. NMISTAT Register Format ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААА11

FIGURE 2-15. NS32FX164 Address Mapping ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААА12

FIGURE 2-16. NS32FX164 Run-Time Environment ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААА13

FIGURE 2-17. General Instruction Format ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААА13

FIGURE 2-18. Index Byte FormatААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААААА13

FIGURE 2-19. Displacement Encodings ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААА14

FIGURE 2-20. Correspondence between Linear and Cartesian Addressing АААААААААААААААААААААААААААААААААААААААААААА20

FIGURE 2-21. 32-Pixel by 32-Scan Line Frame Buffer ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААА21

FIGURE 2-22. Overlapping BITBLT Blocks ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААА22

FIGURE 2-23. BB Instructions Format ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААА23

FIGURE 2-24. BITWT Instruction Format ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААА24

FIGURE 2-25. EXTBLT Instruction FormatААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААА24

FIGURE 2-26. MOVMPi Instruction Format ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААА24

List of Figures (Continued)

FIGURE 2-27. TBITS Instruction FormatААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААА24

FIGURE 2-28. SBITS Instruction Format ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААА25

FIGURE 2-29. SBITPS Instruction Format ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААА25

FIGURE 2-30. Bus Activity for a Simple BITBLT Operation ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААА25

FIGURE 3-1. Operating States ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААААААА26

FIGURE 3-2. Slave Processor Protocol ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААА28

FIGURE 3-3. Slave Processor Status Word ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААА29

FIGURE 3-4. Interrupt Dispatch and Cascade Tables ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААА30

FIGURE 3-5. Exception Acknowledge Sequence: Direct-Exception Mode Disabled АААААААААААААААААААААААААААААААААААА31

FIGURE 3-6. Exception Acknowledge Sequence: Direct-Exception Mode Enabled ААААААААААААААААААААААААААААААААААААА32

FIGURE 3-7. Return from Trap (RETTn) Instruction Flow: Direct-Exception Mode Disabled ААААААААААААААААААААААААААААА33

FIGURE 3-8. Return from Interrupt (RETI) Instruction Flow: Direct-Exception Mode Disabled АААААААААААААААААААААААААААА34

FIGURE 3-9. Interrupt Control Unit Connections (16 Levels) ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААА35

FIGURE 3-10. Cascaded Interrupt Control Unit Connections ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААА36

FIGURE 3-11. Exception Processing Flowchart ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААА38

FIGURE 3-12. Service SequenceААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААААА39

FIGURE 3-13. DSP Module Block Diagram ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААА55

FIGURE 3-14. Power and Ground ConnectionsААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААА56

FIGURE 3-15. Crystal InterconnectionsР30 MHz ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААА56

FIGURE 3-16. Crystal InterconnectionsР40 MHz, 50 MHzААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААА56

FIGURE 3-17. Recommended Reset Connections ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААА56

FIGURE 3-18. Power-On Reset Requirements ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААА57

FIGURE 3-19. General Reset TimingААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААА57

FIGURE 3-20. Bus Connections ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААААААА59

FIGURE 3-21. Read Cycle Timing ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААААА60

FIGURE 3-22. Write Cycle Timing ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААААА61

FIGURE 3-23. Cycle Extension of a Read Cycle ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААА63

FIGURE 3-24. Special Bus Cycle Timing ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААА65

FIGURE 3-25. Slave Processor Read CycleААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААА66

FIGURE 3-26. Slave Processor Write Cycle ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААА67

FIGURE 3-27. NS32FX164 and FPU Interconnections ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААА67

FIGURE 3-28. Memory Interface ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААААА67

FIGURE 3-29. HOLD Timing (Bus Initially Idle) ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААА69

FIGURE 3-30. HOLD Timing (Bus Initially Not Idle)ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААА70

FIGURE 4-1. Connection DiagramААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААА73

FIGURE 4-2. Output Signals Specification Standard ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААА74

FIGURE 4-3a. Input Signals Specification Standard ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААА74

FIGURE 4-3b. RSTI, INT, NMI HysteresisААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААА74

FIGURE 4-4. Read CycleААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААААААААААА79

FIGURE 4-5. Write Cycle ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААААААААААА80

FIGURE 4-6. Special Bus Cycle ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААААА81

FIGURE 4-7. HOLD Acknowledge Timing (Bus Initially Not Idle) ААААААААААААААААААААААААААААААААААААААААААААААААААА АА82 FIGURE 4-8. HOLD Timing (Bus Initially Idle) ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААА83

FIGURE 4-9. External DMA Controller Bus Cycle ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААА84

FIGURE 4-10. Slave Processor Write Timing ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААА85

FIGURE 4-11. Slave Processor Read TimingААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААА85

FIGURE 4-12. SPC Timing ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААААААААААА85

FIGURE 4-13. PFS Signal TimingААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААААА86

FIGURE 4-14. ILO Signal Timing ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААААА86

FIGURE 4-15. Clock Waveforms ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААААА86

FIGURE 4-16. INT Signal Timing ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААААА87

List of Figures (Continued)

FIGURE 4-17. NMI Signal Timing ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААААА87

FIGURE 4-18. Power-On Reset ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААААААА87

FIGURE 4-19. Non-Power-On Reset ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААА88

FIGURE 4-20. Interrupt OutААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААААААААААА88

List of Tables

TABLE 2-1. NS32FX164 Addressing Modes ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААА15

TABLE 2-2. NS32FX164 Instruction Set Summary ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААА16

TABLE 2-3. `op' and `i' Field Encodings ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААА23

TABLE 3-1. Floating-Point Instruction Protocols ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААА28

TABLE 3-2. Summary of Exception ProcessingААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААА40

TABLE 3-3. External Oscillator Specifications Crystal Characteristics ААААААААААААААААААААААААААААААААААААААААААААААААА57

TABLE 3-4. Interrupt Sequences ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААААА64

TABLE 3-5. Bus Cycle Categories ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААААААААААААААААААА67

TABLE 3-6. Data Access Sequences ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААА68

TABLE B-1. Basic Instructions ААААААААААААААААААААААААААААААААААААААААААААААААААА АААААААААААААААААААААААААААААААА94

TABLE B-2. Floating-Point Instructions: CPU Portion ААААААААААААААААААААААААААААААААААААААААААААААААААА ААААААААААААА98

TABLE B-3. Average Instruction Execution Times with No Wait-States АААААААААААААААААААААААААААААААААААААААААААААААА99

TABLE B-4. Average Instruction Execution Times with Wait-States АААААААААААААААААААААААААААААААААААААААААААААААААА1 00

1.0 Product Introduction

The NS32FX164 is a high speed CMOS microprocessor in the Series 32000/EP family.

It includes two main execution units: the NS32CG16 compatible CPU core and the DSP Module. The CPU core is designed for general purpose computations and system control functions. The DSP Module is tuned to perform the DSP primitives needed in Voice Band Modems. The NS32FX164 also incorporates a 4K-byte RAM Array as a shared resource for both the CPU core and the DSP Module (2K-byte in the NS32FV16 and the NS32FX161).

The NS32FX164 is software-compatible with all other CPUs in the family.

The device incorporates all of the Series 32000 advanced architectural features, with the exception of the virtual memory capability.

Brief descriptions of the NS32FX164 features that are shared with other members of the family are provided below:

Powerful Addressing Modes. Nine addressing modes available to all instructions are included to access data structures efficiently.

Data Types. The architecture provides for numerous data types, such as byte, word, doubleword, and BCD, which may be arranged into a wide variety of data structures.

Symmetric Instruction Set. While avoiding special case instructions that compilers can't use, the Series 32000 family incorporates powerful instructions for control operations, such as array indexing and external procedure calls, which save considerable space and time for compiled code.

Memory-to-Memory Operations. The Series 32000 CPUs represent two-address machines. This means that each operand can be referenced by any one of the addressing modes provided.

This powerful memory-to-memory architecture permits memory locations to be treated as registers for all useful operations. This is important for temporary operands as well as for context switching.

Large, Uniform Addressing. The NS32FX164 has 24-bit address pointers that can address up to 16 megabytes without any segmentation; this addressing scheme provides flexible memory management without add-on expense.

Modular Software Support. Any software package for the Series 32000 architecture can be developed independent of all other packages, without regard to individual addressing. In addition, ROM code is totally relocatable and easy to access, which allows a significant reduction in hardware and software cost.

Software Processor Concept. The Series 32000 architecture allows future expansions of the instruction set that can be executed by special slave processors, acting as exten-

sions to the CPU. This concept of slave processors is unique to the Series 32000 architecture. It allows software compatibility even for future components because the slave hardware is transparent to the software. With future advances in semiconductor technology, the slaves can be physically integrated on the CPU chip itself.

To summarize, the architectural features cited above provide three primary performance advantages and characteristics:

#High-Level Language Support

#Easy Future Growth Path

#Application Flexibility

1.1 NS32FX164 SPECIAL FEATURES

In addition to the above Series 32000 features, the NS32FX164 provides features that make the device extremely attractive for a wide range of applications where graphics support, low chip count, and low power consumption are required.

The most relevant of these features are the enhanced Digital Signal Processing performance which makes the chip very attractive for facsimile applications, and the graphics support capabilities, that can be used in applications such as printers, CRT terminals, and other varieties of display systems, where text and graphics are to be handled.

Graphics support is provided by eighteen instructions that allow operations such as BITBLT, data compression/expansion, fills, and line drawing, to be performed very efficiently. In addition, the device can be easily interfaced to an external BITBLT Processing Unit (BPU) for high BITBLT performance.

The NS32FX164 allows systems to be built with a relatively small amount of random logic. The bus is highly optimized to allow simple interfacing to a large variety of DRAMs and peripheral devices. All the relevant bus access signals and clock signals are generated on-chip. The cycle extension logic is also incorporated on-chip.

The device is fabricated in a low-power, high speed CMOS technology. It also includes a power-save feature that allows the clock to be slowed down under software control, thus minimizing the power consumption. This feature can be used in those applications where power saving during periods of low performance demand is highly desirable.

The power save feature, the DSP Module and the Bus Characteristics are described in the ``Functional Description'' section. A general overview of BITBLT operations and a description of the graphics support instructions is provided in Section 2.5. Details on all the NS32FX164 graphics instructions can be found in the NS32CG16 Printer/Display Processor Programmer's Reference Supplement.

NSC NS32FX164V-20, NS32FX164V-25, NS32FX164AV-25, N32FX164AVLJ-25 Datasheet

1.0 Product Introduction (Continued)

Below is a summary of the instructions that are directly applicable to graphics along with their intended use.

Instruction	Application

BBAND	The BITBLT group of instructions provide a
BBOR	method of quickly imaging characters,
BBFOR	creating patterns, windowing and other
BBXOR	block oriented effects.
BBSTOD
BITWT
EXTBLT
MOVMP	Move Multiple Pattern is a very fast
	instruction for clearing memory and drawing
	patterns and lines.
TBITS	Test Bit String will measure the length of 1's
	or 0's in an image, supporting many data
	compression methods (RLL), TBITS may
	also be used to test for boundaries of
	images.
SBITS	Set Bit String is a very fast instruction for
	filling objects, outline characters and
	drawing horizontal lines.
	The TBITS and SBITS instructions support
	Group 3 and Group 4 CCITT standards for
	compression and decompression
	algorithms.
SBITPS	Set Bit Perpendicular String is a very fast
	instruction for drawing vertical, horizontal
	and 45§ lines.
	In printing applications SBITS and SBITPS
	may be used to express portrait and
	landscape respectively from the same
	compressed font data. The size of the
	character may be scaled as it is drawn.
SBIT	The Bit group of instructions enable single
CBIT	pixels anywhere in memory to be set,
TBIT	cleared, tested or inverted.
IBIT
INDEX	The INDEX instruction combines a multiply-
	add sequence into a single instruction. This
	provides a fast translation of an X-Y
	address to a pixel relative address.

2.0Architectural Description

2.1REGISTER SET

The NS32FX164 has 32 internal registers. 17 of these registers belong to the CPU portion of the device and are addressed either implicitly by specific instructions or through the register addressing mode. The other 15 control the operation of the DSP Module, and are memory mapped. Figure 2-1 shows the NS32FX164 internal registers.

CPU Registers

General Purpose

w32 Bits x

R0 ± R7

Address

SP0, SP1

INTBASE

MOD

Processor Status

PSR

Configuration

CFG

Peripherals Registers

DSP Module

EABR

CLPTR

OVF

PARAM

REPEAT

ABORT

EXT

CLSTAT

DSPINT

DSPMASK

NMISTAT

FIGURE 2-1. NS32FX164 Internal Registers

2.1.1 General Purpose Registers

There are eight registers (R0 ± R7) used for satisfying the high speed general storage requirements, such as holding temporary variables and addresses. The general purpose registers are free for any use by the programmer. They are 32 bits in length. If a general purpose register is specified for an operand that is 8 or 16 bits long, only the low part of the register is used; the high part is not referenced or modified.

2.0 Architectural Description (Continued)

2.1.2 Address Registers

The seven address registers are used by the processor to implement specific address functions. Except for the MOD register that is 16 bits wide, all the others are 32 bits. A description of the address registers follows.

PCÐProgram Counter. The PC register is a pointer to the first byte of the instruction currently being executed. The PC is used to reference memory in the program section.

SP0, SP1ÐStack Pointers. The SP0 register points to the lowest address of the last item stored on the INTERRUPT STACK. This stack is normally used only by the operating system. It is used primarily for storing temporary data, and holding return information for operating system subroutines and interrupt and trap service routines. The SP1 register points to the lowest address of the last item stored on the USER STACK. This stack is used by normal user programs to hold temporary data and subroutine return information.

When a reference is made to the selected Stack Pointer (see PSR S-bit), the terms ``SP Register'' or ``SP'' are used. SP refers to either SP0 or SP1, depending on the setting of the S bit in the PSR register. If the S bit in the PSR is 0, SP refers to SP0. If the S bit in the PSR is 1 then SP refers to SP1.

Stacks in the Series 32000 architecture grow downward in memory. A Push operation pre-decrements the Stack Pointer by the operand length. A Pop operation post-increments the Stack Pointer by the operand length.

FPÐFrame Pointer. The FP register is used by a procedure to access parameters and local variables on the stack. The FP register is set up on procedure entry with the ENTER instruction and restored on procedure termination with the EXIT instruction.

The frame pointer holds the address in memory occupied by the old contents of the frame pointer.

SBÐStatic Base. The SB register points to the global variables of a software module. This register is used to support relocatable global variables for software modules. The SB register holds the lowest address in memory occupied by the global variables of a module.

INTBASEÐInterrupt Base. The INTBASE register holds the address of the dispatch table for interrupts and traps (Section 3.2.1).

MODÐModule. The MOD register holds the address of the module descriptor of the currently executing software module. The MOD register is 16 bits long, therefore the module table must be contained within the first 64 kbytes of memory.

2.1.3 Processor Status Register

The Processor Status Register (PSR) holds status information for the microprocessor.

The PSR is sixteen bits long, divided into two eight-bit halves. The low order eight bits are accessible to all programs, but the high order eight bits are accessible only to programs executing in Supervisor Mode.

FIGURE 2-2. Processor Status Register (PSR)

CThe C bit indicates that a carry or borrow occurred after an addition or subtraction instruction. It can be used with the ADDC and SUBC instructions to perform multipleprecision integer arithmetic calculations. It may have a setting of 0 (no carry or borrow) or 1 (carry or borrow).

TThe T bit causes program tracing. If this bit is set to 1, a TRC trap is executed after every instruction (Section 3.3.1).

LThe L bit is altered by comparison instructions. In a comparison instruction the L bit is set to ``1'' if the second operand is less than the first operand, when both operands are interpreted as unsigned integers. Otherwise, it is set to ``0''. In Floating-Point comparisons, this bit is always cleared.

K Reserved for use by the CPU.

J Reserved for use by the CPU.

FThe F bit is a general condition flag, which is altered by many instructions (e.g., integer arithmetic instructions use it to indicate overflow).

ZThe Z bit is altered by comparison instructions. In a comparison instruction the Z bit is set to ``1'' if the second operand is equal to the first operand; otherwise it is set to ``0''.

NThe N bit is altered by comparison instructions. In a comparison instruction the N bit is set to ``1'' if the second operand is less than the first operand, when both operands are interpreted as signed integers. Otherwise, it is set to ``0''.

UIf the U bit is ``1'' no privileged instructions may be executed. If the U bit is ``0'' then all instructions may be executed. When Ue0 the processor is said to be in Supervisor Mode; when Ue1 the processor is said to be in User Mode. A User Mode program is restricted from executing certain instructions and accessing certain registers which could interfere with the operating system. For example, a User Mode program is prevented from changing the setting of the flag used to indicate its own privilege mode. A Supervisor Mode program is assumed to be a trusted part of the operating system, hence it has no such restrictions.

SThe S bit specifies whether the SP0 register or SP1 register is used as the Stack Pointer. The bit is automatically cleared on interrupts and traps. It may have a setting of 0 (use the SP0 register) or 1 (use the SP1 register).

PThe P bit prevents a TRC trap from occurring more than once for an instruction (Section 3.3.1). It may have a setting of 0 (no trace pending) or 1 (trace pending).

IIf Ie1, then all interrupts will be accepted. If Ie0, only the NMI interrupt is accepted. Trap enables are not affected by this bit.

2.0 Architectural Description (Continued)

BReserved for use by the CPU. This bit is set to 1 during the execution of the EXTBLT instruction and causes the BPU signal to become active. Upon reset, B is set to zero and the BPU signal is set high.

Note 1: When an interrupt is acknowledged, the B, I, P, S and U bits are set to zero and the BPU signal is set high. A return from interrupt will restore the original values from the copy of the PSR register saved in the interrupt stack.

Note 2: If BITBLT (BB) or EXTBLT instructions are executed in an interrupt routine, the PSR bits J and K must be cleared first.

2.1.4 Configuration Register

The Configuration Register (CFG) is 32 bits wide, of which 5 bits are implemented. The implemented bits enable various operating modes for the CPU, including vectoring of interrupts, execution of floating-point instructions, processing of exceptions and selection of clock scaling factor. The CFG is programmed by the SETCFG instruction. The format of CFG is shown in Figure 2-3 . The various control bits are described below.

31	8	7	0

Reserved	DE	Res	C	M	F	I

FIGURE 2-3. Configuration Register (CFG)

IInterrupt vectoring. This bit controls whether maskable interrupts are handled in nonvectored (Ie0) or vectored (Ie1) mode. Refer to Section 3.2.3 for more information.

FFloating-point instruction set. This bit indicates whether a floating-point unit (FPU) is present to execute floating-point instructions. If this bit is 0 when the CPU executes a floating-point instruction, a Trap (UND) occurs. If this bit is 1, then the CPU transfers the instruction and any necessary operands to the FPU using the slave-processor protocol described in Section 3.1.3.1.

MClock scaling. This bit is used in conjunction with the C-bit to select the clock scaling factor.

CClock scaling. Same as the M-bit above. Refer to Section 3.5.3 on ``Power Save Mode'' for details.

DE Direct-Exception mode enable. This bit enables the Di- rect-Exception mode for processing exceptions. When this mode is selected, the CPU response time to interrupts and other exceptions is significantly improved. Refer to Section 3.2 for more information.

2.1.5 DSP Module Registers

The DSP Module (DSPM) contains 15 memory-mapped registers. All the registers, except OVF, CLSTAT, ABORT, DSPINT and NMISTAT, are readable and writable. OVF, CLSTAT, DSPINT and NMISTAT are read-only. ABORT is write-only.

The DSPM registers are divided into two groups, according to their function. PARAM, OVF, X, Y, Z, A, REPEAT, CLPTR and EABR are called DSPM dedicated registers. CLSTAT, ABORT, DSPINT, DSPMASK, EXT and NMISTAT are called CPU core interface registers.

Accesses to these registers must be aligned; word and dou- ble-word accesses must occur on word and double-word address boundaries respectively. Failing to do so will cause unpredictable results. Figure 2-4 shows the address map of the DSP Module registers.

Register	Register
Name	Address

PARAM	FFFF8000

OVF	FFFF8004

X	FFFF8008

Y	FFFF800C

Z	FFFF8010

A	FFFF8014

REPEAT	FFFF8018

CLPTR	FFFF8020

EABR	FFFF8024

CLSTAT	FFFF9000

ABORT	FFFF9004

DSPINT	FFFF9008

DSPMASK	FFFF900C

EXT	FFFF9010
NMISTAT	FFFF9014

FIGURE 2-4. DSP Module Registers Address Map

AÐAccumulator

The format of the accumulator is shown in Figure 2-5 .

33	0	33	0

	Imaginary		Real

FIGURE 2-5. Accumulator Format

The A register is a complex accumulator. It has two 34-bit fields: a real part, and an imaginary part. Bits 15 through 30 of the real and the imaginary parts of the accumulator can be read or written by the core in one double-word access. Bits 15 through 30 of the real part are mapped to the operand's bits 0 through 15, and bits 15 through 30 of the imaginary part are mapped to the operand's bits 16 through 31. The accumulator can also be read and written by the com- mand-list execution unit using the SA, SEA, LA and LEA instructions (See Section 3.4 for more information).

Note that when a value is stored in the accumulator by the core, the value of PARAM.RND bit is copied into bit position 14 of both real and imaginary parts of the accumulator. This technique allows rounding of the accumulator's value in the following DSPM instructions (See Section 3.4.5.3 for more information on rounding).

When the Accumulator is loaded either by the core or by the LA or LEA instructions, bits 31 ± 33 of the real and the imaginary accumulators are loaded with the values of bit 30 of the real and the imaginary parts respectively.

When the Accumulator is loaded either by the core or by the LA instruction, bits 0 ± 13 of the real and the imaginary accumulators are loaded with zeros.

X, Y, ZÐVector Pointers

The format of X, Y, and Z registers is shown in Figure 2-6 .

ADDRESS Reserved WRAP-AROUND INCREMENT

FIGURE 2-6. X, Y, Z Registers Format

2.0 Architectural Description (Continued)

The X, Y, and Z registers are used for addressing up to three vector operands. They are 32-bit registers, with three fields: ADDRESS, INCREMENT, and WRAP-AROUND. The value in the ADDRESS field specifies the address of a word in the on-chip memory. This field has 16 bits, and can address up to 64 Kwords of internal memory. The ADDRESS fields are initialized with the vector operands' start-address- es by commands in the command list. At the beginning of each vector operation, the contents of the ADDRESS field are copied to incrementors. Increments can be used by vector instructions to step through the corresponding vector operands while executing the appropriate calculations. There is an address wrap-around for those vector instructions that require some of their operands to be located in cyclic buffers. The allowed values for the increment field are 0 through 15. The actual increment will be 2increment words. The allowed values for the WRAP-AROUND field are 0 through 15. The actual wrap-around will be 2WRAP-AROUND words. The WRAP-AROUND must be greater or equal to the INCREMENT.

The X, Y, and Z registers can be read and written by the core. These registers can be read and written by the com- mand-list execution unit, as well as by the core, when using SX, SXL, SXH, SY, SZ, LX, LY and LZ instructions.

EABRÐExternal Address Base Register

The format of the external address base register is shown in

Figure 2-7 .

31	17	16	0

ADDRESS			0

FIGURE 2-7. EABR Register Format

The EABR register is used together with a 16-bit address field to form a 32-bit external address. External addresses are specified as the sum of the value in EABR and two times the value of the 16-bit address pointed by registers X, Y or Z. The only value allowed to be written into bits 0 through 16 of EABR is ``0''. The EABR register can be read and written by the core. It can also be written by the command-list execution unit by using the LEABR instruction.

EABR can hold any value except for FFFE0000. Accessing external memory with an FFFE0000 in the EABR will cause unpredictable results.

CLPTRÐCommand List Pointer

The CLPTR is a 16-bit register that holds the address of the current command in the internal RAM. Writing into the CLPTR causes the DSPM command-list execution unit to begin executing commands, starting from the address in CLPTR. The CLPTR can be read and written by the core while the command-list execution is idle.

Whenever the DSPM command-list execution unit reads a command from the DSPM RAM, the value of CLPTR is updated to contain the address of the next command to be executed. This implies, for example, that if the last command in a list is in address N, the CLPTR will hold a value of N a 1 following the end of command list execution.

OVFÐOverflow Register

The format of the overflow register is shown in Figure 2-8 .

15	2	1	0

Reserved		OVF	SAT

FIGURE 2-8. OVF Register Format

The OVF register holds the current status of the DSPM arithmetic unit. It has two fields: OVF and SAT. The OVF bit is set to ``1'' whenever an overflow is detected in the DSPM 34-bit ALU (e.g., bits 32 and 33 of the ALU are not equal). No overflow detection is provided for integers. The SAT bit is set to ``1'' whenever a value read from the accumulator cannot be represented within the limits of its data type (e.g., 16 bits for real and integer, and 31 bits for extended real). In this case the value read from the accumulator will either be the maximum allowed value or the minimal allowed value for this data type depending on the sign of the accumulator value. Note that in some cases when the OVF is set, the SAT will not be set. The reason is that if an OVF occurred, the value in the accumulator can no longer be used for proper SAT detection. Upon reset, and whenever the ABORT register is written, the non reserved bits of the OVF register is cleared to ``0''.

The OVF is a read only register. It can be read by the core. It can also be read by the command-list execution unit using the SOVF instruction. Reading the OVF by either the core or the command-list execution unit clears it to ``0''.

PARAMÐVector Parameter Register

The format of the PARAM register is shown in Figure 2-9.

31	26	25	24 19	18	17	16	15	0

Reserved		RND	OP	SUB	CLR	COJ		LENGTH

FIGURE 2-9. PARAM Register Format

The PARAM register is used to specify the number of iterations and special options for the various instructions. The options are: RND, OP, SUB, CLR, and COJ. The effect of each of the bits of the PARAM register is specified in Section 3.4.

The PARAM register can be read and written by the core. It can also be written by the command-list execution unit, by using the LPARAM instruction. The value written into PARAM.LENGTH must be greater then 0.

The value of PARAM.LENGTH is not changed during com- mand-list execution, unless it is written into using the LPARAM instruction.

REPEATÐCommand-List Repeat Register

The format of the repeat register is shown in Figure 2-10 .

31	16	15	0

COUNT			TARGET

FIGURE 2-10. REPEAT Register Format

The REPEAT register is used, together with appropriate commands, to implement loops and branches in the command list. The count is used to specify the number of times a loop in the command list is to be repeated. The target is used to specify a jump address within the command list.

The REPEAT register can be read and written by the core. It can also be read and written by the command-list execution unit by using SREPEAT and LREPEAT instructions respectively.

The value of REPEAT.COUNT changes during the execution of the DJNZ command.

ABORTÐAbort Register

The ABORT register is used to force execution of the command list to halt. Writing any value into this register stops execution, and clears the contents of OVF, EXT, DSPINT and DSPMASK. The ABORT register can only be written and only by the core.

2.0 Architectural Description (Continued)

EXTÐExternal Memory Reference Control Register

The format of the external memory reference control register is shown in Figure 2-11 .

15	1	0

Reserved		HOLD

FIGURE 2-11. EXT Register Format

The EXT register controls external references. The com- mand-list execution unit checks the value of EXT.HOLD before each external memory reference. When EXT.HOLD is ``0'', external memory references are allowed. When EXT.HOLD is ``1'', and external memory references are requested, the execution of the command list will stop until EXT.HOLD is ``0''. Upon reset, and whenever the ABORT register is written, EXT.HOLD is cleared to ``0''. The EXT register can be read or written by the core.

CLSTATÐCommand-List Execution Status Register

The format of the command-list execution status register is shown in Figure 2-12 .

15	1	0

Reserved		RUN

FIGURE 2-12. CLSTAT Register Format

The CLSTAT register displays the current status of the execution of the command list. When the command-list execution is idle, CLSTAT.RUN is ``0'', and when it is active, CLSTAT.RUN is ``1''. Upon reset, the CLSTAT register is cleared to ``0''. It can only be read, and only by the core.

DSPINT, DSPMASK, NMISTATÐInterrupt Control

Registers

The format of DSPINT and DSPMASK is shown in Figure 2-13 .

15	1	0

Reserved		HALT

FIGURE 2-13. DSPINT and DSPMASK Register Format

The DSPINT register holds the current status of interrupt requests. Whenever execution of the command list is stopped, the DSPINT.HALT bit is set to ``1''. The DSPINT is a read only register. It is cleared to ``0'' whenever it is read, whenever the ABORT register is written, and upon reset.

The DSPMASK register is used to mask the DSPINT. HALT flag. An interrupt request is transferred to the interrupt logic of the IOUT output pin whenever the DSPINT.HALT bit is set to ``1'', and the DSPMASK.HALT bit is unmasked (set to ``1''). See Section 4.0 for the functionality of IOUT. DSPMASK can be read and written by the core. Upon reset, and whenever the ABORT register is written, all the bits in DSPMASK are cleared to ``0''.

The format of the NMISTAT register is shown in Figure 2-14 .

15	3	2	1	0

Reserved		ERR	UND	EXT

FIGURE 2-14. NMISTAT Register Format

The NMISTAT holds the status of the current pending NonMaskable Interrupt (NMI) requests.

Whenever the core attempts to access the DSPM address space while the CLSTAT.RUN bit is ``1'' (except for accesses to the CLSTAT, EXT, DSPINT, NMISTAT, DSPMASK, and ABORT registers) NMISTAT.ERR is set to ``1''.

Whenever there is an attempt to execute a DBPT instruction, or a reserved DSPM instruction (Section 3.4), the NMISTAT.UND bit is set to ``1''.

When a high to low transition is detected on the NMI input pin, NMISTAT.EXT bit is set to ``1''.

When one of the bits in NMISTAT is set to ``1'', an NMI request to the core is issued.

The NMISTAT register is cleared to 0 upon reset, and each time its contents are read.

When one of the bits in NMISTAT is set to 1, an NMI occurs. The NMI handler can read the NMISTAT register to determine the source of the interrupt. Note that since NMIs may be nested, it is possible that a second NMI handler (invoked while the previous handler has not yet exited) will read and handle more than one set bit in NMISTAT. Since the read operation clears the register, the interrupted handler may find that no bits are set.

2.2 MEMORY ORGANIZATION

The main memory of the NS32FX164 is a uniform linear address space. Memory locations are numbered sequentially starting at zero and ending at 224 b 1. The number specifying a memory location is called an address. The contents of each memory location is a byte consisting of eight bits. Unless otherwise noted, diagrams in this document show data stored in memory with the lowest address on the right and the highest address on the left. Also, when data is shown vertically, the lowest address is at the top of a diagram and the highest address at the bottom of the diagram. When bits are numbered in a diagram, the least significant bit is given the number zero, and is shown at the right of the diagram. Bits are numbered in increasing significance and toward the left.

Byte at Address A

Two contiguous bytes are called a word. Except where noted, the least significant byte of a word is stored at the lower address, and the most significant byte of the word is stored at the next higher address. In memory, the address of a word is the address of its least significant byte, and a word may start at any address.

15	8	7	0

Aa1			A
MSB			LSB
	Word at Address A

Two contiguous words are called a double-word. Except where noted, the least significant word of a double-word is stored at the lowest address and the most significant word of the double-word is stored at the address two higher. In memory, the address of a double-word is the address of its least significant byte, and a double-word may start at any address.

31	24	23	16	15	8	7	0

Aa3			Aa2		Aa1		A
MSB							LSB
		Double Word at Address A

2.0 Architectural Description (Continued)

Although memory is addressed as bytes, it is actually organized as words. Therefore, words and double-words that are aligned to start at even addresses (multiples of two) are accessed more quickly than words and double-words that are not so aligned.

2.2.1 Address Mapping

The NS32FX164 supports the use of memory-mapped peripheral devices and coprocessors. Such memory-mapped devices can be located at arbitrary locations within the 16-Mbyte address range available externally.

Addresses marked as Reserved in Figure 2-15 are not available in the present implementation of the NS32FX164, and should not be used. The top 8-Mbyte block is reserved by National Semiconductor Corporation, and only a few locations within this block are presently used to access the onchip RAM array and DSP Module registers. Figure 2-15 shows the NS32FX164 address mapping.

Start Address
(HEX)

00000000	Memory and I/O

00FFFE00	Interrupt Control

01000000	Reserved
FFFE0000	DSPM Internal RAM

FFFE1000	Reserved

FFFF8000	DSPM Dedicated Registers

FFFF8028	Reserved

FFFF9000	DSPM Control/Status Registers

FFFF9014	Reserved

FIGURE 2-15. NS32FX164 Address Mapping

2.3 MODULAR SOFTWARE SUPPORT

The NS32FX164 provides special support for software modules and modular programs.

Each module in a NS32FX164 software environment consists of three components:

1.Program Code Segment.

This segment contains the module's code and constant data.

2.Static Data Segment.

Used to store variables and data that may be accessed by all procedures within the module.

3.Link Table.

This component contains two types of entries: Absolute Addresses and Procedure Descriptors.

An Absolute Address is used in the external addressing mode, in conjunction with a displacement and the current MOD Register contents to compute the effective address of an external variable belonging to another module.

The Procedure Descriptor is used in the call external procedure (CXP) instruction to compute the address of an external procedure.

Normally, the linker program specifies the locations of the three components. The Static Data and Link Table typically reside in RAM; the code component can be either in RAM or in ROM. The three components can be mapped into noncontiguous locations in memory, and each can be independently relocated. Since the Link Table contains the absolute addresses of external variables, the linker need not assign absolute memory addresses for these in the module itself; they may be assigned at load time.

To handle the transfer of control from one module to another, the NS32FX164 uses a module table in memory and two registers in the CPU.

The Module Table is located within the first 64 kbytes of memory. This table contains a Module Descriptor (also called a Module Table Entry) for each module in the address space of the program. A Module Descriptor has four 32-bit entries corresponding to each component of a module:

#The Static Base entry contains the address of the beginning of the module's static data segment.

#The Link Table Base points to the beginning of the module's Link Table.

#The Program Base is the address of the beginning of the code and constant data for the module.

#A fourth entry is currently unused but reserved.

The MOD Register in the CPU contains the address of the Module Descriptor for the currently executing module.

The Static Base Register (SB) contains a copy of the Static Base entry in the Module Descriptor of the currently executing module, i.e., it points to the beginning of the current module's static data area.

This register is implemented in the CPU for efficiency purposes. By having a copy of the static base entry or chip, the CPU can avoid reading it from memory each time a data item in the static data segment is accessed.

In an NS32FX164 software environment modules need not be linked together prior to loading. As modules are loaded, a linking loader simply updates the Module Table and fills the Link Table entries with the appropriate values. No modification of a module's code is required. Thus, modules may be stored in read-only memory and may be added to a system independently of each other, without regard to their individual addressing. Figure 2-16 shows a typical NS32FX164 run-time environment.

2.4 INSTRUCTION SET

2.4.1 General Instruction Format

Figure 2-17 shows the general format of a Series 32000 instruction. The Basic Instruction is one to three bytes long and contains the Opcode and up to two 5-bit General Addressing Mode (``Gen'') fields. Following the Basic Instruction field is a set of optional extensions, which may appear depending on the instruction and the addressing modes selected.

Index Bytes appear when either or both Gen fields specify Scaled Index. In this case, the Gen field specifies only the Scale Factor (1, 2, 4 or 8), and the Index Byte specifies which General Purpose Register to use as the index, and which addressing mode calculation to perform before indexing.

2.0 Architectural Description (Continued)

Following Index Bytes come any displacements (addressing constants) or immediate values associated with the selected addressing modes. Each Disp/lmm field may contain one of two displacements, or one immediate value. The size of a Displacement field is encoded within the top bits of that field, as shown in Figure 2-19 , with the remaining bits interpreted as a signed (two's complement) value. The size of an immediate value is determined from the Opcode field. Both Displacement and Immediate fields are stored most-signifi- cant byte first. Note that this is different from the memory representation of data (Section 2.2).

Some instructions require additional ``implied'' immediates and/or displacements, apart from those associated with addressing modes. Any such extensions appear at the end of the instruction, in the order that they appear within the list of operands in the instruction definition (Section 2.4.3).

TL/EE/11267 ± 3

FIGURE 2-18. Index Byte Format

TL/EE/11267 ± 2

Note: Dashed lines indicate information copied to register during transfer of control between modules.

FIGURE 2-16. NS32FX164 Run-Time Environment

TL/EE/11267 ± 4

FIGURE 2-17. General Instruction Format

2.0 Architectural Description (Continued)

2.4.2 Addressing Modes

The NS32FX164 CPU generally accesses an operand by calculating its Effective Address based on information available when the operand is to be accessed. The method to be used in performing this calculation is specified by the programmer as an ``addressing mode''.

Addressing modes in the NS32FX164 are designed to optimally support high-level language accesses to variables. In nearly all cases, a variable access requires only one addressing mode, within the instruction that acts upon that variable. Extraneous data movement is therefore minimized.

NS32FX164 Addressing Modes fall into nine basic types:

Register: The operand is available in one of the eight General Purpose Registers. In certain Slave Processor instructions, an auxiliary set of eight registers may be referenced instead.

Register Relative: A General Purpose Register contains an address to which is added a displacement value from the instruction, yielding the Effective Address of the operand in memory.

Memory Space: Identical to Register Relative above, except that the register used is one of the dedicated registers PC, SP, SB or FP. These registers point to data areas generally needed by high-level languages.

Memory Relative: A pointer variable is found within the memory space pointed to by the SP, SB or FP register. A displacement is added to that pointer to generate the Effective Address of the operand.

Immediate: The operand is encoded within the instruction. This addressing mode is not allowed if the operand is to be written.

Absolute: The address of the operand is specified by a displacement field in the instruction.

External: A pointer value is read from a specified entry of the current Link Table. To this pointer value is added a displacement, yielding the Effective Address of the operand.

Top of Stack: The currently-selected Stack Pointer (SP0 or SP1) specifies the location of the operand. The operand is pushed or popped, depending on whether it is written or read.

Scaled Index: Although encoded as an addressing mode, Scaled Indexing is an option on any addressing mode except Immediate or another Scaled Index. It has the effect of calculating an Effective Address, then multiplying any General Purpose Register by 1, 2, 4 or 8 and adding into the total, yielding the final Effective Address of the operand.

Table 2-1 is a brief summary of the addressing modes. For a complete description of their actions, see the Series 32000 Instruction Set Reference Manual.

In addition to the general modes, Register-Indirect with auto-increment/decrement and warps or pitch are available on several of the graphics instructions.

Byte Displacement: Range b64 to a63

TL/EE/11267 ± 5

FIGURE 2-19. Displacement Encodings

2.0 Architectural Description (Continued)

TABLE 2-1. NS32FX164 Addressing Modes

ENCODING	MODE
Register
00000	Register 0
00001	Register 1
00010	Register 2
00011	Register 3
00100	Register 4
00101	Register 5
00110	Register 6
00111	Register 7
Register Relative
01000	Register 0 relative
01001	Register 1 relative
01010	Register 2 relative
01011	Register 3 relative
01100	Register 4 relative
01101	Register 5 relative
01110	Register 6 relative
01111	Register 7 relative
Memory Relative
10000	Frame memory relative
10001	Stack memory relative
10010	Static memory relative
Reserved
10011	(Reserved for Future Use)
Immediate
10100	Immediate
Absolute
10101	Absolute
External
10110	External
Top Of Stack
10111	Top of stack
Memory Space
11000	Frame memory
11001	Stack memory
11010	Static memory
11011	Program memory
Scaled Index
11100	Index, bytes
11101	Index, words
11110	Index, double words
11111	Index, quad words

ASSEMBLER SYNTAX

R0 or F0

R1 or F1

R2 or F2

R3 or F3

R4 or F4

R5 or F5

R6 or F6

R6 or F7

disp(R0)

disp(R1)

disp(R2)

disp(R3)

disp(R4)

disp(R5)

disp(R6)

disp(R7)

disp2(disp1 (FP)) disp2(disp1 (SP)) disp2(disp1 (SB))

value

@disp

EXT (disp1) a disp2

TOS

disp(FP)

disp(SP)

disp(SB) *a disp

mode[Rn:B] mode[Rn:W] mode[Rn:D] mode[Rn:Q]

EFFECTIVE ADDRESS

None: Operand is in the specified register.

Disp a Register.

Disp2 a Pointer; Pointer found at address Disp 1 a Register. ``SP'' is either SP0 or SP1, as selected in PSR.

None: Operand is input from instruction queue.

Disp.

Disp2 a Pointer; Pointer is found at Link Table Entry number Disp1.

Top of current stack, using either User or Interrupt Stack Pointer, as selected in PSR. Automatic Push/Pop included.

Disp a Register; ``SP'' is either SP0 or SP1, as selected in PSR.

EA (mode) a Rn.

EA (mode) a 2cRn.

EA (mode) a 4cRn.

EA (mode) a 8cRn.

``Mode'' and ``n'' are contained

within the Index Byte.

EA (mode) denotes the effective

address generated using mode.

2.0 Architectural Description (Continued)

2.4.3 Instruction Set Summary

Table 2-2 presents a brief description of the NS32FX164 instruction set. The Format column refers to the Instruction Format tables (Appendix A). The Instruction column gives the instruction as coded in assembly language, and the Description column provides a short description of the function provided by that instruction. Further details of the exact operations performed by each instruction may be found in the Series 32000 Instruction Set Reference Manual and the NS32CG16 Printer/Display Processor Programmer's Reference.

Notations:

ieInteger length suffix: B e Byte We Word

D e Double Word

feFloating Point length suffix: FeStandard Floating LeLong Floating

geneGeneral operand. Any addressing mode can be specified.

shorteA 4-bit value encoded within the Basic Instruction (see Appendix A for encodings).

immeImplied immediate operand. An 8-bit value appended after any addressing extensions.

dispeDisplacement (addressing constant): 8, 16 or 32 bits. All three lengths legal.

regeAny General Purpose Register: R0 ± R7.

aregeAny Processor Register: SP, SB, FP, INTBASE, MOD, PSR, US (bottom 8 PSR bits).

condeAny condition code, encoded as a 4-bit field within the Basic Instruction (see Appendix A for encodings).

		TABLE 2-2. NS32FX164 Instruction Set Summary
MOVES
Format	Operation	Operands	Description
4	MOVi	gen,gen	Move a value.
2	MOVQi	short,gen	Extend and move a signed 4-bit constant.
7	MOVMi	gen,gen,disp	Move multiple: disp bytes (1 to 16).
7	MOVZBW	gen,gen	Move with zero extension.
7	MOVZiD	gen,gen	Move with zero extension.
7	MOVXBW	gen,gen	Move with sign extension.
7	MOVXiD	gen,gen	Move with sign extension.
4	ADDR	gen,gen	Move effective address.
INTEGER ARITHMETIC
Format	Operation	Operands	Description
4	ADDi	gen,gen	Add.
2	ADDQi	short,gen	Add signed 4-bit constant.
4	ADDCi	gen,gen	Add with carry.
4	SUBi	gen,gen	Subtract.
4	SUBCi	gen,gen	Subtract with carry (borrow).
6	NEGi	gen,gen	Negate (2's complement).
6	ABSi	gen,gen	Take absolute value.
7	MULi	gen,gen	Multiply.
7	QUOi	gen,gen	Divide, rounding toward zero.
7	REMi	gen,gen	Remainder from QUO.
7	DIVi	gen,gen	Divide, rounding down.
7	MODi	gen,gen	Remainder from DIV (Modulus).
7	MEIi	gen,gen	Multiply to extended integer.
7	DEIi	gen,gen	Divide extended integer.
PACKED DECIMAL (BCD) ARITHMETIC
Format	Operation	Operands	Description
6	ADDPi	gen,gen	Add packed.
6	SUBPi	gen,gen	Subtract packed.

2.0 Architectural Description (Continued)

		TABLE 2-2. NS32FX164 Instruction Set Summary (Continued)
INTEGER COMPARISON
Format	Operation	Operands	Description
4	CMPi	gen,gen	Compare.
2	CMPQi	short,gen	Compare to signed 4-bit constant.
7	CMPMi	gen,gen,disp	Compare multiple: disp bytes (1 to 16).
LOGICAL AND BOOLEAN
Format	Operation	Operands	Description
4	ANDi	gen,gen	Logical AND.
4	ORi	gen,gen	Logical OR.
4	BICi	gen,gen	Clear selected bits.
4	XORi	gen,gen	Logical exclusive OR.
6	COMi	gen,gen	Complement all bits.
6	NOTi	gen,gen	Boolean complement: LSB only.
2	Scondi	gen	Save condition code (cond) as a Boolean variable of size i.
SHIFTS
Format	Operation	Operands	Description
6	LSHi	gen,gen	Logical shift, left or right.
6	ASHi	gen,gen	Arithmetic shift, left or right.
6	ROTi	gen,gen	Rotate, left or right.

BIT FIELDS

Bit fields are values in memory that are not aligned to byte boundaries. Examples are PACKED arrays and records used in Pascal. ``Extract'' instructions read and align a bit field. ``Insert'' instructions write a bit field from an aligned source.

Format	Operation	Operands	Description
8	EXTi	reg,gen,gen,disp	Extract bit field (array oriented).
8	INSi	reg,gen,gen,disp	Insert bit field (array oriented).
7	EXTSi	gen,gen,imm,imm	Extract bit field (short form).
7	INSSi	gen,gen,imm,imm	Insert bit field (short form).
8	CVTP	reg,gen,gen	Convert to bit field pointer.
ARRAYS
Format	Operation	Operands	Description
8	CHECKi	reg,gen,gen	Index bounds check.
8	INDEXi	reg,gen,gen	Recursive indexing step for multiple-dimensional arrays.

STRINGS

String instructions assign specific functions to the General Purpose Registers:

R4 Ð Comparison Value

R3 Ð Translation Table Pointer

R2 Ð String 2 Pointer

R1 Ð String 1 Pointer

R0 Ð Limit Count

Options on all string instructions are:

B (Backward):	Decrement string pointers after each
	step rather than incrementing.
U (Until match):	End instruction if String 1 entry matches
	R4.
W (While match):	End instruction if String 1 entry does not
	match R4.

All string instructions end when R0 decrements to zero.

2.0 Architectural Description (Continued)

		TABLE 2-2. NS32FX164 Instruction Set Summary (Continued)
Format	Operation	Operands	Description
5	MOVSi	options	Move string 1 to string 2.
	MOVST	options	Move string, translating bytes.
5	CMPSi	options	Compare string 1 to string 2.
	CMPST	options	Compare, translating string 1 bytes.
5	SKPSi	options	Skip over string 1 entries.
	SKPST	options	Skip, translating bytes for until/while.
JUMPS AND LINKAGE
Format	Operation	Operands	Description
3	JUMP	gen	Jump.
0	BR	disp	Branch (PC Relative).
0	Bcond	disp	Conditional branch.
3	CASEi	gen	Multiway branch.
2	ACBi	short,gen,disp	Add 4-bit constant and branch if non-zero.
3	JSR	gen	Jump to subroutine.
1	BSR	disp	Branch to subroutine.
1	CXP	disp	Call external procedure
3	CXPD	gen	Call external procedure using descriptor.
1	SVC		Supervisor call.
1	FLAG		Flag trap.
1	BPT		Breakpoint trap.
1	ENTER	[reg list], disp	Save registers and allocate stack frame (Enter Procedure).
1	EXIT	[reg list]	Restore registers and reclaim stack frame (Exit Procedure).
1	RET	disp	Return from subroutine.
1	RXP	disp	Return from external procedure call.
1	RETT	disp	Return from trap. (Privileged)
1	RETI		Return from interrupt. (Privileged)
CPU REGISTER MANIPULATION
Format	Operation	Operands	Description
1	SAVE	[reg list]	Save general purpose registers.
1	RESTORE	[reg list]	Restore general purpose registers.
2	LPRi	areg,gen	Load dedicated register. (Privileged if PSR or INTBASE)
2	SPRi	areg,gen	Store dedicated register. (Privileged if PSR or INTBASE)
3	ADJSPi	gen	Adjust stack pointer.
3	BISPSRi	gen	Set selected bits in PSR. (Privileged if not Byte length)
3	BICPSRi	gen	Clear selected bits in PSR. (Privileged if not Byte length)
5	SETCFG	[option list]	Set configuration register. (Privileged)

2.0 Architectural Description (Continued)

		TABLE 2-2. NS32FX164 Instruction Set Summary (Continued)
FLOATING POINT
Format	Operation	Operands	Description
11	MOVf	gen,gen	Move a floating point value.
9	MOVLF	gen,gen	Move and shorten a long value to standard.
9	MOVFL	gen,gen	Move and lengthen a standard value to long.
9	MOVif	gen,gen	Convert any integer to standard or long floating.
9	ROUNDfi	gen,gen	Convert to integer by rounding.
9	TRUNCfi	gen,gen	Convert to integer by truncating, toward zero.
9	FLOORfi	gen,gen	Convert to largest integer less than or equal to value.
11	ADDf	gen,gen	Add.
11	SUBf	gen,gen	Subtract.
11	MULf	gen,gen	Multiply.
11	DIVf	gen,gen	Divide.
11	CMPf	gen,gen	Compare.
11	NEGf	gen,gen	Negate.
11	ABSf	gen,gen	Take absolute value.
9	LFSR	gen	Load FSR.
9	SFSR	gen	Store FSR.
12	POLYf	gen,gen	Polynomial Step.
12	DOTf	gen,gen	Dot Product.
12	SCALBf	gen,gen	Binary Scale.
12	LOGBf	gen,gen	Binary Log.
MISCELLANEOUS
Format	Operation	Operands	Description
1	NOP		No operation.
1	WAIT		Wait for interrupt.
1	DIA		Diagnose. Single-byte ``Branch to Self'' for hardware
			breakpointing. Not for use in programming.
GRAPHICS
Format	Operation	Operands	Description
5	BBOR	options*	Bit-aligned block transfer `OR'.
5	BBAND	options	Bit-aligned block transfer `AND'.
5	BBFOR		Bit-aligned block transfer fast `OR'.
5	BBXOR	options	Bit-aligned block transfer `XOR'.
5	BBSTOD	options	Bit-aligned block source to destination.
5	BITWT		Bit-aligned word transfer.
5	EXTBLT	options	External bit-aligned block transfer.
5	MOVMPi		Move multiple pattern.
5	TBITS	options	Test bit string.
5	SBITS		Set bit string.
5	SBITPS		Set bit perpendicular string.
BITS
Format	Operation	Operands	Description
4	TBITi	gen,gen	Test bit.
6	SBITi	gen,gen	Test and set bit.
6	SBITIi	gen,gen	Test and set bit, interlocked.
6	CBITi	gen,gen	Test and clear bit.
6	CBITIi	gen,gen	Test and clear bit, interlocked.
6	IBITi	gen,gen	Test and invert bit.
8	FFSi	gen,gen	Find first set bit.

*Note: Options are controlled by fields of the instruction, PSR status bits, or dedicated register values.

2.0Architectural Description (Continued)

2.5GRAPHICS SUPPORT

The following sections provide a brief description of the NS32FX164 graphics support capabilities. Basic discussions on frame buffer addressing and BITBLT operations are also provided. More detailed information on the NS32FX164 graphics support instructions can be found in the NS32CG16 Printer/Display Processor Programmer's Reference.

2.5.1 Frame Buffer Addressing

There are two basic addressing schemes for referencing pixels within the frame buffer: Linear and Cartesian (or x-y). Linear addressing associates a single number to each pixel representing the physical address of the corresponding bit in memory. Cartesian addressing associates two numbers to each pixel representing the x and y coordinates of the pixel relative to a point in the Cartesian space taken as the origin. The Cartesian space is generally defined as having the origin in the upper left. A movement to the right increases the x coordinate; a movement downward increases the y coordinate.

The correspondence between the location of a pixel in the Cartesian space and the physical (BIT) address in memory is shown in Figure 2-20 . The origin of the Cartesian space (xe0, ye0) corresponds to the bit address `ORG'. Incrementing the x coordinate increments the bit address by one. Incrementing the y coordinate increments the bit address by an amount representing the warp (or pitch) of the Cartesian space. Thus, the linear address of a pixel at location (x, y) in the Cartesian space can be found by the following expression.

ADDR e ORG a y * WARP a x

Warp is the distance (in bits) in the physical memory space between two vertically adjacent bits in the Cartesian space.

Example 1 below shows two NS32FX164 instruction sequences to set a single pixel given the x and y coordinates. Example 2 shows how to create a fat pixel by setting four adjacent bits in the Cartesian space.

Example 1: Set pixel at location (x, y)

Setup: R0 x coordinate

R1 y coordinate

Instruction Sequence 1:
MULD	WARP, R1		; Y*WARP
ADDD	R0,	R1	; 0 X 4 BIT OFFSET
SBITD	R1, ORG		; SET PIXEL
Instruction Sequence 2:
INDEXD R1, (WARP-1), R0			; Y*WARP 0 X
SBITD	R1, ORG		; SET PIXEL

Example 2: Create fat pixel by		setting bits at locations
	(x, y), (xa1, y), (x, ya1) and (xa1, ya1).
Setup: R0 x coordinate
	R1 y coordinate
Instruction Sequence:
INDEXD	R1, (WARP-1), R0	; BIT ADDRESS
SBITD	41, ORG	; SET FIRST PIXEL
ADDQD	1, R1	; (X01, Y)
SBITD	R1, ORG	; SECOND PIXEL
ADDD	(WARP-1), R1	; (X, Y01)
SBITD	R1, ORG	; THIRD PIXEL
ADDQD	1, R1	; (X01, Y01)
SBITD	R1, ORG	; LAST PIXEL

TL/EE/11267 ± 6

FIGURE 2-20. Correspondence between Linear and Cartesian Addressing

2.5.2 BITBLT Fundamentals

BITBLT, BIT-aligned BLock Transfer, is a general operator that provides a mechanism to move an arbitrary size rectangle of an image from one part of the frame buffer to another. During the data transfer process a bitwise logical operation can be performed between the source and the destination data. BITBLT is also called RasterOp: operations on rasters. It defines two rectangular areas, source and destination, and performs a logical operation (e.g., AND, OR, XOR) between these two areas and stores the result back to the destination. It can be expressed in simple notation as:

Source op Destination x Destination op: AND, OR, XOR, etc.

2.0 Architectural Description (Continued)

2.5.2.1 Frame Buffer Architecture

There are two basic types of frame buffer architectures: plane-oriented or pixel-oriented. BITBLT takes advantage of the plane-oriented frame buffer architecture's attribute of multiple, adjacent pixels-per-word, facilitating the movement of large blocks of data. The source and destination starting addresses are expressed as pixel addresses. The width and height of the block to be moved are expressed in terms of pixels and scan lines. The source block may start and end at any bit position of any word, and the same applies for the destination block.

2.5.2.2 Bit Alignment

Before a logical operation can be performed between the source and the destination data, the source data must first be bit aligned to the destination data. In Figure 2-21 , the source data needs to be shifted three bits to the right in order to align the first pixel (i.e., the pixel at the top left corner) in the source data block to the first pixel in the destination data block.

2.5.2.3 Block Boundaries and Destination Masks

Each BITBLT destination scan line may start and end at any bit position in any data word. The neighboring bits (bits sharing the same word address with any words in the destination data block, but not a part of the BITBLT rectangle) of the BITBLT destination scan line must remain unchanged after the BITBLT operation.

Due to the plane-oriented frame buffer architecture, all memory operations must be word-aligned. In order to preserve the neighboring bits surrounding the BITBLT destination block, both a left mask and a right mask are needed for all the leftmost and all the rightmost data words of the destination block. The left mask and the right mask both remain the same during a BITBLT operation.

The following example illustrates the bit alignment requirements. In this example, the memory data path is 16 bits wide. Figure 2-21 shows a 32 pixel by 32 scan line frame buffer which is organized as a long bit stream which wraps around every two words (32 bits). The origin (top left corner) of the frame buffer starts from the lowest word in memory (word address 00 (hex)).

Each word in the memory contains 16 bits, D0 ± D15. The least significant bit of a memory word, D0, is defined as the first displayed pixel in a word. In this example, BITBLT addresses are expressed as pixel addresses relative to the origin of the frame buffer. The source block starting address is 021 (hex) (the second pixel in the third word). The destination block starting address is 204 (hex) (the fifth pixel in the 33rd word). The block width is 13 (hex), and the height is 06 (hex) (corresponding to 6 scan lines). The shift value is 3.

TL/EE/11267 ± 7

FIGURE 2-21. 32-Pixel by 32-Scan Line Frame Buffer

2.0 Architectural Description (Continued)

2.5.2.4 BITBLT Directions

A BITBLT operation moves a rectangular block of data in a frame buffer. The operation itself can be considered as a subroutine with two nested loops. The loops are preceded by setup operations. In the outer loop the source and destination starting addresses are calculated, and the test for completion is performed. In the inner loop the actual data movement for a single scan line takes place. The length of the inner loop is the number of (aligned) words spanned by each scan line. The length of the outer loop is equal to the height (number of scan lines) of the block to be moved. A skeleton of the subroutine representing the BITBLT operation follows.

BITBLT:	calculate BITBLT setup parameters;
	(once per BITBLT operation).
	such as
	width, height
	bit misalignment (shift number)
	left, right masks
	horizontal, vertical directions
	etc
	#
	#

OUTERLOOP: calculate source, dest addresses; (once per scanline).

INNERLOOP: move data, (logical operation) and increment addresses;

(once per word).

UNTIL

done horizontally

UNTIL	done vertically
RETURN	(from BITBLT).

Note: In the NS32FX164 only the setup operations must be done by the programmer. The inner and outer loops are automatically executed by the BITBLT instructions.

Each loop can be executed in one of two directions: the inner loop from left to right or right to left, the outer loop from top to bottom (down) or bottom to top (up).

The ability to move data starting from any corner of the BITBLT rectangle is necessary to avoid destroying the BITBLT source data as a result of destination writes when the source and destination are overlapped (i.e., when they share pixels). This situation is routinely encountered while panning or scrolling.

A determination of the correct execution directions of the BITBLT must be performed whenever the source and destination rectangles overlap. Any overlap will result in the destruction of source data (from a destination write) if the correct vertical direction is not used. Horizontal BITBLT direction is of concern only in certain cases of overlap, as will be explained below.

Figures 2-22(a) and (b) illustrate two cases of overlap. Here, the BITBLT rectangles are three pixels wide by five scan lines high; they overlap by a single pixel in (a) and a single column of pixels in (b) . For purposes of illustration, the BITBLT is assumed to be carried out pixel-by-pixel. This convention does not affect the conclusions.

In Figure 2-22(a) , if the BITBLT is performed in the UP direction (bottom-to-top) one of the transfers of the bottom scan line of the source will write to the circled pixel of the destination. Due to the overlap, this pixel is also part of the uppermost scan line of the source rectangle. Thus, data needed later is destroyed. Therefore, this BITBLT must be performed in the DOWN direction. Another example of this oc-

TL/EE/11267 ± 8	TL/EE/11267 ± 9
(a)	(b)
FIGURE 2-22. Overlapping BITBLT Blocks

The left mask and the right mask are 0000,1111,1111,1111 and 1111,1111,0000,0000 respectively.

Note 1: Zeros in either the left mask or the right mask indicate the destination bits which will not be modified.

Note 2: The BB(function) and EXTBLT instructions use different set up parameters, and techniques.

2.0 Architectural Description (Continued)

curs any time the screen is moved in a purely vertical direction, as in scrolling text. It should be noted that, in both of these cases, the choice of horizontal BITBLT direction may be made arbitrarily.

Figure 2-22(b) demonstrates a case in which the horizontal BITBLT direction may not be chosen arbitrarily. This is an instance of purely horizontal movement of data (panning). Because the movement from source to destination involves data within the same scan line, the incorrect direction of movement will overwrite data which will be needed later. In this example, the correct direction is from right to left.

2.5.2.5 BITBLT Variations

The ``classical'' definition of BITBLT, as described in ``Smalltalk-80 The Language and its Implementation'', by Adele Goldberg and David Robson, provides for three operands: source, destination and mask/texture. This third operand is commonly used in monochrome systems to incorporate a stipple pattern into an area. These stipple patterns provide the appearance of multiple shades of gray in single- bit-per-pixel systems, in a manner similar to the ``halftone'' process used in printing.

Texture op1 Source op2 Destination x Destination

While the NS32FX164 and the external BPU (if used) are essentially two-operand devices, three-operand BITBLT operations can be implemented quite flexibly and efficiently by performing the two operations serially.

2.5.3 GRAPHICS SUPPORT INSTRUCTIONS

The NS32FX164 provides eleven instructions for supporting graphics oriented applications. These instructions are divided into three groups according to the operations they perform. General descriptions for each of them and the related formats are provided in the following sections.

2.5.3.1 BITBLT (BIT-aligned BLock Transfer)

This group includes seven instructions. They are used to move characters and objects into the frame buffer which will be printed or displayed. One of the instructions works in conjunction with an external BITBLT Processing Unit (BPU) to maximize performance. The other six are executed by the NS32FX164.

BIT-aligned BLock Transfer

Syntax: BB(function) Options

Setup:	R0	base address, source data
	R1	base address, destination data
	R2	shift value
	R3	height (in lines)
	R4	first mask
	R5	second mask
	R6	source warp (adjusted)
	R7	destination warp (adjusted)
	0(SP)	width (in words)
Function:	AND, OR, XOR, FOR, STOD
Options:	IA	Increasing Address (default option).
		When IA is selected, scan lines are
		transferred in the increasing BIT/BYTE
		order.
	DA	Decreasing Address.
	S	True Source (default option).
	bS	Inverted Source.

These five instructions perform standard BITBLT operations between source and destination blocks. The operations available include the following:

BBAND:	src	AND	dst
	bsrc	AND	dst
BBOR:	src	OR	dst
	bsrc	OR	dst
BBXOR:	src	XOR	dst
	bsrc	XOR	dst
BBFOR:	src	OR	dst
BBSTOD:	src	TO	dst
	bsrc	TO	dst

`src' and `bsrc' stand for `True Source' and `Inverted Source' respectively; `dst' stands for `Destination'.

Note 1: For speed reasons, the BB instructions require the masks to be specified with respect to the source block. In Figure 2-21 masking was defined relative to the destination block.

Note 2: The options bS and DA are not available for the BBFOR instruction.

Note 3: BBFOR performs the same operation as BBOR with IA and S options.

Note 4: IA and DA are mutually exclusive and so are S and bS.

Note 5: The width is defined as the number of words of source data to read.

Note 6: An odd number of bytes can be specified for the source warp. However, word alignment of source scan lines will result in faster execution.

The horizontal and vertical directions of the BITBLT operations performed by the above instructions, with the exception of BBFOR, are both programmable. The horizontal direction is controlled by the IA and DA options. The vertical direction is controlled by the sign of the source and destination warps. Figure 2-23 and Table 2-3 show the format of the BB instructions and the encodings for the `op' and `i' fields.

23	16	15		8		7	0

0 0 0 0 0 0 D X S 0			op		i	0	0 0 0 1 1 1 0

#D is set when the DA option is selected

#S is set when the bS option is selected

#X is set for BBAND, and it is clear for all other BB instructions

FIGURE 2-23. BB Instructions Format

TABLE 2-3. `op' and `i' Field Encodings

Instruction	Options	`op' Field	`i' Field

BBAND	Yes	1010	11
BBOR	Yes	0110	01

BBXOR	Yes	1110	01
BBFOR	No	1100	01

BBSTOD	Yes	0100	01

BIT-aligned Word Transfer

Syntax: BITWT
Setup:	R0	Base address, source word
	R1	Base address, destination double word
	R2	Shift value

The BITWT instruction performs a fast logical OR operation between a source word and a destination double word, stores the result into the destination double word and increments registers R0 and R1 by two. Before performing the OR operation, the source word is shifted left (i.e., in the direction of increasing bit numbers) by the value in register R2.

2.0 Architectural Description (Continued)

This instruction can be used within the inner loop of a block OR operation. Its use assumes that the source data is `clean' and does not need masking. The BITWT format is shown in Figure 2-24 .

23						16		15			8	7					0

0 0	0	0	0		0	0	0	0	0	1	0 0 0 0 1	0	0 0 0	1	1	1	0
		FIGURE 2-24. BITWT Instruction Format
External BITBLT
Syntax: EXTBLT
Setup:				R0			base addresses, source data
				R1			base address, destination data
				R2			width (in bytes)
				R3			height (in lines)
				R4			horizontal increment/decrement
				R5			temporary register (current width)
				R6			source warp (adjusted)
				R7			destination warp (adjusted)

Note 1: R0 and R1 are updated after execution to point to the last source and destination addresses plus related warps. R2, R3 and R5 will be modified. R4, R6, and R7 are returned unchanged.

Note 2: Source and destination pointers should point to word-aligned operands to maximize speed and minimize external interface logic.

This instruction performs an entire BITBLT operation in conjunction with an external BITBLT Processing Unit (BPU). The external BPU Control Register should be loaded by the software before the instruction is executed (refer to the DP8510 or DP8511 data sheets for more information on the BPU). The NS32FX164 generates a series of source read, destination read and destination write bus cycles until the entire data block has been transferred. The BITBLT operation can be performed in either horizontal direction. As controlled by the sign of the contents of register R4.

Depending on the relative alignment of the source and destination blocks, an extra source read may be required at the beginning of each scan line, to load the pipeline register in the external BPU. The L bit in the PSR register determines whether the extra source read is performed. If L is 1, no extra read is performed. The instructions CMPQB 2,1 or CMPQB 1,2 could be executed to provide the right setting for the L bit just before executing EXTBLT. Figure 2-25 shows the EXTBLT format. The bus activity for a simple BITBLT operation is shown in Figure 2-30.

23					15				8	7						0

0	0	0 0 0	0	0 0 0		0	0 1 0	1	1 1 0		0	0	0	1	1	1 0

FIGURE 2-25. EXTBLT Instruction Format

2.5.3.2 Pattern Fill

Only one instruction is in this group. It is usually used for clearing RAM and drawing patterns and lines.

Move Multiple Pattern

Syntax: MOVMPi

Setup:	R0	base address of the destination
	R1	pointer increment (in bytes)
	R2	number of pattern moves
	R3	source pattern

Note: R1 and R3 are not modified by the instruction. R2 will always be returned as zero. R0 is modified to reflect the last address into which a pattern was written.

This instruction stores the pattern in register R3 into the destination area whose address is in register R0. The pattern count is specified in register R2. After each store operation the destination address is changed by the contents of register R1. This allows the pattern to be stored in rows, in columns, and in any direction, depending on the value and sign of R1. The MOVMPi instruction format is shown in Figure 2-26 .

23	15	8		7	0

0 0 0 0 0 0 0 0 0 0 0 1 1 1			i	0	0 0 0 1 1 1 0

FIGURE 2-26. MOVMPi Instruction Format

2.5.3.3 Data Compression, Expansion and Magnify

The three instructions in this group can be used to compress data and restore data from compression. A compressed character set may require from 30% to 50% less memory space for its storage.

The compression ratio possible can be 50:1 or higher depending on the data and algorithm used. TBITS can also be used to find boundaries of an object. As a character is needed, the data is expanded and stored in a RAM buffer. The expand instructions (SBITS, SBITPS) can also function as line drawing instructions.

Test Bit String

Syntax: TBITS option

Setup:	R0	base address, source (byte address)
	R1	starting source bit offset
	R2	destination run length limited code
	R3	maximum value run length limit
	R4	maximum source bit offset
Option:	1	count set bits until a clear bit is found
	0	count clear bits until a set bit is found

Note: R0, R3 and R4 are not modified by the instruction execution. R1 reflects the new bit offset. R2 holds the result.

This instruction starts at the base address, adds a bit offset, and tests the bit for clear if ``option'' e 0 (and for set if ``option'' e 1). If clear (or set), the instruction increments to the next higher bit and tests for clear (or set). This testing for clear proceeds through memory until a set bit is found or until the maximum source bit offset or maximum run length value is reached. The total number of clear bits is stored in the destination as a run length value.

When TBITS finds a set bit and terminates, the bit offset is adjusted to reflect the current bit address. Offset is then ready for the next TBITS instruction with ``option'' e 0. After the instruction is executed, the F flag is set to the value of the bit previous to the bit currently being pointed to (i.e., the value of the bit on which the instruction completed execution). In the case of a starting bit offset exceeding the maximum bit offset (R1 t R4), the F flag is set if the option was 1 and clear if the option was 0. The L flag is set when the desired bit is found, or if the run length equalled the maximum run length value and the bit was not found. It is cleared otherwise. Figure 2-27 shows the TBITS instruction format.


23	15	8	7	0

0 0 0 0 0 0 0 0 S 0 1 0 0 1 1 1 0				0 0 0 1 1 1 0

# S is set for `TBITS 1' and clear for `TBITS 0'.

FIGURE 2-27. TBITS Instruction Format

2.0 Architectural Description (Continued)

Set Bit String

Syntax: SBITS

Setup:	R0	base address of the destination
	R1	starting bit offset (signed)
	R2	number of bits to set (unsigned)
	R3	address of string look-up table

Note: When the instruction terminates, the registers are returned unchanged.

SBITS sets a number of contiguous bits in memory to 1, and is typically used for data expansion operations. The instruction draws the number of ones specified by the value in R2, starting at the bit address provided by registers R0 and R1. In order to maximize speed and allow drawing of patterned lines, an external 1k byte lookup table is used. The lookup table is specified in the NS32CG16 Printer/Display Processor Programmer's Reference Supplement.

When SBITS begins executing, it compares the value in R2 with 25. If the value in R2 is less than or equal to 25, the F flag is cleared and the appropriate number of bits are set in memory. If R2 is greater than 25, the F flag is set and no other action is performed. This allows the software to use a faster algorithm to set longer strings of bits. Figure 2-28 shows the SBITS instruction format.

23				15				8	7							0

0 0 0	0 0 0	0	0	0	0	1	1 0 1	1 1 0		0	0	0	1	1	1	0

FIGURE 2-28. SBITS Instruction Format

Set BIT Perpendicular String

Syntax: SBITPS

Setup:	R0	base address, destination (byte address)
	R1	starting bit offset
	R2	number of bits to set
	R3	destination warp (signed value, in bits)

Note: When the instruction terminates, the R0 and R3 registers are returned unchanged. R1 becomes the final bit offset. R2 is zero.

The SBITPS can be used to set a string of bits in any direction. This allows a font to be expanded with a 90 or 270 degree rotation, as may be required in a printer application. SBITPS sets a string of bits starting at the bit address specified in registers R0 and R1. The number of bits in the string is specified in R2. After the first bit is set, the destination warp is added to the bit address and the next bit is set. The process is repeated until all the bits have been set. A negative raster warp offset value leads to a 90 degree rotation. A positive raster warp value leads to a 270 degree rotation. If the R3 value is e (space warp a1 or b1), then the result is a 45 degree line. If the R3 value is a1 or b1, a horizontal line results.

SBITS and SBITPS allow expansion on any 90 degree angle, giving portrait, landscape and mirror images from one font. Figure 2-29 shows the SBITPS instruction format.

23				15				8	7							0

0 0 0	0 0 0	0	0	0	0	1	0 1 1	1 1 0		0	0	0	1	1	1	0

FIGURE 2-29. SBITPS Instruction Format

TL/EE/11267 ± 10

FIGURE 2-30. Bus Activity for a Simple BITBLT Operation

Note 1: This example is for a block 4 words wide and 1 line high.

Note 2: The sequence is common with all logical operations of the DP8510/DP8511 BPU.

Note 3: Mask values, shift values and number of bit planes do not affect the performance.

Note 4: Zero wait states are assumed throughout the BITBLT operation.

Note 5: The extra read is performed when the BPU pipeline register needs to be preloaded.

2.0 Architectural Description (Continued)

2.5.3.3.1 Magnifying Compressed Data

Restoring data is just one application of the SBITS and SBITPS instructions. Multiplying the ``length'' operand used by the SBITS and SBITPS instructions causes the resulting pattern to be wider, or a multiple of ``length''.

As the pattern of data is expanded, it can be magnified by 2x, 3x, 4x, . . . , 10x and so on. This creates several sizes of the same style of character, or changes the size of a logo. A magnify in both dimensions X and Y can be accomplished by drawing a single line, then using the MOVS (Move String) or the BB instructions to duplicate the line, maintaining an equal aspect ratio.

More information on this subject is provided in the NS32CG16 Printer/Display Processor Programmer's Reference Supplement.

3.0 Functional Description

This chapter provides details on the functional characteristics of the NS32FX164 microprocessor.

The chapter is divided into five main sections:

Instruction Execution, Exception Processing, Debugging,

DSP Module and System Interface.

3.1 INSTRUCTION EXECUTION

To execute an instruction, the NS32FX164 performs the following operations:

#Fetch the Instruction

#Read Source Operands, if Any (1)

#Calculate Results

#Write Result Operands, if Any

#Modify Flags, if Necessary

#Update the Program Counter

Under most circumstances, the CPU can be conceived to execute instructions by completing the operations above in strict sequence for one instruction and then beginning the sequence of operations for the next instruction. However, due to the internal instruction pipelining, as well as the occurrence of exceptions, the sequence of operations performed during the execution of an instruction may be altered. Furthermore, exceptions also break the sequentiality of the instructions executed by the CPU.

Note 1: In this and following sections, memory locations read by the CPU to calculate effective addresses for Memory-Relative and External addressing modes are considered like source operands, even if the effective address is being calculated for an operand with access class of write.

3.1.1 Operating States

The CPU has four operating states regarding the execution of instructions and the processing of exceptions: Reset, Executing Instructions, Processing An Exception and Waiting- For-An-Interrupt. The various states and transitions between them are shown in Figure 3-1 .

Whenever the RSTI signal is asserted, the CPU enters the reset state. The CPU remains in the reset state until the RSTI signal is driven inactive, at which time it enters the Executing-Instructions state. In the Reset state the contents of certain registers are initialized. Refer to Section 3.5.4 for details.

TL/EE/11267 ± 11

FIGURE 3-1. Operating States

In the Executing-Instructions state, the CPU executes instructions. It will exit this state when an exception is recognized or a WAIT instruction is encountered. At which time it enters the Processing-An-Exception state or the Waiting- For-An-Interrupt state respectively.

While in the Processing-An-Exception state, the CPU saves the PC, PSR and MOD register contents on the stack and reads the new PC and module linkage information to begin execution of the exception service procedure.

Following the completion of all data references required to process an exception, the CPU enters the Executing-In- structions state.

In the Waiting-For-An-Interrupt state, the CPU is idle. A special status identifying this state is presented on the system interface (Section 3.5). When an interrupt is detected, the CPU enters the Processing-An-Exception State.

3.1.2 Instruction Endings

The NS32FX164 checks for exceptions at various points while executing instructions. Certain exceptions, like interrupts, are in most cases recognized between instructions. Other exceptions, like Divide-By-Zero Trap, are recognized during execution of an instruction. When an exception is recognized during execution of an instruction, the instruction ends in one of four possible ways: completed, suspended, terminated, or partially completed. Each type of exception causes a particular ending, as specified in Section 3.2.

3.0 Functional Description (Continued)

3.1.2.1 Completed Instructions

When an exception is recognized after an instruction is completed, the CPU has performed all of the operations for that instruction and for all other instructions executed since the last exception occurred. Result operands have been written, flags have been modified, and the PC saved on the Interrupt Stack contains the address of the next instruction to execute. The exception service procedure can, at its conclusion, execute the RETT instruction (or the RETI instruction for maskable interrupts), and the CPU will begin executing the instruction following the completed instruction.

3.1.2.2 Suspended Instructions

An instruction is suspended when one of several trap conditions is detected during execution of the instruction. A suspended instruction has not been completed, but all other instructions executed since the last exception occurred have been completed. Result operands and flags due to be affected by the instruction may have been modified, but only modifications that allow the instruction to be executed again and completed can occur. For certain exceptions (Trap (UND) the CPU clears the P-flag in the PSR before saving the copy that is pushed on the Interrupt Stack. The PC saved on the Interrupt Stack contains the address of the suspended instruction.

To complete a suspended instruction, the exception service procedure takes either of two actions:

1.The service procedure can simulate the suspended instruction's execution. After calculating and writing the instruction's results, the flags in the PSR copy saved on the Interrupt Stack should be modified, and the PC saved on the Interrupt Stack should be updated to point to the next instruction to execute. The service procedure can then execute the RETT instruction, and the CPU begins executing the instruction following the suspended instruction. This is the action taken when floating-point instructions are simulated by software in systems without a hardware floating-point unit.

2.The suspended instruction can be executed again after the service procedure has eliminated the trap condition that caused the instruction to be suspended. The service procedure should execute the RETT instruction at its conclusion; then the CPU begins executing the suspended instruction again. This is the action taken by a debugger when it encounters a BPT instruction that was temporarily placed in another instruction's location in order to set a breakpoint.

Note 1: It may be necessary for the exception service procedure to alter the P-flag in the PSR copy saved on the Interrupt Stack: If the exception service procedure simulates the suspended instruction and the P- flag was cleared by the CPU before saving the PSR copy, then the saved T-flag must be copied to the saved P-flag (like the floatingpoint instruction simulation described above). Or if the exception service procedure executes the suspended instruction again and the P-flag was not cleared by the CPU before saving the PSR copy, then the saved P-flag must be cleared (like the breakpoint trap described above). Otherwise, no alteration to the saved P-flag is necessary.

3.1.2.3 Terminated Instructions

An instruction being executed is terminated when reset occurs. Any result operands and flags due to be affected by the instruction are undefined, as is the contents of the PC.

3.1.2.4 Partially Completed Instructions

When an interrupt condition is recognized during execution of a string instruction, the instruction is said to be partially completed. A partially completed instruction has not completed, but all other instructions executed since the last exception occurred have been completed. Result operands and flags due to be affected by the instruction may have been modified, but the values stored in the string pointers and other general-purpose registers used during the instruction's execution allow the instruction to be executed again and completed.

The CPU clears the P-flag in the PSR before saving the copy that is pushed on the Interrupt Stack. The PC saved on the Interrupt Stack contains the address of the partially completed instruction. The exception service procedure can, at its conclusion, simply execute the RETT instruction (or the RETI instruction for maskable interrupts), and the CPU will resume executing the partially completed instruction.

3.1.3 Slave Processor Instructions

The NS32FX164 supports only one group of instructions, the floating-point instruction set, as being executable by a slave processor. The floating-point instruction set is validated by the F-bit in the CFG register.

If a floating-point instruction is encountered and the F-bit in the CFG register is not set, a Trap (UND) will result, without any slave processor communication attempted by the CPU. This allows software emulation in case an external floatingpoint unit (FPU) is not used.

3.1.3.1 Slave Processor Protocol

Slave Processor instructions have a three-byte Basic Instruction field, consisting of an ID Byte followed by an Operation Word. The ID Byte has three functions:

1.It identifies the instruction as being a Slave Processor instruction.

2.It specifies which Slave Processor will execute it.

3.It determines the format of the following Operation Word of the instruction.

Upon receiving a Slave Processor instruction, the CPU initiates the sequence outlined in Figure 3-2 . While applying Status Code 1111 (Broadcast ID, Section 3.5.5.1), the CPU transfers the ID Byte on the least-significant half of the Data Bus (AD0 ± AD7). All Slave Processors input this byte and decode it. The Slave Processor selected by the ID Byte is activated, and from this point the CPU is communicating only with it. If any other slave protocol was in progress (e.g., an aborted Slave instruction), this transfer cancels it.

N/A e

3.0 Functional Description (Continued)

The CPU next sends the Operation Word while applying Status Code 1101 (Transfer Slave Operand, Section 3.5.5.1). Upon receiving it, the Slave Processor decodes it, and at this point both the CPU and the Slave Processor are aware of the number of operands to be transferred and their sizes. The Operation Word is swapped on the Data Bus; that is, bits 0 ± 7 appear on pins AD8 ± AD15 and bits 8 ± 15 appear on pins AD0 ± AD7.

Using the Address Mode fields within the Operation Word, the CPU starts fetching operands and issuing them to the Slave Processor. To do so, it references any Addressing Mode extensions which may be appended to the Slave Processor instruction. Since the CPU is solely responsible for memory accesses, these extensions are not sent to the Slave Processor. The Status Code applied is 1101 (Transfer Slave Processor Operand, Section 3.5.5.1).

After the CPU has issued the last operand, the Slave Processor starts the actual execution of the instruction. Upon completion, it will signal the CPU by pulsing SPC low.

While the Slave Processor is executing the instruction, the CPU is free to prefetch instructions into its queue. If it fills the queue before the Slave Processor finishes, the CPU will wait, applying Status Code 0011 (Waiting for Slave).

Upon receiving the pulse on SPC, the CPU uses SPC to read a Status Word from the Slave Processor, applying Status Code 1110 (Read Slave Status). This word has the format shown in Figure 3-3 . If the Q-bit (``Quit'', Bit 0) is set, this indicates that an error was detected by the Slave Processor. The CPU will not continue the protocol, but will imme-

Status Combinations:

Send ID (ID): Code 1111

Xfer Operand (OP): Code 1101

Read Status (ST): Code 1110

Step	Status	Action
1	ID	CPU Sends ID Byte
2	OP	CPU Sends Operation Word
3	OP	CPU Sends Required Operands
4	Ð	Slave Starts Execution.
		CPU Pre-Fetches.

5	Ð	Slave Pulses SPC Low
6	ST	CPU Reads Status Word.
		(Trap? Alter Flags?)
7	OP	CPU Reads Results (If Any).

FIGURE 3-2. Slave Processor Protocol

diately trap through the Slave vector in the Interrupt Table. Certain Slave Processor instructions cause CPU PSR bits to be loaded from the Status Word.

The last step in the protocol is for the CPU to read a result, if any, and transfer it to the destination. The Read cycles from the Slave Processor are performed by the CPU while applying Status Code 1101 (Transfer Slave Operand).

3.1.3.2 Floating-Point Instructions

Table 3-1 gives the protocols followed for each FloatingPoint instruction. The instructions are referenced by their mnemonics. For the bit encodings of each instruction, see Appendix A.

		TABLE 3-1. Floating-Point Instruction Protocols
Mnemonic	Operand 1	Operand 2	Operand 1	Operand 2	Returned Value	PSR Bits
Mnemonic	Class	Class	Issued	Issued	Type and Dest.	Affected
	Class	Class	Issued	Issued	Type and Dest.	Affected
ADDf	read.f	rmw.f	f	f	f to Op.2	none
SUBf	read.f	rmw.f	f	f	f to Op.2	none
MULf	read.f	rmw.f	f	f	f to Op.2	none
DIVf	read.f	rmw.f	f	f	f to Op.2	none
MOVf	read.f	write.f	f	N/A	f to Op.2	none
ABSf	read.f	write.f	f	N/A	f to Op.2	none
NEGf	read.f	write.f	f	N/A	f to Op.2	none
CMPf	read.f	read.f	f	f	N/A	N,Z,L
FLOORfi	read.f	write.i	f	N/A	i to Op.2	none
TRUNCfi	read.f	write.i	f	N/A	i to Op.2	none
ROUNDfi	read.f	write.i	f	N/A	i to Op.2	none
MOVFL	read.F	write.L	F	N/A	L to Op.2	none
MOVLF	read.L	write.F	L	N/A	F to Op.2	none
MOVif	read.i	write.f	i	N/A	f to Op.2	none
LFSR	read.D	N/A	D	N/A	N/A	none
SFSR	N/A	write.D	N/A	N/A	D to Op. 2	none
POLYf	read.f	read.f	f	f	f to F0	none
DOTf	read.f	read.f	f	f	f to F0	none
SCALBf	read.f	rmw.f	f	f	f to Op. 2	none
LOGBf	read.f	write.f	f	N/A	f to Op. 2	none

Notes:

D e Double Word

i e Integer size (B, W, D) specified in mnemonic.

f e Floating-Point type (F, L) specified in mnemonic. Not Applicable to this instruction.

3.0 Functional Description (Continued)

The Operand class columns give the Access Class for each general operand, defining how the addressing modes are interpreted (see Series 32000 Instruction Set Reference Manual).

The Operand Issued columns show the sizes of the operands issued to the Floating-Point Unit by the CPU. ``D'' indicates a 32-bit Double Word. ``i'' indicates that the instruction specifies an integer size for the operand (B e Byte, W e Word, D e Double Word). ``f'' indicates that the instruction specifies a Floating-Point size for the operand (F e 32-bit Standard Floating, L e 64-bit Long Floating).

The Returned Value Type and Destination column gives the size of any returned value and where the CPU places it. The PSR Bits Affected column indicates which PSR bits, if any, are updated from the Slave Processor Status Word (Figure 3-3) .

TL/EE/11267 ± 12

FIGURE 3-3. Slave Processor Status Word

Any operand indicated as being of type ``f'' will not cause a transfer if the Register addressing mode is specified. This is because the Floating-Point Registers are physically on the Floating-Point Unit and are therefore available without CPU assistance.

3.2 EXCEPTION PROCESSING

Exceptions are special events that alter the sequence of instruction execution. The CPU recognizes two basic types of exceptions: interrupts and traps.

An interrupt occurs in response to an event generated either internally, by the on-chip DSP Module, or externally, by activating NMI or INT. External interrupts are typically requested by peripheral devices that require the CPU's attention.

Traps occur as a result either of exceptional conditions (e.g., attempted division by zero) or of specific instructions whose purpose is to cause a trap to occur (e.g., supervisor call instruction).

When an exception is recognized, the CPU saves the PC, PSR and optionally the MOD register contents on the interrupt stack and then it transfers control to an exception service procedure.

Details on the operations performed in the various cases by the CPU to enter and exit the exception service procedure are given in the following sections.

It is to be noted that the reset operation is not treated here as an exception. Even though, like any exception, it alters the instruction execution sequence.

The reason being that the CPU handles reset in a significantly different way than it does for exceptions.

Refer to Section 3.5.4 for details on the reset operation.

3.2.1 Exception Acknowledge Sequence

When an exception is recognized, the CPU goes through three major steps:

1.Adjustment of Registers. Depending on the source of the exception, the CPU may restore and/or adjust the contents of the Program Counter (PC), the Processor Status Register (PSR) and the currently-selected Stack Pointer (SP). A copy of the PSR is made, and the PSR is then set to reflect Supervisor Mode and selection of the Interrupt Stack. Trap (TRC) always disabled. Maskable interrupts are also disabled if the exception is caused by an interrupt.

2.Vector Acquisition. A vector is either obtained from an external interrupt control unit or is supplied internally by default.

3.Service Call. The CPU performs one of two sequences common to all exceptions to complete the acknowledge process and enter the appropriate service procedure. The selection between the two sequences depends on whether the Direct-Exception mode is disabled or enabled.

Direct-Exception Mode Disabled

The Direct-Exception mode is disabled while the DE bit in the CFG register is 0 (Section 2.1.4). In this case the CPU first pushes the saved PSR copy along with the contents of the MOD and PC registers on the interrupt stack. Then it reads the double-word entry from the Interrupt Dispatch table at address ``INTBASE'' a vector c 4''. See Figures 3-4 and 3-5 . The CPU uses this entry to call the exception service procedure, interpreting the entry as an external procedure descriptor.

A new module number is loaded into the MOD register from the least-significant word of the descriptor, and the staticbase pointer for the new module is read from memory and loaded into the SB register. Then the program-base pointer for the new module is read from memory and added to the most-significant word of the module descriptor, which is interpreted as an unsigned value. Finally, the result is loaded into the PC register.

Direct-Exception Mode Enabled

The Direct-Exception mode is enabled when the DE bit in the CFG register is set to 1. In this case the CPU first pushes the saved PSR copy along with the contents of the PC register on the Interrupt Stack. The word stored on the Interrupt Stack between the saved PSR and PC register is reserved for future use; its contents are undefined. The CPU then reads the double-word entry from the Interrupt Dispatch Table at address ``INTBASE a vector c 4''. The CPU uses this entry to call the exception service procedure, interpreting the entry as an absolute address that is simply loaded into the PC register. Figure 3-6 provides a pictorial of the acknowledge sequence. It is to be noted that while the direct-exception mode is enabled, the CPU can respond more quickly to interrupts and other exceptions because fewer memory references are required to process an exception. The MOD and SB registers, however, are not initialized before the CPU transfers control to the service procedure. Consequently, the service procedure is restricted from executing any instructions, such as CXP, that use the contents of the MOD or SB registers in effective address calculations.

3.0 Functional Description (Continued)

TL/EE/11267 ± 13

FIGURE 3-4. Interrupt Dispatch and Cascade Tables

3.2.2 Returning from an Exception Service Procedure

To return control to an interrupted program, one of two instructions can be used: RETT (Return from Trap) and RETI (Return from Interrupt).

RETT is used to return from any trap or non-maskable interrupt service procedure. Since some traps are often used deliberately as a call mechanism for supervisor mode procedures, RETT can also adjust the Stack Pointer (SP) to discard a specified number of bytes from the original stack as surplus parameter space.

RETI is used to return from a maskable interrupt service procedure. A difference of RETT, RETI also informs the onchip ICU as well as any external interrupt control logic that interrupt service has completed. Since interrupts are generally asynchronous external events, RETI does not discard parameters from the stack.

Both of the above instructions always restore the Program Counter (PC) and the Processor Status Register from the interrupt stack. If the Direct-Exception mode is disabled, they also restore the MOD and SB register contents. Figures 3-7 and 3-8 show the RETT and RETI instruction flows when the Direct-Exception mode is disabled.

+ 72 hidden pages