SGS-THOMSON IMSA110 Technical data

rtmrs`q ”YNN? M ˇ’sdkYOVTTLWRRVUTSX?“ —·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

IMSA110

IMAGE AND SIGNAL PROCESSING SUB–SYSTEM

.1-D/2-D SOFTWARE CONFIGURABLE CON-

.VOLVER/FILTER

ON-CHIP PROGRAMMABLE LINE DELAYS (0

.— 1120 STAGES)

8-BIT DATA AND 8.5-BIT COEFFICIENT

.SLICE

.21 MULTIPLY-AND-ACCUMULATE STAGES 1-D (21) OR 2-D (3 x 7) CONVOLUTION WIN-

.DOW

ON-CHIP POST PROCESSOR FOR DATA

.TRANSFORMATIONFULLY CASCADABLE IN WINDOW SIZE AND

.ACCURACY

.20 MHZ DATA THROUGHPUT (420 MOPS) SIGNED/UNSIGNED DATA AND COEFFI-

.CIENTS

.MICROPROCESSOR INTERFACE

.HIGH SPEED CMOS IMPLEMENTATION

.TTL COMPATIBLE

.SINGLE +5V ± 10% SUPPLY

.POWER DISSIPATION < 2.0 WATTS 100 PIN CERAMIC PGA

.APPLICATIONS

.1-D and 2-D digital convolution and correlation

.Real time image processing and enhancement

.Edge and feature detection

Data transformation and histogram equalisa-

.tion

.Computer vision and robotics

.Template matching

.Pulse compression 1-D or 2-D interpolation

PGA100

(Ceramic Grid Array Package)

ORDERING INFORMATION

Part Number

Package

Clock

Military/

Speed

commercial

 

 

 

 

 

 

IMSA110-G20S

PGA100

20MHz

commercial

 

 

 

 

A110-01.TBL

July 1992

1/26

rtmrs`q ”YNN? M ˇ’sdkY?“OVTTLWRRVUSWX?—·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

rtmrs`q ”YNN? M ˇ’sdkYOVTTLWRRVUTSX?“ —·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

IMSA110

PIN CONNECTIONS

Index

1

2

3

4

5

6

7

8

9

10

A

PSRIN

PSRIN

PSRIN

PSRIN

PSROUT

PSROUT

PSROUT

COUT

COUT

COUT

 

[6]

[4]

[2]

[1]

[1]

[2]

[5]

[0]

[1]

[6]

 

 

B

 

CIN

CLK

PSRIN

PSRIN

PSROUT

GND

PSROUT

COUT

Vcc

COUT

 

[3]

[7]

[3]

[0]

[6]

[2]

[7]

 

 

 

 

 

C

 

CIN

CIN

CIN

PSRIN

GND

PSROUT

PSROUT

COUT

GND

COUT

 

[4]

[2]

[0]

[5]

[3]

[7]

[4]

[9]

 

 

 

 

D

 

Vcc

GND

CIN

CIN

PSRIN

PSROUT

COUT

Vcc

COUT

COUT

 

[5]

[1]

[0]

[4]

[3]

[8]

[10]

 

 

 

 

 

E

 

CIN

CIN

CIN

GND

GND

COUT

COUT

COUT

COUT

COUT

 

[8]

[6]

[7]

[5]

[11]

[12]

[13]

[14]

 

 

 

 

F

 

CIN

CIN

CIN

CIN

Vcc

D[6]

COUT

Vcc

GND

COUT

 

[9]

[10]

[11]

[12]

[16]

[15]

 

 

 

 

 

 

G

 

CIN

CIN

CIN

GND

ADR

GND

GND

COUT

COUT

COUT

 

[13]

[15]

[17]

[5]

[19]

[18]

[17]

 

 

 

 

 

H

 

CIN

CIN

CIN

ADR

ADR

E2

GND

Vcc

Vcc

COUT

 

[14]

[19]

[21]

[2]

[7]

[20]

 

 

 

 

 

 

J

 

CIN

CIN

RESET

ADR

ADR

E1

D[2]

D[5]

D[7]

COUT

 

[16]

[20]

[3]

[6]

[21]

 

 

 

 

 

 

 

K

 

CIN

ADR

ADR

ADR

ADR

W

D[0]

D[1]

D[3]

D[4]

 

[18]

[0]

[1]

[4]

[8]

 

 

 

 

 

 

 

Notes : 1. All VCC pins must be connected to the 5 Volt power supply. 2. All GND pins must be connected to ground.

A110-01.EPS

1. INTRODUCTION

The IMSA110 is a single-chip reconfigurable and cascadable subsystem suitable for many high speed image and signal processing applications. Apart from its powerful multiply-accumulate capability (420 MOPs), the strength of the IMSA110 lies in its extensive programmable support for data conditioning and transformation.

2. DESCRIPTION

The IMSA110 consists of a configurable array of multiply-accumulators, three programmable length 1120 stage shift registers, a versatile post-process- ing unit and a microprocessor interface for configuration and control purposes. The comprehensive on-chip facilities make a single device capable of dealing with many image processing operations.

2/26

rtmrs`q ”YNN? M ˇ’sdkY?“OVTTLWRRVUSWX?—·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

rtmrs`q ”YNN? M ˇ’sdkYOVTTLWRRVUTSX?“ —·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

IMSA110

Figure 1 : IMSA110 Users Model

 

 

 

 

 

ENABLE 1

 

Asynchronous Functions

 

 

 

 

 

ENABLE 2

 

 

 

 

 

 

 

 

 

 

 

Configuration and

 

 

 

 

 

 

 

 

WRITE

 

 

 

Backend

control registers

 

 

 

 

 

PCR0

 

 

 

 

 

21 x 8-bit

look up table

BCR

 

 

 

 

 

 

PCR1

 

 

 

 

 

Update coefficient registers

256 x 8-bit data

 

 

 

8

 

PCR2

MMB

 

MEM

Decode

 

transformation

 

 

 

 

 

 

 

 

 

 

DATA

 

logic

21 x 8-bit

look up RAM

SCR

OUB

 

 

 

 

Current coefficient registers

 

 

 

 

 

 

USR

LSR

ACR

TCR

 

 

9

 

 

 

 

 

 

 

 

ADDRESS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Control

CLOCK

 

 

 

 

 

 

 

logic

RESET

 

8

 

1120 stage Programmable

7-stage

 

 

 

 

PSRIN

 

 

multiply-accumulate

 

 

 

 

 

shift register (PSRC)

 

 

 

 

 

 

array C

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1120 stage Programmable

7-stage

 

 

 

 

 

 

 

multiply-accumulate

 

 

 

 

 

 

shift register (PSRB)

 

 

 

 

 

D

array B

 

 

 

 

 

 

 

 

 

 

 

PSROUT

 

 

1120 stage Programmable

7-stage

 

 

 

 

 

 

multiply-accumulate

 

 

 

 

 

shift register (PSRA)

 

 

 

 

 

 

 

 

 

 

8

 

array A

 

 

 

 

 

 

 

 

Backend

22

 

 

 

 

 

 

 

CASCADE

CASCADE

22

 

 

 

22

post-processing unit

OUTPUT

 

 

 

(normalization, saturation,

 

 

 

 

 

 

 

INPUT

 

 

 

 

 

and data transformation)

 

 

 

Synchronous Functions

 

 

 

 

 

A110-02.EPS

The IMSA110 has five interfaces through which data can be transferred, Figure 1. The microprocessor interface allows access to the coefficient registers, the configuration and status registers, and the data transformation tables. The remaining four interfaces allow high speed data input and output to the IMSA110 and the cascading of several devices. A typical IMSA110 system is shown in Figure 3. If N devices are used in the cascade, they can be configured, entirely under software control, as a 21N stage 1-D transversal filter or as a 7X by 3Y 2-D window, where X and Y are any integers satisfying N XY. For example 4 cascaded devices can be software configured as: an 84-stage 1-D filter, a 7 by 12 2-D window, a 28 by 3 2-D window, or a 14 by 6 2-D window.

The final output of the chip is 22 bits wide in twos complement format.

Figure 2 shows the distribution of the delays inside the part.

The latency between PSRin and COUT is dependent upon the length of PSRc. For example, with PSRc set to 0, and all coefficients set to zero except CR0c[6] (so the data passes through all MAC stages), the COUT bus will correspond to the PSRin bus delayed by 47 clock cycles.

The latency between PSRin and PSRout is 5 cycles PLUS the lengths of PSRc, PSRb and PSRa. If the shift registers are bypassed by setting SCR[1] to 1 then PSRout will be PSRin delayed by 2 clock cycles.

The Latency between the cascade input (CIN) and cascade output (COUT) is 6 cycles. This is shown lumped at the cascade input and cascade output pads in Figure 2. Figure 4 gives details of the data pipelining through the backend datapath.

3/26

rtmrs`q ”YNN? M ˇ’sdkY?“OVTTLWRRVUSWX?—·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

rtmrs`q ”YNN? M ˇ’sdkYOVTTLWRRVUTSX?“ —·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

IMSA110

Figure 2 : Synchronous Functions of the IMSA110

PSRIN

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

D

 

 

 

 

 

 

 

Programmable PSRC

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

shift register

 

D

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0 to 1120 stages

1

 

 

 

8

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

8

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Programmable PSRB

 

 

 

 

 

shift register

 

D

 

 

 

 

 

 

 

0 to 1120 stages

1

 

 

 

 

 

 

 

 

 

8

 

 

 

 

 

 

 

 

Programmable PSRA

 

 

 

 

 

 

shift register

 

D

 

 

 

 

 

 

 

 

 

0 to 1120 stages

1

 

 

 

 

 

 

 

 

 

 

 

 

8

MUX

1 D

 

CR1c coefficient registers 7 x 8 bits

 

 

 

CR0c coefficient registers 7 x 8 bits

 

 

D

D

 

 

 

 

1

3

 

 

 

 

2

X

X

X

 

X

 

 

 

 

1

1

1

1

 

 

D

D

D

D

 

 

 

22

 

 

 

 

 

D

 

 

 

CR1b coefficient registers 7 x 8 bits

 

 

 

CR0b coefficient registers 7 x 8 bits

 

 

D

D

 

 

 

 

1

3

 

 

 

 

2

X

X

X

 

X

 

 

 

 

1

1

1

1

 

 

D

D

D

D

 

 

 

22

 

 

 

 

 

D

 

 

 

CR1a coefficient registers 7 x 8 bits

 

 

 

CR0a coefficient registers 7 x 8 bits

 

 

D

D

 

 

 

 

1

3

 

 

 

 

2

X

X

X

 

X

 

 

 

 

1

1

1

1

 

 

D

D

D

D

 

 

 

22

 

 

 

 

 

D

 

 

 

 

 

13

 

 

Backend processing unit

 

 

 

 

 

 

including cascade data path,

 

 

 

 

 

 

 

 

 

 

 

 

normalization, saturation units and

 

 

 

 

 

 

 

 

 

 

 

 

data transformation look up tables

22

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

(see Figure 4 for detail)

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

D

 

 

 

 

D

5

 

 

 

 

 

 

2

 

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PSROUT

CIN

COUT

 

 

cascade input

cascade output

A110-03.EPS

4/26

rtmrs`q ”YNN? M ˇ’sdkY?“OVTTLWRRVUSWX?—·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

rtmrs`q ”YNN? M ˇ’sdkYOVTTLWRRVUTSX?“ —·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

IMSA110

Figure 3 : A Typical IMSA110 Based System

General purpose microprocessor

Input

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

PSRIN

 

PSROUT

 

PSRIN

 

PSROUT

 

PSRIN

 

PSROUT

 

 

 

 

 

 

 

 

 

 

 

IMSA110

 

IMSA110

 

IMSA110

 

 

 

Cascade

 

Cascade

 

Cascade

 

Cascade

 

Cascade

 

Cascade

 

 

 

IN

 

OUT

 

IN

 

OUT

 

IN

 

OUT

 

Output

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Clock

A110-04.EPS

3. PROGRAMMABLE SHIFT REGISTERS

The three shift registers are 8 bits wide and are each programmable from 0 up to 1120 clock cycles in length. The lengths are programmed into control registers via the microprocessor interface.

Data is clocked into the device via the PSRin bus (Programmable Shift Register in) at a maximum rate of 20MHz. On-chip, the input data is then fed through a pipeline of the three shift registers. The output of the first shift register passes to the first 7-stage mac array and also to the input of the second shift register. Having passed through all three shift registers the data is output on the PSRout bus and can be used for cascading. Alternatively, as shown in Figure 2 the shift registers can be bypassed and the input data transferred to the PSRout bus after two delay stages. This mode can be controlled via the on-chip control registers and significantly simplifies software configuration of a cascade arrangement.

4. MAC ARRAY

As shown in Figure 2, the processing core of the device consists of a configurable array of multiplyaccumulators (macs). The mac array consists of three 7-stage transversal filters which can be configured either as a 21-stage linear pipeline or as a 3 × 7 two-dimensional window. The input data is 8 bits wide and is fed to the mac array via three programmable shift registers.

The output of each shift register is supplied as input to one of the three 7-stage transversal filters. For each of the three transversal filters the associated input data is fed simultaneously to all 7 mac stages. At each stage the input sample is multiplied by a coefficient stored in memory, and added to the output of the previous stage delayed by one clock cycle. The output of each 7-stage mac is fed, via a delay stage, to the first stage in the next transversal

filter.

The coefficient word width in the mac array is 8 bits wide. Two banks of coefficients are provided. At any instant one set of coefficients is in use within the mac array. The set in use is defined by the state of the ‘Current Bank’ bit, ACR[0]. The other set can be altered via the microprocessor interface. Once a new set of coefficients has been loaded, the activities of the two coefficient banks can be interchanged without interrupting the flow of data. Alternatively, by setting the ‘continous bank swap’ bit SCR[0], the two coefficient banks are swapped automatically after each data input. In this case the ‘Current Bank’ bit only determines which bank is used first. Both data input and coefficients can be programmed independently to support twos complement or positive unsigned formats allowing multiple devices to be used as a ‘slice’ in higher accuracy systems.

Within the mac array no truncation or rounding is performed on the partial products. The mac array output is fed to the backend post-processing unit which is responsible for data transformation / normalisation and cascading function.

5. BACKEND POST-PROCESSOR — hardware description

The Backend Post-Processor consists of four major blocks : The input block (shifter, cascade adder and rectifier unit),a statistics monitor,the data conditioning unit which itself consists of the data transformation unit and the data normaliser, and the output block (output adder and multiplexers).

A detailed diagram of the Backend Post-Processor is given in Figure 4.

All operations performed in the backend are on twos complement signed numbers unless otherwise stated.

5/26

rtmrs`q ”YNN? M ˇ’sdkY?“OVTTLWRRVUSWX?—·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

rtmrs`q ”YNN? M ˇ’sdkYOVTTLWRRVUTSX?“ —·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

IMSA110

5.1 Shifter, Cascade Adder and Rectifier

Data from the mac array enters the datapath via a programmable shifter. The shifter is capable of arithmetic right shifts (divides) of up to 8 bits with rounding, and left shifts of up to 8 bits. The size of this shift is controlled by the status bits BCR0[5-1]. The output of the shifter passes into the cascade adder where it is added, along with any rounding generated by the shifter, to either the cascade input bus (BCR0[0] = 0), or a zero value (BCR[0] = 1).

If the result of this 22-bit signed addition is greater than 221 - 1, (209715110) then the adder will generate a positive overflow. Likewise, if it is less than -221, (-209715210) a negative overflow will be generated. In other words, a positive overflow is generated if the result of adding two positive numbers (both MSBs = 0) is negative (resulting MSB = 1). Conversely, a negative overflow is generated if the result of adding two negative numbers (both MSBs = 1) is positive (MSB = 0). Adding two numbers of different signs cannot cause the adder to overflow.

The output of the cascade adder can optionally be full-wave or half wave rectified under the control of BCR0[7,6]. The output of the rectifier passes onto the X bus. Overflows on the X bus are signalled to both the statistics monitor and the data conditioner.

5.2 Statistics Monitor

The statistics monitor allows the user to set up watch dogs on the dynamics of the data on the X bus. It cannot affect the data on the X bus. The statistics gathered provide information on the system behaviour which can be used to ensure correct data scaling and normalisation. The information is also useful in the control of the overall system’s analogue frontend.

Hardware/Functions

The statistics monitor consists of a 24 bit Min/Max register (MMR), a 24 bit Min/Max Buffer (MMB), a 22 bit Over/UnderShoot Counter (OUC), a 22 bit Over/UnderShoot Buffer (OUB) and a 22 bit twos complement comparator.

It can perform one of four functions :

MAX REGISTER : Capture the maximum value of data and store it in the MMR.

MIN REGISTER : Capture the minimum value of data and store it in the MMR.

OVERSHOOT COUNTER : Increment the OUC each time the data value exceeds the preset value in the MMR.

UNDERSHOOT COUNTER : Increment the OUC each time the data value is less than the preset value in the MMR.

The mode of operation is determined by the Max/Min switch BCR1[0], and the Static Threshold switch BCR1[1].

Operation

Each sample on the X bus is compared against the threshold stored in the MMR.

If the unit is configured as an overshoot counter and the data on the X bus exceeds the threshold in the MMR, then the counter (OUC) is incremented. If the data is less than or equal to the threshold, then no action will occur. The OUC is unsigned and will not wrap around. Thus it behaves as a saturating counter with a maximum value of 222 - 1, (3FFFFF16, 419430310). If there is a positive overflow on the X bus, then the counter will increment since the correct X bus value must exceed the threshold. Similarly a negative overflow on the X bus will not increment the counter since the correct X bus value cannot exceed the preset threshold.

If the unit is configured as an undershoot counter then the counter will be incremented whenever the sample is less than the preset threshold. In this case a negative overflow will cause the counter to increment.

If the unit is configured as a max register and the X bus exceeds the current threshold in the MMR, then the value on the Xbus is loaded into the MMR and becomes the new threshold and the counter is incremented. If the threshold is not exceeded then no action occurs. Thus the value in the MMR is the maximum value that has appeared on the X bus, and the value in the OUC has been incremented by the number of times that the threshold has been updated.

If the unit is configured as a min register then the threshold is updated and the counter incremented whenever the X bus is less than the current threshold.

When operating as a min/max register, overflows on the X bus can never cause the threshold to be updated as this would load an erroneous value into the MMR.

6/26

rtmrs`q ”YNN? M ˇ’sdkY?“OVTTLWRRVUSWX?—·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

SGS-THOMSON IMSA110 Technical data

rtmrs`q ”YNN? M ˇ’sdkYOVTTLWRRVUTSX?“ —·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

IMSA110

Figure 4 : Detailed Block Diagram of the Backend Post-processing Unit

Clock

 

Cascade input pads

 

 

 

 

 

 

 

cycle

 

 

 

 

 

 

 

 

 

 

 

 

 

 

22

 

 

 

 

 

 

 

22

From MAC array

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

MUX

 

 

 

 

 

Shifter [8:0]

 

 

 

 

 

22

 

 

 

 

 

 

22

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Rounding

 

negative overflow

 

 

 

 

 

Cascade Adder

 

2

 

1

 

 

 

22

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

positive overflow

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Rectifier

 

 

 

 

DATA TRANSFORMATION

 

 

22

22

 

 

 

 

 

 

 

 

 

 

 

 

 

UNIT

 

 

 

 

 

 

 

 

STATISTICS MONITOR

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Prescaler

 

 

 

 

 

 

 

Min/max buffer

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

22

 

 

 

 

 

 

Over/under select

 

 

 

 

 

 

Min/max register

 

 

 

8

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(Isbs) 2

 

 

 

 

 

bus

 

 

22

 

 

 

6

 

 

 

 

 

 

 

 

3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

USR

 

 

 

 

 

 

 

 

Comparator GT/LT

Control

 

 

 

 

 

 

 

 

 

 

 

 

 

 

LSR

 

 

 

 

 

 

 

 

Over/undershoot count

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

22

 

 

 

 

 

 

64 x 32 bit RAM

 

 

 

 

 

 

 

Over/undershoot buffer

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

32

Y bus

5

 

 

 

DATA NORMALIZER

 

 

 

[26:22]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

[21:0]

 

 

MUX

Shifter -2 to 14

 

 

 

32

 

 

 

 

 

Zero data

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

4

 

Byte select

 

 

from

 

 

 

 

 

 

 

 

BCR

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

8

 

 

 

22

 

 

 

 

1

 

 

 

 

 

 

 

MUX

 

 

 

 

 

 

 

 

 

 

 

22

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

22

 

 

Rounding

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Output Adder

 

 

 

5

 

 

 

 

 

 

 

 

22

 

 

 

 

 

 

 

 

 

[21:14]

8

8

 

[7:0]

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

6

 

 

 

 

 

 

 

 

MUX

 

 

[13:8]

MUX

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

8

 

 

 

 

 

8

 

6

 

 

 

 

 

 

[21:14]

 

 

 

[7:0]

 

 

 

 

 

 

 

 

 

22

 

 

 

 

 

 

 

 

 

 

 

 

 

 

-05.EPS

 

 

 

 

 

 

 

Cascade output pads

 

 

 

 

 

 

 

A110

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

7/26

 

rtmrs`q

?”YNN M ˇ’sdkY?“OVTTLWRRVUSWX?—·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

rtmrs`q ”YNN? M ˇ’sdkYOVTTLWRRVUTSX?“ —·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

IMSA110

Overflows

Bit 22 of the MMR records the history of positive overflows on the X bus. Similarly bit 23 records the history of negative overflows. These bits in the MMR are set to zero by writing to the MMR copy location and are active independently of whether the Static Threshold bit is set. When the MMR is read, then bits 22 and 23 are interpreted as follows:

bit 23

bit 22

condition

0

0

No overflow has occured

0

1

One or more positive overflows

have occured

 

 

1

0

One or more negative overflows

have occured

 

 

1

1

Both postive and negative

overflows have occured

 

 

Detailed block diagram of the Backend Post-proc- essing Unit

Access to registers

The MMR and OUC are accessed, through the memory interface, only via their associated buffers (MMB and OUB respectively) and are not accessible directly. In order to load the MMR with a value, the host must first write the value to the MMB and then transfer the data from the MMB to the MMR by performing a WRITE to the copy MMR location, 0B416. To read the MMR the host must first perform a READ cycle from location 0B416 (which transfers the contents of the MMR into the MMB) and then read the MMB. The OUB is accessed in the same way except that the dummy writes and reads are done to and from location 0BC16.

Copies from MMR to MMB and OUC to OUB (reads) can be performed at any time giving a snapshot of the contents of the MMR and OUC respectively. Copies from MMB to MMR and OUB to OUC (writes) can also be performed at any time allowing the threshold and counter to be updated dynamically.

5.3 Data transformation unit

The data transformation unit consists of a prescalar, an under/over select detector, a look up table and a byte selector. It can be used in isolation to perform abitrary data mappings, or in conjunction with the data normaliser to implement sophisticated dynamic range compression functions.

Prescalar

This allows an 8-bit field anywhere within the 22-bit X bus to be selected as the address to the LUT. This is performed by right shifting the X bus so that the required 8 bits are at the least significant end. The

amount of right shift is programmed in BCR2[4-0] and can have a value from 0 to 16.

Over/under select detector

With PosLUTAddr (SCR[6]) set to zero, this unit monitors whether the amount of right shift performed by the prescalar is sufficient to include all significant bits in, and maintain the sign of, the selected 8 bit field (i.e. an over or under select is generated if the most significant bit of the selected 8 bit field differs from any subsequent bit right up to and including the most significant bit of the right shifted X bus). This will be an overselect if the X bus is positive (Bit 21 = 0), and an underselect if the X bus is negative (Bit 21 = 1). In other words the LUT address is always deemed to be signed with an address range of -128 to 127.

If however the control bit PosLUTAddr (SCR[6]) is set to one, the unit monitors whether the amount of right shift performed by the prescaler is sufficient to include all significant bits in the selected 8 bit field AND that all unselected bits are zero (i.e. an over or under select is generated if the first selected bit (bit 9) is not zero OR differs from any subsequent bit right up to and including the most significant bit of the right shifted X bus). This will be an overselect if the Xbus is positive and an underselect WHENEVER the Xbus is negative. Thus, in this mode, the address range of the LUT is 0 to 255.

Prescalar under/over selects and X bus positive/negative overflows are passed to the LUT along with the selected 8 bit address field.

Look up table (LUT) and byte select

The LUT consists of 64 words, 32 bits wide plus two special 32 bit locations called the upper and lower saturation registers (USR and LSR respectively). Thus the LUT is actually 66 words by 32 bits. The 32 bit output of the LUT is called the Y bus.

The most significant 6 bits of the 8 bit address field are used to address one of 64 words in the LUT. The least significant pair of bits in the 8 bit field are used to control a byte select on the output. Thus in addition to operating as a 64+2 word look up table of 32 bit words, it can be used as an 8 bit, 256+2 byte LUT providing 8bit — 8bit transformations.

Positive overflows on the X bus, and over selects in the prescalar cause the LUT to access the USR overriding the address given by the prescalar. Likewise negative overflows and under selects cause the LUT to access the LSR. Any sort of overflow on the X bus or prescalar will cause the byte select control to be overridden and the most significant byte (byte 3) of the appropriate Saturation Register will appear on the byte wide output of the data transformation unit.

8/26

rtmrs`q ”YNN? M ˇ’sdkY?“OVTTLWRRVUSWX?—·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

rtmrs`q ”YNN? M ˇ’sdkYOVTTLWRRVUTSX?“ —·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

IMSA110

If there are simultaneous overflows on the X bus and in the prescalar then the overflow from the X bus takes priority.

The USR and LSR can thus be used to model the saturating behaviour of analogue circuits instead of the usual ‘wrap around’ encountered in digital systems. Alternatively the USR and LSR could signal error conditions within the backend directly on the output pins via one of the output multiplexers.

The LUT is loaded via the memory interface. The addressing for the LUT corresponds to the 8 bit field, assuming that the byte selector is being used. In order to access the look up table, USR and LSR from the microprocessor interface, the LUT Access control bit ACR[1] must be set to zero. This will force the Y bus to zero and the normaliser to be controlled by BCR3[7-3] regardless of the setting of the dynamic normalisation bit, BCR3[2]. The LUT, USR and LSR can then be loaded with any arbitrary value via the microprocessor interface. Setting the LUT access control bit to one will then allow the LUT to be used in the data transformation unit.

5.4 Data normaliser

This unit consists of a shifter capable of right shifts of up to 14 bits and left shifts up to 2 bits, followed by a zero data unit and an adder. The shifter is controllable from one of two 5 bit sources : control bits BCR3[7-3] or bits 26 to 22 of the Y bus. The control bit Enable Dynamic Normalisation

(BCR3[2]) determines which source is in control of the normaliser. If this bit is set to zero the normaliser is controlled by BCR3[7-3]. The five bit field is a twos complement number between 14 and -2. This indicates the amount of right shift (negative meaning left shift). Any value outside this range causes the output of the shifter to be forced to zero. The output of the shifter, with any rounding generated by the shifter, goes into the output adder.

5.5 Output adder

This is a 22 bit adder with one of its inputs coming from the data normaliser. The other input is either bits 21 to 0 of the Y bus from the data transformation unit, or set to zero under the control of BCR3[1]. Note that any overflow occuring due to left shifting in the normaliser or the subsequent addition in the output adder is not detected by the IMSA110.

5.6 Output multiplexers

These two multiplexers allow the currently selected byte from the LUT to be optionally selected to drive either the most significant byte and/or the least significant byte of the Cascade Output pins. This is

controlled by the state of BCR2[5] and BCR2[6]. Enabling either of these multiplexers overrides the state of the Cascade Output pins only on the relavent 8 pins. The remaining pins will continue to represent the output of the output adder.

6. BACKEND POST-PROCESSOR — Modes of Operation

The backend post-processing unit is capable of performing many functions including data scaling, transformation, dynamic range compression and histogram equalisation.

6.1 Default mode (after Reset)

At power up or after reset the state of the backend post-processor is such that data from the MAC array and the cascade input are added and pass straight through the datapath unaffected.

The default mode for the statistics monitor is min register although the values in the OUB, OUC, MMR and MMB will be undefined. Likewise the contents of the LUT, USR and LSR will be undefined, the LUT Access control bit will be zero forcing the Y bus to zero and allowing the microprocessor interface to access the LUT, USR and LSR.

Note that the cascade output pins and the PSR output pins are tristated.

6.2 Cascade adder / MAC data scalar

These units allow the cascading of IMS A110s where the output of the MAC array may be scaled before it is added to the cascade input data. The shifter can also be used for combining devices to obtain extended precision in input data, coefficient word length or both.

The ability to zero the cascade input provides a simple means of controlling the number of ‘active’ devices cascaded as well as a means of debugging large systems.

6.3 Rectification

Rectification, the removal of negative results, is needed in several image processing functions.

For example, edge detection using a Sobel operator usually requires full wave rectification due to the different signs obtained at differing edge transitions. Edge detection using a Laplacian operator produces a change of sign at an edge. In this case, removing negative numbers using half wave rectification can produce better results as full wave rectification can lead to some blurring of the edge transition.

9/26

rtmrs`q ”YNN? M ˇ’sdkY?“OVTTLWRRVUSWX?—·M·“‘N?e`wYOVTTLWRRVUPWQ?dLl`hkY QO_PURM·“‘

Loading...
+ 18 hidden pages