1-D/2 -D SOFTWARE CONFIGURABL E C ONVOLVER/FILTER
.
ON-CHIP PROGRAMMABLE LINE DELA YS (0
— 1120 S TAGES)
.
8-BIT DATA AND 8.5-BIT COEFFICIENT
SLICE
.
21 MULTIPLY-AND-A CCUMULATE STAGES
.
1-D (21) OR 2-D (3 x 7) CONVOL UTION WINDOW
.
ON-CHIP POST PROCESSOR FOR DATA
TRANSFORMATION
.
FULLY CASCADABLE IN WINDOW SIZE AND
ACCURACY
.
20 MHZ DATA THROUGHPUT (420 MOP S)
.
SIGNED/UNSIGNED DATA AND COEFFICIENTS
.
MICROPROCESSOR INTERFACE
.
HIGH SPEED CMOS IMPLEMENTA TION
.
TTL COMPATIBLE
.
SINGLE +5V ± 10% SUPPLY
.
POWER DISSIPATION < 2.0 WAT TS
.
100 PIN CERAMIC PGA
IMSA110
PGA100
(Ceramic Grid Array Package)
APPLICATIONS
.
1-D and 2-D digital convolution and correlation
.
Real time image processing and enhancement
.
Edge and feature detect ion
.
Data transformation and histogram equalisation
.
Computer vision and robotics
.
Template matching
.
Pulse compression
.
1-D or 2-D interpolation
July 1992
ORDERING INFORMATION
Part NumberPackage
IMSA110-G20SPGA10020MHzcommercial
Clock
Speed
Military/
commercial
A110-01.TBL
1/26
IMSA110
PIN CO NNECTIONS
Index
12345678910
PSRIN
A
B
C
D
E
F
G
H
[6]
CIN
[3]
CIN
[4]
VccGND
CIN
[8]
CIN
[9]
CIN
[13]
CIN
[14]
PSRIN
[4]
CLK
CIN
[2]
CIN
[6]
CIN
[10]
CIN
[15]
CIN
[19]
PSRIN
[2]
PSRIN
[7]
CIN
[0]
CIN
[5]
CIN
[7]
CIN
[11]
CIN
[17]
CIN
[21]
[1]
[3]
[5]
CIN
[1]
CIN
[12]
[2]
PSROUT
PSROUT
GND
PSRIN
Vcc
ADR
ADR
PSRIN
PSRIN
PSRIN
GNDGND
GND
ADR
[1]
[0]
[0]
[5]
[7]
PSROUT
GND
PSROUT
PSROUT
COUT
GNDGND
PSROUT
[2]
PSROUT
PSROUT
[3]
[4]
[5]
D[6]
E2
[5]
[6]
[7]
COUT
[3]
COUT
[11]
COUT
[16]
GND
COUT
[0]
COUT
[2]
COUT
[4]
Vcc
COUT
[12]
VccGND
COUT
[19]
VccVcc
COUT
GND
COUT
COUT
COUT
[1]
Vcc
[8]
[13]
[18]
COUT
[6]
COUT
[7]
COUT
[9]
COUT
[10]
COUT
[14]
COUT
[15]
COUT
[17]
COUT
[20]
CIN
J
[16]
CIN
K
[18]
Notes : 1. All VCC pins must be connected to the 5 Volt power supply .
2. All GND pins must be connected to ground.
CIN
[20]
ADR
[0]
RESET
ADR
[1]
ADR
[3]
ADR
[4]
ADR
ADR
1. INTRODUCTION
The IMSA110 is a single-chip reconfigurab le and
cascadable subsystem suitable for many high
speed image and signal processing applications.
Apart from its powerful multiply-accumu late capability (420 MOPs), the strength of the IMSA 1 10 lies
in its extensive programmable support for data
conditioning and transform ation.
2/26
[6]
[8]
E1
W
D[2]
D[0]
D[5]D[7]
D[3]D[4]D[1]
2. DESCRIPTION
The IMSA110 consists of a configurable array of
multiply-accumul ators, three programm able length
1 120 stage shif t registers, a v ersatile post -processing unit and a microprocessor interf ace for config uration and control purposes. The comprehensive
on-chip facilities make a single device capable of
dealing with many image processing operations.
COUT
[21]
A110-01.EPS
Figure 1 : IMSA110 Users Model
IMSA110
ENABLE 1
ENABLE 2
WRITE
MEM
DATA
ADDRESS
PSRIN
PSROUT
CASCADE
INPUT
Asynchronous Functions
8
Decode
logic
9
8
D
8
22
Synchronous Functions
21 x 8-bit
Update coefficient registers
21 x 8-bit
Current coefficient registers
1120 stage Programmable
shift register (PSRC)
1120 stage Programmable
shift register (PSRB)
1120 stage Programmable
shift register (PSRA)
The IMSA110 has five interfaces through which
data can be transferred, Figure 1. The microprocessor interface allows access to the coefficient
registers, the configuration and status registers,
and the data transformation tables. The remaining
four interfaces allow high speed data input and
output to the IMSA1 10 and the cascading of several
devices. A typical IMSA110 system is shown in
Figure 3. If N devices are used in the cascade, t hey
can be configured, entirely under soft ware c ontr ol,
as a 21N stage 1-D transversal filter or as a 7X by
3Y 2-D window, where X and Y are any integers
satisfying N ≤ XY. For example 4 cascaded devices
can be software configured as: an 84-stage 1-D
filter , a 7 by 12 2-D window, a 28 by 3 2-D window,
or a 14 by 6 2-D window.
The final output of the chip is 22 bits wide in twos
complement format.
Configuration and
Backend
look up table
256 x 8-bit data
transformation
look up RAM
USRLSR
7-stage
multiply-accumulate
array C
7-stage
multiply-accumulate
array B
7-stage
multiply-accumulate
array A
22
control registers
PCR0
PCR1
PCR2
SCR
ACR
post-processing unit
(normalization, saturation,
and data transformation)
BCR
MMB
OUB
TCR
Control
logic
Backend
22
CLOCK
RESET
CASCADE
OUTPUT
Figure 2 shows the distribut ion of the delay s inside
the part.
The latency between P SRin and COUT is dependent upon the length of PSRc. For example, with
PSRc set to 0, and all coefficients set to zero except
CR0c[6] (so the data passes through all MAC
stages), the COUT bus will correspond to the
PSRin bus delayed by 47 clock cycles.
The latency between PSRin and PSRout is 5 cycles
PLUS the lengths of PSRc, PS Rb and PSRa. If t he
shift register s are bypass ed by s etting S CR[ 1] to 1
then PSRout will be PSRin delayed by 2 clock
cycles.
The Latency between the cascade input (CIN) and
cascade output (COUT) is 6 cycles. This is shown
lumped at the cascade input and cascade output
pads in Figure 2. Figure 4 gives details of the data
pipelining through the backend datapath.
A110-02.EPS
3/26
IMSA110
Figure 2 : Synchronous Functions of the IMSA110
PSRIN
D1
Programmable PSRC
shift register
0 to 1120 stages
8
Programmable PSRB
shift register
0 to 1120 stages
CR1c coefficient registers 7 x 8 bits
CR0c coefficient registers 7 x 8 bits
DD
1
1
2
D
3
X
X
XX
1111
D
8
22
DDD
D
CR1b coefficient registers 7 x 8 bits
CR0b coefficient registers 7 x 8 bits
D
D
1
1
2
D
3
X
X
XX
1111
D
8
22
DDD
D
MUX
D1
PSROUT
Programmable PSRA
shift register
0 to 1120 stages
8
CR1a coefficient registers 7 x 8 bits
CR0a coefficient registers 7 x 8 bits
DD
1
1
2
D
3
X
X
XX
1111
D
DDD
22
D
13
Backend processing unit
including cascade data path,
normalization, saturation units and
data transformation look up tables
1
DD225
2
(see Figure 4 for detail)
cascade inputcascade output
1
2
COUTCIN
A110-03.EPS
4/26
Figure 3 : A Typical IMSA110 Based System
IMSA110
General purp o se
microprocessor
Input
Clock
PSRIN
Cascade
IN
PSROUT
IMSA110
Casc ad e
OUT
PSRIN
IMSA 110
Casc ade
IN
3. PROG RAMM ABL E S HI FT RE GISTERS
The three shift registers are 8 bits wide and are
each programmable from 0 up to 1 120 cloc k cycles
in length. The lengths are programmed int o contr ol
registers via the mic roproc essor inter fac e.
Data is clocked into the device via the PSRin bus
(Programmable Shift Register in) at a maximum
rate of 20MHz. On-chip, the input data is then fed
through a pipeline of the three shift registers. The
output of the first shift register passes to the first
7-stage mac array and also to the input of the
second shift register. Having passed through all
three shift registers the data is output on the
PSRout bus and can be used for cas cading. Alt ernatively , as shown in Figure 2 the shif t registers can
be bypassed and the input data transferred to the
PSRout bus after tw o delay stages. This mode can
be controlled via the on-chip control registers and
significantly simplifies software configuration of a
cascade arrangement.
4. MAC ARRAY
As shown in Figure 2, the processing core of the
device consists of a configurable array of multiplyaccumulators (macs). The mac array consists of
three 7-stage transversal filters which can be configured either as a 21-stage linear pipeline or as a
3 × 7 two-dimensional window. The input data is
8 bits wide and is fed to the mac array via three
programmable shift registers.
The output of each shift register is supplied as input
to one of the three 7-stage transversal filters. For
each of the three transversal filters the associated
input data is fed simultaneously t o all 7 mac stages.
At each stage the input sample is multiplied by a
coefficient stored in memory, and added to the
output of the previous stage delayed by one clock
cycle. The output of each 7-stage mac is f ed, via a
delay stage, to the first stage in the next trans versal
PSROUT
Casc ade
OUT
PSRIN
Casc ade
IN
PSROUT
IMSA110
Casc ade
OUT
Output
filter .
The coefficient word width in the mac array is 8 bits
wide. Tw o banks of coefficients are provided. At any
instant one set of coefficients is in use within the
mac array . The set in use is defined by the state of
the ‘Current Bank’ bit , ACR[0]. The other set can be
altered via the microprocessor interface. Once a
new set of coefficients has been loaded, the activ ities of the two coefficient banks can be interchanged without interrupting t he flow of data. Alt ernatively, by setting the ‘continous bank swap’ bit
SCR[0], the two coefficient banks are swapped
automatically after each data input. In this case t he
‘Current Bank’ bit only determines which bank is
used first. Both data inp ut and coefficients can be
programmed independently to support twos complement or positive unsigned format s allowing multiple devices to be used as a ‘slice’ in higher accuracy systems.
Within the mac array no truncation or rounding is
performed on the partial products. The mac array
output is fed to the backend post-processing unit
which is responsible for data transformation / normalisation and cascading function.
5. BACKEND POST-PROCESSOR — hard war e
description
The Backend Post-Processor consists of four major blocks : The input block (shift er, cascade adder
and rectifier unit),a statistics monitor,the data conditioning unit which it self c onsists of the data transformation unit and the data normaliser, and the
output block (output adder and mult i plexers ).
A detailed diagram of the Backend P ost-P rocess or
is given in Figure 4.
All operations performed in the backend are on
twos complement signed numbers unless otherwise stated.
A110-04.EPS
5/26
IMSA110
5.1 Shif ter, Cascade Adder and Rectifi e r
Data from the mac array enters t he datapath via a
programmable shifter. The shifter is capable of
arithmetic right shifts (divides) of up to 8 bits with
rounding, and left shifts of up to 8 bits. The size of
this shift is c ontrolled by t he stat us bits BCR0[ 5-1] .
The output of the shifter passes into the cascade
adder where it is added, along with any rounding
generated by the shifter , to either the cascade input
bus (BCR0[0] = 0), or a zero value (B CR[0] = 1).
If the result of this 22-bit signed addition is greater
21
than 2
erate a positive overf lo w. Likewise, if it is less than
-2
- 1, (209715110) then the adder will gen-
21
, (-209715210) a negative overflow will be generated. In other words, a positive overflow is generated if the result of adding two positive numbers
(both MSBs = 0) is negative (resulting MSB = 1).
Conversely, a negative overflow is generated if the
result of adding two negative numbers (both
MSBs = 1) is positive ( MSB = 0). Adding two numbers of different signs cannot cause the adder to
overflow.
The output of the cas cade adder c an optionally be
full-wave or half wav e rectif ied under the contr ol of
BCR0[7,6]. The output of the rectifier passes onto
the X bus. Overflows on the X bus are signalled to
both the statistics monit or and the data conditioner .
5.2 Stati sti cs Mon itor
The statistics monitor allows the user to set up
watch dogs on the dynamics of the data on the X
bus. It cannot affect the data on the X bus. The
statistics gathered provide information on the system behaviour which can be used to ensure correct
data scaling and normalisation. The information is
also useful in the control of the overall system’s
analogue frontend.
Hardware/Functions
The statistics monitor consists of a 24 bit Min/Max
register (MMR), a 24 bit Min/Max Buffer (MMB), a
22 bit Over/UnderShoot Counter (OUC), a 22 bit
Over/UnderShoot Buffer (OUB) and a 22 bit twos
complement comparat or.
It can perform one of four func tions :
• MAX REGISTER : Capture the maximum value
of data and store it in the MMR.
• MIN REG I S TE R : Capture t he minimum value of
data and store it in the MMR.
• OVERSHOOT COUNTER : Increment the OUC
each time the data value exceeds the preset
value in the MMR.
• UNDERSHOOT COUNTER : Increment the O UC
each time the data value is less than the preset
value in the MMR.
The mode of operation is determined by the
Max/Min switc h BCR1[ 0] , and t he S tatic Threshold
switch BCR1[1].
Operation
Each sample on the X bus is compared against the
threshold stored in the MMR.
If the unit is configured as an overshoot counter
and the data on the X bus exceeds the threshold in
the MMR, then t he counter (OUC) is incremented.
If the data is less than or equal to the threshold, then
no action will occur. The OUC is unsigned and will
not wrap around. Thus it behaves as a saturating
counter with a maximum value of 2
(3FFFFF
, 419430310). If there is a positive over-
16
22
- 1,
flow on the X bus, then the counter will increment
since the correct X bus value must exceed the
threshold. Similarly a negative overflow on the X
bus will not increment the counter since the correct
X bus value cannot exceed the preset thres hold.
If the unit is configured as an undershoot cou nter
then the counter will be increment ed whenever th e
sample is less than the preset threshold. In this
case a negative overflow will cause the counter to
increment.
If the unit is configured as a max regis t er and the
X bus exceeds the current threshold in the MMR,
then the value on the Xbus is loaded into the MMR
and becomes the new threshold and the count er is
incremented. If the threshold is not ex ceeded t hen
no action occurs. Thus the value in the MMR is t he
maximum value that has appeared on the X bus,
and the value in the OUC has been incremented by
the number of times that the threshold has been
updated.
If the unit is configured as a min register then the
threshold is updated and the counter incremented
whenever the X bus is less than the current threshold.
When operating as a min/max register, overflows
on the X bus can never cause the threshold to be
updated as this would load an erroneous value into
the MMR.
6/26
Figure 4 : Detailed Block Diagram of the Back end Post -pr ocessing Unit
IMSA110
Clock
cycle
1
2
3
Cascade input pads
22
negative overflow
positive overflow
DATA TRANSFORMATION
UNIT
Prescaler
Over/under select
(Isbs) 2
64 x 32 bit RAM
8
6
USR
LSR
MUX
22
1
1
22
Cascade Adder
22
Rectifier
22
22
X bus
22
Shifter [8:0]
22
STATISTICS MONITOR
Min/max buffer
Min/max register
22
Comparator GT/LT
Over/undershoot count
Over/undershoot buffer
22
1
Rounding
From MAC array
Control
Y bus
32
[26:22]
[21:0]
32
4
Byte select
8
5
MUX
from
BCR
22
MUX
22
DATA NORMALIZER
Shifter -2 to 14
Zero data
1
22
Rounding
Output Adder
22
5
[21:14][7:0]
MUXMUX
88
[21:14][7:0]
88
6
[13:8]
6
22
Cascade output pads
A110-05.EPS
7/26
IMSA110
Overflows
Bit 22 of the MMR records the history of positive
overflows on the X bus. S imilarly bit 23 records the
history of negative overflows. These bits in the
MMR are set to zero by writing to the MMR copy
location and are active independently of whether
the Static Threshold bit is set. When the MMR is
read, then bits 22 and 23 are interpreted as follows:
bit 23bit 22condition
0 0 No overflow has occured
0 1
1 0
1 1
One or more positive overflows
have occured
One or more negative overflows
have occured
Both postive and negative
overflows have occured
Detailed block diagram of the Backend P ost-processing Unit
Access to registers
The MMR and OUC are accessed, through the
memory interface, only via their associate d buffers
(MMB and OUB respec tively) and ar e not acc ess ible directly . In order to load the MMR with a value,
the host must first write the value to the MMB and
then transfer the data from the MMB to the MMR
by performing a WRITE to the co py MMR loc ation,
0B4
. To read the MMR the host must firs t perform
16
a READ cycl e from locat ion 0B4
(which transfer s
16
the contents of the MMR into the MMB) and then
read the MMB. The OU B is accessed in the same
way except that the dummy writes and reads are
done to and from location 0BC
16
.
Copies from MMR to MMB and OUC to OUB
(reads) can be performed at any time giving a
snapshot of the contents of the MMR and OUC
respectively. Copies from MMB to MMR and OUB
to OUC (writes) c an also be perf ormed at any time
allowing the threshold and counter to be updated
dynamically.
5.3 Data transformation unit
The data transformation unit consists of a prescalar, an under/over select detector, a look up table
and a byte selector. It can be used in isolation to
perform abitrary data mappings, or in conjunction
with the dat a normaliser to implement sophisticated
dynamic range compressi on functions .
Prescalar
This allows an 8-bit field anywhere within the 22-bit
X bus to be select ed as the address to the LUT. This
is performed by right shifting the X bus so that the
required 8 bits are at the least significant end. Th e
amount of right shift is programmed in BCR2[4-0]
and can have a value from 0 to 16.
Over/under select detector
With PosLUTAddr (SCR[6]) set to zero, this unit
monitors whether the amount of right shift performed by the prescalar is sufficient to include all
significant bits in, and maintain the sign of, the
selected 8 bit field (i.e. an over or under select is
generated if the most significant bit of the selecte d
8 bit field differs from any subse quent bit right up to
and including the most significant bit of the right
shifted X bus). This will be an ove rselect if the X
bus is positive (Bit 21 = 0), and an underselect if
the X bus is negative (Bit 21 = 1). In other words
the LUT address is always deemed to be signed
with an address range of -128 to 127.
If however the control bit PosLUTAddr (SCR[6]) is
set to one, the unit monitors whether the amount of
right shift performed by t he prescaler is suf ficient to
include all significant bits in the selected 8 bit field
AND that all unselected bits are zero (i.e. an over
or under select is generated if the first selected bit
(bit 9) is not zero OR differs from any subsequent
bit right up t o and including the most significant bit
of the right shif ted X bus). This w ill be an overselect
if the Xbus is positiv e and an underselect WHENEVER the Xbus is negative. Thus, in this mode, the
address range of the LUT is 0 to 255.
Prescalar under/over selects and X bus positive/negative overflows are passed to the LUT
along with the selected 8 bit address field.
Look up table (LUT) an d byte sel ect
The LUT consis ts of 64 words, 32 bits wide plus two
special 32 bit locations called t he upper and lowersaturation registers (USR and LSR respectively).
Thus the LUT is actually 66 words by 32 bits. The
32 bit output of the LUT is called the Y bus.
The most significant 6 bits of t he 8 bit address f ield
are used to address one of 64 words in the LUT.
The least significant pair of bits in the 8 bit field are
used to control a byte select on the output. Thus in
addition to operating as a 64+2 word look up table
of 32 bit words, it can be used as an 8 bit, 256+2
byte LUT prov iding 8bit — 8bit tr ansf orm ations .
Positive overflows on the X bus, and over selects
in the prescalar cause the LUT to access the USR
overriding the address given by the prescalar. Likewise negative overflows and under selects cause
the LUT to access t he LSR. Any sort of overflow on
the X bus or prescalar will cause the byte select
control to be overridden and the most significan tbyte (byte 3) of the appropriate Sat uration Register
will appear on the byte wide output of the data
transformat ion unit.
8/26
Loading...
+ 18 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.