Notes : 1. All VCC pins must be connected to the 5 Volt power supply .
2. All GND pins must be connected to ground.
The IMSA110 is a single-chip reconfigurab le and
cascadable subsystem suitable for many high
speed image and signal processing applications.
Apart from its powerful multiply-accumu late capability (420 MOPs), the strength of the IMSA 1 10 lies
in its extensive programmable support for data
conditioning and transform ation.
The IMSA110 consists of a configurable array of
multiply-accumul ators, three programm able length
1 120 stage shif t registers, a v ersatile post -processing unit and a microprocessor interf ace for config uration and control purposes. The comprehensive
on-chip facilities make a single device capable of
dealing with many image processing operations.
The IMSA110 has five interfaces through which
data can be transferred, Figure 1. The microprocessor interface allows access to the coefficient
registers, the configuration and status registers,
and the data transformation tables. The remaining
four interfaces allow high speed data input and
output to the IMSA1 10 and the cascading of several
devices. A typical IMSA110 system is shown in
Figure 3. If N devices are used in the cascade, t hey
can be configured, entirely under soft ware c ontr ol,
as a 21N stage 1-D transversal filter or as a 7X by
3Y 2-D window, where X and Y are any integers
satisfying N ≤ XY. For example 4 cascaded devices
can be software configured as: an 84-stage 1-D
filter , a 7 by 12 2-D window, a 28 by 3 2-D window,
or a 14 by 6 2-D window.
The final output of the chip is 22 bits wide in twos
complement format.
Configuration and
look up table
256 x 8-bit data
look up RAM
array C
array B
array A
control registers
post-processing unit
(normalization, saturation,
and data transformation)
Figure 2 shows the distribut ion of the delay s inside
the part.
The latency between P SRin and COUT is dependent upon the length of PSRc. For example, with
PSRc set to 0, and all coefficients set to zero except
CR0c[6] (so the data passes through all MAC
stages), the COUT bus will correspond to the
PSRin bus delayed by 47 clock cycles.
The latency between PSRin and PSRout is 5 cycles
PLUS the lengths of PSRc, PS Rb and PSRa. If t he
shift register s are bypass ed by s etting S CR[ 1] to 1
then PSRout will be PSRin delayed by 2 clock
The Latency between the cascade input (CIN) and
cascade output (COUT) is 6 cycles. This is shown
lumped at the cascade input and cascade output
pads in Figure 2. Figure 4 gives details of the data
pipelining through the backend datapath.
The three shift registers are 8 bits wide and are
each programmable from 0 up to 1 120 cloc k cycles
in length. The lengths are programmed int o contr ol
registers via the mic roproc essor inter fac e.
Data is clocked into the device via the PSRin bus
(Programmable Shift Register in) at a maximum
rate of 20MHz. On-chip, the input data is then fed
through a pipeline of the three shift registers. The
output of the first shift register passes to the first
7-stage mac array and also to the input of the
second shift register. Having passed through all
three shift registers the data is output on the
PSRout bus and can be used for cas cading. Alt ernatively , as shown in Figure 2 the shif t registers can
be bypassed and the input data transferred to the
PSRout bus after tw o delay stages. This mode can
be controlled via the on-chip control registers and
significantly simplifies software configuration of a
cascade arrangement.
As shown in Figure 2, the processing core of the
device consists of a configurable array of multiplyaccumulators (macs). The mac array consists of
three 7-stage transversal filters which can be configured either as a 21-stage linear pipeline or as a
3 × 7 two-dimensional window. The input data is
8 bits wide and is fed to the mac array via three
programmable shift registers.
The output of each shift register is supplied as input
to one of the three 7-stage transversal filters. For
each of the three transversal filters the associated
input data is fed simultaneously t o all 7 mac stages.
At each stage the input sample is multiplied by a
coefficient stored in memory, and added to the
output of the previous stage delayed by one clock
cycle. The output of each 7-stage mac is f ed, via a
delay stage, to the first stage in the next trans versal
Casc ade
Casc ade
Casc ade
filter .
The coefficient word width in the mac array is 8 bits
wide. Tw o banks of coefficients are provided. At any
instant one set of coefficients is in use within the
mac array . The set in use is defined by the state of
the ‘Current Bank’ bit , ACR[0]. The other set can be
altered via the microprocessor interface. Once a
new set of coefficients has been loaded, the activ ities of the two coefficient banks can be interchanged without interrupting t he flow of data. Alt ernatively, by setting the ‘continous bank swap’ bit
SCR[0], the two coefficient banks are swapped
automatically after each data input. In this case t he
‘Current Bank’ bit only determines which bank is
used first. Both data inp ut and coefficients can be
programmed independently to support twos complement or positive unsigned format s allowing multiple devices to be used as a ‘slice’ in higher accuracy systems.
Within the mac array no truncation or rounding is
performed on the partial products. The mac array
output is fed to the backend post-processing unit
which is responsible for data transformation / normalisation and cascading function.
The Backend Post-Processor consists of four major blocks : The input block (shift er, cascade adder
and rectifier unit),a statistics monitor,the data conditioning unit which it self c onsists of the data transformation unit and the data normaliser, and the
output block (output adder and mult i plexers ).
A detailed diagram of the Backend P ost-P rocess or
is given in Figure 4.
All operations performed in the backend are on
twos complement signed numbers unless otherwise stated.
Data from the mac array enters t he datapath via a
programmable shifter. The shifter is capable of
arithmetic right shifts (divides) of up to 8 bits with
rounding, and left shifts of up to 8 bits. The size of
this shift is c ontrolled by t he stat us bits BCR0[ 5-1] .
The output of the shifter passes into the cascade
adder where it is added, along with any rounding
generated by the shifter , to either the cascade input
bus (BCR0[0] = 0), or a zero value (B CR[0] = 1).
If the result of this 22-bit signed addition is greater
than 2
erate a positive overf lo w. Likewise, if it is less than
- 1, (209715110) then the adder will gen-
, (-209715210) a negative overflow will be generated. In other words, a positive overflow is generated if the result of adding two positive numbers
(both MSBs = 0) is negative (resulting MSB = 1).
Conversely, a negative overflow is generated if the
result of adding two negative numbers (both
MSBs = 1) is positive ( MSB = 0). Adding two numbers of different signs cannot cause the adder to
The output of the cas cade adder c an optionally be
full-wave or half wav e rectif ied under the contr ol of
BCR0[7,6]. The output of the rectifier passes onto
the X bus. Overflows on the X bus are signalled to
both the statistics monit or and the data conditioner .
5.2 Stati sti cs Mon itor
The statistics monitor allows the user to set up
watch dogs on the dynamics of the data on the X
bus. It cannot affect the data on the X bus. The
statistics gathered provide information on the system behaviour which can be used to ensure correct
data scaling and normalisation. The information is
also useful in the control of the overall system’s
analogue frontend.
The statistics monitor consists of a 24 bit Min/Max
register (MMR), a 24 bit Min/Max Buffer (MMB), a
22 bit Over/UnderShoot Counter (OUC), a 22 bit
Over/UnderShoot Buffer (OUB) and a 22 bit twos
complement comparat or.
It can perform one of four func tions :
• MAX REGISTER : Capture the maximum value
of data and store it in the MMR.
• MIN REG I S TE R : Capture t he minimum value of
data and store it in the MMR.
each time the data value exceeds the preset
value in the MMR.
each time the data value is less than the preset
value in the MMR.
The mode of operation is determined by the
Max/Min switc h BCR1[ 0] , and t he S tatic Threshold
switch BCR1[1].
Each sample on the X bus is compared against the
threshold stored in the MMR.
If the unit is configured as an overshoot counter
and the data on the X bus exceeds the threshold in
the MMR, then t he counter (OUC) is incremented.
If the data is less than or equal to the threshold, then
no action will occur. The OUC is unsigned and will
not wrap around. Thus it behaves as a saturating
counter with a maximum value of 2
, 419430310). If there is a positive over-
- 1,
flow on the X bus, then the counter will increment
since the correct X bus value must exceed the
threshold. Similarly a negative overflow on the X
bus will not increment the counter since the correct
X bus value cannot exceed the preset thres hold.
If the unit is configured as an undershoot cou nter
then the counter will be increment ed whenever th e
sample is less than the preset threshold. In this
case a negative overflow will cause the counter to
If the unit is configured as a max regis t er and the
X bus exceeds the current threshold in the MMR,
then the value on the Xbus is loaded into the MMR
and becomes the new threshold and the count er is
incremented. If the threshold is not ex ceeded t hen
no action occurs. Thus the value in the MMR is t he
maximum value that has appeared on the X bus,
and the value in the OUC has been incremented by
the number of times that the threshold has been
If the unit is configured as a min register then the
threshold is updated and the counter incremented
whenever the X bus is less than the current threshold.
When operating as a min/max register, overflows
on the X bus can never cause the threshold to be
updated as this would load an erroneous value into
the MMR.
Figure 4 : Detailed Block Diagram of the Back end Post -pr ocessing Unit
Bit 22 of the MMR records the history of positive
overflows on the X bus. S imilarly bit 23 records the
history of negative overflows. These bits in the
MMR are set to zero by writing to the MMR copy
location and are active independently of whether
the Static Threshold bit is set. When the MMR is
read, then bits 22 and 23 are interpreted as follows:
bit 23bit 22condition
0 0 No overflow has occured
0 1
1 0
1 1
One or more positive overflows
have occured
One or more negative overflows
have occured
Both postive and negative
overflows have occured
Detailed block diagram of the Backend P ost-processing Unit
Access to registers
The MMR and OUC are accessed, through the
memory interface, only via their associate d buffers
(MMB and OUB respec tively) and ar e not acc ess ible directly . In order to load the MMR with a value,
the host must first write the value to the MMB and
then transfer the data from the MMB to the MMR
by performing a WRITE to the co py MMR loc ation,
. To read the MMR the host must firs t perform
a READ cycl e from locat ion 0B4
(which transfer s
the contents of the MMR into the MMB) and then
read the MMB. The OU B is accessed in the same
way except that the dummy writes and reads are
done to and from location 0BC
Copies from MMR to MMB and OUC to OUB
(reads) can be performed at any time giving a
snapshot of the contents of the MMR and OUC
respectively. Copies from MMB to MMR and OUB
to OUC (writes) c an also be perf ormed at any time
allowing the threshold and counter to be updated
5.3 Data transformation unit
The data transformation unit consists of a prescalar, an under/over select detector, a look up table
and a byte selector. It can be used in isolation to
perform abitrary data mappings, or in conjunction
with the dat a normaliser to implement sophisticated
dynamic range compressi on functions .
This allows an 8-bit field anywhere within the 22-bit
X bus to be select ed as the address to the LUT. This
is performed by right shifting the X bus so that the
required 8 bits are at the least significant end. Th e
amount of right shift is programmed in BCR2[4-0]
and can have a value from 0 to 16.
Over/under select detector
With PosLUTAddr (SCR[6]) set to zero, this unit
monitors whether the amount of right shift performed by the prescalar is sufficient to include all
significant bits in, and maintain the sign of, the
selected 8 bit field (i.e. an over or under select is
generated if the most significant bit of the selecte d
8 bit field differs from any subse quent bit right up to
and including the most significant bit of the right
shifted X bus). This will be an ove rselect if the X
bus is positive (Bit 21 = 0), and an underselect if
the X bus is negative (Bit 21 = 1). In other words
the LUT address is always deemed to be signed
with an address range of -128 to 127.
If however the control bit PosLUTAddr (SCR[6]) is
set to one, the unit monitors whether the amount of
right shift performed by t he prescaler is suf ficient to
include all significant bits in the selected 8 bit field
AND that all unselected bits are zero (i.e. an over
or under select is generated if the first selected bit
(bit 9) is not zero OR differs from any subsequent
bit right up t o and including the most significant bit
of the right shif ted X bus). This w ill be an overselect
if the Xbus is positiv e and an underselect WHENEVER the Xbus is negative. Thus, in this mode, the
address range of the LUT is 0 to 255.
Prescalar under/over selects and X bus positive/negative overflows are passed to the LUT
along with the selected 8 bit address field.
Look up table (LUT) an d byte sel ect
The LUT consis ts of 64 words, 32 bits wide plus two
special 32 bit locations called t he upper and lowersaturation registers (USR and LSR respectively).
Thus the LUT is actually 66 words by 32 bits. The
32 bit output of the LUT is called the Y bus.
The most significant 6 bits of t he 8 bit address f ield
are used to address one of 64 words in the LUT.
The least significant pair of bits in the 8 bit field are
used to control a byte select on the output. Thus in
addition to operating as a 64+2 word look up table
of 32 bit words, it can be used as an 8 bit, 256+2
byte LUT prov iding 8bit — 8bit tr ansf orm ations .
Positive overflows on the X bus, and over selects
in the prescalar cause the LUT to access the USR
overriding the address given by the prescalar. Likewise negative overflows and under selects cause
the LUT to access t he LSR. Any sort of overflow on
the X bus or prescalar will cause the byte select
control to be overridden and the most significan tbyte (byte 3) of the appropriate Sat uration Register
will appear on the byte wide output of the data
transformat ion unit.
If there are simultaneous overflows on the X bus
and in the prescalar then the overflow from the X
bus takes priority.
The USR and LSR can thus be used to model the
saturating behaviour of analogue circuits ins tead of
the usual ‘wrap around’ encountered in digit al s ystems. Alternat ively the USR and LSR could signal
error conditions within the backend directly on the
output pins via one of the output multiplexers.
The LUT is loaded via the memory interface. The
addressing for the LUT corresponds to the 8 bit
field, assuming that the byte s elector is being used.
In order to access the look up table, USR and LSR
from the microprocessor interface, the LUT Ac-cess control bit ACR[1] must be set to zero. This
will force the Y bus to zero and the normaliser to be
controlled by BCR3[7-3] regar dless of the setting of
the dynamic normalisation bit, BCR3[2]. The LUT,
USR and LSR can then be loaded with any arbitrary
value via the microprocessor int erf ace. Set ting t he
LUT access control bit to one will then allow the LUT
to be used in the data transformation unit .
5.4 Data normaliser
This unit consis ts of a shifter capable of right shifts
of up to 14 bits and left shifts up to 2 bits, followed
by a
zero data unit
and an adder. The shifter is
controllable from one of two 5 bit sources : control
bits BCR3[7-3] or bits 26 to 22 of the Y bus. The
control bit Enable Dynamic Normalisation
(BCR3[2]) deter mines which s ource is in cont rol of
the normaliser . If this bit is set to zero the normaliser
is controlled by BCR3[7-3]. The five bit field is a
twos complement number between 14 and -2. This
indicates the amount of right shift (negative meaning left shift). Any value outside this range causes
the output of the shifter to be forced to zero. The
output of the shifter, with any rounding generated
by the shifter , goes int o the output adder.
5.5 Output adder
This is a 22 bit adder with one of its inputs coming
from the data normaliser. The other input is either
bits 21 to 0 of the Y bus f rom the data transformati on
unit, or set to zero under the control of BCR3[1].
Note that any overflow occuring due to left shifti ng
in the normaliser or the subsequent addition in the
output adder is not detected by the IMSA110.
5.6 Output multiplexers
These two mult iplexers allow the current ly selecte d
byte from the LUT to be optionally selec ted t o drive
either the most significant byte and/or the least
significant byte of the Cascade Out put pins. This is
controlled by the state of BCR2[5] and BCR2[6].
Enabling either of these mult iplexers over rides th e
state of the Cascade Output pins only on the relavent 8 pins. The remaining pins will continue to
represent the output of the output adder.
The backend post-processing unit is capable of
performing many functions including data scaling,
transformation, dynamic range compression and
histogram equalisation.
6.1 Default mode (after Reset)
At power up or after reset the st ate of the bac kend
post-processor is such that data from the MAC
array and the cascade input are added and pass
straight through the datapath unaf fec ted.
The default mode for the statistics monitor is minregister although the values in the OUB, OUC,
MMR and MMB will be undefined. Likewise the
contents of the LUT, USR and LSR will be undefined, the LUT Access control bit will be zero
forcing the Y bus to zero and allowing the microprocessor interface to access the LUT, USR and
Note that the cascade output pins and the PSR
output pins are tristated.
6.2 Cascade adder / MAC data scalar
These units allow the cascading of IMS A110s
where the output of the MAC array may be scaled
before it is added to the cascade input data. The
shifter can also be used for combining devices to
obtain extended precision in input data, coeff icient
word length or both.
The ability to zero the cascade input provides a
simple means of controlling the number of ‘active’
devices cascaded as well as a means of debugging
large systems.
6.3 Rectification
Rectification, the removal of negative results, is
needed in several image processing functions.
For example, e dge detection us ing a Sobel operator usually requires full wav e rectification due to the
different signs obtained at differing edge transitions. Edge detection using a Laplacian operator
produces a change of sign at an edge. In this case,
removing negative numbers using half wave rectification can produce better results as full wave
rectification can lead to some blurring of the edge
