MITEL PDSP16488AMA Datasheet

PDSP16488A MA
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
*
*
PDSP16488A MA
Single Chip 2D Convolver with Integral Line Delays
Supersedes January 1997 version, DS3742 - 3.1 DS3742 - 4.0 January 2000
The PDSP16488A is a fully integrated, application spe­cific, image processing device. It performs a two dimensional convolution between the pixels within a video window and a set of stored coefficients. An internal multiplier accumulator array can be multi-cycled at double or quadruple the pixel clock rate. This then gives the window size options listed in Table 1.
An internal 32k bit RAM can be configured to provide either four or eight line delays. The length of each delay can be programmed to the users requirement, up to a maximum of 1024 pixels per line. The line delays are arranged in two groups,which may be internally connected in series or may be configured to accept separate pixel inputs. This allows inter­laced video or frame to frame operations to be supported.
The 8 bit coefficients are also stored internally and can be downloaded from a host computer or from an EPROM. No additional logic is required to support the EPROM and a single device can support up to 16 convolvers.
The PDSP16488A contains an expansion adder and delay network which allows several devices to be cascaded. Convolvers with larger windows can then be fabricated as shown in Table 2.
Intermediate 32 bit precision is provided to avoid any danger of overflow, but the final result will not normally occupy all bits. The PDSP16488A thus provides a multiplier in the output path, which allows the user to align the result to the most significant end of the 32 bit word.
FEATURES
The PDSP16488A is a fully compatible replacement for the PDSP16488
8 or 16 bit pixels with rates up to 40 MHz
Window sizes up to 8 x 8 with a single device
Eight internal line delays
Supports interlace and frame to frame operations
Coefficients supplied from an EPROM or remote host
Expandable in both X and Y for larger windows
Gain control and pixel output manipulation
132 pin QFP
Rev A B C D
Date MAR 1993 JUL 1996 JAN1997
NOTE
Polyimide is used as an inter-layer dielectric and as glassivation.
Polymeric material is also used for die attach which according to the requirement in paragraph 1.2.1.b. (2) precludes catagorising this device as fully compliant. In every other respect this device has been manufactured and screened in full accordance with the requirements of Mil-Std 883 (latest revi­sion).
Data Size
8 8
8 16 16
Max Pixel
Rate
10MHz
10MHz
20MHz
20MHz
40MHz
40MHz
* Maximum rate is limited to 30 MHz by line store expansion delays
Table 2 Devices needed to implement typical window sizes
Window Size
Width X Depth
4 8 8 4 8
Table 1 Single Device Configurations
Pixel
8
16
8
16
8
16
4 4 8 4 4
1
1
1
2
1
2
1
4
1
4
2
Max Pixel
Rate
40MHz 20MHz 10MHz 20MHz 10MHz
Window size
1
4
2
2
6
4
4
4
6
Line Delays
4x1024 4x1024 8x512 4x512 4x512
4
8
CHANGE NOTIFICATION
The change notification requirements of MIL-PRF-38535 will be implemented on this device type. Known customers will be notified of any changes since the last buy when ordering further parts if significant changes have been made.
PIXEL
CLOCK
GENERATOR
SYNC
EXTRACT
9
A/D
CONVERTER
OPTIONAL
FIELD
COMPOSITE
STORE
Fig. 1 Typical , Stand Alone, Real Time System
SYNC
BYPASS
DATA IN
AUX DATA
EPROM
ADDR DATA
CLK
PDSP
16488A
CONVOLVER
RES
POWER ON
RESET
DELAYED
SYNC
OUTPUT
DATA
1
PDSP16488A MA
R
E
F
E
N
E
N
E
N
E
N
R
PROG
MASTER
SINGLE
DELOP
CE DS R/W PC0 PC1 RES CS3:0
CONTROL
MULTI PURPOS
DATA BUS
X15:0
X
DELAY
CONTROL
REGISTERS
IP7:0
BY PASS
L7:0
Y
DELAY
Y
DELAY
1
LINE
DELAY
3
LINE
DLYS
4
LINE
DLYS
COEFFICIENT
STORE (64)
8 X 8
ARRAY O
MAC'S
CLOCK
ADDER
COMPARATO
SCALER
MUX
BIN
OVER FLOW
D15:0 DATA OUT
OEN
Fig. 2 Functional Block Diagram
PIN NO
AC PACKAG
A1 B1 C2 C1 D2 D1 E2 E1 F2 G2 G1 H2 J1 J2 K1 K2 L1 L2 M1 N1 N2
FUNCTIO
L0 F1 L1 L2 L3 SPARE L4 L5 L6 L7 IP7 SPARE IP6 IP5 IP4 SPARE IP3 IP2 IP1 IP0 BYPASS
PIN NO
AC PACKAG
M3 N3 M4 N4 M5 N5 M6 M7 N7 M8 N9 M9 N10 M10 N11 M11 N12 N13 M13 L12 L13
FUNCTIO
X15 X14 X13 SPARE SINGLE X12 X11 MASTE X10 X9 X8 X7 X6 X5 X4 X3 X2 X1 X0 DELOP PC0
PIN NO
AC PACKAG
K12 K13 J12 J13 H12 G12 G13 F12 E13 E12 D13 D12 C13 C12 B13 A13 A12 B11 A11 B10 A10
FUNCTIO
RES CS0 CS1 CS2 CS3 PROG DS CE R/W HRES OV PC1 BIN OEN D0 D1 D2 D3 D4 D5 D6
PIN NO
AC PACKAG
B9 A9 B8 B7 A7 B6 A5 B5 A4 B4 A3 B3 A2 F1 N6 F13 A6 H1 N8 H13 A8
FUNCTIO
D7 D8 CLK SPARE D9 D10 D11 SPARE D12 D13 D14 D15 F0 VDD VDD VDD VDD GND GND GND GND
Pin out Table (84 pin PGA - AC84)
2
PDSP16488A MA
NAME
IP7:0
L7:0
BYPASS
HRES
X15:0
D15:0
PC1
PC0
DELOP
DS
CE
TYPE
INPUT
I/O
INPUT
INPUT
DUAL FUNCTION
OUTPUT
OUTPUT
INPUT
OUTPUT
I/O
INPUT
DESCRIPTION
Pixel data input to the first line delay. [most significant byte in 16 bit mode]
Pixel data input to the second group of line delays. [least significant byte in 16bit mode]. Alternatively an output from the last line delay when the appropriate mode bit is set.
The first line delay in the first group is bypassed when this input is active. (High). No internal pull up.
Resets the line delay address pointers when high. Normally the composite sync signal in real time applications. In non real time systems it defines a frame store update period, when low.
Address/data connections from a MASTER or SINGLE device to the external coefficient source, with X15 defining EPROM or Host support. Otherwise they provide the expansion data input.
Signed 16 bit scaled data or multiplexed 32 bit intermediate data. During intermediate transfers the most significant half is valid when the clock is low, and the least significant half when clock is high.
During programming a MASTER device outputs a timing strobe on this pin. This is passed down the chain in a multiple device system, using the PC0 input on the next device.
This pin is used in conjunction with PC1 in multiple device systems. It terminates the write strobe from a MASTER device which is EPROM supported.
This output provides a version of the HRES input which has been delayed by an amount defined by the user.
The data strobe from a host computer. Active low. This pin will be an output from an EPROM supported MASTER device which provides strobes to the remaining devices.
An active low enable which is internally gated with R/ W and DS to perform reads or writes to the internal registers. In a SINGLE or MASTER device, which is supported from an EPROM, the bottom 72 addresses are always used and CE is not needed. CE can then be used to initiate a new register load sequence after the power on load sequence.
R/ W
PROG
CLK
BIN
OV
RES
SINGLE
MASTER
OEN
CS3:0
F1:0
VCC / GND
INPUT
I/O
INPUT
OUTPUT
OUTPUT
INPUT
INPUT
INPUT
INPUT
OUTPUTS
OUTPUTS
SUPPLY
Read / not write line from the host CPU. When an EPROM is used this pin should be tied low.
This pin is normally an input which signifies that registers are to be changed or examined. It is, however, an output from an EPROM supported SINGLE or MASTER device indicating to the rest of the system that registers are being updated.
Clock. All events are triggered on the rising edge of the clock, except the latching of least significant expansion inputs . Internally the clock can be multiplied by two or four in order to increase the effective number of multipliers.
This output indicates the result from the internal comparison. A high value indicates that the pixel was greater than the internal threshold. The output is only valid from the last device in a chain.
When high this output indicates that there has been a gain control overflow.
Active low power on reset signal.
Tied to ground to indicate a SINGLE device system. Internal pull up resistor.
Tied to ground to indicate the MASTER device in a multiple device system. Must be left open circuit in a SINGLE device system. Internal pull up.
Output enable signal. Active low.
Four address bits from a MASTER specifying one of sixteen devices in a multiple device system. Must be externally decoded to provide chip enables for the additional devices.
These bits indicate the field selection given by the auto select logic. The same coding as that used for Control Register bits C5:4 is used.
Four Power and ground pairs. All must be connected.
3
PDSP16488A MA
BASIC OPERATION
The PDSP16488A convolver performs a weighted
sum of all the pixels within an N x N two dimensional window. Each pixel value is multiplied by a signed coefficient, or weight, and the products are summed together. In practice positive weights would be used to produce averaging effects, with various distribution laws, and negative weights would be used for edge enhancement. The window is moved continuously over the video frame, and for real time operation a new result must be obtained for every pixel clock. In most applications odd sized windows will be used, resulting in a centre pixel whose value is modified by the surrounding pixels.
OUTPUT ACCURACY
With 8 bit pixels, and an 8 x 8 window, it is possible for
the accumulated sum to grow to 22 bits within a single device. With 16 bit pixels, and an 8 x 4 window ( the maximum possible ), the sum can grow to 29 bits. The PDSP16488A actually allows for word growth up to 32 bits, and thus allows several devices to be cascaded without any danger of over­flow. Since coefficients can be negative, the final result is a 32 bit signed two's complement number.
In a particular application the desired output will lie
somewhere within these 32 bits, the actual position being dependent on the coefficient values used. This causes prob­lems in physically choosing which output pins to connect to the rest of the system. To overcome this problem the PDSP16488A contains an output multiplier, or gain control, which allows the final result to be aligned to the most signifi­cant end of the 32 bit internal result.The provision of a multiplier, rather than a simple shifter, allows the gain to be defined more accurately.
The sixteen most significant bits of the adjusted result are
available on output pins, and contain a sign bit.
OUTPUT SATURATION
MULTIPLIER ARRAY
The PDSP16488A contains sixteen 8x8 multipliers each producing a 16 bit result. Internally the pixel clock supplied by the user can be multiplied by two or four, which together with the proprietary architecture, allows each multi­plier to be used several times within a pixel clock period. This increases the effective number of multipliers, which are avail­able to the user, from 16 to 32 or 64 respectively. This architecture produces a very efficient utilization of chip area, and allows the line delays to be accommodated on the same device.
The sixteen multipliers are arranged in a 4 deep by 4 wide array, resulting in effective arrays of 4 by 8 or 8 by 8 with the multi-cycling options. The multiplier array can also be configured to handle 16 bit signed pixels; the effective number of available multipliers is then halved.
LINE DELAY OPERATION
Internal RAM is arranged in two separate groups, and can be configured to provide line delays to match the chosen size of the convolver. When a four deep arrangement is used, with 8 bit pixels, four line delays are available, and each can be programmed to contain up to 1024 pixels. In an eight deep array, or if16 bit pixels are needed, each line can contain up to 512 pixels. Figure 4 illustrates the options available.
The first line delay in one of the groups can optionally be switched in or out under the control of an input pin. It is used to delay the pixel input when data is obtained from another convolver in a multiple device system, or it is used to support interlaced video.
Signals L7:0 may be used as pixel inputs or outputs. They are configured as inputs at power-on to avoid possible bus conflicts, but by setting a mode control bit can become outputs. They can then be used to drive another device when multiple PDSP16488A's are required.
If the output from the convolver is driving a display, negative pixels will give erroneous results. An option is thus provided which forces all negative results to zero, which are then interpreted as black by the display. At the same time positive results, which overflow the gain control, are forced to saturate at the most positive number ie peak white. In this mode the output sign bit is always zero,and should not be connected to an A/D converter.
A separate option forces both negative and positive overflows to saturate at their respective maximum values, but in scale negative results remain valid. A gain control overflow warning flag is also available, which can be used in a host CPU supported system to change the gain parameters if overflows are not acceptable.
BINARY OUTPUT
The PDSP16488A contains a 16 bit arithmetic com­parator which allows the output from the gain control to be compared with a previously programmed value. An output flag allows the user to detemine if the result was above or below a value contained within an internal register.
4
INTERLACED VIDEO
When using real time interlaced video, a picture or frame is composed from two fields, with odd lines in one field and even lines in the other. An external field delay is thus required to gather information from adjacent lines, and the convolver needs two input busses. The bus providing the delayed pixels has an extra internal line delay. This is only used in the field containing the upper line in any pair of lines, and must be bypassed in the other field. It ensures that data from the previous field always corresponds to the line above the present active line, and avoids the need to change the position of the coefficients from one field to the next.
Figure 3 shows the translation from physical to internal line positions, for single device interlaced systems. Line N is the line presently being convolved, which is either one or two lines previous to the line presently being produced.
When windows requiring four or more lines are to be implemented, the first line delay, in the group supplied from the L7:0 pins, must always be by-passed. This by-pass option is controlled by Register B, bit 7 and is not effected by the BYPASS input pin.. The coefficients must be loaded into the locations shown, which match the translated line positions, with unused coefficients, shown shaded, loaded with zero's.
LINE N-1
LINE N
LINE N+1
3 X 3 WINDOW
C4 C5C9C6
C8
C10
C2C0 C1
VIDEO
LINE N+2
FIELD
DELAY
ODD FIELD
IP7:0
L7:0
1024
1024
1024
1024
N+1
N - 1
N
PDSP16488A MA
4 X 4
Output is shifted
OR
8 X 4
ARRAY
by 1 line in
every field
LINE N-2
LINE N-1
LINE N
LINE N+1
LINE N+2
LINE N-3
LINE N-2
LINE N-1
LINE N
LINE N+1
LINE N+2
LINE N+3
LINE N+4
5 X 5 WINDOW
C48 C49 C50 C51 C52
C8 C9 C10 C11 C12
C40 C41 C42 C43
C0 C1 C2 C3 C4
C32 C33 C34 C35 C36
C44
8 X 8 WINDOW
C30C29C28C27C26C25C24
C56 C57 C58 C59 C60 C61 C62 C63
C16 C17 C18 C19 C20 C21 C22
C48 C49 C50 C51 C52 C53 C54
C8 C9 C10 C11 C12 C13 C14
C40 C41 C42 C43 C44 C45 C46
C0 C1 C2 C3 C4 C5 C6
C32
C33 C34 C35 C36 C37 C38
VIDEO
LINE N+2
*
Delay is By-Passed
[REG B,BIT 7 IS SET]
C31
C23
C55
VIDEO
C15
LINE N+4
C47
*
C7
C39
Delay is By-Passed
[REG B,BIT 7 IS SET]
FIELD
DELAY
FIELD
DELAY
IP7:0
L7:0
FIELD
L7:0
ODD
ODD
FIELD
IP7:0
512
512
512
512
512
512
512
512
512
512
512
512
512
512
512
512
N+1
N-1
*
N+2
N
N-2
N+3
N+1
N-1
N-3
*
N+4
N+2
N
N-2
8 X 8
ARRAY
8 X 8
ARRAY
Output is shifted
by 1 line in
every field
Output is shifted
by 2 lines in
every field
Figure 3. Line Delay Allocations in Single Device Interlaced Systems
5
PDSP16488A MA
L7:0
IP7:0
IP7:0
BYPASS
L7:0
IP7:0
BYPASS
L7:0
512
512
512
512
512
512
512
512
512
512
512
512
512
512
512
512
1024
1024
1024
1024
16
16
16
16
8x8
ARRAY
4 X 4
OR
8 X 4
ARRAY
4X4
OR
8X4
IP7:0
BYPASS
IP7:0
BYPASS
L7:0
L7:0
512
512
512
512
512
512
512
512
1024
1024
1024
1024
8X8
ARRAY
4 X 4
OR
8 X 4
ARRAY
BYPASS
Fig. 4. Line Delay Configurations
DEFINING THE LENGTH OF THE LINE DELAY
Figure 4 defines the maximum line lengths available in each of the window size options. The actual line lengths can be defined in one of three ways, to support both real time applications, taking pixels directly from a camera, and also use in systems supported by a frame store. In the former case the line delays must be referenced to video synchronization pulses. In the latter case the line lengths are well defined, and the horizontal flyback 'dead times' will have been removed.
To support real time applications an option is provided in which the length of the line delay is defined by the number of clocks obtained whilst an input pin ( HRES ) is in-active. HRES would normally be composite sync when the convolver is directly attached to an NTSC or PAL video camera.
Conceptually, the line delay is achieved by reading the previous contents of a RAM based line store, and then writing new information to the same address. When HRES is active write operations are inhibited, and the address counter is reset. During an active line the counter is incremented by the pixel clock. If the maximum count is reached before the end of a line, then write operations are terminated and wrap-around effects avoided.
The active going edge of HRES, marking the end of a line, is normally asynchronous to the pixel clock, and it is possible for an additional pixel to be stored on some lines. This has no effect on the convolver operation, and will not cause a cumulative shift in the pixel position from line to line.
An alternative means of defining the line length is, however, provided when an exact number of pixels is needed. HRES going in-active then starts the delay operation for every line, but it ceases when the 10 bit value contained in two registers is reached. This method can avoid the need to store blank pixels at the end of a line before sync goes active. With this method the line must contain an even number of pixels, but the value loaded into the control registers defining the line length, must be one less than the even number needed.
In an image processing system, the pixel clock is often re-synchronized, or even inhibited, during blanking or sync. The next line is then started with a precise time interval from the end of sync to the first pixel clock edge. This avoids any visible pixel jitter at the beginning of the line, which would otherwise be present since pixel clock is asynchronous with respect to video sync pulses.
When using the PDSP16488A the pixel clock should not be inhibited, or re-synchronized, until the delayed version of the HRES input goes active. This is present on the DELOP output pin. This will ensure that no pixels on the right hand edge are lost due to the internal pipeline delay.
If the pixel clock is a continuous signal, the user must ensure that the HRES in-active transition meets the timing requirements defined in Figure 10. The active going edge at the end of a line need not be synchronized.
When pixels are read/written to a frame store, an alternative line delay configuration is needed. Within the frame store lines would be stored in contiguous locations, with no gaps caused by the flyback period between the lines. This method of use makes the HRES defined line delay operation difficult to use, and an alternative mode of operation is provided. The HRES input is then driven by a system provided signal, which defines a complete frame store update period. It is not a line defining signal. The high to low transition of this signal will initiate the line store update sequence and allow the internal address pointers to increment. These point­ers will be synchronously reset at the end of a line, when they reach the pre-programmed value. They will then immediately start a new operation using address zero. The actual line delay must be pre-loaded into two control registers as described previously.
Write operations back to the frame store must allow for the total pipeline delay. This can be achieved by inhibiting write operations until the delayed version of HRES goes low at the DELOP output pin. Write operations then continue until it goes back high. The PDSP16488A assumes that data is valid when a clock signal is applied, and that it also meets the set up and hold requirements given in Figure 10. If data is not valid, due for example to a frame store DRAM refresh cycle, then the user must externally inhibit the clock. The clock supplied to the convolver will in this mode be a signal which defines a frame store cycle time.
The use of the convolver in a line scan system is similar to its use with a frame store. These systems have no flyback period, and the address counter must be synchronously reset at the end of the line and then allowed to continue.
GAIN CONTROL
The gain control is provided as an aid to locating the bits of interest in the 32 bit internal result. The magnitude of the largest convolved output will depend on the size of the
6
PDSP16488A MA
window, and the coefficient values used. The function of the gain control is then to produce an output, which is accurate to 16 bits, and which is aligned to the most significant end of this 32 bit word. The sixteen most significant bits of the word are available on output pins, and the largest number need only have one sign bit if the gain control is correctly adjusted.
Fiigure 5 indicates the mechanism employed with the required function implemented in two steps. Two mode control bits allow one of four 20 bit fields to be selected from the final 32 bit value. These four fields are positioned with the first at the most significant end, and then at four bit displacements down to the least significant end.
By setting an enabling bit, the field selection can optionally be done automatically. This feature should only be used in the real time operating mode, when HRES defines video lines. Internal logic examines the most significant 13, 9, or 5 bits from the 32 bit result, and makes a field selection dependent on which group does not contain identical sign bits. If less than five sign bits are obtained, the logic will select the field containing the most significant 20 bits.
The automatic selection is particularly useful when a fixed scene is being processed. The selection is reset when any internal register is updated ( ie PROG has been active ) and is then held in-active for ten further occurances of the HRES input. This allows the internal multiplier/ accumulator array to be completely flushed before a field selection is made. As convolver outputs of greater magnitude are produced the field selection logic will respond by selecting a more significant field. The most significant field found necessary remains selected until PROG again goes active. Even if the automatic field selection is not enabled, two outputs, F1:0, will still indicate which field would have been selected. These are coded in the same way as Register C, bits 5:4.
Having chosen a field, either manually or automati­cally, it is then multiplied by a 4 bit unsigned integer. This is contained within a user programmed register, and the multi­plication will produce a 24 bit result . The middle 16 bits of this result contain the required output bits. The gain control multi­plier can overflow in to the unused most significant four bits if the parameters are chosen wrongly. This condition is indi­cated by an overflow flag .
By setting appropriate mode control bits, further ma­nipulation of the gain control output is possible. One option allows all negative outputs to be forced to zero, and at the same time positive gain control overflows will saturate at the maximum positive number. A different option will saturate positive and negative overflows at their respective maximum values, but otherwise leaves them unchanged. Occasional
FROM EXPANSION ADDER
32 BITS
20
20 20 20
488412
12
MSB
LSB
D15:0
MUX
GAIN
REGISTER
4
20
X
4
1624
SATURATE
LOGIC
4
overflows can be tolerated in some systems, and this option prevents any gross errors.
EXPANSION
Multiple devices can be connected in cascade in order to fabricate window sizes larger than those provided by a single device. This requires an additional adder in each device which is fed from expansion data inputs. This adder is not used by a single device or the first device in a cascaded system, and can be disabled by a mode control bit.
The first device in the cascaded system must be designated as a MASTER device by tying an input pin low. Its expansion input bus is then used as the source of data for the coefficient and control registers in all devices in the system.
In order to reduce the pin count required for 32 bit busses, both expansion in and data out are time multiplexed with the phases of the pixel clock. When the clock is high the least significant half will be valid, and when the clock is low the most significant half will be valid.
In practice this multiplexing is only possible with pixel clocks up to 20MHz. Above these frequencies the multiplexing must be inhibited by setting a Mode Control bit ( Register A, Bit 7 ). The intermediate data accuracy will then be reduced, since only the lower 16 bits of the internal 32 bit intermediate sum are available on the output pins. In such systems the coefficients must be scaled down in order to keep the intermediate and final results down to 16 bits. The final device should not use the gain control, and instead should simply output the non-multiplexed 16 bit result. The overflow flag and pixel saturation options will not be available.
PIXEL INPUT AND OUTPUT DELAYS
In a real time system, when line delays are referenced to video sync pulses present on the HRES input, the first pixel from the last line delay does not appear on the L7:0 pins until the fifth active pixel clock edge after HRES has gone low. This is illustrated in Figure 7. In a vertically expanded system, this output provides the input to the first line delays in the vertically displaced devices. The internal logic is thus designed to always expect this five clock delay. Compensation must thus be applied to the devices which are directly connected to the video source, such that the first pixel is not valid until the fifth clock edge.
For this reason the PDSP16488A contains an optional four clock pipeline delay on each of the pixel data inputs. When the delay is used the first pixel in a video line must be available on the input pins after the first pixel clock edge. This would be so if the device were connected to an A/D converter, since that would introduce a one pixel pipeline delay. If the system introduces any further external pipeline delays, then the internal delay should be bypassed, and the user should ensure that the first pixel is valid after the fifth clock edge.
The use of this four clock delay is controlled by Bit 3, in Control Register B. This delay is in addition to the delays which are provided to support expansion in both the X and Y directions, and are controlled by Register D, Bits 3:2. Both delays are in fact simply added together in the device, but are provided for conceptually different reasons.
Fig. 5. Gain Control Operation
7
PDSP16488A MA
INPUT
delays
delays
4 clock
delay
0
delays
D
delays
4 clock
delay
4
B3 = 1
0
D3:2 = 00
WIDTH = S
line
delays
ZERO
B3 = 0
line
delays
0
delays
D0 = 0
D = 4+S(N-1) Defined by D3:2
WIDTH = S
0
delays
D0 = 0
4 clock
delay
4 clock
delay
4
delays
0
delays
0
delays
D
delays
B3 = 1
D3:2 = 00
line
delays
line
delays
B3 = 0
N th DEVICE IN THE ROW
WIDTH = S
0/4
delays
0 IF S = 4, 4 IF S = 8
D0 = 0 or 1
N th DEVICE IN THE ROW
D = 4+S(N-1) Defined by D3:2
WIDTH = S
0/4
delays
0 IF S = 4, 4 IF S = 8
D0 = 0 OR 1
4 clock
delay
4 clock
delay
0
B3 = 0
delays
D
D = 4+S(N-1) Defined by D3:2
delays
line
delays
WIDTH = S
0
delays
D0 = 0
4 clock
delay
Fig. 6. Multi-Device Delay Paths
DELAY COMPENSATION FOR LARGE WINDOWS
A large window is composed of several partial windows each of which is implemented in an individual device. If necessary the partial window must be padded with zero coefficients to become one of the standard sizes. When constructing a large window it is necessary to delay the expansion data inputs in order to compensate for growth in the horizontal direction. Delays in the partial sums are also necessary to compensate for the total pipeline delay needed to produce the previous complete horizontal stripe.
Within each device in a horizontal stripe, apart from the first, the expansion input must be delayed by the width of the partial window, before it is added to the internal sum. Since partial windows can only be 4 or 8 pixels wide,a delay of 4 or 8 pixel clocks is needed. There is, however, an in-built delay
0
B3 = 0
delays
D = 4+S(N-1) Defined by D3:2
D
delays
line
delays
N th DEVICE IN THE ROW
WIDTH = S
0/4
delays
0 IF S = 4,4 IF S = 8
D0 = 0 OR 1
4 clock
delay
OUTPUT
of 4 pixels in the inter device connection, and the PDSP16488A thus only needs an option to delay the expansion input by an additional four pixels.
The data from the last device in a horizontal row of convolvers feeds the expansion input of the first device in the next row. This is shown in Figure 6. With this arrangement, the position of the partial window as illustrated, is the inverse of its vertical position on a normal TV screen. Thus the top, left hand, device corresponds to the bottom, left hand, portion of the complete window.
The output from the last device in the row is delayed with respect to the original data input by an amount given by the formula;
DELAY = 4 + [N-1].S where N is the number of devices in
a row and S is the partial window width, ie 4 or 8.
8
PDSP16488A MA
y
The internal convolver sums, in each of the devices in the next row, must be delayed by this amount before they are added to results from the previous row. This is more conven­iently achieved by delaying data going into the line stores. The required cumulative delay with respect to the first horizontal stripe is then automatically obtained when more than two rows of devices are needed.
Two bits in Control Register D are used to define one of four delay options. These delays have been selected to support systems needing from two to eight devices and are described in the applications section.
COEFFICIENTS
Sixty-four coefficients are stored internally and must be initially loaded from an external source. Table 3 gives the coefficient addresses within a device, with coefficent C0 specified by the least significant address and C63 by the most significant address. Table 5 shows the physical window posi­tion within the device which is allocated to each coefficient in the various modes of operation. Horizontally the coefficient positions correspond to the convolution process as if it were conceptually observed on a viewing screen, ie the left hand pixel is multiplied with C0. In the vertical direction the lines of coefficients are inverted with respect to a visual screen, ie the line starting with C0 is actually at the bottom of the visualized window.
The coefficients may be provided from a Host CPU using conventional addressing, a read/write line, data strobe, and a chip enable. Alternatively, in stand alone systems, an EPROM may be used. A single EPROM can support up to 16 devices with no additional hardware.
When windows are to be fabricated which are smaller than the maximum size that the device will provide in the required configuration, then the areas which are not to be used must contain zero coefficients. The pipeline delay will then be that of a completely filled window.
TOTAL PIPELINE DELAY
The total pipeline delay is dependent on the device configuration and the number of devices in the system. Table 4 gives the delays obtained with the various single device
Function
Mode Reg A Mode Reg B Mode Reg C Mode Reg D Comparator LSB Comparator MSB Scale Value Pixels / Line LSB Pixels / Line MSB C0 - C15 C16 - C31 C32 - C47 C48 - C63 Unused
Hex. Addr
00 01 02 03 04 05 06 07
08 40 - 4F 50 - 5F 60 - 6F 70 - 7F 09 - 3F
Table 3 Internal Register Addressing
Data
size
8 8
8 16 16
configurations when the gain control is used. These delays are the the internal processing delays and do not include the delays needed to move a given size window completely into a field of interest. When multiple devices are needed, addi­tional delays are produced which must be calculated for the particular application. These delays are discussed in the applications section.
The PDSP16488A contains facilities for outputing a delayed version of HRES to match any processing delay. Control register bits allow this delay to be selected from any value between 29 and 92 pixel clocks.
Window
Size
4x4 8x4 8x8 4x4 8x4
Ta
ble 4 Pipe line dalays
Pipeline
Dela
34 30 26 28 26
ASYNCHRONOUS BACK EDGE
ACTIVE LINE PERIOD
23 45678
First pixel from
line
store
valid
Fig.7 Pixel Input Delays
12 76
last 2
pixels
intern-
ally
stored
LINE STORE
WRITES INHIBITED
HRES [SYNC]
CLOCK
Set Up
Time
First pixel valid
[B3 set]
9
Loading...
+ 21 hidden pages