MITEL VP2611 Datasheet

VP2611
VP2611
H.261 Encoder
Supersedes June 1996 edition, DS3487 - 4.0 DS3487 - 4.1 December 1998
FEATURES
Fully integrated H261 video encoder
Up to full CIF resolution and 30 Hz frame rates
Inputs YUV data in 8 x 8 sub block format
Outputs run length coded coefficients
On chip motion vector estimator with +/-7 pixel search
window
Addresses and control generated internally for DRAM
frame store
QFP package
ASSOCIATED PRODUCTS
VP510 Colour Space Converter
VP520S CIF/QCIF Converter
VP2612 Video Multiplexer
VP2614 Video Demultiplexer
VP2615 H.261 Decoder
DESCRIPTION
The VP2611 Video Compression Source Coder forms part of a chip set used in video conferencing, video telephony and multimedia applications. It produces data which conforms to the H261 standard for video compression with rates between 64K and 2M bits per second. With a 27 MHz clock the device will accept data produced to full CIF resolution at 30 Hz frame rates. The pipeline latency through the device is only 3 macro block periods.
The VP2611 contains all the elements necessary for the compression algorithm. It incorporates a Motion Vector Esti­mator which performs a +/- 7 pixel search. The decision to use inter or intra frame compression is made by the device, and the selected data blocks are read from the frame store. New or difference data is then passed through a Discrete Cosine Transformer and quantized. Data from the quantizer is also inverse quantized and passed through an Inverse Discrete Cosine Transformer. This re-constructed data is then written to the frame store for use in the next frame period.This frame store is managed by an internal DRAM controller, and no external logic is needed.
The input data must be in YUV space, and must also conform to the six sub blocks per macro block format defined by H261. Any conversion from RGB format is performed by the VP510 Colour Space Converter. Any reduction in spatial resolution, down to CIF or QCIF requirements, is done by the VP520 Three Channel Video Filter.
The quantized data is zig-zag scanned and run length coded before being output, together with block information and motion vectors.
NTSC
PAL
R
COLOUR SPACE
G
B
VP510
CONVERTER
COMP VIDEO
DECODER
VIDEO
SYNC
Y
Cr/Cb
CCIR601 RESOLUTION Y 720 X 288 Cr/Cb 360 x 288 PAL Y 720 X 240 Cr/Cb 360 x 240 NTSC
3 CHANNEL
VIDEO FILTER
ADDR
CIF FRAME
16 X128K
VP520
STORE
Fig 1 : Typical Video Conferencing Transmission System
DATA
USER
INTERFACE
REQYUV
FRMIN
MBLK'S
CIF RESOLUTION
Y 352 X 288
Cr/Cb 176 x 144
SYSTEM
CONTROLLER
VP2611
INTEGRATED
VIDEO ENCODER
CIF FRAME
STORE
16X128K
RLC DATA
FLAGS
VP2612
VIDEO
MULTIPLEXER
TX BUFFER
32K X 8
H261
BIT
STREAM
64kb to 2Mb/s
1
VP2611
PIN DESCRIPTIONS
YUV7:0 This input bus accepts YUV data one pixel at a
time from the preprocessor, clocked in on the rising edge of PCLK.
PCLK This signal is used to strobe in data at the YUV
port and must be derived by dividing SYSCLK with an integer greater than one.
FRMIN This input should be pulled high to prepare the
VP2611 to code a new frame. It must be held high for at least one SYSCLK cycle and then must be pulled low again before the next frame begins. The VP2611 will respond to the rising edge of FRMIN by asserting REQYUV appproximately 184 SYSCLK cycles later.
REQYUV This output is pulled high to request that YUV
data be input for a new MacroBlock. It is pulled low again 1871 SYSCLK cycles later. It re­mains low during Dummy MacroBlocks and during the lay period between frames.
DBUS7:0 This output bus serves several functions as
defined by DMODE3:0. In addition to providing the quantized coefficients and motion vectors, it is used to output control information.
R/W1 Read/Write control for external DRAM 1.
W2 Read/Write control for external DRAM 2.
R/
N/C if 256k DRAMs.
OE1 Output Enable control for external DRAM 1
or ADR8.
OE2 Output Enable control for external DRAM 2.
N/C if 256k DRAMs. ADR7:0 Address output for the external DRAMs. CBUS7:0 Bi-directional data bus for use by a Microproce-
ssor. Data and insructions are clocked on and
off the chip on the rising edge of CSTR.
CSTR Data strobe for the CBUS port. CEN An enabling signal for the CBUS port.
CADR When high, this signal defines CBUS as a data
bus, and when low as an instruction input. SYSCLK System clock, run at 27MHz maximum. The
clock must be high for between 35% and 65%
of each clock cycle. This clock is used for all
internal operations.
DMODE3:0 Output flag port for DBUS7:0 bus. The value at
this port identifies the data type appearing on DBUS7:0 during the same period.
DCLK This output pulses high for a minimum of 37ns
each time new data is output on DBUS or DMODE. It can be used as an edge sensitive strobe signal or a level sensitive "valid" signal.
SW15:0 This bidirectional port is connected to the
frame store.
RAS Row Address Strobe output for the external
DRAMs.
CAS Column Address Strobe output for the external
DRAMs.
DECISION
MOTION VECTOR
ESTIMATOR
Search
Window
FORWARD PATH
DCT
SUB
Q Step
Force
Intra
LOW
PASS
FILTER
Predicted
block
FRAME STORE INTERFACE
Q
Force
Filter
YUV
BLOCK
FORMAT
INTER/INTRA PROCESSOR
RESET Active low power on reset which must be held
low for at least 2064 cycles. TCK Test clock for JTAG. TMS Test Mode Select for JTAG. TDI Input JTAG test data. TDO Output JTAG test data.
TRST Reset JTAG controller (active low).
NOTE:
"Barred" active low signals do not appear with a bar in the main body of the text.
RLC
Motion Vectors
CONTROL
LOGIC
Block Info
DATA
BUS
BUS
FLAGS
IDCT
ADD
Zig Zag
IQ
Force
Intra
Force
Filter
ADDRESS
CONTROL
DATA
HOST DATA & CONTROL
Fig 2 : Simplified Block Diagram
2
080004080
0
0
0
0
F
-
-
-
-
-
-
-
OPERATION OF MAJOR BLOCKS
Motion Vector Estimator
The motion estimator calculates the mean absolute error ( MAE ) for each possible position of the combined luminance block in a search window from the previous frame. The combined luminance block consists of 16 x 16 pixels, and in the search window this is displaced between -7 to +7 vertically, and -8 to +7 horizontally. The two lsb's of each pixel are discarded and the MAE value is contained within 14 bits.
The minimum MAE value, representing the best match between the previous and current block, is passed to the motion compensation decision block, together with the posi­tion of this best fit in the search window. The zero displace­ment MAE value is also passed to this block, which then decides whether the best fit is sufficiently better than the zero displacement fit. It uses the characteristic shown in Figure 3, where the 14 bit MAE is a Hex value. In the area to the right of the line all points defined by the two MAE values will cause motion compensation to be applied. In this case the best fit MAE value is used by the inter/intra decision processor, otherwise the zero displacement value is used.
Inter/Intra Decision Processor
The MAE value passed by the motion compensation decision block is compared to the simplified variance of the current block. This simplified variance is calculated by sum­ming the moduli of the differences between each luminance pixel and the mean luminance value over the whole macrob­lock. Eight bit pixels are used, and the variance value is expressed in 14 bits by discarding the two lsb's from the actual 16 bit result. It can then be directly compared to the 14 bit MAE value.
If the MAE value is below a user defined threshold inter mode coding is always selected. The default threshold is 3, on a scale from 0 to 255 using the 8 msb's from the 14 bit value. Above this threshold inter mode is only selected if the variance of the current block is greater than or equal to the MAE value in use.
In order to avoid gradual picture degredation, every 61st Macroblock input to the VP2611 is coded in intra mode regardless of the above decision. As 61 is a prime number, this will ensure that each macroblock will be transmitted in intra mode at least once in every 61 transmissions. If FIX MAC­ROBLOCK or SKIP PICTURE is invoked this `Force Intra'
VP2611
8
180
4
140
MC Off
100
CO
80
8
5
Minimum Mean Absolute Err or in Hex
40
4
20
4
40 80 CO 100 140 180
Zero Movement Absolute Error in Hex
Fig 3 : MC Decision Slope
counter will be disabled.
The user may overide the internal Inter/Intra decision at any time using the CBUS control port. A user generated forced inter mode will overide an internally generated `Force Intra'.
Low Pass Filter
The macroblock selected from the previous frame in motion compensated inter mode coding, will be filtered before it is subtracted from the current block. This decision can be overidden externally by the system controller. The Filter uses a simple [ 1 2 1 ] characteristic in both vertical and horizontal dimensions as specified in H.261 on the macroblock boundaries [010] is used.
x = 1.125y
MC On
SYMBOL PARAMETER MINIMUM MAXIMUM
t RAC Access time from RAS
t CAC Access time from CAS
t RP RAS precharge time 50ns or under
t CP CAS precharge time 15ns or under t RAS RAS pulse width 90ns or under t CAS CAS pulse width 50ns or under
t REF Time to refresh 256 rows
N.B. All times are quoted assuming 27MHz operation. For lower clock
frequencies increase the above values proportionately.
Table 1 : External DRAM timing requirements
105ns or under
25ns or under
0.25ms or over
3
VP2611
Frame Store Manager
The previous picture is stored in an external CIF DRAM frame store, which is connected by a glueless interface. The internal Frame Store Manager controls all read, write, and refresh operations to these DRAMs. No provision is made to allow the use of smaller DRAM's, if only QCIF operation is required.
During the coding of each macroblock columns of the search window are read from these DRAMs, and finally the "best fit" macroBlock is obtained. At the completion of coding the fully processed new macroblock is written to the DRAM's, after it has been decoded again. In this way the frame store maintains a bit-accurate duplicate of the image seen by the Decoder (excepting transmission errors).
Several configurations are possible to make the required 128Kx16 store. Two 64K x 16 DRAMs could be employed; in this case use the default 1M DRAM mode when setting up the chip. Otherwise, a single 256K x 16 DRAM or four 256K x 4 DRAMs could be used. In these last two cases use OE1 as ADR8, RW1 as R/W, and do not connect RW2 and OE2. Also, use the Setup instruction at the CPORT to put the device into 4M DRAM mode.
Table 1 details the critical timing parameters which the external DRAM must meet with SYSCLK running at 27MHz. Note that, if used at slower speeds, the requirements on the DRAM timing are relaxed with the exception of refresh. The number of refresh cycles the VP2611 produces is directly proportional to the SYSCLK frequency.
Discrete Cosine Transform
This circuit performs a Discrete Cosine Transform on each 8x8 sub block, whether in inter or intra mode. In intra mode, eight bit pixel data is used, with a ninth implied sign bit ( all pixel data is positive ). In inter mode the difference between the current and best fit previous block is used. This will be a two's complement number. Twelve bit coefficients are produced by the DCT, and passed on to the quantizer.
Quantize
Zig Zag Scan
This is essentially an address generator which reorders the DCT coefficients according to the standard zig-zag scan pattern. This has the effect of concentrating the significant coefficients at the beginning of the sub-block, improving the efficiency of the Run Length Coder.
Run Length Coder
Each coefficient output from the zig zag scan is examined. If it is non-zero, then the Run Length Coding circuit will pass the coefficient magnitude to the output port along with its zero count i.e. the number of zero magnitude coefficients preced­ing it within the same 8x8 sub-block.
Inverse Quantize
This circuit replicates the operation of the inverse quan­tizer in the decoder. It reconstructs the 12 bit DCT coefficients from the 8 bit quantized inputs, using the 5 bit quantization value. This is achieved using the following formulae.
If QUANT is odd :
REC = QUANT*(2*LEVEL+1) : LEVEL > 0
REC = QUANT*(2*LEVEL-1) : LEVEL < 0
If QUANT is even :
REC = QUANT*(2*LEVEL+1)-1 : LEVEL > 0
REC = QUANT*(2*LEVEL-1)+1 : LEVEL < 0
For Intra Coded DC Coefficients :
REC = 8*LEVEL
except if LEVEL=255 when REC=1024
If LEVEL=0 then REC=0 in all cases.
The reconstructed values (REC) are passed through a Clipping Circuit in case of arithmetic overflow.
Thus, the Inverse Quantizer restores the DCT coefficients to their original value but with quantisation error.
This section quantizes the results of the DCT by dividing the 12 bit output from the DCT with a host supplied value. The 5 bit quantization value supplied corresponds to division of the 12 bit coefficients ( range ± 2048 ) by values from 2 to 62, but in steps of 2. This variable quantization strategy allows the volume of data generated by the encoder to be adjusted dynamically, depending on the fullness of the transmission buffer. For H.261 applications it uses the quantisation value provided at the control port during the previous Macroblock period (or at some earlier time). An option is provided which allows two quantisation values to be used, one for use with inter coded macroblocks, and the other for use with intra coded macroblocks.
As specified in H.261, the DC coefficient of an Intra coded Block is treated differently and the 12 bit value is always divided by 8.
When the quantization value is small, and the DCT coef­ficient is large, there is a danger of overflow in the eight bit output. To avoid this a clipping circuit is included at the output of the quantizer, which saturates at the maximum values.
4
Inverse DCT
This circuit replicates the operation of the Inverse Cosine Transform in the Decoder, and outputs 9 bit signed pixel data (intra mode) or pixel difference data (inter mode). The IDCT fully meets the CCITT specification.
Reconstruction Adder
In Inter Mode, the IDCT data is added to the best fit block from the previous frame store. In Intra mode, the IDCT data is simply added to zero. After the adder, the sign bit is removed from the result to give 8 bit pixels. Clipping circuits ensure that any pixels with values exceeding 255 are clipped to 255, and any with negative values are clipped to zero (such values are possible due to quantization noise).
VP2611
Y
K
K
2064 cycles
YUV Input
Frame Store Read
Control Decisions
Frame Store Write
DBUS Output
MB1 MB2 MB3 MB4
DUMMY MB1 MB2 MB3
DUMMY MB1 MB2 MB3
DUMMY DUMMY DUMMY MB1
DUMMY DUMMY DUMMY MB1
Fig 4: MacroBlock Pipelining
OPERATION OF INTERFACES
Macroblock Delays
The VP2611 has a three macroblock pipeline delay be­tween pixel inputs and run length coded outputs. This is illustrated in Figure 4. Whilst the second macroblock is being input, the best fit macroblock from the previous frame is being identified and then read from the frame store. At this time any Control Decisions which are to effect the first macroblock must be supplied by the host controller. The run length coded outputs for the first macroblock are not available until the fourth macroblock is supplied at the input pins.
YUV Input Port
The YUV port accepts pixel data from the preprocessor in block format as illustrated in Figure 5. Within a complete system the VP2611 is always the master device, and must be supplied with macroblock data when it makes a demand. The order in which pixels are supplied is pre-determined, and must be strictly maintained. There are 64 pixels per sub-block and 4 luminance and 2 chrominance sub-blocks per macroblock. The macroblocks themselves are divided into groups of blocks ( GOB's ), and the sequence specified in H.261 must also be maintained. Note that, since the chrominance resolution is half the luminance resolution both vertically and horizontally, then the two chrominance blocks cover the same picture area as the four luminance blocks.
The pre-processor producing macroblock data must pro­duce a frame start signal ( FRMIN ) when it has a complete frame of data available. This resets the input controller within the VP2611, which will then generate sequential GOB and macroblock numbers for the coded outputs referenced to this input.
FRMIN must go high for at least one system clock period, and must go low before the next frame is available. The VP2611 responds to FRMIN with a request for macroblock data ( REQYUV ), which occurs approximately 184 SYSCLK periods after FRMIN. It must then receive a complete macrob­lock within 1871 SYSCLK periods, and at the end of this time REQYUV will go inactive. The VP2611 must be provided with a PCLK signal to strobe in the data. This must be derived from SYSCLK, and must only be present when there is valid data at the input. Data must meet the set up and hold times with respect to PCLK as specified in Figure 6.
The maximum peak rate for PCLK is the SYSCLK rate divided by two, but since there are 384 bytes per macroblock
SUBBLOCK ORDER WITHIN MACROBLOC
1 2
5
3 4
PIXEL ORDER WITHIN SUBBLOC
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 03 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
U
19
6
V
Fig 5 : Ordering of Pixels
then theoretically the average rate need only be 384/1871 times the SYSCLK rate. Note that PCLK must always be obtained by dividing SYSCLK by an integer greater than one. When the VP520 CIF/QCIF Converter is supplying the VP2611 with data, it provides a peak PCLK rate equivalent to SYSCLK divided by two, and an average rate of SYSCLK divided by four.
The mimimum gap between REQYUV going active is 2064 SYSCLK periods. In full CIF mode "dummy" macrob­locks are internally inserted between rows, in order to give the chip sufficient time to load a new search window. No new YUV data must be loaded during these dummy macroblocks, and REQYUV will remain inactive. No dummy macroblocks are required in QCIF mode. With a 27MHz SYSCLK all macrob­locks will be coded in less than a 30Hz frame rate period, and there will be a period of inactivity before FRMIN goes active again. During this period the output bus will remain static at all ones, and no output strobe ( DCLK ) will be produced.
SCLK/2
20ns
PCLK
YUV7:0
20ns
10ns
N.B. All timings given are MINIMUM values.
0ns
Fig 6 : Timing at YUV Port
5
Loading...
+ 9 hidden pages