Supersedes June 1996 edition, DS3487 - 4.0DS3487 - 4.1 December 1998
FEATURES
■Fully integrated H261 video encoder
■Up to full CIF resolution and 30 Hz frame rates
■Inputs YUV data in 8 x 8 sub block format
■Outputs run length coded coefficients
■On chip motion vector estimator with +/-7 pixel search
window
■Addresses and control generated internally for DRAM
frame store
■QFP package
ASSOCIATED PRODUCTS
■VP510 Colour Space Converter
■VP520S CIF/QCIF Converter
■VP2612 Video Multiplexer
■VP2614 Video Demultiplexer
■VP2615 H.261 Decoder
DESCRIPTION
The VP2611 Video Compression Source Coder forms part
of a chip set used in video conferencing, video telephony and
multimedia applications. It produces data which conforms to
the H261 standard for video compression with rates between
64K and 2M bits per second. With a 27 MHz clock the device
will accept data produced to full CIF resolution at 30 Hz frame
rates. The pipeline latency through the device is only 3 macro
block periods.
The VP2611 contains all the elements necessary for the
compression algorithm. It incorporates a Motion Vector Estimator which performs a +/- 7 pixel search. The decision to use
inter or intra frame compression is made by the device, and the
selected data blocks are read from the frame store. New or
difference data is then passed through a Discrete Cosine
Transformer and quantized. Data from the quantizer is also
inverse quantized and passed through an Inverse Discrete
Cosine Transformer. This re-constructed data is then written
to the frame store for use in the next frame period.This frame
store is managed by an internal DRAM controller, and no
external logic is needed.
The input data must be in YUV space, and must also
conform to the six sub blocks per macro block format defined
by H261. Any conversion from RGB format is performed by
the VP510 Colour Space Converter. Any reduction in spatial
resolution, down to CIF or QCIF requirements, is done by the
VP520 Three Channel Video Filter.
The quantized data is zig-zag scanned and run length
coded before being output, together with block information
and motion vectors.
NTSC
PAL
R
COLOUR SPACE
G
B
VP510
CONVERTER
COMP VIDEO
DECODER
VIDEO
SYNC
Y
Cr/Cb
CCIR601 RESOLUTION
Y 720 X 288 Cr/Cb 360 x 288 PAL
Y 720 X 240 Cr/Cb 360 x 240
NTSC
3 CHANNEL
VIDEO FILTER
ADDR
CIF FRAME
16 X128K
VP520
STORE
Fig 1 : Typical Video Conferencing Transmission System
DATA
USER
INTERFACE
REQYUV
FRMIN
MBLK'S
CIF RESOLUTION
Y 352 X 288
Cr/Cb 176 x 144
SYSTEM
CONTROLLER
VP2611
INTEGRATED
VIDEO ENCODER
CIF FRAME
STORE
16X128K
RLC DATA
FLAGS
VP2612
VIDEO
MULTIPLEXER
TX BUFFER
32K X 8
H261
BIT
STREAM
64kb to 2Mb/s
1
VP2611
PIN DESCRIPTIONS
YUV7:0This input bus accepts YUV data one pixel at a
time from the preprocessor, clocked in on the
rising edge of PCLK.
PCLKThis signal is used to strobe in data at the YUV
port and must be derived by dividing SYSCLK
with an integer greater than one.
FRMINThis input should be pulled high to prepare the
VP2611 to code a new frame. It must be held
high for at least one SYSCLK cycle and then
must be pulled low again before the next frame
begins. The VP2611 will respond to the rising
edge of FRMIN by asserting REQYUV
appproximately 184 SYSCLK cycles later.
REQYUVThis output is pulled high to request that YUV
data be input for a new MacroBlock. It is pulled
low again 1871 SYSCLK cycles later. It remains low during Dummy MacroBlocks and
during the lay period between frames.
DBUS7:0This output bus serves several functions as
defined by DMODE3:0. In addition to providing
the quantized coefficients and motion vectors,
it is used to output control information.
R/W1Read/Write control for external DRAM 1.
W2Read/Write control for external DRAM 2.
R/
N/C if 256k DRAMs.
OE1Output Enable control for external DRAM 1
or ADR8.
OE2Output Enable control for external DRAM 2.
N/C if 256k DRAMs.
ADR7:0Address output for the external DRAMs.
CBUS7:0Bi-directional data bus for use by a Microproce-
ssor. Data and insructions are clocked on and
off the chip on the rising edge of CSTR.
CSTRData strobe for the CBUS port.
CENAn enabling signal for the CBUS port.
CADRWhen high, this signal defines CBUS as a data
bus, and when low as an instruction input.
SYSCLKSystem clock, run at 27MHz maximum. The
clock must be high for between 35% and 65%
of each clock cycle. This clock is used for all
internal operations.
DMODE3:0Output flag port for DBUS7:0 bus. The value at
this port identifies the data type appearing on
DBUS7:0 during the same period.
DCLKThis output pulses high for a minimum of 37ns
each time new data is output on DBUS or
DMODE. It can be used as an edge sensitive
strobe signal or a level sensitive "valid" signal.
SW15:0This bidirectional port is connected to the
frame store.
RASRow Address Strobe output for the external
DRAMs.
CASColumn Address Strobe output for the external
DRAMs.
DECISION
MOTION
VECTOR
ESTIMATOR
Search
Window
FORWARD PATH
DCT
SUB
Q Step
Force
Intra
LOW
PASS
FILTER
Predicted
block
FRAME STORE INTERFACE
Q
Force
Filter
YUV
BLOCK
FORMAT
INTER/INTRA
PROCESSOR
RESETActive low power on reset which must be held
low for at least 2064 cycles.
TCKTest clock for JTAG.
TMSTest Mode Select for JTAG.
TDIInput JTAG test data.
TDOOutput JTAG test data.
TRSTReset JTAG controller (active low).
NOTE:
"Barred" active low signals do not appear with a bar in the
main body of the text.
RLC
Motion Vectors
CONTROL
LOGIC
Block Info
DATA
BUS
BUS
FLAGS
IDCT
ADD
Zig Zag
IQ
Force
Intra
Force
Filter
ADDRESS
CONTROL
DATA
HOST DATA & CONTROL
Fig 2 : Simplified Block Diagram
2
080004080
0
0
0
0
F
-
-
-
-
-
-
-
OPERATION OF MAJOR BLOCKS
Motion Vector Estimator
The motion estimator calculates the mean absolute error
( MAE ) for each possible position of the combined luminance
block in a search window from the previous frame. The
combined luminance block consists of 16 x 16 pixels, and in
the search window this is displaced between -7 to +7 vertically,
and -8 to +7 horizontally. The two lsb's of each pixel are
discarded and the MAE value is contained within 14 bits.
The minimum MAE value, representing the best match
between the previous and current block, is passed to the
motion compensation decision block, together with the position of this best fit in the search window. The zero displacement MAE value is also passed to this block, which then
decides whether the best fit is sufficiently better than the zero
displacement fit. It uses the characteristic shown in Figure 3,
where the 14 bit MAE is a Hex value. In the area to the right
of the line all points defined by the two MAE values will cause
motion compensation to be applied. In this case the best fit
MAE value is used by the inter/intra decision processor,
otherwise the zero displacement value is used.
Inter/Intra Decision Processor
The MAE value passed by the motion compensation
decision block is compared to the simplified variance of the
current block. This simplified variance is calculated by summing the moduli of the differences between each luminance
pixel and the mean luminance value over the whole macroblock. Eight bit pixels are used, and the variance value is
expressed in 14 bits by discarding the two lsb's from the actual
16 bit result. It can then be directly compared to the 14 bit MAE
value.
If the MAE value is below a user defined threshold inter
mode coding is always selected. The default threshold is 3, on
a scale from 0 to 255 using the 8 msb's from the 14 bit value.
Above this threshold inter mode is only selected if the variance
of the current block is greater than or equal to the MAE value
in use.
In order to avoid gradual picture degredation, every 61st
Macroblock input to the VP2611 is coded in intra mode
regardless of the above decision. As 61 is a prime number, this
will ensure that each macroblock will be transmitted in intra
mode at least once in every 61 transmissions. If FIX MACROBLOCK or SKIP PICTURE is invoked this `Force Intra'
VP2611
8
180
4
140
MC Off
100
CO
80
8
5
Minimum Mean Absolute Err or in Hex
40
4
20
4
4080CO100 140 180
Zero Movement Absolute Error in Hex
Fig 3 : MC Decision Slope
counter will be disabled.
The user may overide the internal Inter/Intra decision at
any time using the CBUS control port. A user generated
forced inter mode will overide an internally generated `Force
Intra'.
Low Pass Filter
The macroblock selected from the previous frame in
motion compensated inter mode coding, will be filtered before
it is subtracted from the current block. This decision can be
overidden externally by the system controller. The Filter uses
a simple [ 1 2 1 ] characteristic in both vertical and horizontal
dimensions as specified in H.261 on the macroblock boundaries
[010] is used.
x = 1.125y
MC On
SYMBOLPARAMETERMINIMUMMAXIMUM
t RACAccess time from RAS
t CACAccess time from CAS
t RPRAS precharge time50ns or under
t CPCAS precharge time15ns or under
t RASRAS pulse width90ns or under
t CASCAS pulse width50ns or under
t REFTime to refresh 256 rows
N.B. All times are quoted assuming 27MHz operation. For lower clock
frequencies increase the above values proportionately.
Table 1 : External DRAM timing requirements
105ns or under
25ns or under
0.25ms or over
3
VP2611
Frame Store Manager
The previous picture is stored in an external CIF DRAM
frame store, which is connected by a glueless interface. The
internal Frame Store Manager controls all read, write, and
refresh operations to these DRAMs. No provision is made to
allow the use of smaller DRAM's, if only QCIF operation is
required.
During the coding of each macroblock columns of the
search window are read from these DRAMs, and finally the
"best fit" macroBlock is obtained. At the completion of coding
the fully processed new macroblock is written to the DRAM's,
after it has been decoded again. In this way the frame store
maintains a bit-accurate duplicate of the image seen by the
Decoder (excepting transmission errors).
Several configurations are possible to make the required
128Kx16 store. Two 64K x 16 DRAMs could be employed; in
this case use the default 1M DRAM mode when setting up the
chip. Otherwise, a single 256K x 16 DRAM or four 256K x 4
DRAMs could be used. In these last two cases use OE1 as
ADR8, RW1 as R/W, and do not connect RW2 and OE2. Also,
use the Setup instruction at the CPORT to put the device into
4M DRAM mode.
Table 1 details the critical timing parameters which the
external DRAM must meet with SYSCLK running at 27MHz.
Note that, if used at slower speeds, the requirements on the
DRAM timing are relaxed with the exception of refresh. The
number of refresh cycles the VP2611 produces is directly
proportional to the SYSCLK frequency.
Discrete Cosine Transform
This circuit performs a Discrete Cosine Transform on each
8x8 sub block, whether in inter or intra mode. In intra mode,
eight bit pixel data is used, with a ninth implied sign bit ( all pixel
data is positive ). In inter mode the difference between the
current and best fit previous block is used. This will be a two's
complement number. Twelve bit coefficients are produced by
the DCT, and passed on to the quantizer.
Quantize
Zig Zag Scan
This is essentially an address generator which reorders
the DCT coefficients according to the standard zig-zag scan
pattern. This has the effect of concentrating the significant
coefficients at the beginning of the sub-block, improving the
efficiency of the Run Length Coder.
Run Length Coder
Each coefficient output from the zig zag scan is examined.
If it is non-zero, then the Run Length Coding circuit will pass
the coefficient magnitude to the output port along with its zero
count i.e. the number of zero magnitude coefficients preceding it within the same 8x8 sub-block.
Inverse Quantize
This circuit replicates the operation of the inverse quantizer in the decoder. It reconstructs the 12 bit DCT coefficients
from the 8 bit quantized inputs, using the 5 bit quantization
value. This is achieved using the following formulae.
If QUANT is odd :
REC = QUANT*(2*LEVEL+1) : LEVEL > 0
REC = QUANT*(2*LEVEL-1) : LEVEL < 0
If QUANT is even :
REC = QUANT*(2*LEVEL+1)-1 : LEVEL > 0
REC = QUANT*(2*LEVEL-1)+1 : LEVEL < 0
For Intra Coded DC Coefficients :
REC = 8*LEVEL
except if LEVEL=255 when REC=1024
If LEVEL=0 then REC=0 in all cases.
The reconstructed values (REC) are passed through a
Clipping Circuit in case of arithmetic overflow.
Thus, the Inverse Quantizer restores the DCT coefficients
to their original value but with quantisation error.
This section quantizes the results of the DCT by dividing
the 12 bit output from the DCT with a host supplied value. The
5 bit quantization value supplied corresponds to division of the
12 bit coefficients ( range ± 2048 ) by values from 2 to 62, but
in steps of 2. This variable quantization strategy allows the
volume of data generated by the encoder to be adjusted
dynamically, depending on the fullness of the transmission
buffer. For H.261 applications it uses the quantisation value
provided at the control port during the previous Macroblock
period (or at some earlier time). An option is provided which
allows two quantisation values to be used, one for use with
inter coded macroblocks, and the other for use with intra
coded macroblocks.
As specified in H.261, the DC coefficient of an Intra coded
Block is treated differently and the 12 bit value is always
divided by 8.
When the quantization value is small, and the DCT coefficient is large, there is a danger of overflow in the eight bit
output. To avoid this a clipping circuit is included at the output
of the quantizer, which saturates at the maximum values.
4
Inverse DCT
This circuit replicates the operation of the Inverse Cosine
Transform in the Decoder, and outputs 9 bit signed pixel data
(intra mode) or pixel difference data (inter mode). The IDCT
fully meets the CCITT specification.
Reconstruction Adder
In Inter Mode, the IDCT data is added to the best fit block
from the previous frame store. In Intra mode, the IDCT data is
simply added to zero. After the adder, the sign bit is removed
from the result to give 8 bit pixels. Clipping circuits ensure that
any pixels with values exceeding 255 are clipped to 255, and
any with negative values are clipped to zero (such values are
possible due to quantization noise).
VP2611
Y
K
K
2064 cycles
YUV Input
Frame Store Read
Control Decisions
Frame Store Write
DBUS Output
MB1MB2MB3MB4
DUMMYMB1MB2MB3
DUMMYMB1MB2MB3
DUMMYDUMMYDUMMYMB1
DUMMYDUMMYDUMMYMB1
Fig 4: MacroBlock Pipelining
OPERATION OF INTERFACES
Macroblock Delays
The VP2611 has a three macroblock pipeline delay between pixel inputs and run length coded outputs. This is
illustrated in Figure 4. Whilst the second macroblock is being
input, the best fit macroblock from the previous frame is being
identified and then read from the frame store. At this time any
Control Decisions which are to effect the first macroblock must
be supplied by the host controller. The run length coded
outputs for the first macroblock are not available until the
fourth macroblock is supplied at the input pins.
YUV Input Port
The YUV port accepts pixel data from the preprocessor in
block format as illustrated in Figure 5. Within a complete
system the VP2611 is always the master device, and must be
supplied with macroblock data when it makes a demand. The
order in which pixels are supplied is pre-determined, and must
be strictly maintained. There are 64 pixels per sub-block and
4 luminance and 2 chrominance sub-blocks per macroblock.
The macroblocks themselves are divided into groups of blocks
( GOB's ), and the sequence specified in H.261 must also be
maintained. Note that, since the chrominance resolution is half
the luminance resolution both vertically and horizontally, then
the two chrominance blocks cover the same picture area as
the four luminance blocks.
The pre-processor producing macroblock data must produce a frame start signal ( FRMIN ) when it has a complete
frame of data available. This resets the input controller within
the VP2611, which will then generate sequential GOB and
macroblock numbers for the coded outputs referenced to this
input.
FRMIN must go high for at least one system clock period,
and must go low before the next frame is available. The
VP2611 responds to FRMIN with a request for macroblock
data ( REQYUV ), which occurs approximately 184 SYSCLK
periods after FRMIN. It must then receive a complete macroblock within 1871 SYSCLK periods, and at the end of this time
REQYUV will go inactive. The VP2611 must be provided with
a PCLK signal to strobe in the data. This must be derived from
SYSCLK, and must only be present when there is valid data
at the input. Data must meet the set up and hold times with
respect to PCLK as specified in Figure 6.
The maximum peak rate for PCLK is the SYSCLK rate
divided by two, but since there are 384 bytes per macroblock
then theoretically the average rate need only be 384/1871
times the SYSCLK rate. Note that PCLK must always be
obtained by dividing SYSCLK by an integer greater than one.
When the VP520 CIF/QCIF Converter is supplying the VP2611
with data, it provides a peak PCLK rate equivalent to SYSCLK
divided by two, and an average rate of SYSCLK divided by
four.
The mimimum gap between REQYUV going active is
2064 SYSCLK periods. In full CIF mode "dummy" macroblocks are internally inserted between rows, in order to give the
chip sufficient time to load a new search window. No new YUV
data must be loaded during these dummy macroblocks, and
REQYUV will remain inactive. No dummy macroblocks are
required in QCIF mode. With a 27MHz SYSCLK all macroblocks will be coded in less than a 30Hz frame rate period, and
there will be a period of inactivity before FRMIN goes active
again. During this period the output bus will remain static at all
ones, and no output strobe ( DCLK ) will be produced.
SCLK/2
20ns
PCLK
YUV7:0
20ns
10ns
N.B. All timings given are MINIMUM values.
0ns
Fig 6 : Timing at YUV Port
5
Loading...
+ 9 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.