The M65727 is a highly efficient motion estimation LSI used to estimate motion vectors for real-time
encoding of dynamic images. M65727 can be used together with the frame memory, the M65721
(Controller LSI) and the M65722 (Pixel Processor LSI). It operates under the control of the M65721.
M65727 accepts the template macro block (MB) data inputs from the M65721 and accepts the search
window image data from the frame memory. It estimate the motion vectors by searching the minimum
value of the mean absolute error between template block data and search window image data. It outputs
the result to the M65721. The M65721 and the M65722 are able to generate prediction image using the
above mentioned motion vector. The M65727 is designed so that it is applicable to MPEG2, the
international video compression standard.
When vertical ±7.5 is searched:512 pixels / (550) cycles
When vertical ±15.5 is searched: 768 pixels / (806) cycles
*Processing capability
27MHz operation:Processing the search range over horizontal ±7.5 and
vertical ±7.5 for the ITU-R 601 image size is possible.
40MHz operation:Processing the search range over horizontal ±7.5 and
vertical ±15.5 for the ITU-R 601 image size is possible.
*, **: "Top" and "Bottom" mentioned here indicate the field parities. In the
following explanation, a frame with the top field side as the first line is assumed.
*A: The mode of vertical search range is ±7.5 can be used only under 27MHz
operation.
*Block size16x16
*Search rangeSecond field: horizontal ±0.5 pixel, vertical: ±0.5 pixel
*Search methodTake the average with the first field data (half-pel is generated as needed)
and estimate 9 points over the second field (including the specified point)
*Evaluation function Full sampled mean absolute error
*Execution cycleThe execution cycle mentioned here refers to the throw-in period of MB,
not the 1MB processing time. (550) cycles / MB
*Search window image inputs
First field:432 (18x24) pixel / (550) cycles
Second field: 432 (18x24) pixel / (550) cycles
*Processing capability
27MHz operation:Processing the search range over horizontal ±7.5 and
vertical ±7.5 for the ITU-R 601 image size is possible.
Frame Dual Prime Mode
*Block size16x8
*Search rangeSecond field: horizontal ±0.5 pixel, vertical: ±0.5 pixel
*Search methodTake the average with the first field data (half-pel is generated as needed)
and estimate 9 points over the second field (including the specified point)
Not only the minimum evaluation value, but also all the 9 points are stored
and output.
*Execution cycleThe execution cycle mentioned here refers to the throw-in period of MB,
not the 1MB processing time. (550) cycles / MB
*Search window image inputs
First field:288 (18x16) pixel / (550) cycles
Second field: 288 (18x16) pixel / (550) cycles
*Processing capability
27MHz operation: Processing the search range over horizontal ±7.5 and
vertical ±7.5 for the ITU-R 601 image size is possible.
***:The Frame Dual-Prime Mode supports a part of Dual-Prime prediction specified
*Highly efficient parallel architecture for high speed processing and data transfer, which
eliminates I/O bottleneck.
*Supports prediction modes for MPEG2, Field prediction, Frame prediction, Field Dual-Prime
prediction and Frame Dual-Prime prediction.
*For Field Mode and Frame Mode, it is possible to do simultaneous vector search over 16x16 and two
16x8 blocks.
*Implementing low cost image compression hardware is possible using DRAM frame memory in a 32
bit DRAM interface.
*Estimates half-pel precision vectors in a chip.
*The exhaustive search method is used over the integer-pel precision vectors in a search range.
Evaluation function is full sampled mean absolute error.
*The M65727's scalable architecture allows wider search range with multiple-chip configuration.
When expanding the horizontal search range, it is possible to use a common search window image
data into all chips.
This is the output enable pin. It controls the tri-state of DOUT port. DOUT port.
2.2 Explanation of Pins
Functions and uses of M65727 pins are explained below. Refer to "2.1 List of Pins" for the bit
configuration of terminals and I/O attributes.
The term Execution cycle used in this explanation refers to 550 / 806 . It means that the above cycle is
capable of vector detection within a search range of -7 (-8) ~ +7 horizontally using integer precision.
When the horizontal search area is greater than or equal ±15, the integer precision operation requires
multiple execution cycles.
2.2.1 Data I/O Ports
DSWIThis is the 32 bit wide search window image data input port. The search
window image input is processed in parallel with the arithmetic operation.
Therefore, the data inputted will be used in the next execution cycle.
DMBIThis is a 8 bit wide template MB input port. The template MB input is
processed in parallel with the arithmetic operation. Therefore, the data inputted
will be used in the next execution cycle.
DOUTThis is an 8 bit wide output port, during the field or frame mode, receives output
request, OREQC, and outputs the following information in the following order.
horizontal motion vector, vertical motion vector, minimum distortion,
distortion of vector (0,0), half-pel indication code
During the field dual-prime mode, the M65727 outputs minimum distortion and
dmv indication code. During the frame dual-prime mode, it outputs minimum
distortion, dmv indication code and distortions correspond to all estimation
points.
2.2.2 System Control Pins
CLKIClock input.
RESETCRESET pin. Hardware reset. Asserted low. Not all registers are reset by
RESET. Before the normal operation, the M657272 requires RESET.
CECAsserted low. This pin enables the input clock. This signal is sampled at up-
edge of CLKI. The next clock cycle is valid when this signal is asserted. The
invalid clock cycle is called "wait cycle". The chip is designed as static CMOS
circuits and the internal data will not be destroyed during wait cycles.
This pin enables DSWI port. This signal is asserted low. Data is not accepted
during not-active cycles.
This pin enables DMBI port. This signal is asserted low. Data is not accepted
during not-active cycles.
SSYNCThis is a sync signal for the DSWI port. It is asserted low. This signal must be
asserted when the leading data for DSWI is inputted
MSYNCThis is a sync signal for the DMBI port. It is asserted low. This signal must be
asserted when the leading data for DMBI is inputted
ESYNCThis is a sync signal for the block level pipeline. When this signal is asserted,
one execution cycle (550 / 806 cycles) is activated. It is asserted low.
DSYNCThis is a sync signal for the DCNT port. It must be asserted when dynamic
control signal is inputted. See DCNT for the content of the dynamic control
signal. This signal is asserted low.
OREQCThis is used to request output. M65727 starts output from DOUT port after this
signal is asserted. This signal is asserted low.
2.2.4 Pins Specifying Operational Modes
MODEThis pin sets the mode of the M65727. The following four modes can be
specified.
00: Field mode, 01: Frame mode,
10: Field Dual-Prime mode, 11: Frame Dual-Prime mode
HSIZE
can be specified.
VSIZEThis pin specifies the vertical search range. The following two types of range
can be specified. 0: ±7.5, 1: ±15.5
EXTNDThis pin specifies the vertical search range expansion. When expansion modes
are selected, the vertical search range is set to ±8.0 / ±16.0. It is possible to
expand a vertical search range using multiple chips. Depending on modes, the
order of priority regarding the vectors with same distortions is different.
00: non-expansion, 01: reserved,
10: upper-range of expansion, 11: lower-range of expansion
HLFPLThis switches between half-pel precision search mode and integer-pel precision
FMFMTSwitches between the external memory (SW image) formats.
0: Field format, 1: Frame format
DCNTThis is a dynamic control input. (Dynamic control means the control which
differs in each execution cycle.) The following is required.
Control of valid or invalid for search range (SKILL)
Leading pixel location of search window image in the vertical direction when the
dual-prime mode (DVSPO)
The central position of the search window image used for the dual-prime mode
(DCNTR)
M65727 is configured as shown in Fig. 3.11. Its main components are the Input Unit, the Integer-pel
Unit, the Motion detection Unit, the Half-pel / Dual-Prime Unit, and the Output Unit. When the
Field/Frame mode is selected, the search window image data and template MB data are inputted to the
Input Unit from their respective input ports. The data is used as the source data of the Integer-pel Unit
after its order is changed. Then, the mean absolute error is calculated for each cycle at the Integer-pel
Unit and the result is given to the Motion detection Unit and the best integer-pel precision motion
vectors are estimated. Then, at the Half-pel Unit, the best half-pel precision motion vectors are
estimated. The results are output from the Output Unit. When the Field/Frame Dual-Prime mode is
selected, the template MB data and the search window image data of the first field and the second field
are sent to the Input Unit from their respective input ports and become the source data of the Dual-Prime
Unit. Then, the motion vector estimation is conducted at the Dual-Prime Unit and the results are
output from the Output Unit. The functions of each unit are outlined below.
3.2 Block Configuration
3.2.1 Input Unit
The function of Input Unit is to output the search window image data and the template MB data to the
Integer-pel Unit or Dual-Prime Unit with the necessary sequence and timing.
Having this block enables the user to input comparatively freely the needed search window image data,
using the sync signal (SSYNC) and the data enable signal (DENSWC), without regard to the motion
estimation sequence. Similarly, necessary template MB data can be inputted fairly freely using the sync
signal (MSYNC) and the data enable signal (DENMBC).
The search window image data is inputted from the highest line towards the lowest line scanning left to
right. The output sequence, on the other hand, starts from the leftmost column to the rightmost column
and scans top to bottom. The input and output sequence of the search window image data for DualPrime are from the highest line to the lowest line and scanning from left to right. The search window
image data is inputted as 4 vertical continuous pixels using the 32 bit input port. The Input Unit outputs
a pixel per cycle by parallel-serial conversion.
3.2.2 Integer-pel Unit
The function of this block is to calculate the mean absolute error using the template MB data and the
search window image data coming from the Input Unit. The Integer-pel Unit is composed of
processing elements arranged in parallel, allowing high speed processing of the data to be evaluated.
Three sets of calculated mean absolute error (corresponding to 16x16 block and two sets of 16x8 block)
are given to the Motion detection Unit. Three sets of search window image data and the template MB
data that correspond to the three sets of vectors are transferred from the integer-pel Unit to the Half-pel
Unit.
The function of this block is to select the vest motion vector of the integer-pel precision by comparing
the 16-bit mean absolute error coming from the Integer-pel Unit. In addition, this block stores
distortion of vector (0,0). In case when the distortions for multiple motion vectors are same, the most
suitable motion vector is determined according to the order of priority.
The output data depends on the modes. When the integer-pel precision search mode is specified during
the Field/Frame mode, the following items are output to the Output Unit. They are three sets of best
integer-pel precision motion vectors, the distortion for each, and the distortion for each vector (0,0). If
the half-pel precision search mode is specified when the Field/Frame mode, the three sets of best
integer-pel precision motion vectors and the distortion for each vector (0,0) are output to the Output Unit.
And, the distortions for each integer-pel precision vector are output to Half-pel Unit.
3.2.4 Half-Pel / Dual-Prime Unit
This Unit calculates, during the Field/Frame mode, the mean absolute errors for half-pel precision
vectors and detects the minimum distortions using the partial search window image data around best
integer-pel precision vectors.
The search window image consists of 18x18 pixels and two sets of 18x10 pixels around the three sets of
integer-pel precision motion vectors detected at the Motion detection Unit. Eight kinds of interpolated
images are generated by the half-pel interpolation filter. This image is matched against the template
MB data given from the integer-pel Unit and the minimum distortions are detected from the above
results.
During the Field/Frame Dual-Prime mode, it detects the best dmv from the template MB data, the first
search window image data and the second search window image data.
The first and second search window image consist of 18x18 pixel (10) each. The interpolated
image from the first search window is generated according to the central position information
(Displacement based on 0.5 pixel from 16x16 (8) pixels contained in 18x18 (10) pixels). Similarly,
nine sets of interpolated images are generated through the interpolated filter from the second search
window. Then, the nine sets of averaging images of the first and the second images are obtained.
Next, the block matching between the template MB data is conducted and the best dmv is obtained. In
case of the Frame Dual-Prime mode, not only the minimum evaluation value, but all the evaluated
values for all the displacement are output.
3.2.5 Output Unit
This is the interface circuit related to the Output Port, DOUT. Necessary data comes from the
Motion detection Unit and Half-pel / Dual-Prime Unit and is output to through DOUT in sync with the
output request signal, OREQC.
M65727 has the following operational modes which can be switched externally. This chapter
outlines the operational modes of M65727.
3.3.1 Field / Frame / Field Dual-Prime / Frame Dual-Prime
M65727 is capable of 4 modes, namely Field mode / Frame mode / Field Dual-Prime mode / Frame
Dual-Prime mode, to work with the prediction mode of MPEG2. These modes are specified as shown
below using MODE pins. They must be specified before the chip operation and fixed during operation.
00: Field mode
01: Frame mode
10: Field Dual-Prime mode
11: Frame Dual-Prime mode
Field mode detects three sets of motion vectors simultaneously which work with 16x16 block, 16x8
(upper) block, and 16x8 (lower) block.
Frame mode detects three sets of motion vectors simultaneously which work with 16x16 block, 16x8
(top) block, and 16x8 (bottom) block.
Field Dual-Prime and Frame Dual-Prime mode detect the dmv.
3.3.2 The Search Range in Horizontal Direction
The search range in the horizontal direction can be selected from ±7.5 / ±15.5 / ±31.5 /± 63.5/ and
±127.5. The number of cycles needed for the motion vector estimation increases in proportion to the
size of the horizontal search range. When ±15.5 or larger is specified as the horizontal search range, it
is expected that multiple chips with interleaving manner are used. Refer to Chapter 5.5 for the number
of chips needed when the horizontal search range is ±15.5 or more. The horizontal search range is
specified as shown below using HSIZE pins. The horizontal search range must be specified before the
chip goes into operation and should be fixed during the operation.
The vertical search range can be selected from ±7.5 and ±15.5. This range detects a minimum
execution cycle (550 / 806) In the vertical expansion mode, the search range will be ±8.0 / ±16.0.
See Chapter 5.5 for the number of chips needed for the vertical expansion. The vertical search range is
specified as shown below using VSIZE pin. This must be done before the chip operation and it should
be fixed during the operation.
0: ±7.5, 1: ±15.5
The vertical search range of ±7.5 can be used only under 27MHz operation.
3.3.4 Search range expansion for Vertical direction
The non-expansion mode of vertical search range is ±7.5 / ±15.5 as explained above. At this time,
the order of priority for vectors, which have same distortions, gives the highest priority to the vector
(0,0). When the search range expansion mode of the vertical direction is specified, the range becomes
±8.0 / 16.0. It becomes possible to expand the vertical search range using multiple chips. There are
two modes for vertical expansion mode, upper-range of expansion and lower-range of expansion. The
difference between the two is in the priority order for distortion equivalent vectors. If it is specified as
upper-range, the vector (0, +8 / +16) will have the highest priority. If it is specified as lower-range, the
vector (0, -8 / -16) will have the highest priority. See Ch. 5.2 for the order of priority for vectors.
See Ch. 5.5 for the number of chips needed for the vertical search range expansion.
This mode is specified by EXTND pin as shown below. This mode must be specified before the
chip goes into operation and it should be fixed during the operation.
00: non-expansion, 01: reserved, 10: upper-range of expansion (Expansion),
11: lower-range of expansion (Expansion)
3.3.5 Half-Pel Precision/Integer-Pel Precision
The motion vector searched by M65727 is integer-pel precision during the integer-pel precision mode.
In case of half-pel precision mode, half-pel precision vector is detected using interpolated search
window image. The order of output data is shown in Table 2 (See Ch. 3.3.7). The above two modes
have different outputs.
This mode is specified by HLFPL pin as shown below. This mode should be specified before the
chip operation and should stay fixed.
M65727 is capable of selecting the external frame memory (SW image) format only when the Frame
mode is ON. This format is specified by the use of FMFMT pin as shown below. In other modes,
only field format can be used. See Ch. 5.4 for details on formats. This mode must be specified prior
to the chip operation and should be fixed during the operation.
0: Field format 1: Frame format
3.3.7 Operation Modes and Output Data
Data output from M65727 differs according to the operational modes. During the Field / Frame
mode, three sets of data group, a group of 16x16 and two groups of 16x8, are output in sequence.
Outputs from half-pel precision mode and integer precision search mode are different.
When the Field mode is ON, a data group for 16x16 block is first output. A data group for 16x8
(upper) block is output next and a data group for 16x8 (lower) is output last. It takes 21 cycles.
When the Frame mode is ON, a data group for 16x16 block is output first. A data group for 16x8
(top) is output next. And a 16x8 (bottom) block is output last. It takes 21 cycles.
Table 2 shows a set of data outputs during the Field/Frame mode in the order they are output.
In case of the Field Dual-Prime mode, Dual-Prime vector specifying code and its distortion are output in
the order shown in Table 3. It takes 3 cycles to output data.
When the Frame Dual-Prime mode is used, the Dual-Prime vector specifying code, its distortion, and
distortions for each displacement point are output in the order shown in Table 4. It takes 21 cycles to
output data.
The integer-pel precision motion vector outputs as motion vector corresponding to 16x16 block even
for 16x8 block. Therefore, during the Frame Estimation Mode, the vertical component of the motion
vector for the 16x8 block must be changed outside.
When multiple chips are used to expand the vertical search range, the vertical components of the
motion vectors must be changed for all blocks.
Fig. 3.3.7-1 shows correspondence between 16x8 block and vectors.
vertical component
3Minimum evaluation value (upper 8bits) Minimum evaluation value (upper 8bits)
4Minimum evaluation value (lower 8bits) Minimum evaluation value (lower 8bits)
5(0, 0) evaluation value (upper 8bits)(0, 0) evaluation value (upper 8bits)
6(0, 0) evaluation value (lower 8bits)(0, 0) evaluation value (lower 8bits)
7all 0 (L) outputHalf-pel indication code
Note 1:The motion vector is a binary number in 2's complement. It is output after it is expanded to
8 bits.
Note 2:Upper 8 bits of the evaluation value is first output and the lower 8 bits are output next in
natural binary
Note 3:Half-pel indication code is specified by the lower 4 bits as shown below. The upper 4 bits
are for L output.
0000: Most suitable for integer-pel precision motion vector(0.0, 0.0)
1010: Upper-left direction Half-pel of integer-pel precision vector(-0.5, -0.5)
1001: Upper-right direction Half-pel of integer-pel precision vector(+0.5, -0.5)
0110: Lower-left direction Half-pel of integer-pel precision vector(-0.5, +0.5)
0101: Lower-right direction Half-pel of integer-pel precision vector(+0.5, +0.5)
0010: Left direction Half-pel of integer-pel precision vector(-0.5, +0.0)
0001: Right direction Half-pel of integer-pel precision vector(+0.5, +0.0)
1000: Upper direction Half-pel of integer-pel precision vector(+0.0, -0.5)
0100: Lower direction Half-pel of integer-pel precision vector(+0.0, +0.5)
Note 4:The (0,0) evaluation value is an evaluation value corresponding to the no-motion. When
specifying the upper range of expansion, the evaluation point of (X, Y) = (0, +8 / +16) is
used as the position; when specifying the lower range of expansion, the evaluation point of
(X, Y) = (0, -8 / -16) is used as the position.
Table 3 Relationship between Field Dual-Prime Estimation Mode and Its Output Data
Output sequence
Minimum evaluation value (Upper 8bits)
1
2Minimum evaluation value (Lower 8bits)
3dmv indication code
Table 4 Relationship between Frame Dual-Prime Estimation Mode and Its Output Data
Output
sequence
1
2Minimum evaluation value (Lower)13Left evaluation value (Lower)
3dmv indication code14Right evaluation value (Upper)
4Center evaluation value (Upper)15Right evaluation value (Lower)
5Center evaluation value (Lower)16Left lower evaluation value (Upper)
6Left upper evaluation value (Upper)17Left lower evaluation value (Lower)
7Left upper evaluation value (Lower)18Lower evaluation value (Upper)
8Upper evaluation value (Upper)19Lower evaluation value (Lower)
9Upper evaluation value (Lower)20Lower right evaluation value (Upper)
10Right upper evaluation value (Upper)21Lower right evaluation value (Lower)
11Right upper evaluation value (Lower)
Minimum evaluation value (Upper)Output sequence12Left evaluation value (Upper)
Note 1:The evaluated values are output using the natural binary number. First, the upper 8 bits are
output and the lower 8 bits are output next.
Note 2:The dmv indication code is specified using the lower 4 bits as shown below. The upper 4
bits are for L output.
0000: The center point vector is optimum(+0.0, +0.0)
1010: Upper left from the center point vector (-0.5, -0.5)
1001: Upper right from the center point vector (+0.5, -0.5)
0110: Lower left from the center point vector (-0.5, +0.5)
0101: Lower right from the center point vector (+0.5, +0.5)
0010: Left of the center point vector (-0.5, +0.0)
0001: Right of the center point vector (+0.5, +0.0)
1000: Upper direction from the center point vector (+0.0, -0.5)
0100: Lower direction from the center point vector (+0.0, +0.5)
3.3.8 Operational Modes and Dynamic Control Signals (for each processing cycle)
M65727 has controls which need to change every execution cycle. These controls differ according
to operational modes as shown below. They are input to the chip through DCNT pins when DSYNC is
asserted. One assertion is needed for each information write into the chip. Therefore, when a mode
needs multiple control information, DSYNC must be asserted multiple times. DSYNC is asserted
low.