This document contains proprietary information of LSI Corporation. The information contained herein is not to be used by or disclosed to third parties without the
express written permission of an officer of LSI Corporation.
Document DB14-000045-00, Final Revision F (May, 1998)
This document describes revisions D through F of LSI Logic Corporation’s
L64005 MPEG-2 Audio/Video Decoder and will remain the official reference
source for all revisions/releases of this product until rescinded by an update.
To receive product literature, call us at 1.800.574.4286 (U.S. and Canada);
+32.11.300.531 (Europe); 408.433.7700 (outside U.S., Canada, and Europe)
and ask for Department JDS; or visit us at http://www.lsilogic.com.
LSI Logic Corporation reserves the right to make changes to any products herein
at any time without notice. LSI Logic does not assume any responsibility or liability arising out of the application or use of any product described herein, except
as expressly agreed to in writing by LSI Logic; nor does the purchase or use of
a product from LSI Logic convey a license under any patent rights, copyrights,
trademark rights, or any other of the intellectual property rights of LSI Logic or
third parties.
In particular, supply of the LSI Logic IC L64005 does not convey a license or
imply a right under certain patents and/or other industrial or intellectual proper ty
rights claimed by IRT, CCETT and Philips, to use this IC in any ready-to-use electronic product. The purchaser is herby notified that Philips, CCETT and IRT are
of the opinion that a generally available patent license for such use is required
from them. No warranty or indemnity of any sort is provided by LSI Logic regarding patent infringement.
LSI Logic logo design is a registered trademark of LSI Logic Corporation. All
other brand and product names may be trademarks of their respective companies.
This book is the primary reference and technical manual for the L64005
MPEG-2 Audio/Video Decoder. It contains a complete functional description and includes complete physical and electrical specifications for the
L64005.
AudienceThis document assumes that you have some familiarity with microproces-
sors and related support devices. The people who benefit from this book
are:
♦ Engineers and managers who are evaluating the processor for pos-
sible use in a system
♦ Engineers who are designing the processor into a system
OrganizationThis document has the following chapters:
♦ Chapter 1 Introduction, describes the system interface and the
architecture of the L64005 MPEG-2 Audio/Video Decoder.
♦ Chapter 2 Registers, discusses the L64005 internal registers. It also
provides a description of the internal memory mapping and how the
registers are accessed from the system interface. This chapter is
intended primarily for system programmers who are developing software drivers.
♦ Chapter 3 Signals, provides detailed information on the L64005 sig-
nals. The signal descriptions are useful for hardware designers who
are interfacing the L64005 with other devices.
♦ Chapter 4 Video Data Flow, This chapter describes the MPEG bit-
stream construction, parsing and error handling as well as the operation of the channel buffer.
♦ A level-significant signal that is true or valid when the signal is LOW
always has an overbar () over its name.
♦ An edge-significant signal that initiates actions on a HIGH-to-LOW
transition always has an overbar () over its name.
Revision
History
The word
deassert
assert
means to drive a signal true or active. The word
means to drive a signal false or inactive.
Hexadecimal numbers are indicated by the prefix “0x” before the number—for example, 0x32CF. Binary numbers are indicated by a subscripted “2” following the number—for example, 0011.0010.1100.11112.
This section lists the changes in this document from initial release to the
current version.
VersionRelease DateComments
L64005.ADV.0March 4, 1996Initial release
L64005.ADV.1August 23, 1996Major modifications to most chapters.
L64005.FinalMay 11, 1998Minor changes to most chapters.
Changed register map, pinout, and
signal descriptions. Added Section
6.3, “Reduced Memory Mode, ” Section
3.6, “PLL Interface,” and Section 5.5,
“Channel Buffer Architecture”.
Added corrections from document
review and relevant items from
L64005 Rev. E and F ECNs.
Notice for
L64002 Users
This section is for customers using the L64002, and who want to upgrade
to the L64005. The following is a brief description of the pertinent
changes, with emphasis on pinout and necessary software changes.
Please note: LSI Logic recommends building new boards to
ensure L64005 to L64002 compatibility. A simple 0 Ω resistor jumper (for pin 69) allows switching between the loop filter and the CAS signal.
Pinout ChangesIf the L64005 is used with fast page mode DRAM, then a few changes
are needed. For further information, please refer to Chapter 9: Specifications.
♦ Pin 64 is CAS for the L64005, not BA9 (BA9 has been removed).
♦ For Rev. E and F devices Pin 69 is now not connected (NC) and no
external loop filter is required. The filter may be left in place on any
board that already has it designed in.
For the L64005 Rev. D, Pin 69 is LP2. Regardless of the DRAM
mode used, an external loop filter must be included in the design
(requires one resistor and two capacitors for an off-chip loop filter).
♦ The DRAM interface now supports both regular and synchronous
DRAM modes. See Section 5.3.2, “Synchronous DRAM Mode,” for
more information on the SDRAM interface.
♦ New AC timing specifications and drawings have been added to Sec-
tion 9.1
♦ Pin 68 is Analog VDD (AVDD), and pin 70 is Analog GND (AGND).
These pins must be isolated from other VDD and VSS pins.
♦ Please note that the L64005 has an on-chip PLL, so the 27-MHz
input clock must have low jitter (<300ps).
♦ The duty cycle for SYSCLK has been specified slightly differently.
Please refer to Chapter 9, Specifications, for details.
Software
Changes
A few changes must be made to L64005 supporting software.
♦ Bit 0in Group 7, Register 27 must be set for reduced memory mode
(1=RMM, 0=Normal).
♦ If reduced memory mode is used, Group 7, Register 27, Bits [7:2]
must be set to determine the number of 8-line segments used for a
B-frame decode.
♦ Bits [4:3] of Group 7, Register 1 are no longer used for PMCT (1CAS
enable) or 512-page size select. In the L64005, bits [4:3] are used
to select the DRAM mode. Refer to Chapter 2 for more details.
♦ In the L64005 32-bit mode is not supported. Bit 5 of Group 7, Reg-
ister 1 is now reserved.
♦ Bit 6 of Group 7, Register 26 controls line doubling for the interlaced
display mode.
♦ In the L64005, bits [7:0] in Group 7, Register 28 contains the hori-
♦ Additional field status bits have been added to the register map. Odd
Field First and Last Active Field have been added to Group 6, Register 31, Bits [3:2]. Refer to Section 2.8.18, “Group 6 Display Mode
1” for more details.
Sections 1.1 through 1.4 explain in general terms the requirements of the
Moving Picture Expert’s Group MPEG-2 International Standard (IS)
13818 as applied to video compression and decompression. These sections provide a good foundation for the L64005-specific discussion that
follows in Sections 1.5 through 1.7.
1.1
Video
Compression
and
Decompression
Concepts
The MPEG standard defines a format for compressed digital video.
Encoders designed to work within the confines of the standard compress
video information, and decoders decompress it.
The MPEG algorithms for video compression and decompression are
flexible, but generally fit the following criteria:
♦ Data rates are about 1 to 1.5 Mbit/s for MPEG-1 and up to 15 Mbit/s
for MPEG-2. The L64005 MPEG-2 decoder’s channel interface is
capable of supporting a 20 Mbit/s serial data rate or a 40 Mbit/s parallel data rate.
♦ Resolutions are about 352 pixels horizontally up to about 288 lines
vertically for MPEG-1 and 720 x 576 for MPEG-2 (main profile/main
level). The L64005 is capab le of resolutions up to 720 x 576 f or either
MPEG-1 or MPEG-2.
♦ Display frame rates range from 24 to 30 frames per second.
1.1.1
Video Encoding
For a video signal to be compressed, it must be sampled, digitized, and
converted to luminance and color difference signals (Y, Cr, Cb). The
MPEG standard stipulates that the luminance component (Y) be sampled
with respect to the color difference signals (Cr and Cb) by a ratio of 4:1.
That is, for every four samples of Y, there is to be one sub-sample each
of Cr and Cb, because the human eye is much more sensitive to luminance (brightness) components than to color components. Video sampling takes place in both the vertical and horizontal directions. Once
video is sampled, it is reformatted, if necessary, into a non-interlaced signal. An interlaced signal contains only part of the picture content (every
other horizontal line, for example) for each complete display scan.
The encoder must also choose which picture type to use. A picture corresponds to a single frame of motion video, or to a movie frame. There
are three picture types:
♦ Intracoded pictures (
other pictures.
♦ Predictive-coded pictures (
compensated prediction from the past I or P reference pictures.
♦ Bidirectionally predictive-coded pictures (
motion compensation from a previous and a future I or P-picture.
I-pictures
P-pictures
) are coded without reference to any
) are coded using motion-
B-pictures
) are coded using
A typical coding scheme contains a mixture of I, P, and B-pictures. Typically, an I-picture may occur every half a second, to give reasonably fast
random access, with two B-pictures inserted between each pair of I- or
P-pictures.
Once the picture types have been defined, the encoder must estimate
motion vectors for each
of a 16-pixel by 16-line section of luminance component and two spatially
corresponding 8-pixel by 8-line sections, one for each chrominance component.
Motion vectors give the displacement from the stored previous picture.
P-pictures use motion compensation to exploit temporal redundancy in
the video. Motion within the pictures means that the pix els in the previous
picture will be in a different position from the pixels in the current block,
and the displacement is given by motion vectors encoded in the MPEG
bitstream. Motion vectors define the motion of a macroblock, which is the
motion of a 16 x 16 block of luminance pixels and the associated chrominance components.
When an encoder provides B-pictures, it must reorder the picture
sequence so that the decoder operates properly. Because B-pictures use
motion compensation based on previously sent I- or P- pictures, they can
only be decoded after the referenced pictures have been sent.
As mentioned earlier, a macroblock is a 16 x 16 region of video, corresponding to 16 horizontal pixels and 16 vertical display lines. When sampling a block, the video encoder captures the luminance component of
every pixel in the horizontal direction, and the luminance component of
every line in the vertical direction. However, the encoder similarly captures only every other Cb and Cr chrominance component. The result is
a 16 x 16 block of luminance components and two 8 x 8 blocks each of
Cr and Cb components. Each macroblock then consists of a total of six
8 x 8 blocks (four 8 x 8 luminance blocks, one 8 x 8 Cr block, and one
8 x 8 Cb block), as illustrated in Figure 1.1.
Figure 1.1
MPEG Macroblock
Structure
8
8
01
8
23
8
YCrCb
88
88
45
It is important to note that the spatial picture area covered by the four
8 x 8 blocks of luminance is the same area covered by each of the 8 x 8
chrominance blocks. Because half as many chrominance samples are
needed to cover the same area, they fit into an 8 x 8 block instead of a
16 x 16 block.
For a given macroblock, the encoder must choose a coding mode. The
coding mode depends on the picture type, the effectiveness of motion
compensation in the particular region of the picture, and the nature of the
signal within the block. In addition, for MPEG-2 the encoder must choose
to code the macroblock as either a field or frame. After it selects the coding method, the encoder performs a motion-compensated prediction of
the block contents based on past and/or future reference pictures. The
encoder then produces an error signal by subtracting the prediction from
the actual data in the current macroblock. The error signal is separated
into 8 x 8 blocks (four luminance blocks and two chrominance blocks)
and a discrete cosine transform (DCT) is performed on each 8 x 8 block.
The DCT operation converts an 8 x 8 block of pixel values to an 8 x 8
matrix of horizontal and vertical spatial frequency coefficients. An 8 x 8
block of pixel values can be reconstructed by performing the inverse discrete cosine transform (IDCT) on the spatial frequency coefficients. In
general, most of the energy is concentrated in the low frequency coefficients, which are located in the upper left corner of the transformed
matrix. A quantization step achieves compression — where an index
identifies the quantization intervals. Because the encoder identifies the
interval and not the exact value within the interval, the pixel values of the
block reconstructed by the IDCT have reduced accuracy.
The DCT coefficient in the upper left location (0, 0) of the block represents the zero horizontal and zero vertical frequencies and is known as
the
DC coefficient.
The DC coefficient is proportional to the average pixel
value of the 8 x 8 block, and additional compression is provided through
predictive coding because the difference in the average value of neighboring 8 x 8 blocks tends to be relatively small. The other coefficients
represent one or more nonzero horizontal or nonzero vertical spatial frequencies, and are called
AC coefficients.
The quantization level of the
coefficients corresponding to the higher spatial frequencies favors the
creation of an AC coefficient of zero by choosing a quantization step size
such that the human visual system is unlikely to perceive the loss of the
particular spatial frequency, unless the coefficient value lies above the
particular quantization level. The statistical encoding of the expected
runs of consecutive zero-valued coefficients of higher-order coefficients
accounts for some coding gain.
To cluster nonzero coefficients early in the series and to encode as many
zero coefficients as possible following the last nonzero coefficient in the
ordering, the coefficient sequence is specified to be a zigzag ordering.
concentrates the highest spatial frequencies at the end
of the series. The MPEG-2 standard includes additional block scanning
orders.
1.1.2
Bitstream
Syntax
After block scanning has been performed, the encoder performs
length coding
on the AC coefficients. This process reduces each 8 x 8
run-
block of DCT coefficients to a number of events represented by a nonzero coefficient and the number of preceding zero coefficients. Because
many coefficients are likely to be zero after quantization, run-length coding increases the overall compression ratio.
The encoder then performs
variable-length coding
(VLC) on the resulting
data. VLC is a reversible procedure for coding that assigns shorter codewords to frequent events and longer codewords to less frequent events,
thereby achieving video compression. Huffman encoding is a particularly
well-known form of VLC that reduces the number of bits necessary to
represent a data set without losing any information.
The final compressed video data is now ready for transmission to either
a local storage device from which a video decoder may later retrieve and
decompress the data, or to a remote video decoder via cable or direct
satellite broadcast, for example.
The MPEG standard specifies the syntax for a compressed bitstream.
The video syntax contains six layers, each of which supports either a signal processing or a system function. The layers and their functions are
described in Table 1.1.
Table 1.1
MPEG
Compressed
Bitstream Syntax
Syntax LayersFunction
Sequence LayerRandom Access Unit: Context
Group of Pictures LayerRandom Access Unit: Video
Picture LayerPrimary Coding Unit
Slice LayerResynchronization Unit
Macroblock LayerMotion Compensation Unit
Block LayerDCT Unit
The MPEG syntax layers correspond to a hierarchical structure. A
sequence
a header and some number of
is the top layer of the video coding hierarchy and consists of
groups-of-pictures (GOPs).
The sequence
header initializes the state of the decoder, which allows the decoder to
decode any sequence without being affected by past decoding history.
Figure 1.2
Typical Sequence
of Frames in
Display Order
A GOP is a random access point; that is, it is the smallest coding unit
that can be independently decoded within a sequence, and consists of a
header and some number of pictures. The GOP header contains time
and editing information.
The three types of pictures as explained earlier are:
♦ I-pictures
♦ P-pictures
♦ B-pictures
Note that because of the picture dependencies, the bitstream order (the
order in which pictures are transmitted, stored, or retrieved), is not the
display order, but rather the order in which the decoder requires the pictures for decoding the bitstream. For example, a typical sequence of pictures, in display order, might be as shown in Figure 1.2.
IBBPBBPBBPBBIBBPBBP
012345678910 11 12 13 14 15 16 17 18
In contrast, the bitstream order corresponding to the given display order
would be as shown in Figure 1.3.
Figure 1.3
Typical Sequence
of Frames in
Bitstream Order
IPBBPBBPBBIBBPBBPBB
031264597812 10 11 15 13 14 18 16 17
Because the B-pictures depend on the subsequent I- or P-picture in display order, the I- or P-picture must be transmitted and decoded before
the dependent B-pictures.
Pictures consist of a header and one or more
contains time, picture type, and coding information.
A slice provides some immunity to data errors. Should the bitstream
become unreadable within a picture, the decoder should be able to
recover by waiting for the next slice, without having to drop an entire
picture.
Slices consist of a header and one or more
macroblocks.
The slice
header contains position and quantizer scale information.
A macroblock is the basic unit for motion compensation and quantizer
scale changes. In MPEG-2 the block can be either field or frame coded.
Each macroblock consists of a header and six component 8 x 8 blocks:
four blocks of luminance, one block of Cb chrominance, and one block
of Cr chrominance. The macroblock header contains quantizer scale and
motion compensation information.
A macroblock contains a 16-pixel by 16-line section of luminance component and the spatially corresponding 8-pixel by 8-line section of each
chrominance component. A skipped macroblock is one f or which no DCT
information is encoded.
Blocks
are the basic coding unit, and the DCT is applied at this block
level. Each block contains 64 component pixels arranged in an 8 x 8
order. Note that pixel values are not individually coded, but are components of the coded block.
Note that the picture area covered by the four blocks of luminance is the
same as the area covered by each of the chrominance blocks. Each luminance pixel corresponds to one picture pixel, but because the chrominance information is subsampled with a 2:1 ratio both horizontally and
vertically (4:1 total), each chrominance pixel corresponds to four picture
pixels.
1.1.3
Video Decoding
Video decoding is the reverse of video encoding and is intended to
reconstruct a moving picture sequence from a compressed, encoded bitstream. Decoding is simpler than encoding because there is no motion
estimation performed and there are far fewer options.
The data in the bitstream is decoded according to the syntax defined in
the MPEG-2 standard. The decoder must first identify the beginning of a
coded picture and identify the type of picture, then decode each individual macroblock within a particular picture. Motion vectors and macrobloc k
types (each of the picture types I, P, and B have their own macroblock
types) present in the bitstream, are used to construct a prediction of the
current macroblock based on past and future reference pictures that the
encoder has already stored. Coefficient data is then inverse quantized
and operated on by an inverse DCT process that changes data from the
frequency domain to the time and space domain.
After the decoder processes all of the macroblocks, the picture reconstruction is complete. If the picture just reconstructed is a reference picture (I-picture or P-picture), it replaces the oldest stored reference picture
and is used as the new reference for subsequent pictures. The pictures
may need to be reordered before they are displayed, in accordance with
the display order instead of the coding order. After the pictures are reordered, they may be displayed on an appropriate output device.
1.2
Audio
Compression
and
Decompression
Concepts
1.2.1
MPEG Audio
Encoding
Given an
audio stream
elementary stream
of data (for audio data, this is called an
), an MPEG encoder first digitally compresses and codes
the data. The MPEG algorithm offers a choice of levels of complexity and
performance for this process.
To prepare a stream of compressed audio data for transmission, it is formatted into
correction data, and optional user-defined
frames are then sent in
System Stream
audio frames
.
. Each audio frame contains audio data, error-
packets
ancillary data
grouped within
packs
. The audio
in an ISO MPEG
The packs in system streams may contain a mix of audio packets and
video packets f or one or more channels . Packs ma y contain packets from
separate elementary streams. Thus, MPEG can easily support multiple
channels
of program material, and a decoder given access to a system
stream may access large numbers of channels.
MPEG audio encoding is intended to efficiently represent a digitized
audio stream by removing redundant information. Because different
applications have different performance goals, MPEG uses different
encoding techniques. These techniques, called
Layers
, provide a different trade-off between compression and signal quality. The MPEG algorithm uses the two following processes for removing redundant audio
information:
♦ Coding and quantization
♦ Psychoacoustic modelling
Coding and quantization are techniques that are applied to data that has
been mapped into the frequency domain and filtered into subbands.
Psychoacoustic modeling is a technique that determines the best allocation of data within the available data channel bandwidth based on human
perception.
The general structure of an MPEG audio encoder is shown in Figure 1.4.
Digitized
Audio
Input
Frequency
Filter Bank
(Mapping)
Psychoacoustic
Model
Bit Allocation
Processor
(Among Subbands,
Coding, Quantizing)
Bitstream
Formatter
Once audio data has been coded, it may be stored or transmitted digitally. MPEG provides a framework for use of packet-oriented transmission of compressed data. In particular, ISO CD 11172 defines formats
for digital data streams for both video and audio. The ISO System
Stream format is designed to accommodate both audio packets and
video packets within the same frame work for transmission. The data may
be physically delivered in parallel form or serial form. The System Stream
is composed of a sequence of packs, as shown in Figure 1.5.
Figure 1.5
ISO System
Stream
PackPack
. . .
Pack
Layer
Header
Contains:
Pack Start Code (32 bits),
System Clock Reference
(128 bits)
System
Header
Packet
An MPEG pack is composed of a
packet
, a sequence of
Packet
(first)(last)(variable #)
More Packets
Contains:
Various data, including
system stream ID
pack layer header
packets
, and ends with an ISO 11172
PacketISO
Contains:
Audio stream data
(in audio frames)
11172
End Code
, a
system header
end code.
The pack layer header contains a pack start code used for synchroniza-
tion purposes, and a system clock value. The system header packet contains a variety of housekeeping data and in particular contains a system
stream ID used to differentiate multiple system streams. A sequence of
one or more packets contains either encoded audio or encoded video
stream data. The ISO 11172 end code is the final element in an MPEG
pack. For detailed definition of pack headers, refer to the ISO CD
11172-1 system stream descriptions.
Any one MPEG packet carries either audio or video data, but not both
simultaneously. An MPEG Audio Packet contains an audio packet header
and one or more Audio Frames. Figure 1.6 shows the packet structure.
Figure 1.6
MPEG Audio
Packet Structure
Audio Packet
Audio
Packet
Header
Contains:
Packet Start Code
Packet Length
Presentation Time Stamps
Audio
Frame
(first)(last)
(quantity varies)
. . .
Audio Frames
Contains:
Audio Frame Header
Audio Frame CRC
Audio Data
Ancillary/User Data
Audio
Frame
Audio Packet
1.2.1.1 Audio Packet Header
An audio packet header contains the following:
♦ Packet Start Code
Identifies this as an audio packet. The Packet Start Code also contains a five-bit audio stream identifier that may be read by the user
to identify the audio channel.
♦ Packet Length
Indicates the number of bytes remaining in the audio packet.
♦ Presentation Time Stamps (PTS)
. . .
The PTS indicates when audio data should be presented.
1.2.1.2 Audio Frame
An Audio Frame contains a slice of the audio data stream together with
some supplementary data. Audio frames have the following elements: