Table 8-1-1 outlines comparisons between DV and other digital VCR formats.
D9 (Digital S)DVC ProDV CAMDV
MaterialMPME
Tape
Sampling frequency
(MHz)
Sampling rate
Width (inch)1/21/4
Track pitch (µm)20
Y13.5
R-Y/B-Y6.753.375
NTSC
PAL
4 : 2 : 2
181510
4 : 1 : 1
4 : 2 : 0 (Line Sequential)
Video
Audio
Error correction code
Modulation method
As shown in the above table, the major specifications of the DV
are almost identical to the D9 (Digital S).
Differences lie only in the tape used, sampling rate and number
of executed pixels.
Samples per line
TV lines per frame
Quantization bits
Compression method
Compression
Sampling frequency (kHz)
Quantization bits
Channels
Compression
Y720
R-Y/B-Y360180 : NTSC,360 : PAL
NTSC
PAL
Table 8-1-1 Comparison of Digital VCR Formats
482
578
1/3.3
48
16
4
480
576 : Y , 288 : R-Y/B-Y
8
In-frame DCT
1/5
48(32)
16(12)
2(4)
Non compressed
Reed-Solomon integration code
SI-NRZI , 24/25
(A) Video signal
High video quality thanks to the 4:1:1 (*4:2:0) component digital recording.
(Note) Descriptions marked * in the text and figures are PAL
data.
• Luminance signal
The Y signal is sampled at 13.5 MHz so the Y signal bandwidth
is about half the sampling frequency, or 6.75 MHz. This provides better resolution than S-VHS and current TV broadcasting
(Y bandwidth of about 4.5 MHz).
8-1
• Chrominance signal
By recording the chrominance signal in a form divided into color
difference signals R-Y and B-Y (component recording method),
recording with excellent color reproduction and little color blur
is achieved.
MHz
Y Signal
First effective line
in each field
Colour difference
Signal (CR, CB)
First effective line
in each field
1/13.5
MHz
1/3.375
First pixel in the effective period
With VHS, the chrominance signal has been converted into a
lower-frequency signal so that its bandwidth has been limited
to 0.5 MHz. The DV has expanded the bandwidth by about 3
times to 1.68 MHz (3.375MHz/2) and records the chrominance
components quite separately from the Y component. The sampling frequency of the R-Y and B-Y signals are 3.375 MHz (*6.75
MHz), which is 1/4 (*1/2) of the Y signal sampling frequency of
13.5 MHz. Based on this ratio, this recording method is referred
to as 4:1:1 (*4:2:0 line sequential) digital component recording.
Line
285
Line
23
Line
286
Line
24
Line
287
Line
25
Line
285
Line
23
Line
286
Line
24
Line
287
Line
25
720 Samples
180 Samples
4:1:1 Sampling
: Effective sample
: Interlaced sample
480 Samples
480 Samples
X2
Y Signal
First effective line
in each field
Colour difference
Signal (CR, CB)
First effective line
in each field
Fig. 8-1-1 4:1:1 Component Digital Recording
MHz
1/13.5
Line
335
Line
23
Line
336
Line
24
Line
337
Line
25
MHz
1/6.75
Line
335
Line
23
Line
336
Line
24
Line
337
Line
25
Line
338
Line
26
Line
339
Line
27
First pixel in the effective period
720 Samples
360 Samples
4:2:0 Sampling
: CR
: CB
576 Samples
288 Samples
X2
8-2
Fig. 8-1-2 *4:2:0 Component Digital Recording
(B) Audio signal
The DV format also records the audio digitally. Two recording
formats (48 kHz, 32 kHz) are provided to allow selection according to the purpose. Compatibility with 44.1 kHz sampling is also
provided for use in the reproduction of software tapes.
• 48 kHz, 16-bit high quality stereo mode
This recording mode provides high audio quality equivalent to
the DAT.
Although compression technology is used in video recording,
the audio recording does not use compression.
• 32 kHz, 12-bit mode with a stereo dubbing capability
This mode divides the audio area into two parts and records 12bit stereo audio channels to both of them, or a total of 4 channels. This makes it possible to record 2 audio channels at the
same time as performing video recording and to dub 2 additional audio channels while leaving the original audio channels.
(C) Other features
• Video signal output
The digital signal read out from a tape during playback is fed to
the image memory to arrange the time axis before being output
as the video signal. This makes it possible to reduce wow &
flutter, which may be caused by head rotation irregularities or
tape transport variations.
8.1.2 Tape Format of DV
Note) Descriptions marked * in the text and figures are PAL
data.
The DV records the video and audio signals independently, in
the video area and in the audio area respectively. With the video
signal, the data of each frame is divided into 10 tracks (*12 tracks)
before being recorded. Namely, the data of a frame is input to a
memory, divided into 10 (*12), and recorded into 10 tracks (*12
tracks) using a 2-channel head by rotating the drum by 5 turns
(*6 turns) in the period of one frame (33.3 ms/*40 ms) of a
normal NTSC (*PAL) signal. The sub-code area records data including the timecode, recording date, index and absolute track
number, and the ITI area records the reference signal of the
absolute height of track (SSA) ad tracking signals. The DV does
not use the tracking control track. The tracking system of the
DV is called the 3-frequency (F0, F1, F2) digital pilot system,
which records digital pilot signals F0 (0Hz), F1 (465 kHz) or
F2 (697.5 kHz) on every track during recording. When the F0
track is reproduced, the DV controls the tracking servo so that
the crosstalk levels of the pilot signals (F1, F2) in the tracks on
the left and right of the F0 track become identical. The pilot
signals are recorded over all the tracks.
Sub
• Timecode enabling editing
The DV tape has a sub-code area for recording signals for use in
editing. Timecodes are automatically recorded frame by frame
in this area so that the video can later be edited on a per-frame
basis. The sub-code area also includes the recording of index ID
signals, which can be used in search and other operations later.
• AUX data recording information on recording and shooting
The video signal recording area on the tape records the signal
called the AUX data together with the video signal. The AUX
data contains information on the recording date and shooting
conditions such as wide-screen shooting, and is recorded automatically during recording. Parts of the AUX data information
can be displayed on the screen as required. The AUX data on
the record mode, etc. is also recorded in the audio area in a
similar way to the video area.
• Digital interface for digital inputs/outputs (optional)
All track data can be input or output directly without altering the
digital format (IEEE1394 compliant).
DV terminals are provided to enable dubbing and editing using
digital signals.
Video
Head writing
Audio
ITI
F0F0F0F0F0F1F2F1F1F2
10 tracks/frame
Fig. 8-1-3 NTSC Format DV Tape
Sub
Video
Head writing
Audio
ITI
F0F0F0F0F0F1F2F1F1F2F0F2
12 tracks/frame
Tape travel
Tape travel
Fig. 8-1-4 PAL Format DV Tape
8-3
(A) Mini-DV cassette tape
The DV cassette uses metallic magnetic particles so that stable, high-power signals can be obtained from even very thin
tracks. Metal tapes include the metal particle coated (MP) tapes
and metal evaporated (ME) tapes. The Metal tape used with the
DV is the ME tape which features a thin magnetic layer and low
demagnetization.
• ID board and memory-in cassette (MIC)
The DV cassette has 4-contact terminals on the back label side
edge of the shell. These terminals are called the “ID board terminals”, and the values of resistance across the terminals express basic cassette ID (BCID) information such as the tape thickness, tape type and applications. The DV cassette can optionally incorporate a memory for storing the BCID as well as the
content information. In this case, these 4 terminals are used as
the serial communication terminals for the storage and readout
of cassette memory contents.
As described above, the ID board terminals are used in two
ways and the cassettes are available as those with or without
internal memory according to the usage. The GY-DV500 and
BR-DV600 are not avairable with this optional function, so information such as the content information cannot be written or
read even when a cassette with internal memory is used with
them.
All these models do through these terminals is read the BCID of
every cassette.
Upper shell
Top lid
ID board terminal
Terminal
No
1
2
3
4
1
2
3
4
thickness
Tape type
Tape grade
Front lid
Accidental erasure
protection hole
Accidental erasure
protection tab
Open
REC possible : Close
REC impossible: Open
Lid lock
Fig. 8-1-5 External View of Mini-DV Cassette
Contents
Tape
7µm
Reserved
ME
Reserved
For cleaning
MP
For consumer
For professional
Reserved
For PC
–
LED hole
Lower shell
Close
Reel lock
MICID Board
Resistor valueFunction
Open
1.80kΩ ± 0.09k
VDD
Open
6.80kΩ ± 0.34k
1.80kΩ ± 0.09k
SDA
S/C
Open
6.80kΩ ± 0.34k
1.80kΩ ± 0.09k
SCK
S/C
–
GNDGND
Bottom lid
Tape
8-4
Table 8-1-2 ID Board Terminal Standard
(B) Main Standard of DV Tape Format (SD)
θr
Tape travel (Ts)
A ch 1
(ch 1, 2)
Sub
Code
G3
A ch 1
(ch 1, 2)
MRG
α1
α2
174°
A ch 2
(ch 3, 4)
Video
0
F0
Opt. track 2
Video
G1
ITI
9
F2
(PF1)
A ch 2
(ch 3, 4)
G3
Audio
8
F0
Sub
G2
Code
MRG
MRG
Sub
Code
G3
Video
H2
Head motion
Lr (θe)
Wt
H0
(We)
1
0
F2
F0
He
H1
(PF1)
Effective data area (NTSC: 134975 bit)
G2
Audio
G1
ITI
8
9
F0
F1
6
7
F0
F2
1 Frm (Pilot Frm 0)
525/60
ITI
5
F1
Opt. track 1
G1
Video
4
F0
Audio
ITI
Tp
MRG
Sub
Code
G3
G2
G2
Audio
G1
2
3
F2
1
F0
F1
Opt. track 2
MRG
Sub
Code
G3
MRG
Sub
Code
G3
MRG
Sub
Code
G3
MRG
Sub
Code
G3
1
F1
(PF0)
Effective data area (PAL: 134850 bit)
Audio
G1
ITI
0
11
F0
10
F2
F0
Video
G2
9
8
F1
F0
ITI
6
7
F0
F2
G2
G2
Audio
Audio
G1
G1
ITI
3
4
5
F0
F1
2
F2
F0
Video
1
F1
Video
0
F0
G2
Audio
G1
ITI
11
F2
Video
Opt. track 1
1 Frm (Pilot Frm 0)
625/50
(PF0)
Fig. 8-1-6 Main Standard of DV Tape Format (SD) (Track Pattern)
Head motion
8-5
8.2
MAJOR SIGNAL PROCESSING OPERATIONS OF DV
Fig. 8-2-1 shows the basic configuration of the major signal
processing circuitry of the DV in the form of a block diagram.
SHUFFLE
SHUFFLINGAUDIO
VIDEO
VIDEO
AUDIO
SHUFFLE
MACRO BLOCK
SHUFFLING
DESHUF.
DESHUFFLING
DESHUF.
DESHUFFLING
CODING
DCT
VLC
DECODING
I-DCT
VLD
ECC
INNER
OUTER
SHUFFLING
Fig. 8-2-1 Basic Configuration of Major Signal Processing Circuitry of DV
8.2.1Flow of Video and Audio Signals in Recording Circuitry
The video and audio signal inputs/outputs shown in Fig. 8-2-1
are sampled digital signals.
The input digital video signal is shuffled block by block, so that
deviations in the compression rate of the picture can be prevented. This is achieved by gathering data from various positions in the picture and compressing it. When the video signal is
supplied to the ECC block for error correction later, de-shuffling
is applied to return the data to the original positions in the video
signal. The shuffled video signal is coded. The coding consists
of replacing the image signal in the conversion block composed
of multiple pixels with signals without correlation by means of
discrete cosine transform (DCT).
The DCT operation consists of dividing the picture into blocks
(each composed of 8 x 8 pixels) and obtaining the transform
coefficient, which indicates the amount of components of the
previously determined basic image pattern (64 pixels) in each
block.
The VLC (Variable Length Coding) quantizes the DCT transformed
signal and applies entropy coding to the quantized signal. The
coding technique used here performs Run Length coding first,
then applies Huffman coding.
The audio input signal is also shuffled block by block in the same
way as with the video signal, but the audio signal is not compressed.
The video and audio signals are input to the ECC block, where
error-correcting codes are appended to them. The error-correcting codes use product codes obtained by double coding of the
Reed-Solomon integration codes.
The DCI-R (Digital Channel Interface for Recording) block performs channel coding of the recorded signals. It performs what
IEEE1394
I/FDV TERMINAL
TG
SERVO
DCI-R
SI-NRZI
24/25
Phase &
Amplitude
Compensator
ECC
OUTER
INNER
DESHUFFLING
DCI-PEQ
3 TO 2
1+D
VITERBI
TRACKING
Note : PAL model is not available the signal output.
See note:
REC.AMP.HEAD
PRE-AMP.
HEAD
TAPE
is usually called modulation by transforming a series of digital
data composed of “1” and “0” to match the properties of magnetic recording systems.
8.2.2Flow of Video and Audio Signals in Playback Circuitry
When a digital VCR plays a recorded signal, the frequency response of the low-frequency and high-frequency components
is degraded. The degradation in the frequency response of the
low-frequency components is because the played signal becomes a differential waveform and a rotary transformer is used.
That in the frequency response of the high-frequency components is caused by the performance of the recording tape itself
and by the gap between the tape and head during recording/
playback.
When a tape in which 1-bit pulses are recorded is played while
the frequency response is degraded, the pulse duration is expanded and intersymbol interference results. The equalizer is
used to suppress the intersymbol interference and reduce code
errors.
The DCI-P (Digital Channel Interface for Playback) block demodulates the signal, which has been coded for recording and turns it
into a signal that can be subjected to error correction.
The ECC (Error Correcting Codes) block corrects errors while
performing shuffling. The audio signal is then de-shuffled and
output. The video signal is sent to the decoding block where
the signal compressed by coding is expanded by means of decoding, and returns to a digital signal. The digital video signal is
then de-shuffled before being output.
8-6
8.3VIDEO/AUDIO SIGNAL PROCESSING IN
RECORDING CIRCUITRY
8.3.1 Division into Blocks
As shown in Fig. 8-3-1, data of each frame is divided into
macroblocks (MBs) (8 x 8 pixels) which are the basis of the DCT
circuitry. Since the luminance signal and two color difference
signals are sampled with different frequencies, 4-luminance signal blocks and each of the color difference signal blocks occupy
the same position and area in the picture. When data of any one
of the 4-luminance signal blocks is lost, the data in other blocks
becomes meaningless. Therefore, the signals of every 6 blocks
are processed as a single processing unit and recorded onto
tape.
(a) With a 525/60 system:
90 (22.5) blocks
(b) With a 625/50 system:
90 (45) blocks
8.3.2 Shuffling (Video)
The 5 macroblocks composing a single video segment is collected from a picture by means of shuffling according to a specified rule.
The objective of shuffling lies in making uniform the quantity of
information that is contained in the 5 macroblocks in a video
segment. The image information of a frame is not usually distributed evenly in the picture, but there are segments with a
large amount of information and those with a small amount of
information as shown in Fig. 8-3-2.
5MB
60 (60)
blocks
Figures inside ( ) are the number of color difference (CR, CB) blocks.
Macroblock
CB
CR
Y
8
8
888
8
8
Six blocks occupying the same position and
area in the picture form a macroblock.
8
72 (36)
blocks
8
Macroblock
CB
8
R
8
C
Y
8
8
88
8
8
Fig. 8-3-1 Division into Blocks, Macroblock
The macroblock is a group composed of the 6 blocks.
The data divided into blocks is processed by the DCT coding as
described later, and the video-recording rate is compressed to
25 Mbps. In fact, however, the data of each frame is compressed
to the specified number of bits, which are recorded onto the
specified number of tracks on tape. Namely, when 25 Mbps is
converted into the number of bits per frame, the following
number of bits are recorded onto 10 (NTSC) or 12 (PAL) tracks:
(25 x 10
(25 x 10
6
)/30 = 833333 bits ............ (NTSC)
6
)/25 = 1000000 bits .......... (PAL)
As seen above, the data in each frame is compressed so as not
to exceed the specified number of bits calculated based on the
video-recording rate. This is referred to as length fixation. Although the length of the data of a frame is fixed so as not to
exceed the calculated number of bits, the actual length fixation
is applied per 5 macroblocks. In other words, the length of the
data of 5 macroblocks is fixed so as not to exceed the following
number of bits:
833333 x 5/1350 = 3086 bits ......... (NTSC)
1000000 x 5/1620 = 3086 bits ....... (PAL)
The 5 macroblocks used as the unit of length fixation is referred
to as a video segment.
Video segment A
(Flat, sky section)
Video segment A
(Few information)
Video segment B
(Much information)
MB1
MB2
MB3
MB4
MB5
MB1
MB2
MB3
MB4
MB5
Length fixationinformation
Video segment B
(Fine branch section)
Coarse quantization for
reducing the amount of
information
Fig. 8-3-2 Distribution of Information in a Picture
As a result, if the fixed-length data obtained from 5 sequential
macroblocks uses the same number of bits for each macroblock,
distortion due to compression or expansion of data would be
noticeable. This would depend on the screen segments (distortion is less noticeable in the segments with little information
but noticeable in segments with much information). To prevent
this by making the information uniform across the video segments and making distortion less noticeable, shuffling is performed in the picture.
The shuffling is performed based on the rule shown in Fig. 8-3-
3.
Each picture is divided vertically into the same number as the
number of macroblocks in a video segment (5), and divided horizontally into the same number as the number of tracks used to
record the data of 1 frame onto tape (NTSC: 10, PAL: 12).
The blocks divided in this way are referred to as superblocks.
8-7
Sequence of MBs in a superblock
No.
0
11
12
23
1
10
2
9
3
8
4
7
5
6
24
13
22
25
14
21
26
15
20
16
19
17
18
8 pixels
2nd Field
1st Field
8 lines
Sequence of
superblocks
Order of shuffling
12 34 5
MB1 MB2 MB3 MB4 MB5
Set at a fixed length of 5 MBs.
Fig. 8-3-3 Shuffling Technique (Example with 525/60)
With shuffling, the first macroblock (No. 0) in each of the 5
superblocks are collected to form a video segment as shown in
Fig. 8-3-3, and the next video segment is formed by collecting
the first macroblocks (No. 1) of the same 5 superblocks. When
all of the macroblocks in the 5 superblocks have been collected,
data collection of the next 5 superblocks starts.
8.3.3 DCT
The video segments formed by shuffling are DCT transformed
per (8 x 8) blocks. There are two DCT transform modes including the stationary mode and the dynamic mode.
(1) Stationary mode (8 x 8 DCT mode)
This is the basic mode, executing (8 x 8) DCT transform to
every (8 x 8) pixels in blocks.
(2) Dynamic mode (2 x 4 x 8 DCT mode)
The (8 x 8) blocks are divided into two (4 x 4) blocks called
fields 1 and 2, and a (4 x 8) DCT transform is applied to every
(4 x 8) pixels.
The following paragraph explains this mode by taking the
case in which an object moves toward the right as an example.
When the object
moves horizontally:
The vertical high-frequency
component increases.
4 lines
(2 x 4 x 8)
4 lines
8 pixels
(1st Field)
8 pixels
(2nd Field)
Fig. 8-3-4 (2 x 4 x 8) DCT Transform in Motion Mode
When the object does not move, the data in the horizontal direction contains the high-frequency component but the data in
the vertical direction contains only the DC current. On the other
hand, when the object moves in the horizontal direction, the
position of the object at the time of 1st field scanning is different from that at the time of 2nd field scanning, so the picture of
the object in a frame appears to have ridged edges. The highfrequency component increases in the vertical direction data as
well as in the horizontal direction data. In such a case, the (8 x 8)
blocks are divided into two pairs of (4 x 8) blocks for fields 1 and
2, and (4 x 8) DCT transform is applied to every (4 x 8) pixels.
This method makes it possible to reduce any increases in the
high-frequency component in the vertical direction data and prevents a drop in the compression rate.
Whether a DCT transform is performed in the stationary mode
or in the dynamic mode is decided by detecting the (8 x 8) blocks
in each video segment by the motion detector circuit located
before the DCT transform circuit.
The data of a block in stationary mode is composed of one DC
component and 63 AC components. But, in dynamic mode each
one of the two (4 x 8) blocks is composed of a DC component
and 31 AC components. To allow the dynamic mode data to be
processed in the same way as the stationary mode data, the
dynamic mode processing calculates the sum and differences
of the factors of the same order and form (8x 8) blocks as shown
in Fig. 8-3-5.
(2 x 4 x 8) DCT factors
8
4
(1st field)
8
4
8
(Sum of factors)
8
(Difference of factors)
8-8
(2nd field)
Fig. 8-3-5 Processing of DCT Transform Factors in Dynamic Mode
The above processing allows both the dynamic mode data and
0
0
1
2
4
7
6
5
3
0
0
1
1
2
2
2
1
0
1
1
2
3
2
2
1
1
1
1
2
3
3
2
2
1
1
2
2
3
3
3
2
1
2
2
3
3
3
3
2
2
2
2
3
3
3
3
3
2
2
3
3
3
3
3
3
DC
0
0
1
2
2
1
1
1234567
(8 x 8) DCT
Vertical direction
0
0
1
2
3
0
1
1
2
1
1
2
2
1
2
2
2
1
2
2
3
2
2
3
3
2
3
3
3
3
3
3
3
DC
0
1
1
1234567
(2 x 4 x 8) DCT
Horizontal direction
(Sum)
4
7
6
5
0
2
1
1
1
2
2
1
1
3
2
2
2
3
2
2
2
3
3
2
2
3
3
3
3
3
3
3
0
1
1
0
Vertical direction
(Difference)
Class No.Area No.
0
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
1
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
2
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
3
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
4
4
4
4
8
8
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
4
4
4
4
8
8
8
8
2
1
1
1
1
1
1
1
1
2
2
2
2
4
4
4
4
8
8
8
8
16
16
3
1
1
1
1
1
1
1
2
2
2
4
4
4
4
8
8
8
8
16
16
16
16
Quantizer No.
Horizontal direction
stationary mode data to be handled as data composed of one
DC component and 63 AC components in the subsequent
quantization processing.
8.3.4 Quantization
(A) Data quantity estimation
After all of the blocks in a video segment have been DCT transformed, the video segment is stored in the buffer. At this time,
the data quantity estimation block selects the quantizer for use
in quantizing the video segment.
With DCT transform, normalization is performed so that the DCT
factors have a 10-bit dynamic range for the 8-bit pixel block. The
DCT factors are divided by an integer called the quantization
step and allocated to a smaller number of bits before being subjected to variable-length coding (VLC). This operation (division)
is referred to as re-quantization, or simply quantization. The
quantization reduces the values of the factors, turning many of
0 in the high frequencies. As the VLC in the subsequent stage
performs code allocation by forming value 0 run lengths and the
non-0 factor after it into a group, so the quantization improves
the coding efficiency. As the VLC consists of simply allocating
codes to factors, the code quantity (number of bits) is controlled by the VLC block.
(B) Quantization
Each quantizer is composed of a set of 4 quantization steps as
shown in Fig. 8-3-6, and the factors in a block are divided into 4
areas. The data is quantized in the 4 areas using different
quantization steps. The factors are shifted from low-frequency
factors to high-frequency factors as the area number increases,
so the quantization of high-frequency factors are coarser than
for low-frequency factors. This utilizes the fact that the distortion of high-frequency factors is less noticeable to human vision
even when they are coarsely quantized.
Fig. 8-3-6
(C) Classification
Each block in a video segment is classified into one of 4 classes
before being quantized. The quantization steps of the quantizers
vary depending on the class numbers (see Fig. 8-3-6). Table 8-31 shows the definitions of the 4 classes.
Class No.
0
1
2
3
Block with which quantization distortion after compression is highly noticeable. The absolute value of the AC
factor should not exceed 225.
Block with which quantization distortion after compression is less noticeable than Class 0. The absolute value of
the AC factor should not exceed 225.
Block with which quantization distortion after compression is less noticeable than Class 1. The absolute value of
the AC factor should not exceed 225.
Block with which quantization distortion after compression is little noticeable or the absolute value of the AC
factor exceeds 225.
Definition
Table 8-3-1 Definitions for Classification
8-9
The Table 8-3-1 means that a block with a larger class number
has a more approximate degree of definition (activity). The data
in such a block is quantized relatively roughly to compress the
data quantity. Rather approximate quantization of picture areas
with low activity does not cause odd sensations for the human
vision, but approximate quantization of those with high activity
does cause odd sensations. The blocks with low activity in each
segment are quantized as roughly as possible with a reduced
number of allocated bits, unless the odd sensations are not felt.
However, the blocks with high activity are quantized using an
increased number of allocated bits to improve the picture quality experienced by the human vision.
A block containing factors with a larger absolute AC factor than
255 is classified as Class 3, and subjected to an operation called
initial shifting before quantization. This operation divides an AC
factor over 255 by 2 so that errors do not occur in the VLC operation.
Class No.
Initial shifting
Before initial shifting
MSB=1
(ACmax > 255)
After initial shifting
0
Not applied1Not applied2Not applied
9 bits
MSBLSB
1
1-bit shift
01
The value is accommodated within 8 bits.
3
Applied
Fig. 8-3-7 Initial Shifting
While the AC factor before quantization consists of 10 bits, or 9
bits excluding the sign bit, the VLC in the subsequent stage can
codify a non-0 factor value only into an 8-bit code. Initial shifting
is performed to deal with this. As a result, a factor that has a
value in the 9th bit (SMB) (i.e. a factor over 255) can be accommodated in 8 bits and processed by VLC. Fig. 8-3-6 shows that
an increase in the class No. increases the quantization steps in
the same quantizer No. or that quantization of a larger class No.
is more approximate.
Although the quantization steps of Class 2 look less than for
those of Class 3, this is because the Class 3 data has been
subjected to initial shifting before quantization. The actual
quantization steps of Class 3 are more approximate than in those
of Class 2.
8.3.5 VLC
After division into (8 x 8) blocks by DCT transform, the energy
of pixels are concentrated in the DC component and the vertical
and horizontal vertical factors become almost null. To code these
factors, coding uses a technique for varying the code length
according to the incidence of the factors.
This coding technique varying the code length is referred to as
variable length coding (VLC).
(A) Entropy coding
As the VLC achieves efficient coding by allocating short codes
to data with a high DCT factor incidence and long codes to data
with a low incidence, it is sometimes called the entropy coding.
The entropy coding can be expressed with an entropy function.
H = 1 (max.) when Pn = 0.5 (when 0 and 1
both occur at the same probability)
H
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 0.1
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0
P
Fig. 8-3-8 Entropy Function
When VLC approaches the above function, it is regarded as coding with a high compression rate.
(B) Huffman coding
The DV uses Huffman codes to improve the compression efficiency. The Huffman codes determines the minimum average
code length of information with a known incidence probability.
Table 8-3-2 shows the method of construction of Huffman codes.
The symbols in the table should be regarded as the levels of
pixel values.
Symbol IncidenceCode Word
1
0.49
2
0.14
3
0.14
4
0.07
5
0.07
6
0.04
7
0.02
8
0.02
9
0.01
Construction Procedure
0
0
1
0
0
1
0
0
0
1
0
1
1
1
1
0
100
101
1
1100
1101
1110
11110
111110
111111
8-10
Table 8-3-2 Construction of Huffman Codes
The Huffman coding uses the following procedure.
Mode Channels Sampling Frequency
48 k
44.1 k
32 k
32 k
2
4
48 kHz
44.1 kHz
32 kHz
32 kHz
16-bit linear
12-bit nonlinear
Quantization
(1) Information source symbols 1 to 9 are arranged in the order
of incidence probability.
(2) For the symbol with the lowest incidence and that with the
second-lowest incidence, “0” is allocated to one and “1” is
allocated to the other. (Which of the two symbols is “0” and
which is “1” can be decided arbitrarily.)
(3) Then, the above two symbols are considered to be joined
together as a single symbol. The incidence probability of the
joined symbols should be equal to the sum of the incidence
probabilities of the two symbols before joining.
(4) By considering the joined symbols as new individual sym-
bols, these symbols and other symbols (which should exclude the two symbols which were joined) are rearranged in
the order of incidence probability.
(5) Steps (2) to (4) are repeated until the number of symbols
becomes 1.
(6) The values (0 or 1) allocated to the original symbols in step
(2) are read in the reverse order. The read values form the
code word for the symbols.
Transform into (0 run length + non-0 factor values)
(1, 12), (0, 5), (3, 4), (0, 3)... (EOB)
A Huffman code constructed in this way can be decoded instantaneously because, by following the bits one by one from
the first bit at the time of decoding, the end of a code word can
be determined without referring to the head of the next code
word.
As the VLC used with the DV is based on Huffman coding combined with run length coding, it is called “modified 2-dimensional Huffman coding”.
Horizontal frequency
Vertical frequency
Fig. 8-3-9 Zigzag Scanning of DCT Factors
Among the factors transformed by 2D DCT, the 63 AC factors
other than the DC component factor are rearranged by zigzag
scanning as shown in Fig. 8-3-9. When the AC factors are rearranged in the order of scanning, the result is as shown in Fig.
8-3-10, where the scanned factor series is divided into a group
of sequential 0 factors followed by a group of non-0 factors. A
code called the EOB (End Of Block) is appended after the last
non-0 factor scanned.
After the factors are transformed into a group of 0-value run
length and a group of non-0 factor values, a Huffman code table
is compiled according to the incidence probabilities of these
groups and VLC is applied based on this.
VLC
(111011s), (100s), (11011s),... (1110)
s: Sign bit
Fig. 8-3-10 2D Huffman Coding
8.3.6 Shuffling (Audio)
Unlike the video signal, the audio signal is not compressed. It is
subjected only to shuffling as preparation for the ECC.
Table 8-3-3 shows the basic modes of the DV.
Table 8-3-3 Basic Audio Modes
In the 2-channel modes, the quantization is 16-bit linear and the
sampling frequency can be selected from 48 kHz, 44.1 kHz or
32 kHz. One of the 2 channels is recorded in the first 5 tracks of
the 10 tracks of an NTSC frame (first 6 tracks of the 12 tracks of
a PAL frame) after shuffling. Therefore, when an error occurs
with a track, 1/5 (or 1/6) interpolation is applied.
8-11
8.3.7 De-shuffling
The compressed video data is recorded on tracks on tape. The
video data recording area of each track is divided into sub-units
called the sync blocks (SBs). The number of SBs where video
data is recorded is 135 per track, and the number of SBs per
frame is:
135 x 10 = 1350 ............................. (NTSC)
130 x 12 = 1620 ............................. (PAL)
These values are equal to the number of macroblocks per frame.
Therefore, the compressed video segment data (5 macroblocks)
is packed into 5 SBs and recorded.
Fig. 8-3-11 shows the packing of video segment data into 5 SBs.
<Step 1>
• The macroblock data is packed into the specified area (fixed
area) on a per-DCT block basis.
• Overflow data is stored in the macroblock memory.
• This step is performed for each macroblock.
<Step 2>
• The data in MB memory is packed in the vacant area of
the same SB.
• Overflow data is stored in the VS (Video Segment) memory.
• This step is performed for each macroblock.
<Step 3>
• The data in theVS memory is packed in the vacant areas
in the 5 macroblocks.
The section corresponding to the data area of each SB(unit) is
divided into the number of DCT blocks per macroblock (fixed
areas) and the data in the DCT blocks is packed in priority in the
fixed areas. When data packed in 5 units is recorded, it is not
recorded in 5 sequential SBs.
After the data in each video segment has been packed in 5 units,
the data is de-shuffled before being recorded. The data of a
frame, composed by the de-shuffling, is recorded sequentially
on the SBs on tape as shown in Fig. 8-3-12.
Overflow data is stored in the MB memory. (Step 1)
Data of
MB memory 1
block 1
Data of
block 3
Data in the MB memory is packed in the
vacant areas of the same SB. (Step 2)
Overflow data is stored
VS memory
Data of
MB1
Data of
MB3
in the VS memory.
Data in the VS memory is packed in the vacant
areas in the 5 MBs. (Step 3)
MB1
MB2
MB3
MB4
MB5
Fig. 8-3-11 Packing of Compressed Data in SB
1 frame (10 tracks)
Fig. 8-3-12 De-shuffled Recording (NTSC)
The reason why the shuffled data is not recorded on tape but
de-shuffled before being recorded is to make the picture quality
in variable-speed playback easier to view.
Head
Areas updated per head scan
8-12
Areas forming a group on the picture
are updated simultaneously.
Fig. 8-3-13 Picture Updating in Variable-Speed Playback
Fig. 8-3-13 shows the picture updating by data reproduced during variable-speed playback. The picture is easy to see because
the de-shuffling before recording causes the reproduced data
to form a sequential group.
8.3.8 Error Correction
Error correction consists of correcting errors occurring in the
data. For this purpose, data is provided with regularity by adding
redundant parts. The error correction codes are the mathematical systematization of how to add the redundant parts, and the
theory of this is referred to as the coding theory.
(A) Principles of error detection and error correction
With the DV, it may happen that data recorded as “0” is reproduced as “1” or data recorded as “1” is reproduced as “0” due
to thermal noise of the recording/playback amplifier or dust or
scratches on the tape surface. The error correction detects and
corrects such code error produced in the process of recording
or playback.
In the following, the principles of error detection and error correction will be explained by using codes composed by adding a
3-bit redundant part to every 2 bits of data as shown below.
With the coding theory, the part for data is called the information part, the redundant part is called the parity check part, and
the total of information part and parity check part is called the
code word. The operation of transforming the information part
into code word is referred to as coding.
Let us assume that a code word (00110) corresponding to data
(00) has been recorded and is reproduced with an error in 1 bit
(01110). When the reproduced series is compared with previously defined 4 code words, it may be known that the former
coincides with none of the latter. These are the principles of
error detection.
When the above situation is verified in more detail, it can be
established that the reproduced series (01110) is different by 1
bit from the code word (00110) and by 2 or more bits from other
code words.
The fact that a code word is most similar to the reproduced
series means that the probability that it is the recorded code
word is highest. Therefore, coding errors can be corrected by
considering that the most similar code word to the reproduced
series is the recorded code word. These are the principles of
error correction.
The operation of correcting errors in the reproduced series is
referred to as decoding.
(B) Hamming distance
The error detection and correction capabilities of error correction coding can be defined in terms of the hamming distance
between the code words. The hamming distance is the number
of dissimilar components when two code words are compared
component by component. For example, assuming that code
word (00110) is C
tance dH (C
(11111) is C
1 and code word (01101) is C2, hamming dis-
1, C2) is 3. If code word (10011) is C3 and code word
4, hamming distance dH (C3, C4) becomes 2.
Fig. 8-3-14 shows the scheme of the relationship between the
error detection and correction capabilities and the hamming distance.
Min. hamming distance
Code word
Ci
t
S
Code word
Cj
t
2t+1
Fig. 8-3-14 Expressions of Error Detection and Correction
Capabilities by Hamming Distance
To enable error correction coding, detect all of S or more errors,
the minimum value of the hamming distance between code
words (minimum hamming distance) should be S + 1. This is
because, if the minimum hamming distance is less than S, code
word Ci could be turned into another code word Ci if there are
S error items.
Similarly, to correct t or fewer errors, the minimum hamming
distance between code words should be 2t + 1 or more. This is
because, if the minimum hamming distance is less than 2t, code
word Cj which has a shorter hamming distance than Ci would
exist if t error items occur with Ci. The hamming distances between the code words of the above-mentioned code are:
d
H ((00110), (01101)) = 3
d
H ((00110), (10011)) = 3
d
H ((00110), (11111)) = 3
d
H ((01101), (10011)) = 4
d
H ((01101), (11111)) = 2
d
H ((10011), (11111)) = 2
Therefore, the minimum hamming distance between these code
words is 2. Thus, as the number of errors that can be detected
is:
S = 1 (because S + 1 = 2)
This means that the error of a bit can be detected.
It was described above that the error of a bit in code word (00110)
can be corrected. This is because the minimum hamming distance between only code word (00110) and other code words is
3. Therefore:
t = 1 (because 2t + 1 = 3)
However, correction of error of a bit is not possible with other
code words, with which only error detection is possible.
(C) Simple error correction codes
Examples of simple error correction codes will be described in
the following.
(1) Simple parity check code
The parity check code is composed of a k-bit information
part and 1-bit parity check part. The code with which parity
is selected so that the number of “1” contained in a code
word is an even number is referred to as the even parity
code. That with which parity is selected so that the number
of “1” contained in a code word is an odd number is referred to as the odd parity code. For example, a 3-bit code
may have even parity codes as follows:
(00) → (000)
(01) → (011)
(10) → (101)
(11) → (110)
All of the hamming distances between the 4 code words are 2,
so a 1-bit error occurring in any code word can be detected.
8-13
(2) Repetition code
The repetition code consists of n times of repetitions of 1bit information parts, and the number of code words is 2.
For instance, a 3-bit code has two repetition codes as follows:
(0) → (000)
(1) → (111)
The minimum hamming distance between these codes is 2,
so a 1-bit error occurring in a code word can be corrected.
(3) (7, 4) hamming code
The (7, 4) hamming code consists of a 4-bit information part
(i
3, i2, i1, i0) and a 3-bit parity check part (P2, P1, P0) which is
determined according to the following rule.
P0 = i3 + i2 + i0
P1 = i2 + i1 + i0
.... (8.3.1)
P2 = i3 + i2 + i1
In expressions (8.3.1), operators “*” represent exclusive OR,
so 0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, and 1 + 1 = 0.
The (7, 4) hamming code has 4 information bits, or 2
4
= 16
code words. Table 8-3-4 shows all of the code words.
Information Part
i
3
i
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
i
2
1
0
0
0
0
1
0
1
0
0
1
0
1
1
1
1
1
0
0
0
0
1
0
1
0
0
1
0
1
1
1
1
1
Parity Check Part
i
P
0
2
0
0
0
1
0
1
0
0
0
1
0
0
0
0
0
1
1
0
1
1
1
1
1
0
1
1
1
0
1
0
1
1
P
P
1
0
0
0
0
1
1
1
1
0
1
0
1
1
0
1
0
0
1
1
1
0
0
0
0
1
0
1
0
0
1
0
1
1
Table 8-3-4 List of Code Words of (7, 4) Hamming Code
This table shows that the minimum hamming distance of the
code is 3 and that a 1-bit error can be corrected.
(4) Basic knowledge on algebra
The examples of error correction codes used in the description of the previous section are beginner-type codes, and
their error detection and correction capabilities are insufficient for application in the DV.
The error correction codes with practical error detection and
correction capabilities for use with the DV are constructed
based on abstract algebra. In the following, the minimum
required knowledge on the algebra for understanding the
error correction will be described. If you want to gain more
knowledge, please also refer to books on mathematics.
(a) Galois field
Here, the term “field” refers to the set of elements with
which the four arithmetical operations of addition, subtraction, multiplication and division can be defined.
The term “field” is used because its functions are working
organically, and have come from German word “Körper”.
For example, a real number or complex number is a field.
However, when the term “field” is used with error correction codes, it does not mean a field with an infinite number
of elements such as a real number, but means a finite field
composed of a finite number of elements.
The “element” is what belongs to a set.
For example, if there is “a” which belongs to set “M”, it is
said, “a is an element of M”. The fact that “a” belongs to
“M” is expressed using a symbol as “a ∈ M2”. “∈“ is the
symbol of the initial letter of “element”. If “a” does not belong to “M”, it is expressed as “a ∉ M”.
A finite field is also called the Galois field, after the French
mathematician Evariste Galois (Oct. 25, 1811 - May 31, 1832)
who created one of the basic algebra theories. A Galois field
composed of P elements is expressed as GF(P).
The simplest Galois field is a GF(2) composed of elements
(0, 1). The addition in GF(2) is defined as exclusive OR and
the multiplication can be defined as multiplication of ordinary integers.
Table 8-3-5 shows the operation tables of GF(2).
0
1
0
0
1
1
1
0
0
1
0
0
1
0
0
1
(a) Addition(b) Multiplication
Table 8-3-5 Operation Tables of GF(2)
8-14
(b) Extension field
The complex number field is created by adding root “i” of
2
“x
+ 1 = 0”, the polynomial which cannot be factorized any
more in a real number field (irreducible polynomial) to a real
number field. In the same way, by adding a root of an irreducible polynomial in Galois field GF(P) to GF(P), it is possible to create a field GF(P
is equal to P
m
which is a P’s power. This operation is re-
m
) where the number of elements
ferred to as a field extension. GF(P) is called the ground field
and GF(P
m
) is called the extension field.
For example, assume that there is an irreducible cubic polynomial as follows:
3
x
+ x + 1 = 0,
If there is an extension field GF (2
3
) created by a root of the
above polynomial “α“ to GF (2), “0”, “1” and “α“ are the elements of extension field GF (2
3
). Then, since multiplication should
be definable in a field, all of α‘s powers become the elements
of GF (2
nomial x
3
). As for the α‘s powers, since “α“ is the root of poly-
3
+ x + 1 = 0,
3
α
= α + 1
Therefore, this relationship means the following:
0
α
= 1
1
α
= α
2
2
α
= α
α3 = α + 1
4
α
= (α + 1) α = α2 + α
5
α
= (α2 + α) α = α2 + α1 + 1
6
α
= (α2 + α + 1) α = α2 + 1
7
α
= (α2 +1) α = 1
:
(Repeated)
α‘s power after α
7
are repetitions of the above.
Fig. 8-3-15 shows this relationship from a different viewpoint.
In this figure, since the cycle is closed, α‘s power after α
replaced by values of α
α = α
1 = α
6
or below.
2
α
1
8
α
0
7
α
3
α
7
are
element with a period of 2
m
- 1 to GF(2). The hamming code
is a cyclic code which uses a primitive polynomial as the
generation polynomial, and its parameters are as shown
below.
Code length: n = 2
m
- 1
Information amount: k = n - m
Min. hamming distance : d
MIN = 3
A hamming code with code length of n and information
amount of k is called a (n, k) hamming code. Since the minimum hamming distance is 2, any 1-bit error occurring in the
code word can be corrected.
(7) BCH code
The BCH (Bose-Chauduri-Hocgenghem) code extends the
error correction of a Hamming code to 2 bits or more. The
BCH codes with t-bit correction become the cyclic codes of
code table n with which generation polynomial G(x) is a
minimum-order polynomial having α, α
assuming that α is a primitive root of GF(2
m
2
- 1.
2
,... α2t as the roots,
m
) and that n =
(8) Reed-Solomon code
The BCH code described in above (7) is defined on GF(2).
The Reed-Solomon code is an extension of the above to
m
GF(2
). Its error detection and correction are applied per symbol, which is an element of the extension field.
In general, a Reed-Solomon code with code length of n and
information amount of k is called the (n, k) RS code.
(9) Reed-Solomon product code
Assuming that C
1 is (n1, k1) codes and C2 is (n2, k2) codes,
the product code of them can be obtained by coding k
umns, each composed of k
k
2 rows, each composed of k1 symbols with C1. Fig. 8-3-16
2 symbols, with C2, then coding
shows the construction of product code.
1 col-
4
α
6
α
Fig. 8-3-15 Cycle of GF (2
5
α
3
)
(5) Cyclic code
The cyclic codes are important codes that can be coded or
decoded easily.
The cyclic code has the following properties:
• The sum of arbitrary code words is a code word.
• The series obtained by cyclic shifting of code words be-
comes a code word.
(6) Hamming code
When one of the roots of an irreducible m-order polynomial
in GF(2) is assumed to be α, the maximum period until α‘s
power returns to the original number is 2
m
- 1 (see Fig. 8-3-
15).
An irreducible polynomial which has a root α with which the
period is 2
m
- 1 is called the primitive polynomial, and α in
this polynomial is called a primitive element.
m
GF(2
) is an extension field created by adding a primitive
Information part
C2’s parity
check part
C1’s parity
check part
C2’s parity
check part on
C1’s parity
check part
Fig. 8-3-16 Construction of Product Codes
The C2’s parity check part on C1’s parity check part is identical
to the C1’s parity check part on C2’s parity check part. As a
result, the same product code can be obtained by coding C1
before coding C2. The DV uses the Reed-Solomon product code
obtained by a double coding of the Reed-Solomon code.
8-15
8.3.9 Data Structure
Random errors and burst errors due to dropouts on tape occur
with the magnetic recording/playback circuitry of the DV. To deal
with this, the DV records data by turning into sync blocks. Fig.
8-3-17 shows the DV data structure.
23
Sync
ID code
area
ID0,ID1,ID
135
23
Sync
ID code
ID0,ID1,ID
area
23
Sync
ID code
area
ID0,ID1,ID
Sync block length: 90 bytes
2
2
VAUX
1
11
2
9
5
2
12
Outer code parity
5
AAUX
58
AAUX
Outer code parity
Sync block length: 12 bytes
52
Sub-code data
52
Sub-code data
85
778
Video data
778
VAUX
Video data
728
Audio data
72
Audio data
7
Inner code
parity
Inner code
parity
Inner code
parity
Inner code
parity
Inner code
parity
Inner code
parity
4
GF (2
)
8
GF (2
)
8
GF (2
)
Unit: Byte
Fig. 8-3-17 Data Structure
The video and audio uses the same sync block configuration to
reduce the scale of hardware. The video or audio data forms a
sync block together with the sync area, the ID code indicating
the data attributes and the inner code parity.
The byte counts of the sync area and ID code in Fig. 8-3-17 are
the values before the 24/25 conversion (by a converter circuit
which produces digitally the pilot signal for use in tracking during playback). In practice, however, they are 24/25-converted
and become scrambled interleaved NRZI-converted signals before being recorded onto tape. The following 17-bit patterns have
been defined as sync patterns.
The ID code is composed of 3 bytes including two ID bytes and
a parity byte.
The ID bytes of a video or audio sector are:
4 bits of sequence number for indicating the continuity of frame,
4 bits of track pair number indicating the track number, and 8
bits of sync block number indicating the arrangement of sync
blocks in each track.
(1) Error correction in the ID code area
In the ID code area, (12, 8, 3) BCH codes are used for common error correction to the audio, video and sub-code sectors.
The configuration is as shown below.
Primitive polynomial: x4 + x + 1
ID
0:C15 C14 C13 C12 C11 C10 C9C8
ID1:C7C6C5C4C3C2C1C0
IDP:P7P6P5P4P3P2P1P0
MSBLSB
ID-CW
0: C14 C12 C10 C8 C6 C4 C2 C0 P6 P4 P2 P0
ID-CW1: C15 C13 C11 C9 C7 C5 C3 C1 P7 P5 P3 P1
The parity can be expressed as follows.
P
7 =C15+C11+C7 +C5
P5 =C15 +C13+C9+C5 +C3
P3 =C15 +C13 +C11+C7+C3 +C1
P1 =+C13+C9 +C7+C1
P6 =C14+C10+C6 +C4
P4 =C14 +C12+C8+C4 +C2
P2 =C14 +C12 +C10+C6+C2 +C0
P0 =+C12+C8 +C6+C0
As the contents of ID code have regularity (inertia) between
successive sync blocks, it is also possible to apply only the
error detection.
(2) Error correction in the video and audio areas
The video data and audio data are composed of Reed-Solomon product codes of GF (2
8
) as shown in Fig. 8-3-17. With
internal codes, a common code length is used to reduce the
burden on the hardware.
(a) Inner codes: same video and audio (85, 77)
α : Primitive element
Primitive polynomial: x
Generation polynomial : g = (x + 1)(x + α)(x + α
8
+ x4 + x3 + x2 + 1
2
)...(x + α7)
As 8 bytes of parity are added, up to 4 errors can be corrected.
(b) Outer codes: Video (149, 138), audio (14, 9)
Primitive polynomial: x
8
+ x4 + x3 + x2 + 1
Generation polynomials:
Video g = (x + 1)(x + α)(x + α
Audio g = (x + 1)(x + α)(x + α
2
)...(x + α10)
2
)(x + α3) (x + α4)
As 11 parity bytes are added to the video data, the maximum burst error correction length assuming that all random
errors are corrected by inner code is 85 x 11 = 935 bytes,
which corresponds to correction of error due to a defect of
about 0.3 mm in the widthwise direction of tape.
Similarly, 5 parity bytes are added to the audio data so the
maximum burst error correction is 85 x 5 = 425 bytes, which
corresponds to correction of error due to a defect to about
0.14 mm in the widthwise direction of tape.
8-16
Loading...
+ 36 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.