As a pioneer of IP video monitoring systems, Sony has incessantly innovated and
continually enhanced its line of network cameras throughout the years. Among a
*1
wealth of products, the SNC-RX550, SNC-RZ50, and SNC-CS50
TM
the Sony IPELA
family – incorporate a variety of intelligent features that provide
high-quality images and efficient operation over IP networks.
These cameras incorporate a number of unique features that have been designed
for surveillance applications, yet can also be useful in other types of monitoring
applications. Among the many features adopted by these cameras, our customers
have specifically requested more detailed information on the following:
This manual is a comprehensive guide covering the above topics and explaining
how each of the intelligent network cameras utilizes the technology, while
concurrently identifying user benefits. It has been written in a manner that is easy
to read and comprehend. Illustrations have been used to depict concepts that are
difficult to explain in words alone. And each section is written so that it can be
read independently from the rest – it is not necessary to read the document from
cover to cover. This manual is targeted at product and marketing managers,
account managers, resellers, system integrators, and end users who have a strong
desire to understand these technologies.
– which belong to
We hope that by reading through this manual, you will fully understand the
innovative technologies that Sony has incorporated in the SNC-RX550, SNC-RZ50,
and SNC-CS50 Series of network cameras. And we hope that you find these
technological benefits to be a great advantage when you think about taking your
surveillance and remote monitoring solution to the next step.
SNC-RZ50SNC-CS50SNC-RX550 (Black and White)
*1
In the following text, “SNC-RX550,” “SNC-RZ50,” and “SNC-CS50” refer to both NTSC and PAL models
(i.e. SNC-RX550N/SNC-RX550P, SNC-RZ50N/SNC-RZ50P, and SNC-CS50N/SNC-CS50P).
The Sony SNC-RX550/RZ50/CS50 Series of network
cameras is capable of encoding images using any of the
following three compression formats: JPEG, MPEG-4, and
H.264. This multi-codec capability allows users to flexibly
choose the appropriate compression format to match
their network environment and monitoring applications.
This section provides a general explanation of these three
compression formats beginning with the basics of video
compression.
Basics of Video Compression
Most practical video compression techniques are based on
lossy compression, under which there are two basic
methods of compressing video: intra-frame compression
and inter-frame compression.
Intra-frame compression is a technique that compresses
each video frame independently without reference to any
other frame of video, while inter-frame compression
makes use of data from previous and/or subsequent video
frames. Note that inter-frame compression is generally
used in conjunction with intra-frame compression.
With intra-frame compression, each frame of video is
compressed spatially (i.e. redundant or nonessential data
is removed from the image).
Inter-frame compression, however, is a technique that
compresses multiple video frames by utilizing data from
adjacent frames (i.e. temporal prediction). Inter-frame
compression takes advantage of the characteristics of
video by “capturing” only the difference between
successive frames. By doing so, redundant information
between two frames can be eliminated, resulting in high
compression ratios.
JPEG
JPEG (standardized by ISO/IEC IS 10918-1/ITU-T T.81) is
the “industry-standard” image compression format for
surveillance applications and is ideal for use when highquality still images are required. These individual still
images are captured in sequence of 30 (NTSC) or 25 (PAL)
frames per second to form video and is sometimes referred
to as “Motion JPEG.” All these images are independently
compressed using intra-frame compression (Fig. 1).
Because intra-frame compression is the only method used,
JPEG data is larger than MPEG-4 and H.264, which
employ both intra-frame and inter-frame compression
techniques.
With the SNC-RX550/RZ50/CS50 Series of network
cameras, the JPEG picture quality can be set to a level
within the range of one to ten as shown in the table
below. By presetting the picture quality level, these
cameras output images with a “near-constant” data size,
meaning that the data size fluctuates about a pre-defined
constant value. This is useful for calculating the required
storage capacity and bandwidth for streaming JPEG
images over a network.
Before looking at the MPEG-4 compression format
adopted by these cameras, it is important to clarify the
term “MPEG-4.” MPEG-4 is a series of standards
developed by ISO/IEC MPEG (Motion Pictures Experts
Group) and has many “Parts,” “Profiles,” and “Levels”
related to multimedia content. Among these “Parts,”
“Profiles,” and “Levels,” the SNC-RX550/RZ50/CS50
Series of network cameras employs MPEG-4 Part 2
(ISO/IEC 14496-2) Simple Profile Level 3, and MPEG-4 Part
10 (ISO/IEC 14496-10), which is also called H.264 and
was jointly developed with ITU-T. In the following text,
“MPEG-4” refers to MPEG-4 Part 2 Simple Profile Level 3
and “H.264” refers to MPEG-4 Part 10.
Structure of MPEG-4
Let’s take a look at the structure of MPEG-4. A video
“frame” in MPEG-4 is referred to as a Video Object Plane
(VOP). There are two types of VOPs: an I-VOP (initial) and a
P-VOP (predictive). A Group of VOPs (GOV) consists of an
I-VOP and several P-VOPs. In these cameras, a GOV makes
up one second
An I-VOP is compressed using the intra-frame compression
technique and is similar to a single JPEG image. This initial
“frame” of a GOV is often called an “anchor.” I-VOPs are
much larger in data size than P-VOPs; however, they are
essential in the GOV structure, and are required when
searching image data.
P-VOP data is generated by predicting the difference
between the “current image” and the previously encoded
I-VOP or P-VOP (reference frame). This is performed using
inter-frame compression. As explained in the section on
“Basics of Video Compression,” this method of prediction
takes advantage of the video property that two consecutive
“frames” are very similar. Because P-VOP data contains
information related only to the difference between two
frames (i.e. VOPs) and not the image data itself, the data size
of P-VOPs are greatly reduced when compared to I-VOPs.
*2
of video (Fig. 2).
P-VOPs and Motion Compensation
“Motion Compensation” is the key to predicting
movement within an image and forming P-VOP data to
efficiently compress MPEG-4 video. This section briefly
introduces this technique.
As described above, P-VOP data is generated by predicting
the difference between the previous VOP (reference VOP)
and the current image that is input from the camera. To
predict this movement, “blocks” consisting of 16 x 16
pixels, called macroblocks are first formed within the
image. Next, motion vectors are calculated based on the
predicted movement within each macroblock. The
prediction process is such that the movement within each
macroblock between the reference VOP and the current
image is compared. The resultant “shift” of the
comparison is represented as a motion vector.
*3
In MPEG-4, sub-blocks consisting of 8 x 8 pixels within
the 16 x 16 macroblocks can also be used to predict the
current VOP (Fig. 3). The smaller the “frame” is divided,
the more accurately movement can be predicted, which
can result in an even higher compression ratio.
Fig. 3 MPEG-4 Motion Compensation Blocks
Fig. 2 MPEG-4 GOV Structure
*2
The default GOV setting of SNC-RX550/RZ50/CS50 Series of network cameras is one second. The length of a GOV can be set between one and five
seconds.
*3
The actual prediction process utilizes a number of feedback loops and complicated algorithms including triggers to reset the I-VOP when there are
extreme movement patterns. This method helps to accurately produce motion vectors. Further technical details are beyond the scope of this paper.
4
H.264
40
Video Parameters:
•10 frames/s
•QCIF (176 x 144 pixels)
•10 seconds of video (100 frames)
JPEG
PSNR
(dB)
Bit rate (Kb/s)
38
36
35
34
32
30
28
0100200300
H.264
MPEG-4
16 pixels
16 pixels
8 pixels
4 pixels 8 pixels 4 pixels
4 pixels
4 pixels
8 pixels
16 pixels
16 pixels
8 pixels
8 pixels
8 pixels
JPEG/MPEG-4/H.264 Comparison
H.264 (or MPEG-4 Part 10) has been developed with the
aim of providing high-quality video at a much lower bit rate
than MPEG-4. A number of techniques for achieving
efficient compression are incorporated in H.264. One major
contributing factor is the improvement in motion prediction.
As in the case of MPEG-4, each image is divided into
blocks to predict movement. However, with H.264, the
block patterns can be a 16 x 16 pixel macroblock or any
combination of the seven options shown in Fig. 4 (e.g. 4 x
4 sub-blocks in the upper right quadrant of the
macroblock, an 8 x 8 sub-block in the upper left
quadrant, and an 8 x 16 sub-block in the lower half, as
shown in Fig. 5). The block pattern is variably determined
depending on the amount and speed of movement within
the image. If an area of the image has little movement,
the algorithm utilizes large blocks (such as 16 x 16 pixels
or 8 x 8 pixels) to predict the difference between the
previous VOP and the current image. However, where an
area of the image includes significant motion, the
algorithm utilizes smaller blocks for prediction. By
dynamically adapting the size of each block to the
amount of motion, the prediction accuracy for each block
is significantly improved. Because the predicted data is
more accurate, less image data needs to be transmitted;
therefore, compression efficiency is greatly improved
when compared to MPEG-4.
Though motion prediction using variable block sizes
increases prediction accuracy and minimizes the amount
of data to be transmitted, it does require greater
processing power within the codec.
The difference between JPEG, MPEG-4, and H.264
compression formats has been explained in the above
sections. Here, let‘s relate picture quality to transmission
bit rate.
Fig. 6 is a graph depicting the picture quality vs. the bit
rate of these three compression formats.
*4
The vertical
axis (PSNR level) expresses the picture quality, and the
horizontal axis expresses the transmission bit rate. PSNR
(Peak Signal-to-Noise Ratio) is a metric widely used by
engineers to measure the “quality” of compressed video
images.
At a PSNR of 35 dB, JPEG images are transmitted at
approximately 260 Kb/s, while MPEG-4 transmits at
approximately 85 Kb/s and H.264 transmits at 50 Kb/s. To
put this into perspective, MPEG-4 requires approximately
one-third of the bandwidth used by JPEG, and H.264
requires just one-fifth.
In summary, both MPEG-4 and H.264 are ideal for image
transfer over a network because they require much less
network bandwidth than JPEG.
Fig. 4 H.264 Motion Compensation Blocks
Fig. 5 H.264 Combination Block Pattern
*4
The graph shows just one example of comparing bit rates at which JPEG, MPEG-4, and H.264 images can be transmitted. Actual bit rates for
transmitting data using these three compression formats differ with image quality and image size settings.
Fig. 6 Comparison Between H.264, MPEG-4,
and JPEG (picture quality vs. bit rate)
5
Loading...
+ 11 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.