Xilinx is providing this product documentation, hereinafter “Information,” to you “AS IS” with no warranty of any kind, express or implied.
Xilinx makes no representation that the Information, or any particular implementation thereof, is free from any claims of infringement. You
are responsible for obtaining any rights you may require for any implementation based on the Information. All specifications are subject to
change without notice.
XILINX EXPRESSLY DISCLAIMS ANY WARRANTY WHATSOEVER WITH RESPECT TO THE ADEQUACY OF THE INFORMATION OR
ANY IMPLEMENTATION BASED THEREON, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OR REPRESENTATIONS THAT
THIS IMPLEMENTATION IS FREE FROM CLAIMS OF INFRINGEMENT AND ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE.
Except as stated herein, none of the Information may be copied, reproduced, distributed, republished, downloaded, displayed, posted, or
transmitted in any form or by any means including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without
the prior written consent of Xilinx.
UG805 March 1, 2011www.xilinx.comVideo Scaler v4.0 User Guide
About This Guide
The LogiCORE™ IPVideo Scaler v4.0 User Guide provides information about generating the
Video Scaler core, customizing and simulating the core using the provided example
design, and running the design files through implementation using the Xilinx tools.
Guide Contents
This manual contains the following chapters:
•Chapter 1, Introduction introduces the Xilinx Video Scaler core and provides related
information, including recommended design experience, additional resources,
technical support, and submitting feedback to Xilinx.
•Chapter 2, Overview illustrates examples of video scaler applications.
•Chapter 3, Implementation elaborates on the internal structure in the core and
describes interfacing.
•Chapter 4, Video I/O Interface and Timing describes how to drive the input timing
signals so the scaler can be operated correctly. It also describes the data output signals
and their relation to the output data.
•Chapter 5, Scaler Architectures describes Single-engine for sequential YC processing,
Dual Engine for parallel YC processing, and Triple engine for parallel RGB/4:4:4
processing.
•Chapter 6, Control Interface discusses the three control interface options available to
the user in CORE Generator™ software: EDK pCore, GPP and Constant.
•Chapter 7, Scaler Aperture explains how to define the scaler aperture using the
appropriate dynamic control registers.
•Chapter 8, Coefficients describes the coefficients used by both the Vertical and
Horizontal filter portions of the scaler, in terms of number, range, formatting and
download procedures.
•Chapter 9, Performance emphasizes the importance of available clock rate and
provides some worst-case conversion examples.
•Appendix A, Use Cases illustrates two likely usage scenarios for the video scaler.
•Appendix B, Programmer Guide provides a description of how to program and
control the data flow for the video scaler hardware pCore.
•"Appendix C, System Level Design provides an example design extracted from a
known, working EDK project, including other Video IP blocks.
Preface
Video Scaler v4.0 User Guidewww.xilinx.com11
UG805 March 1, 2011
Preface: About This Guide
Additional Resources
To find additional documentation, see the Xilinx website at:
To search the Answer Database of silicon, software, and IP questions and answers, or to
create a technical support WebCase, see the Xilinx website at:
http://www.xilinx.com/support/mysupport.htm
.
This document uses the following conventions. An example illustrates each convention.
The following typographical conventions are used in this document:
ConventionMeaning or UseExample
Messages, prompts, and
Courier font
Courier bold
Helvetica bold
program files that the system
displays
Literal commands that you enter
in a syntactical statement
Commands that you select from
a menu
Keyboard shortcutsCtrl+C
speed grade: - 100
ngdbuilddesign_name
File Open
Italic font
Dark Shading
Square brackets [ ]
Braces { }
Vertical bar |
Angle brackets < >
Variables in a syntax statement
for which you must supply
values
References to other manuals
Emphasis in text
Items that are not supported or
reserved
An optional entry or parameter.
However, in bus specifications,
such as bus[7:0], they are
required.
A list of items from which you
must choose one or more
Separates items in a list of
choices
User-defined variable or in code
samples
ngdbuild design_name
See the User Guide for more
information.
If a wire is drawn so that it
overlaps the pin of a symbol, the
two nets are not connected.
This feature is not supported
ngdbuild [option_name]
design_name
lowpwr ={on|off}
lowpwr ={on|off}
<directory name>
12www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
ConventionMeaning or UseExample
Conventions
Vertical ellipsis
.
.
.
Horizontal ellipsis . . .
Notations
Online Document
The following conventions are used in this document:
ConventionMeaning or UseExample
Blue text
Blue, underlined text
Repetitive material that has
been omitted
Repetitive material that has
been omitted
The prefix ‘0x’ or the suffix ‘h’
indicate hexadecimal notation
An ‘_n’ means the signal is
active low
Cross-reference link to a location
in the current document
Hyperlink to a website (URL)
IOB #1: Name = QOUT’
IOB #2: Name = CLKIN’
.
.
.
allow block block_name loc1
loc2 ... locn;
A read of address 0x00112975
returned 45524943h.
usr_teof_n is active low.
See Chapter 3, Basic
Architecture for details.
See Additional Resources,
page 12,” for details.
Go to www.xilinx.com
latest speed files.
for the
Video Scaler v4.0 User Guidewww.xilinx.com13
UG805 March 1, 2011
Preface: About This Guide
14www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Introduction
This chapter introduces the Video Scaler core and provides related information, including
recommended design experience, additional resources, technical support, and submitting
feedback to Xilinx. See www.xilinx.com/products/ipcenter/EF-DI-VID-SCALER.htm
About the Core
The Video Scaler core is a Xilinx CORE Generator™ IP core, included in the latest IP
Update on the Xilinx IP Center
Scaler product page.
Recommended Experience
Although the Video Scaler core is a fully verified solution, the challenge associated with
implementing a complete design varies depending on the configuration and functionality
of the application. For best results, previous experience building high performance,
pipelined FPGA designs using Xilinx implementation software and UCF is recommended.
Chapter 1
.
. For detailed information about the core, see the Video
Contact your local Xilinx representative for a closer review and estimation for your specific
requirements
Additional Core Resources
For detailed information about video scaler technology and updates to the Video Scaler
core, see the following:
Documentation
From the Video Scaler product page:
•Video Scaler Data Sheet
•Video Scaler Release Notes
Technical Support
For technical support, visit www.xilinx.com/support. Questions are routed to a team of
engineers with expertise using the Video Scaler core.
Xilinx will provide technical support for use of this product as described in the
LogiCORE™ IP Video Scaler User Guide. Xilinx cannot guarantee timing, functionality, or
support of this product for designs that do not follow these guidelines.
Video Scaler v4.0 User Guidewww.xilinx.com15
UG805 March 1, 2011
Chapter 1: Introduction
Providing Feedback
Xilinx welcomes comments and suggestions about the Video Scaler core and the
documentation supplied with the core.
Core
For comments or suggestions about the Video Scaler core, submit a WebCase from
www.xilinx.com/support
•Product name
•Core version number
•Explanation of your comments
Documentation
For comments or suggestions about this document, submit a WebCase from
www.xilinx.com/support
•Document title
•Document number
•Page number(s) to which your comments refer
•Explanation of your comments
. Be sure to include the following information:
. Be sure to include the following information:
Nomenclature
The following are defined for the purposes of this document:
Table 1-1:Nomenclature
TermDefinition
Scaler Aperture The input data rectangle used to create the output data rectangle.
Filter Aperture The group of contributory data used in a filter to generate one
particular output. The number of elements in this group of data is
the number of taps. We define the filter aperture size using the
num_h_taps and num_v_taps parameters.
Coefficient Phase Each tap is multiplied by a coefficient to make its contribution to
the output pixel. The coefficients used are selected from a “phase”
of num_x_taps coefficients. The phase selection is dependent
upon the position of the output pixel in the input sampling grid
space. For each dimension of the filter, each coefficient phase
consists of num_h_taps or num_v_taps coefficients.
Channel For scaler purposes, all monochromatic video streams, for example
Y, Cb, Cr, R, G, B, are all considered separate channels.
Coefficient Phase Index An index given that selects the coefficient phase applied to one
filter aperture in a FIR. For an n-tap filter, this index points to n
coefficients.
16www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Nomenclature
Table 1-1:Nomenclature
TermDefinition
Coefficient Bank A group of coefficients that will be applied to one video component
(Y or C) in one dimension (H or V) for a conversion of one frame. It
includes all phases. For an n-tap, m-phase filter, a coefficient bank
comprises nxm values. Each tap may be multiplied by any one of
m coefficients assigned to it, selected by the phase index, which is
applied to all taps.
Coefficient Set A group of four coefficient banks (VY, VC, HY, HC). One full set
should be written into the scaler before use.
Video Scaler v4.0 User Guidewww.xilinx.com17
UG805 March 1, 2011
Chapter 1: Introduction
18www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Overview
Video scaling is the process of converting an input color image of dimensions Xin pixels by
Y
in
Within predefined limits, the Xilinx Video Scaler supports the modification of the X
X
out
dynamically crop selected subject area from the input image prior to scaling that area. This
dynamic combination lends itself well to applications that require shrink and zoom
functionality.
The Xilinx Video Scaler supports real-time video inputs and memory interface inputs (that
is, a frame buffer). When connected to a real-time input source, the input clock and
horizontal and vertical (H/V) timing signals come directly from the input video stream. In
the case of a memory interface, standard memory handshaking signals may be used in
place of the H/V timing signals.
While maintaining image quality is usually of primary interest, it is subjective and heavily
dependent upon the end application. Moreover, image quality comes at a price in terms of
FPGA resources. Hence, while the core structure and architecture of the scaler is
maintained for all applications, flexibility is made paramount to enable users from all
applications to use this IP.
Chapter 2
lines to an output color image of dimensions X
, Y
input parameters during run-time on a frame basis. Furthermore, you may also
out
pixels by Y
out
out
lines.
, Yin,
in
Video Scaler v4.0 User Guidewww.xilinx.com19
UG805 March 1, 2011
Chapter 2: Overview
20www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Implementation
Video Rectangle In
(Dimensions X
in XYin)
Video Rectangle Out
(Dimensions Xout XYout)
Video Scaler
UG_07_031909
This section elaborates on the internal structure in the core, and describes interfacing.
Basic Architecture
The Xilinx Video Scaler LogiCORE™ IP converts a specified rectangular area of an input
digital video image from the original sampling grid to a desired target sampling grid
(Figure 3-1).
X-Ref Target - Figure 3-1
Chapter 3
Figure 3-1:High Level View of the Functionality
The input image must be provided in raster scan format (left to right and top to bottom).
The valid outputs will also be given in this order.
The Xilinx Video Scaler makes few assumptions regarding the origin or the destination of
the video data. The input could be fed in real-time from a live video feed, or it could be
read from an external memory. The output could feed directly to another processing stage
in real time, but also could feed an external frame buffer (for example, for a VGA controller,
or a Picture-in-Picture controller). Whatever the configuration, you must assess, given the
clock-frequency available, how much time is available for scaling, and define:
1.Whether to source the scaler using live video or an input-side frame buffer, and
2.Whether the scaler feeds out directly to the next stage or to an output-side frame
buffer.
When using a live video input source, you have no control over the video timing signals.
Hence, the specific requirements must allow for this. For example, when up-scaling by a
factor of 2, two lines must be output for every input line. The scaler core clock-rate (‘clk’)
must allow for this, especially considering the architectural specifics within the scaler that
take advantage of the high speed features of the FPGA to allow for resource sharing.
Feeding data from an input frame buffer is more costly, but allows you to read the required
data as needed, but still have one “frame” period in which to process it.
Video Scaler v4.0 User Guidewww.xilinx.com21
UG805 March 1, 2011
Chapter 3: Implementation
$ATA&LOW
#ONTROL&LOW#LOCKS
VIDEO?IN?CLK
ACTIVE?VIDEO?IN
LINE?REQUEST
HBLANK?INVBLANK?IN
7RITESIDECONTROL
VIDEO?DATA?IN
OOEDIVKLC?NIOEDIVUT?CLK
VIDEO?DATA?OUT
2EADSIDECONTROL
VIDEO?OUT?CLK
VIDEO?OUT?ALMOST?FULL
VIDEO?OUT?WE
#LK
#ONTROL
3TATE-ACHINES
!SYNC)NPUT
,INE"UFFER
!SYNC/UTPUT
,INE"UFFERS
3CALER-ODULE
5'???
3CALER#ORE
Some observations (not exclusively true for all conversions):
•Generally, when up-scaling, or dealing with high definition (HD) rates, it is simplest
to use an input-side frame buffer. This does depend upon the available clock rates.
•When down-scaling, it is often the case that the input-side frame buffer is not
required, because for every input line the scaler is required to generate a maximum of
one valid output line.
•Generally, the output data does not conform to any standard. It is therefore not
possible to feed the output directly to a display driver. Usually, a frame buffer is
ultimately required to smooth the output data over an output frame period. The
output video stream is described later.
I/O Buffering, Clock Domains
Figure 3-2 shows the top level buffering, indicating the different clock domains, and the
scope of the control state-machines.
X-Ref Target - Figure 3-2
Figure 3-2:Simplified Top Level Block Diagram, Indicating Clock-domains
To support the many possibilities of input and output configurations, and to take
advantage of the fast FPGA fabric, the scaler core uses a separate clock domain from that
used in controlling data I/O. More information is given in Chapter 9, Performance about
how to calculate the minimum required operational clock frequency. It is also possible to
read the output of the scaler using a 3rd clock domain. These clock domains are isolated
22www.xilinx.comVideo Scaler v4.0 User Guide
from each other using asynchronous line buffers as shown in Figure 3-2. The control state-
machines monitor the I/O line buffers. They also monitor the current input and output line
numbers.
UG805 March 1, 2011
Video I/O Interface and Timing
CORE Generator™ software provides two interface options for provision of the video data
into the video scaler core.
1.Live – standard format video signal, along with synchronization signals to be driven
directly into the core.
2.Memory – an internal memory arbiter is included in the core, so the active video area
may be accessed from an external memory block.
Data Source: Live Video
Input Data and Timing Signals
•General Input Handshaking Principles
•Hblank_in Input
•Vblank_in Input
•Frame_rst Signal
•Active_video_in Input
Chapter 4
General Input Handshaking Principles
The input data is written into an internal double-buffered line buffer. Availability of space
for one entire line of data is indicated by a high level on the line_request output. One
line of data, of a length up to max_samples_in_per_line, may be written to this buffer
without the need for further arbitration. Following the first valid pixel-write operation to
this line buffer, the line_request output will be driven low by the scaler. This signal
may rise a few (> 3) clock cycles later to indicate availability of the other half of the double
buffer. The number of clock cycles is dependent on the current conversion.
Video Scaler v4.0 User Guidewww.xilinx.com23
UG805 March 1, 2011
Chapter 4: Video I/O Interface and Timing
Valid video data is written into the input line buffer using active_video_in as a writeenable. This is shown in Figure 4-1 for the 8-bit 4:2:2 case The active_video_in signal must remain in a high state for the duration of the active input line.
X-Ref Target - Figure 4-1
video_in_clk
line_request
active_video_in
video_data _in (7:0) (Luma)
video_data_in (15:8) (Chroma)
Cb
Y
Y
0
1
Cr
0
0
YnY
n+1Yn+2Yn+3
CbnCrnCb
n+2Crn+2
Y
size-1
Cr
size-2
UG678_5-1_081809
Figure 4-1: Scaler 8-bit 4:2:2 Input Timing
The scaler is capable of accepting and delivering 4:4:4 (e.g., RGB), 4:2:2, and 4:2:0 chroma
formats. It will not convert between chroma formats. For delivery of 4:4:4 video data, a
third channel would be added to this diagram, and the three channels would be either R,
G, and B or Y, Cb, and Cr. It is necessary to clarify the I/O format. For bandwidth, 4:2:0 is
essentially the same as 4:2:2 horizontally, but is half the bandwidth vertically. Different
signaling is required for the delivery of the YC4:2:2: and YC4:2:0 chroma systems. The
luma (Y) input is a full bandwidth 8-bit input on video_data_in[7:0]. The chroma for
both 4:2:0 and 4:2:2 is also a full-bandwidth input on
video_data_in[(data_width*2)-1:data_width], but Cb and Cr are interleaved
on a pixel basis, as shown in Figure 4-1 for the 8-bit case. An additional input
active_chroma_in is required in the 4:2:0 case. This must be asserted high on all lines
for 4:2:2, but only for alternate lines for 4:2:0, as shown in Figure 4-2.
X-Ref Target - Figure 4-2
chroma_in
video_data_in (7:0)_(Luma)
video_data_in (15:8)_(Chroma)
Line1
Valid
Line2
N/V
Line3
Valid
Line4
N/V
When running the scaler using Live Mode, you are likely to derive the active_video_in
from timing signals such as horizontal sync or embedded flags like EAV and SAV. In this
case, you will have calculated that the line-rate at the input, often defined by the input
video format, is sufficiently low that the host system will never need to wait for the
line_request signal to be asserted.
However, in contrast, you may calculate that this is not possible, and that the scaler must
hold off the input data. The line_request flag deasserted state should be used to hold
off the write-operation for a new line. Since it is impossible to hold off a live video feed, the
data must be fed (directly or indirectly) from a frame buffer, and the appropriate external
control provided (Memory Mode).
The horizontal blanking input signal hblank_in is generally used as a line-based reset. It
must be provided to the scaler core in the same clock domain as the video data
(video_in_clk).
The hblank_in signal is used to perform the following operations:
•Reset an internal input pixel counter.
•Reset the internal input side line buffer write-address pointer.
•Increment the input line counter (rising edge of hblank_in).
•Decode the input line count during active data period to open and close an internal
processing “window.”
•Decode the input line count to create a delayed internal frame-based reset signal
(frame_rst) during vblank_in. The line-number is specified in the CORE
Generator GUI (Frame Reset line Number).
The timing of hblank_in must satisfy the following criteria:
•It must be low for the active-data duration of the input line.
•It must be high for a period greater than or equal to 100 video_in_clk-cycles in
duration, once per line. This allows the scaler time to handle inherent line-based
latency in the filters.
•It must be low for a period greater than or equal to 32 video_in_clk-cycles in
duration, once per line.
The hblank_in input must be tied to the horizontal blanking signal provided with the
input video stream. Also, you may choose to use the inverse of hblank_in to create the
active_video_in signal (see the Active_video_in Input section).
Vblank_in Input
The vertical blanking input signal vblank_in is generally used as a frame-based reset. It
must be provided into the scaler core on the same clock domain as the video data
(video_in_clk).
The vblank_in signal is used to perform the following operations:
•Reset input line counter (both edges).
•Generate internal frame-based reset signal (frame_rst) during vertical blanking.
In Live Video mode, Frame Reset Line Number must be set to a value that is lower than
the number of line periods for which vblank_in remains high between frames. To
characterize this further, hblank_in must transition high a larger number of times than Frame Reset Line Number while vblank_in is high.
The vblank_in input must be tied to the vertical blanking signal provided with the input
video stream.
Frame_rst Signal
To maximize robustness of the scaler core, it is preferable to reset internal state-machines,
FIFOs and other processes once per frame. Owing to inherent multi-line period latency in
the system, it is not possible to use the vbank_in for this purpose. During vblank_in, hblank_in must continue to be active (as per most video formats). Frame_rst is
generated when the number of hblank_in pulses equals Frame Reset Line Number
Video Scaler v4.0 User Guidewww.xilinx.com25
UG805 March 1, 2011
Chapter 4: Video I/O Interface and Timing
specified in the CORE Generator/EDK GUI. Figure 4-3 is a screen shot from simulation,
showing the relationship between vblank_in, hblank_in and Frame_rst. The line
count shown is an internal counter included in this image for clarity. To achieve the case
illustrated, enter the value 22 into the CORE Generator GUI or pCore GUI.
The Frame_rst signal is used to perform the following operations:
•Trigger the transfer of coefficients from the coefficient FIFO to the coefficient stores if and only if a full set of coefficients exists in the FIFO.
•Trigger the transfer of control register values from the scaler core pins to internal
“active” registers, ready for use during the next frame. Setting bit 1 of the Control
register to 0 prevents this transfer from happening.
•Reset read- and write-pointers of input and output line buffers.
•Reset internal state-machine to indicate next input line as the top line in a frame.
Active_video_in Input
The active_video_in signal is generally used as an input data validation signal. It must
be provided into the scaler core on the same clock domain as the video data
(video_in_clk).
The timing of active_video_in must satisfy the following criteria:
•The first low-to-high transition will coincide with the first active data value for the
current line.
•This signal must be low when hblank_in is high.
•Following the transition from low to high, active_video_in must not transition
low during the active period of the current line. Following a high-to-low transition, a
pulse on the hblank_in signal must occur as described previously in the Hblank_in
Input section.
•For each line, while hblank_in = 0, the active_video_in signal must remain high
for at least ApertureEndPixel+1 cycles. For example, to scale an entire 720P image, set
ApertureStartPixel = 0, ApertureEndPixel=1279.
If hblank_in is driven high before this has occurred, the line will not be
acknowledged by the scaler. This parameter is provided as an input to the scaler by the
user.
You may choose to use the inverse of hblank_in to create the active_video_in signal.
26www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Data Source: Memory
This mode is primarily intended for use with a memory controller with rectangular access
capability such as the VFBC port on the MPMC. The VFBC port must be configured to
provide the amount of data that the scaler is expecting for each frame. The port must
contain sufficient buffering for at least one horizontal line of the input video rectangle.
When this video interface mode has been selected in CORE Generator, hblank_in, vblank_in, and active_video_in timing signals are not required. Also, the video data
must be fed into the scaler core via the rd_data port instead of the video_data_in port.
The rd_almost_empty signal must be asserted when the port has less than one line
available in the buffer.
When rd_almost_empty is low and the scaler is ready to accept a new line of input data,
it asserts the rd_re signal high. This signal will remain high for the duration of one line
period (determined by aperture_start_pixel and aperture_end_pixel). The first
(left-most) valid data pixel must be driven onto the rd_data port one clock cycle after
rd_re has been asserted. See Figure 4-4.
X-Ref Target - Figure 4-4
Data Source: Memory
Figure 4-4: Interface Timing for Memory Source Mode
It is important for the scaler core to have a concept of frame synchronization so that topedge filtering may be performed cleanly. For this purpose, you must also supply a vertical
synchronization pulse vsync_in once per frame, before the input of the top line. Only the
rising edge of vsync_in is used internally. It should be provided in the video_in_clk
domain.
In this mode, cropping is not possible within the scaler itself as in Live Video mode.
aperture_start_pixel and aperture_start_line must be set to 0. Cropping can
be achieved using memory offsets. The first pixel and line provided to the scaler will
always be included in the horizontal and vertical apertures.
Video Scaler v4.0 User Guidewww.xilinx.com27
UG805 March 1, 2011
Chapter 4: Video I/O Interface and Timing
VIDEO?OUT?CLK
VIDEO?OUT?WE
VIDEO?DATA?OUT,UMA
VIDEO?DATA?OUT#HROMA
6ALID
6ALID
.OT6ALID
.OT6ALID
6ALID
6ALID
5'??
Output Data and Timing Signals
Although driving the scaler input using a direct standard video feed is supported, the
equivalent cannot be said for the scaler output. Because of the bursty nature of the vertical
filter portion of the scaling operation, the required size of the output buffering would be
prohibitive. This would be more aptly targeted to an external memory interface, which is
beyond the scope of this LogiCORE™ IP. However, the user may decide that his system
can directly handle the bursty data output from the scaler, provided valid data is indicated
by the core. Consequently, simple hand-shaking is achieved using the video_out_we and
video_out_almost_full signals.
When a line of data becomes available in the output buffer, and the
video_out_almost_full flag is low, the video_out_we flag is asserted as shown in
The video_out_almost_full input is provided to throttle the output from the scaler.
When this is asserted high for a number of line periods, the line_request signal will be
deasserted due to back-pressure through the scaler. If video_out_almost_full is low
at the start of an output line, the entire line will be delivered. The target must de-assert
video_out_almost_full when it is ready to accept the entire line.
Upon completion of the final line requested according to the output_v_size parameter,
the scaler will send a pulse of six video_out_clk cycles on the output_frame_done
signal.
For 4:2:0 outputs, the valid chroma data output will be accompanied by a high level on the
chroma_out signal as shown in Figure 4-6.
The scaler supports the following possible arrangements of the internal filters.
•Option 1: Single-engine for sequential YC processing
•Option 2: Dual Engine for parallel YC processing
•Option 3: Triple engine for parallel RGB/4:4:4 processing
When using RGB/4:4:4, only Option 3 can be used. Selecting Option 1 or Option 2
significantly affects throughput trading versus resource usage. These three options are
described in detail in this chapter.
Architecture Descriptions
Single-Engine for Sequential YC Processing
Chapter 5
This is the most complex of the three options because Y, Cr, and Cb operations are
multiplexed through the same filter engine kernel.
One entire line of one channel (for example luma) is processed before the single-scaler
engine is dedicated to another channel of the same video line. The input buffering
arrangement allows for the channels to be separated on a line-basis. The internal data path
bit widths are shown in Figure 5-1, as implemented for a 4:2:2 or 4:2:0 scaler. DataWidth
may be set to 8, 10, or 12 bits.
X-Ref Target - Figure 5-1
2*DataWidth
The scaler module is flanked by buffers that are large enough to contain one line of data,
double buffered.
At the input, the line buffer size is determined by the parameter
max_samples_in_per_line. At the output, the line-buffer size is determined by the
parameter max_samples_out_per_line. These line buffers enable line-based
arbitration, and avoid pixel-based handshaking issues between the input and the scaler
core. The input line buffer also serves as the “most recent” vertical tap (that is, the lowest
in the image) in the vertical filter.
Input Line
Buffer
Figure 5-1: Internal Data Path Bitwidths for Single-Engine YC Mode
1*DataWidth1*DataWidth
Scaler
Output Line
Buffer (Y)
1*DataWidth
Output Line
Buffer (Cb/Cr)
2*DataWidth
UG_16_031909
Video Scaler v4.0 User Guidewww.xilinx.com29
UG805 March 1, 2011
Chapter 5: Scaler Architectures
Ou tputLine
LineBuffer
ScalerEngine
Ou tputLine
Input LineBu ffer
ScalerEngine
Ou tputLine
Ch1In pu tLine
Buffer
Sc alerEngine
(
)
OutputLine
Buffer
Sc alerEngine
(
)
(
Buffer
Sc alerEngine
(
)
4:2:0 Special Requirements
When operating with 4:2:0, it is also important to include the following restriction: when
scaling 4:2:0, the vertical scale factor applied at the vsf input must not be less than
20
(2
)*144/1080. This restriction has been included because Direct Mode 4:2:0 requires
additional input buffering to align the chroma vertical aperture with the correct luma
vertical aperture. In a later release of the video scaler, this restriction will be removed.
Dual-Engine for Parallel YC Processing
For this architecture, separate engines are used to process Luma and Chroma channels in
parallel as shown in Figure 5-2.
X-Ref Target - Figure 5-2
video_data_in
1*DataWidth
2*DataWidth
1*DataWidth
Luma(Y)Input
Chro ma (Cr/Cb)
Figure 5-2: Internal Data Path Bitwidths for Dual-Engine YC Mode
1*DataWi dth
1*DataWi dth
(Y)
(C)
1* DataWidth
1* DataWidth
Buffer(Y)
Buffer (C)
1*DataWi dth
video_da ta_out
2* DataWi d th
1*DataWidth
For the Chroma channel, Cr and Cb are processed sequentially. Due to overheads in
completing each component, the chroma channel operations for each line require slightly
more time than the Luma operation. It is worth noting also that the Y and C operations do
not work in synchrony.
Triple-Engine for RGB/4:4:4 Processing
For this architecture, separate engines are used to process the three channels in parallel, as
shown in Figure 5-3.
X-Ref Target - Figure 5-3
vi deo _da ta_in video_da ta_out
1*DataWidth
3*DataWidth
1*DataWidth
1* DataWidth
1* DataWidth
1* DataWidth
Buffer (Ch1)
Buffer (Ch2)
Ou tputLine
Buffer
Ch3)
1*DataWidth
1*DataWidth
Ch2In pu tLine
Ch3In pu tLine
1*DataWidth
Ch1
1*DataWidth
Ch2
1*DataWidth
Ch3
Figure 5-3: Internal Data Path Bitwidths for Triple-Engine RGB/4:4:4 Architecture
For this case, all three channels are processed in synchrony.
3* DataWi d th
30www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
GUI Operation
X-Ref Target - Figure 5-4
GUI Operation
When the chroma format is specified as 4:4:4, the triple-engine parallel architecture is
always selected. Otherwise, selection between the YC Sequential or Parallel options may
be achieved automatically (YC Filter Configuration = Auto Select) or manually in the
CORE Generator GUI or the EDK GUI (see Figure 5-4).
The primary goal of selecting the correct architecture is to optimize resource usage, for a
given worst case operational scenario. When Auto Select is selected, the GUI tries to
establish what the user's worst case is from the following input parameters:
•Input maximum rectangle size
•Output maximum rectangle size
•Target Clock-frequency
•Desired Frame rate
Figure 5-4: Auto Select in GUI
The pseudo-code calculation made by the GUI for the Auto Select option is as follows:
if (TgtFrameRate <= MaxFrameRateOneComponent/2) then
Use Single engine
else
Use Dual engine
end if;
Video Scaler v4.0 User Guidewww.xilinx.com31
UG805 March 1, 2011
Chapter 5: Scaler Architectures
The Information tab (see Figure 5-5) in the CORE Generator GUI (not available in EDK
GUI) shows the estimated maximum achievable frame-rate given the above information,
using a similar calculation as above. The user is advised to take a look at this value, and
may elect to force the GUI one way or the other. This may be advisable in cases where, for
example, a higher overhead per frame than 15% is needed. This overhead is intended as a
general way of representing inactive periods in a frame such as blanking, but also includes
filter flushing time, state-machine initialization, etc.
X-Ref Target - Figure 5-5
Figure 5-5: CORE Generator GUI Information Tab
32www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Control Interface
)2*]
1__
____
([
20
sizehoutput
pixelstartaperturepixelendaperture
roundhsf
)2*]
1__
____
([
20
sizevoutput
linestartaperturelineendaperture
roundvsf
There are three control interface options available in CORE Generator™ software: EDK
pCore, GPP or Constant. The interface types differ primarily in the method of delivery of
the user-defined control values and filter coefficients. These values are listed in the video
scaler data sheet DS
Control Values
There follows a brief description of the function of the control values.
In GPP mode and pCore mode, these values are provided as dynamic inputs, and may be
changed during runtime – the user inputs become active once per frame after completion
of an output frame, using an internal active value capture register.
For the pCore version of the core, CORE Generator software provides the GPP core placed
in a wrapper which allows you to parameterize the scaler core in EDK. The ports are
driven by registers that sit on the AXI4-Lite. The address is decoded in the wrapper. A
MicroBlaze™ processor software driver is provided in source-code form to drive these
ports. Typical usage of the pCore is shown in Figure 6-1.
840 in Table 2, under “Dynamic Control Register Interface.”
These parameters define the size and location of the input rectangle. They are
explained in detail in Chapter 7, “Scaler Aperture.”
•output_h_size, output_v_size
These two parameters define the size of the output rectangle. They do not determine
anything about the target video format. You must determine what do with the scaled
rectangle that emerges from the scaler core.
•hsf, vsf
These are the horizontal and vertical shrink-factors that must be supplied the user.
They should be supplied as integers, and can typically be calculated as follows:
and
Video Scaler v4.0 User Guidewww.xilinx.com33
UG805 March 1, 2011
Chapter 6: Control Interface
Hence, up-scaling is achieved using a shrink-factor value less than one. Down-scaling is
achieved with a shrink-factor greater than one.
You may wish to work this calculation backwards. For a desired scale-factor, you may wish
to calculate the output size or the input size. This is application-dependent. Smooth
zoom/shrink applications may take advantage of this approach, coupled with usage of the
following start-phase controls described below.
The allowed range of values on these parameters is 1/12 to 12: (0x015555 to 0xC00000).
•num_h_phases, num_v_phases
Although you must specify the maximum number of phases (max_phases) that the core
supports in the CORE Generator GUI, it is not necessary to run the core with a filter that
has that many phases. Under some scaling conditions, you may want a large number of
phases, but under others you may need only a few, or even only one. Non power-of-two
numbers of phases are supported.
•coef_wr_addr, h_coeff_set, v_coeff_set
In GPP and pCore interfaces, you may load coefficients. The scaler can store up to
max_coef_sets coefficient sets internally. coef_wr_addr sets the set location of the set
to which you intend to write. The set may subsequently be used by controlling the
h_coeff_set and v_coeff_set values.
These are the start-phase controls. Internally to the core, the scaler accumulates the 24-bit
shrink-factor (hsf, vsf) to determine phase and filter aperture. These four values allow you
to preset the fractional part of the accumulations horizontally (hpa) and vertically (vpa) for
luma (y) and chroma (c).
When dealing with 4:2:2, luma and chroma are always vertically cosited. Hence the
start_vpa_c value is ignored.
Usage of these parameters is important for scaling interlaced formats cleanly. On
successive input fields, the start_vpa_y value needs to be modified.
Also, when the desired result is a smooth shrink or zoom over a period of time, you may
get better results by changing these parameters for each frame.
The allowed range of values on these parameters is -0.99 to 0.99: (0x100001 to 0x0FFFFF).
The default value for these parameters is 0.
•control
The control register contains only two active bits. The default value for the control register
during continuous operation is “0x3.”
•bit 0 is a general purpose enable. Activated/deactivated on a vblank_in basis, a
value of 0 disables the scaler output.
•bit 1 enables values on the other register inputs to become internally active on a
vblank_in basis. A value of 0 prevents the active internal values from being
changed.
34www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Constant (Fixed) Mode
When using this mode, the values are fixed at compile time. The user system does not need
to drive any of the parameters. The CORE Generator GUI prompts you to specify:
•coefficient file (.coe)
•hsf
•vsf
•aperture_start_pixel
•aperture_end_pixel
•aperture_start_line
•aperture_end_line
•output_h_size
•output_v_size
•num_h_phases
•num_v_phases
Constant mode has the following restrictions:
Constant (Fixed) Mode
•A single coefficient set must be specified using a .coe file; this is the only way to
populate the coefficient memory.
•Coefficients may not be written to the core; the coef_wr_addr control is disabled.
•You may not specify h_coeff_set or v_coeff_set; there is only one set of
coefficients.
•You may not specify start_hpa_y, start_hpa_c, start_vpa_y, start_vpa_c;
they are set internally to zero.
•The control register is always set to “0x00000003,” fixing the scaler in active mode.
General Purpose Processor (GPP) Interface
This interface type exposes all control ports to the user. You are responsible for driving
these ports. Xilinx recommends that GPP mode be used only by experienced scaler users.
Figure 6-1 indicates how the EDK pCore is effectively a wrapper around the GPP mode
core. This should be considered as an example of how you may choose to wrap the GPP
mode core to suit any processor.
In GPP mode, the control values may be changed during runtime – the user input control
values become active once per frame after completion of an output frame, using an internal
active value capture register.
Coefficient Delivery for GPP Interface
In this mode, you must supply all coefficients to the core. See Chapter 8, “Coefficients,” for
all details regarding coefficient loading in GPP mode.
Video Scaler v4.0 User Guidewww.xilinx.com35
UG805 March 1, 2011
Chapter 6: Control Interface
EDK pCore Interface
In contrast to GPP Mode and Constant Mode control interfaces, when you select this
control interface option in CORE Generator, no netlist is created. Instead, a database is
generated containing the necessary files for use in an EDK project. This database includes:
<component_name> -> drivers-> scaler_v3_01_a -> data -> scaler_v2_1_0.mdd
1.Copy the /drivers/scaler_v3_01_a sub-directory from the CORE Generator database
to the /drivers directory in your EDK project repository.
36www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
2.Copy the /pcores/axi_scaler_v4_00_a sub-directory from the CORE Generator
database to the /pcores directory in your EDK project repository.
All VHDL files are encrypted. Do not attempt to modify these files.
Parameter Modification in CORE Generator
When "EDK pCore" is selected in the CORE Generator GUI, all parameters are greyed-out.
The user must use the EDK GUI to parameterize the core.
Scaler Software Driver
All files provided by CORE Generator software under the drivers directory are tested SW
drivers for the video scaler. They are unencrypted c-code which you may adapt for your
own environment. This is intended for a memory-mapped system. The register map for the
scaler registers is given in Appendix B, “Programmer Guide.”
Coefficient Delivery for EDK pCore Interface
Delivery of coefficients to the hardware core is achieved exactly as is described for the GPP
Interface (see Chapter 8, “Coefficients,” for full details). However, the pCore wrapper and
software driver mask you from the detail described.
Interrupts
Interrupts
There are six interrupts:
1.intr_output_frame_done – Issued once per complete output frame.
2.intr_reg_update_done – Issued during Vertical blanking when the register values
have been transferred to the active registers.
3.intr_input_error – Issued if active_video_in is asserted before the scaler is ready
to receive a new line.
4.intr_output_error – Issued if frame period completes before full output frame has been
delivered.
5.intr_coef_wr_error – Issued if coefficient is written into coefficient FIFO when the
FIFO is not ready.
6.intr_coef_fifo_rdy – High when the coefficient FIFO is ready to receive a coefficient for
the current set; stays low once a full set has been written into FIFO; sent high during
Vertical blanking.
7.intr_coef_mem_rdbk_rdy - Sent low after CoefMemRdEn (control register bit (3)) is
written low. Two frames after CoefMemRdEn is written high, this signal is driven high
again.
In GPP mode, all seven interrupts are active.
In Constant mode, only intr_input_error, intr_output_error and intr_output_frame_done are active.
Video Scaler v4.0 User Guidewww.xilinx.com37
UG805 March 1, 2011
Chapter 6: Control Interface
-ICRO"LAZE
)NTERRUPT
#ONTROLLER
6IDEO3CALER
P#ORE
)NTERRUPT
#ONTROLLER
0ERIPHERAL0ERIPHERALN
)NTERRUPTS
!8),ITE
6IDEO
3CALER
'00
Inside the pCore wrapper, an Interrupt Controller (Xilinx Interrupt Control LogiCORE™
(DS516
microprocessor must then read the interrupt status registers to establish the nature of the
interrupt. The interrupt registers are defined in Appendix B, “Programmer Guide.” A
generic n-peripheral system is shown in Figure 6-1. It shows the intended usage of
interrupts in an EDK-based system. It also shows how the Xilinx Interrupt Controller is
used internally to the pCore along with the scaler in GPP mode.
X-Ref Target - Figure 6-1
)) collates these interrupts into one interrupt on the AXI4-Lite bus. The
Figure 6-1: Typical EDK-based System Showing Interrupt Structure
38www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Scaler Aperture
This section explains how to define the scaler aperture using the appropriate dynamic
control registers. The aperture is defined relative to the input timing signals.
Input Aperture Definition
It is vital to understand how to specify the scaler aperture properly. The scaler aperture is
defined as the input data rectangle used to create the output data rectangle. The input
values aperture_start_line, aperture_end_line, aperture_start_pixel and aperture_end_pixel need to be driven correctly.
To scale from a rectangle of size 1280x720, they should be set as follows:
aperture_start_pixel0
aperture_end_pixel 1279
Chapter 7
X-Ref Target - Figure 7-1
aperture_start_line 0
aperture_end_line 719
It is also important to understand how “line 0” and “pixel 0” are defined to ensure that
these values are entered correctly. Line 0 is defined as the first active line following a rising
edge in active_video_in. An internal line counter is decoded to signal internally that
the current line is indeed line 0. This line counter is reset on a falling edge of vblank_in.
It increments on a rising edge of hblank_in.
One situation that needs to be avoided is the counter effectively starting at 1 instead of 0.
This will cause no video output. The correct relationship between input hblank_in and
vblank_in to avoid this situation is shown in Figure 7-1. The falling edge of vblank_in
occurs while hblank_in is still high.
Figure 7-1: Hblank_in at Falling Edge of VBlank_in
Video Scaler v4.0 User Guidewww.xilinx.com39
UG805 March 1, 2011
Chapter 7: Scaler Aperture
Pixel 0 is defined as the first active pixel after the rising edge of active_video_in. This
is indicated in Figure 7-2. The value 128 is used as the default value in video_data_in
during blanking. In this example, the first pixel in the horizontal scaler aperture is the first
active pixel in the input line.
X-Ref Target - Figure 7-2
Figure 7-2: Active_video_in in Relation to First Active Sample
Cropping
When using “Live” mode, you may choose to select a small portion of the input image. To
achieve this, set the aperture_start_line, aperture_end_line, aperture_start_pixel and aperture_end_pixel according to your requirements.
For example, from an input which is 720P, you may want to scale from a rectangle of size
80x60, starting at (pixel, line) = (20, 32). Set the following:
X-Ref Target - Figure 7-3
aperture_start_pixel 20
aperture_end_pixel 99
aperture_start_line 32
aperture_end_line 91
Figure 7-3 shows the opening of an internal processing window signal
(t_verticalwindow) with the preceding cropping settings. A similar operation occurs in
the horizontal domain. A useful developer note is that if the largest input rectangle is
cropped from the input, then this size may be used in deciding the
max_pixels_in_per_line parameter. This may save block RAM usage in some cases.
Figure 7-3: Cropping from the Input Image
When using “Memory” mode, cropping must be achieved by selecting the appropriate
rectangular area from memory. aperture_start_pixel and aperture_start_line
must be set to zero.
40www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Coefficients
This section describes the coefficients used by both the Vertical and Horizontal filter
portions of the scaler, in terms of number, range, formatting and download procedures.
Coefficient Table
One single size-configurable, block RAM-based, Dual Port RAM block stores all H and V
coefficients combined, and holds different coefficients for luma and chroma as desired.
This coefficient store may be populated with active coefficients as follows:
•Using the Coefficient Interface (see Coefficient Interface).
•By preloading using a .coe file
Coefficients that are preloaded using a .coe file remain in this memory until they are
overwritten with coefficients loaded by the Coefficient Interface. Consequently, this is not
possible when using Constant mode. Preloading with coefficients allows the user an easy
way of initializing the scaler from power-up.
Chapter 8
When using pCore or GPP interfaces, you may want more than one coefficient set from
which to choose. For example, it may be necessary to select different filter responses for
different shrink factors. This is often true when down-scaling by different factors to
eliminate aliasing artifacts. The user may load (or preload using a .coe file) multiple
coefficient sets.
The number of phases for each set may also vary, dependent upon the nature of the
conversion, and how you have elected to generate and partition the coefficients. The
maximum number of phases per set defines the size of the memory required to store them,
and this may have an impact on resource usage. Careful selection of the parameters
max_phases and max_coef_sets is paramount if optimal resource usage is important.
Each coefficient set is allocated an amount of space equal to 2
fixed parameter that is defined at compile time. However, it is not necessary for every set
to have that many phases. The number of phases for each set may be different, provided
you indicate how many phases there are in the current set being used, by setting the input
register values num_h_phases, and num_v_phases accordingly. Without setting these
correctly, invalid coefficients will be selected by the phase accumulators.
Horizontal filter coefficients are stored in the lower half of the coefficient memory. Vertical
filter coefficients are stored in the upper half of the coefficient memory. For each of the H and V sectors, luma coefficients occupy the lower half and chroma coefficients occupy the
upper half. This method simplifies internal addressing. When the chroma format is set to
4:4:4., one set of coefficients will be shared between all three channels (i.e., R, G, and B will
be scaled identically).
max_phases
. Max_phases is a
Video Scaler v4.0 User Guidewww.xilinx.com41
UG805 March 1, 2011
Chapter 8: Coefficients
31150
Valid - Coefficient n+1
Valid - Coefficient n
16-bit Coefficients
UG_28_031909
If the user specifies in the CORE Generator or EDK GUI that the Luma and Chroma filters
share common coefficients, then there is no coefficient memory space available for chroma
coefficients. In this case, the user must not load chroma coefficients using the Coefficient
interface, and must not specify chroma coefficients in the .coe file.
Similarly, if the user has specified in the CORE Generator or EDK GUI that the Horizontal
and Vertical filters share common coefficients, then there is no coefficient memory space
available for Vertical coefficients. In this case, the user must not load Vertical coefficients
using the Coefficient interface, and must not specify Vertical coefficients in the .coe file.
Note:
taps.
This option is only available if the number of horizontal taps is equal to the number of vertical
Coefficient Interface
The scaler uses only one set of coefficients per frame period. To change to a different set of
stored coefficients for the next frame, use the h_coeff_set and v_coeff_set dynamic
register inputs.
You may load new coefficients into a different location in the coefficient store during some
frame period before they are required. You may load a maximum of one coefficient set
(including all of HY, HC, VY, VC components) per frame period. Subsequently, this
coefficient set may be selected for use by controlling h_coeff_set and v_coeff_set.
Filter Coefficients may be loaded into the coefficient memory using the coefficient memory
interface. This comprises:
coef_data_in(31:0)32-bit coefficient input bus
coef_wr_enCoefficient write-enable
coef_set_wr_addr(3:0)Coefficient set write address
The 32-bit input word always holds two coefficients. The scaler supports 16-bit coefficient
bit-widths. The word format is shown in Figure 8-1.
X-Ref Target - Figure 8-1
Figure 8-1: Coefficient Write-Format on coef_data_in(31:0)
42www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Coefficient Interface
vblank_in
Coefficient Load
Control SM
Coefficient Load
FIFO
Coefficient Storecoef_data_in(31:0)
coef_set_wr_addr(3:0)
coef_wr_en
Coefficient Write
Address
Coefficients
to filters
Video Scaler
Por t A
Operational Read
Address (V Filter)
Operational Read
Address (H Filter)
Por t B
UG678_7-3_081809
An address-multiplexer is used to support the coefficient write interface as shown in
Figure 8-2. The coefficient write-address is multiplexed with the coefficient read-address
for the vertical filter to create the address for Port A on the dual-port coefficient RAM.
Consequently, coefficients must be loaded into the coefficient stores when no active video
scaling is occurring. It is only possible, therefore, to load the coefficients during the vertical
blanking period. Since this would be an impossible burden on a processor, an external
block RAM FIFO has been provided to which you load your coefficients during one frame
period, as shown in Figure 8-2. Following a latency period after the positive transition of
vblank_in, any new coefficient set is streamed into the internal coefficient store for use
by the filter in the next frame.
X-Ref Target - Figure 8-2
Figure 8-2: Coefficient Loading Mechanism, Including External FIFO
A waveform indicating the coefficient loading process is shown in Figure 8-3.
The coefficient memory interface is an asynchronous interface. A high level on the
coef_wr_en signal is used to capture the coefficients delivered on coef_data_in as
shown in Figure 8-3. An internal state-machine detects the 3rd ‘clk’ period when
coef_wr_en is stable and high. At this point, the data is registered into the FIFO. Xilinx
recommends that the high coef_wr_en pulse be no less than the equivalent of 6 ‘clk’ periods in duration. It is required that it also be low for a period no less than 6 ‘clk’ periods
between write operations.
The guidelines are as follows:
•The address coef_set_addr for all coefficients in one set must be written via the
normal register interface.
•coef_data_in delivers two coefficients per 32-bit word. The lower word (bits 15:0)
always holds the coefficient that will be applied to the latest tap (that is, spatially
speaking, the right-most or lowest). The word format is shown in Figure 8-1.
•All coefficients for one phase must be loaded sequentially via coef_data_in,
starting with coef 0 and coef 1 [coef 0 is applied to the newest (right-most or lowest)
input sample in the current filter aperture]. See Figure 8-3. For an odd number of
coefficients, the final upper 16 bits is ignored.
•All phases must be loaded sequentially starting at phase 0, and ending at phase
(max_phases-1). This must always be observed, even if a particular set of
coefficients has fewer active phases than max_phases.
•For RGB/4:4:4, when not sharing coefficients across H and V operations, for each
Video Scaler v4.0 User Guidewww.xilinx.com43
UG805 March 1, 2011
dimension, one bank of coefficients must be loaded into the FIFO before they can be
streamed into the coefficient memory. When sharing coefficients across H and V
operations, it is only necessary to write coefficients for the H operation. This process is
permitted to take as much time as desired by the user system. This means that worst
Chapter 8: Coefficients
Coefs 0,1coef_data_in
coef_wr_en
Coefs 2, 3Coefs 4, 5Coefs 6, 7
UG_30_031909
case, for a 12H-tap x 12V-tap 64-phase filter, you need to write 6 times per phase. If the
user has specified separate H and V coefficients, this is a total of 768 write operations
per set.
•For YC4:2:2 or YC4:2:0, when not sharing coefficients across H and V operations or
across Y and C operations, one bank of luma (Y) and chroma (C) coefficients must be
loaded into the FIFO for each dimension before they can be streamed into the
coefficient memory. When sharing coefficients across H and V operations, it is only
necessary to write coefficients for the H operation. Also, when sharing coefficients
across Y and C operations, it is only necessary to write coefficients for the Y operation.
This process is permitted to take as much time as desired by the user system. This
means that worst case, for a 12H-tap x 12V-tap 64-phase filter, you need to write 6
times per phase. If the user has specified separate H and V coefficients and separate Y
and C coefficients, this is a total of 1536 write operations per set.
•Writing a new address to coef_set_addr resets the internal state-machine that
oversees the coefficient loading procedure. An error condition will be asserted if the
loading procedure comes up less than 2 x max_phases*Max(num_h_taps, num_v_taps) when coef_set_addr is updated.
Examples of Coefficient Set Generation and Loading
As mentioned, when data is fed in raster format, coefficient 0 is applied to the lowest tap in
the aperture for the Vertical filter or for the right-most tap in the Horizontal filter.
Following are a few examples of how to generate some coefficients and translate them into
the correct format for downloading to the scaler.
Example 1: Num_h_taps = num_v_taps = 8; max_phases = 4
Tab le 8- 1 shows a set of coefficients drawn from a sinc function.
In this example, a 32-point 1-D sinc function has been sub-sampled to generate four phases
of eight coefficients each. Sub-sampling in this way usually results in a phases whose
component coefficients rarely sum to 1.0 – this will cause image distortion. The example
MATLAB
express them as the 16-bit integers required by the hardware. For this process,
coef_width = 16. Note that this is only pseudo code. Generation of actual coefficients is
®
m-code that follows shows how to normalize the phases to unity and how to
44www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Examples of Coefficient Set Generation and Loading
beyond the scope of this document. Refer to Answer Record 35262 and Filter Coefficient
Calculations for more information on coefficient generation for the video scaler.
% Subsample a Sinc function, and create 2D array
x=-(num_taps/2):1/num_phases:((num_taps/2)-1/num_phases);
coefs_2d=reshape(sinc(x), num_phases, num_taps)
format long
% Normalize each phase individually
for i=1:num_phases
sum_phase = sum(coefs_2d(i,:));
for j=1:num_taps
norm_phases(i, j) = coefs_2d(i, j)/sum_phase;
end
% Check - Normalized values should sum to 1 in each phase
norm_sum_phase = sum(norm_phases(i,:))
end
% Translate real to integer values with precision defined by coef_width
int_phases = round(((2^(coef_width-2))*norm_phases))
This generates the 2D array of integer values shown (in hexadecimal form) in Tab le 8- 2.
The 16-bit coefficients must be coupled into 32-bit values for delivery to the HW. The
resulting coefficient file for download is shown in Ta bl e 8- 3.
The coefficients must be downloaded in the following order:
1.Horizontal Luma (always required)
2.Horizontal Chroma (required if not sharing Y and C coefficients)
3.Vertical Luma (required if not sharing H and V coefficients)
4.Vertical Chroma (required if not sharing H and V coefficients, and also not sharing Y
and C coefficients)
Table 8-3: Example 1 Coefficient Set Download Format
Horizontal Filter Coefficients for LumaHorizontal Filter Coefficients for Chroma
Examples of Coefficient Set Generation and Loading
Example 2: Num_h_taps = num_v_taps = 8;
max_phases = 5, 6, 7 or 8; num_h_phases = num_v_phases = 4
If the max_phases parameter is greater than the number of phases in the set being loaded,
load default coefficients into the unused locations. Example 2 is an extended version of
Example 1 to show this. Ta bl e 8- 4 shows the same 4-phase coefficient set loaded into the
scaler when num_h_phases = 4, num_v_phases = 4 and max_phases is greater than
4(max_phases = 5, 6, 7 or 8, num_h_taps = 8, num_v_taps =8).
Note that:
1.If max_phases is not equal to an integer power of 2, then the number of phases to be
loaded is rounded up to the next integer power of 2. See Example 2 (Tabl e 8 -4 ). Unused
phases should be loaded with zeros.
2.The number of values loaded per phase is not rounded to the nearest power of 2. See
Example 3 (Ta bl e 8 -7 ).
Table 8-4: Example 2 Coefficient Set Download Format
Horizontal Filter Coefficients for LumaHorizontal Filter Coefficients for Chroma
Now consider the case where the number of taps in the Horizontal dimension is different
to that in the Vertical dimension. For this case, when loading the coefficients for the
dimension for which the number of taps is smaller, each phase of coefficients must be
padded with zeros up to the larger number of taps.
Example coefficients are shown in hexadecimal form in Tab le 8 - 5 (horizontal) and Ta bl e 8 -6
(vertical).
To preload the scaler with coefficients (mandatory when in Constant mode), you must
specify, using the CORE Generator GUI or the EDK GUI, a .coe file that contains the
coefficients you want to use. It is important that the .coe file specified is in the correct
format. The coefficients specified in the .coe file become hard-coded into the hardware
during synthesis.
Generating .coe Files
Generating .coe files can be accomplished by either extracting coefficients from a file
provided with the core (refer to the next section) or developing your own set of
coefficients. Developing your own coefficients is a very complex and subjective operation,
and is beyond the scope of this document. Refer to Answer Record 35262
Coefficient Calculations for more information on generating video scaler coefficients.
Extracting Coefficients From xscaler_coefs.c File
The pCore version of the video scaler includes a software driver. The coefficients are
included in this driver in the xscaler_coefs.c file. The pCore version of the core can be
generated by selecting "EDK pCore" in the CORE Generator GUI. Coefficients from this file
can be extracted manually; however, it is important to know the format of this file.
and Filter
All coefficients required for any conversion are provided with the SW Driver. The filename
is xscaler_coefs.c. You may modify this file, and the driver code that reads the
coefficients from it, as you see fit.
The file defines 19 “bins” of coefficients. You must select which bin to use according to
your application. In the delivered driver, the file xscaler.c includes a function called
XScaler_CoeffBinOffset, which assesses the scaling requirements specified by you
(for example, input/output rectangle sizes) and calculates which bin of coefficients is
required. In this driver, the bins have been allocated as per Ta bl e 8 -8 . This function may be
used independently for all Horizontal, Vertical, Luma, and Chroma filter operations.
Table 8-8:Coefficient “Binning” in SW Driver (xscaler_coefs.c)
Bin #
1SF<1All up-scaling cases
1+Ceil((output_size*16)/input_size)
(bins 2 to 17)
For example:
• Down-scaling 1920 to 1440: use bin
13
• Down-scaling 1080 to 1000 : Use
bin 16
• Down-scaling 1080 to 144 : Use
bin 4
SF=input_size/
output_size
1<SF<16
(All down-
scaling cases)
Comments
General down-scaling coefficients
Down-scaling filter coefficients include
anti-aliasing characteristics that differ
according to scale-factor
52www.xilinx.comVideo Scaler v4.0 User Guide
18N/AUnity coefficient in center tap
19
1920/1280
(1080/720)
Example user-specific case for HD down
scaling conversion
UG805 March 1, 2011
Coefficient Preloading Using a .coe File
Within each “bin,” four further levels of granularity can be observed. In order of
decreasing size of granularity, these levels are:
•Number of taps defined
•Number of phases defined
•Phase number (one line in file)
•Tap number (one element of each line), newest (right-most or lowest) first
For example, the first set of coefficients, defined for two taps and two phases, is given as:
// bin # 1; num_taps = 2; num_phases = 2
1018, 15366,
8192, 8192
The second set of coefficients, defined for two taps and three phases, is given immediately
afterwards as:
/* bin # 1; num_taps = 2; num_phases = 3 */
1018, 15366,
5852, 10532,
10532, 5852,
And so forth.
Format for .coe Files
The guidelines for creating a .coe file are as follows:
•Coefficients may be specified in either 16-bit binary form or signed decimal form.
•First line of a 16-bit binary file must be memory_initialization_radix=2;
•First line of a signed decimal file must be memory_initialization_radix=10;
•Second line of all .coe files must be memory_initialization_vector=
•All coefficient entries must end with a comma (",") except the final entry which must
end with a semicolon ";".
•Final entry must have a carriage return at the end after the semicolon.
•All coefficient sets must be listed consecutively, starting with set 0.
•All sets in the file must be of equal size in terms of the number of coefficient entries.
•Number of coefficient entries in all sets depends upon:
•Max_coef_sets
•Max_phases
•Max_taps (=max(num_h_taps, num_v_taps))
•User setting for "Separate Y/C coefficients"
•User setting for “Chroma_format”
Video Scaler v4.0 User Guidewww.xilinx.com53
UG805 March 1, 2011
Chapter 8: Coefficients
•User setting for "Separate H/V coefficients"
The simplest method is to specify an intermediate value num_banks:
num_banks=4;
if (Separate H/V coefficients = 0) then
num_banks := num_banks/2;
end;
if (Separate Y/C coefficients = 0) or (chroma_format=4:4:4)
then
num_banks := num_banks/2;
end;
Consequently, the number of entries in the .coe file can be defined as:
num_coefs_in_coe_file = max_coef_sets x num_banks x max_phases x
max_taps
•Within each set, coefficient banks must be specified in the following order:
Table 8-9:Ordering of Coefficients in .coe File for Different Coefficient Sharing
Options
Separate Y/C Coefficients Separate H/V CoefficientsBank Order in .coe File
TrueTrueHY, HC, V Y, V C
TrueFals eH, V
FalseTrueY, C
FalseFalseSingle set only
•Within each bank, all phases must be listed consecutively, starting with phase 0,
followed by phase 1, etc.
•The number of phases specified (per bank) in the .coe file must be equal to
Max_Phases, even for filters that use fewer phases. Set all coefficients in unused
phases to 0 (decimal) or 0000000000000000 (16b binary).
•Within each phase, all coefficients must be listed consecutively. The first specified
coefficient for any phase represents the value applied to the newest (rightmost or
lowest) tap in the aperture.
Tab le 8- 10 shows an example of a .coe file with the following specification:
num_h_taps = num_v_taps = 12;
max_phases = 4;
max_coef_sets = 1;
Separate H/V Coefficients = False;
Separate Y/C Coefficients = False;
54www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Coefficient Preloading Using a .coe File
Both signed decimal and 16-bit binary forms are shown.
Table 8-10: .coe File Example 1
PhaseTapFile line-numberLine text (signed decimal form)Line text (16-bit binary form)
Tab le 8- 11 shows an example of a .coe file with the following specification:
num_h_taps = 12, num_v_taps = 12;
max_phases = 4;
max_coef_sets = 2;
Separate H/V Coefficients = True;
Separate Y/C Coefficients = True;
56www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Coefficient Preloading Using a .coe File
Just signed decimal form is shown. For clarity's sake, the same coefficient values have been
used for each bank. Be aware that these are not realistic coefficients. Also note that this list
includes ellipses to show continuation, and that it does not include a complete set of
coefficients.
Table 8-11: .coe File Example 2
SetBankPhaseTapFile line-numberLine Text
N/A1memory_initialization_radix=10;
2memory_initialization_vector=
00 (HY)0030,
00 (HY)014162,
00 (HY)0250,
00 (HY)036-1069,
00 (HY)0………
00 (HY)101528,
00 (HY)1116155,
00 (HY)1217-186,
00 (HY)…………
00 (HY)303973,
00 (HY)314072,
00 (HY)3………
00 (HY)3115028,
01 (HC)00510,
01 (HC)0152162,
01 (HC)02530,
0…… … ……
01 (HC)308773,
01 (HC)318872,
01 (HC)3………
01 (HC)3119828,
02 (VY)00990,
02 (VY)01100162,
02 (VY)021010,
0…… … ……
02 (VY)3013573,
02 (VY)3113672,
02 (VY)3………
Video Scaler v4.0 User Guidewww.xilinx.com57
UG805 March 1, 2011
Chapter 8: Coefficients
Table 8-11: .coe File Example 2
02 (VY)31114628,
03 (VC)001470,
03 (VC)01148162,
03 (VC)021490,
0…… … ……
03 (VC)3018373,
03 (VC)3118472,
03 (VC)3………
03 (VC)31119428,
10 (HY)001950,
10 (HY)01196162,
10 (HY)021970,
10 (HY)…………
10 (HY)31124228
11 (HC)002430,
1…… … ……
12 (VY)002910,
1…… … ……
13 (VC)3037573,
13 (VC)3137672,
13 (VC)3………
13 (VC)31138628;
----387“”
Tab le 8- 12 shows an example of a .coe file with the following specification:
num_h_taps = 4, num_v_taps = 3;
max_phases = 4;
max_coef_sets = 1;
Separate H/V Coefficients = True;
Separate Y/C Coefficients = False;
58www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Just signed decimal form is shown.
Table 8-12: .coe File Example 3
Coefficient Preloading Using a .coe File
BankPhaseTap
File line-
number
Line TextNotes
N/A1memory_initialization_radix=10;
2memory_initialization_vector=
0 (H)003-104,
0 (H)0141018,
0 (H)02515364,
0 (H)036106,
0 (H)107-240,
0 (H)1184793,
0 (H)12912022,
0 (H)1310-191,
0 (H)2011-282,
0 (H)21128474,
0 (H)22138474,
0 (H)2314-282,
0 (H)3015-191,
0 (H)311612022,
0 (H)32174793,
0 (H)3318-240,
1 (V)001986,
1 (V)012016212,
1 (V)022186,
1 (V)--220,Padding value
1 (V)1023512,
1 (V)112416068,
1 (V)1225-197,
1 (V)--260,Padding value
1 (V)20271243,
1 (V)212815539,
1 (V)2229-398,
1 (V)--300,Padding value
1 (V)30312829,
Video Scaler v4.0 User Guidewww.xilinx.com59
UG805 March 1, 2011
Chapter 8: Coefficients
Table 8-12: .coe File Example 3
1 (V)313214099,
1 (V)3233-544,
1 (V)--340;Padding value
-- - 35“”
Coefficient Readback
For coefficient verification purposes, a feature of the video scaler allows the user to read
back coefficients in the active coefficient memory.
Dedicated connections are included to facilitate this feature:
•coef_set_bank_rd_addr(11:8): Coefficient set read-address
•coef_set_bank_rd_addr(1:0): Coefficient bank read-address. 00=HY, 01=HC, 10=VY,
11=VC
•intr_coef_mem_rdbk_rdy: Output flag indicating that the specified coefficient bank is
ready for reading
Before changing the set and bank read address, the user must set bit 3 of the control register
to 0. Using the coef_set_bank_rd_addr, the user provides a set number and bank
number for the coefficients he wants to read back. The user must then activate the new
bank of coefficients by setting bit 3 of the control register to 1. A FIFO is then populated
with that bank of coefficients. Once the intr_coef_mem_rdbk_rdy interrupt has gone
high, using coef_mem_rd_addr the user must also provide the phase and tap number of
the coefficient he wants to read from that bank. The coefficient will appear at
coef_mem_output three clk cycles later.
Reading back coefficients does not cause image distortion, and may be executed during
normal operation.
60www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Performance
The target maximum clock frequencies for all scaler input clocks are shown in Tab le 9 -1 .
Table 9-1:Target Maximum Clock Frequencies
FamilySpeed gradeFMax (MHz)
Virtex-5-1225
Virtex-6-1250
Spartan-6-2150
Chapter 9
-2250
-3275
-2280
-3160
Spartan-3A DSP-4150
-5160
It is very important to ensure that the clock rate available supports worst-case conversions.
This chapter includes detailed information and examples for worst-case scenarios.
Every user of the Xilinx Video Scaler should have a worst-case scenario in mind. The
factors that may contribute to this scenario include:
•Maximum line length to be handled in the system (into and out from the scaler)
•Maximum number of lines per frame (in and out)
•Maximum frame refresh rate
•Chroma format (4:4:4, 4:2:2, or 4:2:0)
•Clock FMax (depends upon the selected device)
These factors may contribute to decisions made for configuring the scaler and its
supporting system. For example, the user may decide to use the scaler in its dual-engine
parallel Y/C configuration to achieve the scale factor and frame rate desired. Using a dualengine scaler allows the scaler to process more data per clock cycle at the cost of an
increased resource usage. He may also elect to change speed-grade or even device family
dependent upon his findings.
The size of the scaler implementation is determined by the number of taps and number of
phases in the filter and the number of engines. The number of taps and number of phases
do not impact the clock frequency.
Video Scaler v4.0 User Guidewww.xilinx.com61
UG805 March 1, 2011
Chapter 9: Performance
How do you establish whether or not the scaler will meet the application requirements?
The approach taken is to calculate the minimum clock frequency required to make the intended conversions possible.
Definitions:
Subject ImageThe area of the active image that is driven into the scaler. This may or may
not be the entire image, dependent upon your requirements. It is of
dimensions (SubjWidth x SubjHeight).
Active ImageThe entire active input image, some or all of which will include the Subject
Image, and is of dimensions (ActWidth x ActHeight).
FPixThe input sample rate.
F'clk The 'clk' frequency. Data is read from the internal input line buffer,
processed and written to the internal output buffer using the system clock.
FLineInThe input Line Rate – could be driven by input rate or scaler LineReq rate.
FLineIn must represent the maximum burst frequency of the input lines.
For example, 720P exhibits an FLineIn of 45kHz.
FFrameIn The fixed frame refresh rate (Hz) – same for both input and output.
To make the calculations according to the previous definitions and assumptions, it is
necessary to distinguish between the following cases:
•Live Video mode: An input video stream feeds directly into the scaler.
•Memory mode: The user may control the input feed using back-pressure/
There follow some example cases which attempt to illustrate how to calculate what clock
frequencies may be required to sustain the throughput required for given usage scenarios.
Live Video Mode
If no input frame buffer is used, and the timing of the input video format drives the scaler,
then the number of 'clk' cycles available per H period becomes important. FLineIn is a
predetermined frequency in this case, often (but not necessarily) defined according to a
known broadcast video format (for example 1080i/60, 720P, CCIR601 etc.).
The critical factors may be summarized as follows:
•ProcessingOverheadPerComponent –The number of extraneous cycles needed by
•The user may not hold off the input stream.
•The system must be able to cope with the constant flow of video data.
handshaking by implementing an input frame buffer.
the scaler to complete the generation of one component of the output line, in addition
to the actual processing cycles. This is required due to filter latency and State-Machine
initialization. For all cases in this document, this has been approximated as 50 cycles
per component per line.
62www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Live Video Mode
•CyclesPerOutputLine – This is the number of cycles the scaler requires to generate
one output line, of multiple components. The final calculation depends upon the
chroma format and the filter configuration (YC4:2:2 only), and can be summarized as:
The CompsPerEngine and OverHeadMult values can be extracted from Tab le 9 -2 .
Table 9-2:Throughput Calculations for Different Chroma Formats
Chroma FormatNumEnginesCompsPerEngineOverHeadMult
4:4:4 (e.g., RGB)311
4:2:2 High performance212
4:2:2 Standard performance123
4:2:0123
NumEngines
This is the number of engines used in the implementation. For the YC4:2:2 case, a
higher number of engines uses more resources - particularly BRAM and DSP48.
CompsPerEngine
This is the largest number of full h-resolution components to be processed by this
instance of the scaler. When using YC, each chroma component constitutes 0.5 in this
respect.
OverHeadMult
For each component processed by a single engine, the
ProcessingOverheadPerComponent overhead factor must be included in the equation.
The number of times this overhead needs to be factored in depends upon the number
of components processed by the worst-case engine.
vertical scaling ratio = output_v_size/input_v_size
Given the preceding information, it is now necessary to calculate how many cycles it will
take to generate the worst-case number of output lines for any vertical aperture:
•MaxClksTakenPerVAperture – This is the number of cycles it will take to generate MaxVHoldsPerInputAperture lines.
MaxClksTakenPerVAperture = CyclesRequiredPerOutputLine x
MaxVHoldsPerInputAperture
It is then necessary to decide the minimum 'clk' frequency required to achieve your goals
according to this calculation:
MinF'clk' = FLineIn x MaxClksTakenPerVAperture
Also useful is the reciprocal relationship that defines the number of 'clk' cycles available
before the next line is written into the input line buffer, for a predefined 'clk' frequency:
ClksAvailablePerLine = F'clk'/FLineIn
Within this number of cycles, all output lines that require the use of the current vertical
filter aperture must be completely generated. If MaxClksTakenPerVAperture <
ClksAvailablePerLine, then the desired conversion is possible using the current clock
frequency, without the use of an input frame buffer.
Some examples follow. They are estimates only, and are subject to change.
Example 1: The Unity Case
1080i/60 YC4:2:2 'passthrough'
Vertical scaling ratio = 1.00
Horizontal scaling ratio = 1.00
FLineIn = 33750
Single-engine implementation
hsf=220 x (1/0.6667) = 0x180000
vsf=220 x (1/0.6667) = 0x180000
When using a single-engine, this conversion will not work with or without frame
buffers (see below - Memory mode) unless using higher speed-grade Virtex-5 or
Virtex-6 devices.
Example 7: Down-scaling 1080P60 YC4:2:2 to 720P/60
67.5 kHz line rate
Vertical scale ratio = 0.6667
Horizontal scale ratio = 0.6667
FLineIn = 67500
Dual-engine implementation
This conversion will work in Virtex-5, but not in Spartan-3A DSP since the MinF'clk is
greater than the Spartan-3A Fmax, but less than the Virtex-5 Fmax, as shown in
Ta b l e 9 - 1.
Using an input frame buffer allows you to stretch the processing time over the entire frame
period (utilizing the available blanking periods). New input lines may be provided as the
internal phase-accumulator dictates, instead of the input timing signals.
The critical factors may be summarized as follows:
•ProcessingOverheadPerLine – The number of extraneous cycles needed by the scaler
to complete the generation of one output line, in addition to the actual processing
cycles. This is required due to filter latency and State-Machine initialization. For all
cases in this document, this has been approximated as 50 cycles per component per
line.
•FrameProcessingOverhead – The number of extraneous cycles needed by the scaler
to complete the generation of one output frame, in addition to the actual processing
cycles. This is required mainly due to vertical filter latency. For all cases in this
document, this has been generally approximated as 10000 cycles per frame.
Video Scaler v4.0 User Guidewww.xilinx.com67
UG805 March 1, 2011
Chapter 9: Performance
•CyclesPerOutputFrame – This is the number of cycles the scaler requires to generate
one output frame, of multiple components. The final calculation depends upon the
chroma format (and, for YC4:2:2 only, the filter configuration), and can be
summarized as:
hsf=220 x (1/1.5) = 0x0AAAAA
vsf=220 x (1/0.8) = 0x155555
This conversion is allowed in Spartan-3A DSP.
Memory Mode
Note:
Spartan-3A DSP.
Example 9 showed that the same conversion with no frame buffer is not possible in
Video Scaler v4.0 User Guidewww.xilinx.com69
UG805 March 1, 2011
Chapter 9: Performance
70www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Use Cases
Typical Uses
Some scenarios for scaler usage are shown in Figure A-1 through Figure A-5. In particular,
usage of the following dynamic parameter values are illustrated:
•aperture_start_line
•aperture_end_line
•aperture_start_pixel
•aperture_end_pixel
•output_h_size
•output_v_size
•hsf
•vsf
Appendix A
These values are very significant, and their usage is be referred to throughout this
document.
X-Ref Target - Figure A-1
720
aperture_start_pixel = 0
1280
aperture_end_pixel = 1279
aperture_start_line
= 0
output_y_size = 480
aperture_end_line
= 719
output_h_size = 640
Figure A-1: Format Down-scaling. Example 720p to 640x480,
HSF = 2
20
x 1280/640; VSF = 220 x 720/480
UG_01_031909
Video Scaler v4.0 User Guidewww.xilinx.com71
UG805 March 1, 2011
Appendix A: Use Cases
aperture_start_line
= 0
aperture_start_pixel = 0
640
480
aperture_end_pixel = 639
aperture_end_line
= 479
output_h_size = 1280
output_y_size
= 720
UG_02_031909
aperture_start_line
= 420
aperture_start_pixel = 750
1280
480
270
720
aperture_end_pixel = 1229
aperture_end_line
= 689
output_h_size = 1280
output_y_size
= 720
UG678_4-5_081809
aperture_start_line
= 0
aperture_start_pixel = 0
720
aperture_end_pixel = 1279
aperture_end_line
= 719
12801280
720
270
480
output_h_size = 480
output_y_size
= 270
UG678_4-6_081809
X-Ref Target - Figure A-2
Figure A-2: Format Up-scaling. Example 640x480 to 720p,
HSF = 2
X-Ref Target - Figure A-3
20
x 640/1280; 220 x VSF = 480/720
Figure A-3: Zoom (Up-scaling), HSF = 220 x 480/1280; VSF = 220 x 270/720
X-Ref Target - Figure A-4
72www.xilinx.comVideo Scaler v4.0 User Guide
Figure A-4: Shrink (Down-scaling). Example for Picture-in-Picture (PinP),
HSF = 2
20
x 1280/480; VSF = 220 x 720/270
UG805 March 1, 2011
X-Ref Target - Figure A-5
aperture_start_line
= 0
aperture_start_pixel = 0
1280
480
270
720
aperture_end_pixel = 479
aperture_end_line
= 269
output_h_size = 1280
output_y_size
= 720
UG678_4-7_081809
Typical Uses
Figure A-5: Zoom (Up-scaling) reading from External Memory,
HSF = 2
20
x 480/1280; VSF = 220 x 270/720
Video Scaler v4.0 User Guidewww.xilinx.com73
UG805 March 1, 2011
Appendix A: Use Cases
74www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
Programmer Guide
Introduction
This appendix provides a description of how to program and control the data flow for the
video scaler hardware pCore. The information is sufficient for the development of a
software driver (API) for use in application software for applications such as video
conferencing and video analytics.
Appendix B
Note:
as described here.
A software driver is provided with the pCore so that you do not have to develop a software API
Conventions
Reserved locations in the registers will be ignored by the hardware and can be written by
software with any value. Therefore the software does not need to zero or mask bits.
Unused coefficients should be set to zero. The number of taps is a compile time parameter
for the IP core and needs to be known by the programmer to be able to load the coefficient
tables correctly.
Register Definitions
Note: All registers default to 0x00000000 on power-up or software reset.
Table B-1: Video Scaler Registers Overview
AddressNameRead/WriteDescription
0x0000controlR/WGeneral control register
0x0004statusRGeneral readable status register
0x0008status_errorRGeneral readable status register for errors
0x000cstatus_doneR/WGeneral read register for status done
Coefficient value N+1 where N is index for the coefficient
set.
Usage: Each write to this register increments an internal
counter by 2 to generate a coefficient set internal to the
video scaler. LSB aligned for coefficients less than 16 bits.
Coefficient value N where N is index for the coefficient
set.
coef_value_N15:0
Usage: Each write to this register increments an internal
counter by 2 to generate a coefficient set internal to the
video scaler. LSB aligned for coefficients less than 16 bits
Tab le B- 19 :Coefficient Set and Bank Read Address Register
intr_coef_wr_error4Mask or Enable interrupt for intr_coef_wr_error
intr_output_error3Mask or Enable interrupt for intr_output_error
intr_input_error2Mask or Enable interrupt for intr_input_error
intr_coef_fifo_rdy1Mask or Enable interrupt for intr_coef_fifo_rdy
intr_output_frame_
done
Filter Coefficient Calculations
The values for the filter coefficients can be calculated with any standard digital filter tool.
MATLAB® software provides a tool box for establishing the filter coefficients once the
cutoff frequency is known from the scale factor. It should be noted that sharp cutoff
frequencies are generally not desired in image processing due to the ringing generated at
sharp transitions (artifacts). Additionally allowing some amount of aliasing can be
subjectively preferred in side-by-side comparisons. The MATLAB software FIR1 function
can be used as a starting point for deriving coefficient values.
6Mask or enable interrupt for intr_coef_mem_rdbk_rdy
5Mask or Enable interrupt for intr_reg_update_done
0Mask or Enable interrupt for intr_output_frame_done
Xilinx provides a C-Model that generates coefficients. Contact Xilinx support for
information on how to obtain this C-Model. Refer to the Video Scaler Product Page
information about accessing the C-Model.
Video Scaler v4.0 User Guidewww.xilinx.com85
UG805 March 1, 2011
for
Appendix B: Programmer Guide
Video Scaler Flow Diagram
Start
Scaling
Y
Initialize
Registers
Set Load
Coef Bank
Load
Coefs
Set Active
Coef Bank
New
Scale
N
Factors
Y
Initialize
Registers
HSF . VSF .
Output_h/v
Set Active Coef
Bank
New
Coef
Bank?
Y
Set Load
Coef Bank
N
N
Y
N
Disable
Scaler
Control O0
86www.xilinx.comVideo Scaler v4.0 User Guide
Enable
Video Scaler
Control 0 1
N
Y
3
Done?Done?
Y
Stop
Scaling
N
Figure B-0: Video Scaler Flow Chart
Load Coefs
UG678_01_030210
UG805 March 1, 2011
System Timing Diagram
System Timing Diagram
Video Scaler v4.0 User Guidewww.xilinx.com87
UG805 March 1, 2011
Figure B-0: System Timing Diagram
Appendix B: Programmer Guide
Proposed API function calls
The following functions are proposed for LO, L1, L2 API.
•In a zoom operation the input image size is changing on a frame basis and the output
resolution is fixed.
•Calls XScaler_CalcScaleFactors, XScaler_SetScalerValues every frame to
perform the zoom function. Prior to beginning the zoom operation, you will have to
preload the coeff banks you would like to use for the duration and decide when to
transition to a new coefficient bank; example 4 coeff banks for 200 frames switch bank
every 50 frames.
•In a downsize operation, the input image size is not changing on a frame basis and the
output resolution is changing.
•Calls XScaler_CalcScaleFactors, XScaler_SetScalerValues every frame to
perform the downsize function. Prior to beginning the downsize operation, you will
have to preload the coeff banks you would like to use for the duration and decide
when to transition to a new coefficient bank; example 4 coeff banks for 200 frames
switch bank every 50 frames.
Example Settings
The following examples illustrate settings for different scale factors.
Pass Thru
Tab le B- 27 is an example of pass thru of a 1280 x 720 resolution image.
Tab le B- 27 :Pass Through Register Settings
AddressNameDecimal Value
0x0000control07
0x0010hsf1048576
0x0014vsf1048576
0x0018aperture_start_pixel0
0x0018aperture_end_pixel1279
0x001caperture_start_line0
0x001caperture_end_line719
0x0020Output_h_size1280
0x0020Output_v_size720
0x0024num_h_phases4
0x0024num_v_phases4
0x0028h_coeff_set0
0x0028v_coeff_set0
0x002cstart_hpa _y0
0x0030start_hpa_c0
0x0034start_vpa_y0
0x0038start_vpa_c0
0x003cCoef_set_write_addr0
0x0040Coef_values
90www.xilinx.comVideo Scaler v4.0 User Guide
See Chapter 8,
Coefficients
UG805 March 1, 2011
Down Sample by 2 in Horizontal and Vertical
Tab le B- 28 is an example of scaling down a 1280 x 720 resolution image by a factor of
2 horizontally and vertically to 640x 360.
Tab le B- 28 :Down Sample Register Settings
AddressNameDecimal VAlue
0x0000control07
0x0010hsf2097152
0x0014vsf2097152
0x0018aperture_start_pixel0
0x0018aperture_end_pixel1279
0x001caperture_start_line0
0x001caperture_end_line719
0x0020Output_h_size640
0x0020Output_v_size360
Example Settings
0x0024num_h_phases4
0x0024num_v_phases4
0x0028h_coeff_set0
0x0028v_coeff_set0
0x002cstart_hpa_y0
0x0030start_hpa_c0
0x0034start_vpa_y0
0x0038start_vpa_c0
0x003cCoef_set_write_addr0
0x0040Coef_values
See Chapter 8,
Coefficients
Video Scaler v4.0 User Guidewww.xilinx.com91
UG805 March 1, 2011
Appendix B: Programmer Guide
92www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
System Level Design
Introduction
This appendix provides an example system that includes the video scaler core. Important
system level aspects when designing with the video scaler are highlighted, including:
•Video scaler usage with the VDMA/VFBC/MPMC or other memory
interface/controller
•Inclusion of the video scaler in an EDK project
•Typical usage of video scaler in conjunction with other cores
•System level distribution of video timing and genlock signals
Example System General Configuration.
Appendix C
The system input and output is expected to be no larger than 720P (1280Hx720V), with a
maximum pixel frequency of 74.25 MHz, with equivalent clocks.
•MicroBlaze controls scale factors according to user input
•The system can upscale or downscale
•When down scaling, the full input image is scaled down and placed in the center of a
black 720P background and displayed
•When upscaling, the center of the 720P input image is cropped from memory and
upscaled to 720P, and displayed as a full 720P image on the output
•Operational clock frequencies are derived from the input clock
Figure C-1 shows a typical example of the video scaler in memory mode incorporated into
a larger system. Here are the essential details:
•The Multiport Memory Controller (MPMC) represents the memory access point for
multiple IP blocks.
•The MPMC ports are configured as Video Frame Buffer Controllers (VFBC), which
allow the user to access data in rectangular fashion, making it simple to store frames
of data, and access portions of any frame. This configuration is useful for cropping an
area in preparation for upscaling (for example). See the MPMC Data Sheet
information
•The Video Direct Memory Access (VDMA) blocks simplify the VFBC interface, and
act as a SW-controllable processor peripheral. See the VDMA Data Sheet
information.
•The Timebase Controller is a SW-configurable timing detector and generator block,
which generates timing signals for distribution around the system. See the Timing
Controller Data Sheet for more information.
for more
for more
Video Scaler v4.0 User Guidewww.xilinx.com93
UG805 March 1, 2011
Appendix C: System Level Design
•The On-Screen Display (OSD) block aligns the data read from memory with the
timing signals and presents it as a standard-format video data stream. It also alphablends multiple layers of information (e.g. text, other video data). See the OSD Data
Sheet for more information.
X-Ref Target - Figure C-1
Control Buses
In this example, MicroBlaze is configured to use the PLB v4.6. The VDMAs sit on the PLB
bus directly. The Video Scaler, Timing Controller, and OSD use AXI4-Lite. The PLB-to-AXI
bridge facilitates the transition between PLB and AXI buses.
VDMA0 Configuration
VDMA0 is used uni-directionally, used for writing input data into the memory. Normally,
this should be configured as a write-only core (C_DMA_TYPE = 0). However, currently, it is
configured as a bidirectional core in this case (C_DMA_TYPE = 2), to work around an issue in the
VDMA design - the read side of this core is not connected, except for the read-side clock.
The system operates using a Genlock mechanism. A rotational 5-frame buffer is defined in
the external memory. Using the Genlock bus vdma_0_XIL_WD_MGENLOCK, VDMA0
communicates to VDMA1 which of the five frame locations is being written, to avoid R/W
collisions.
VDMA0, in the MHS file text given below, is sourced from an engineering test-pattern
generator (not included in the MHS file below). This generates a VDMA write bus that
connects directly to the VDMA write port.
Figure C-1: Simplified System Diagram
94www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
VDMA1 Configuration
VDMA1 is bidirectional, used for reading the original frames from memory, and writing
the scaled frame back to memory.
The system operates using a Genlock mechanism. A second rotational 5-frame buffer is
defined in the external memory. VDMA1 communicates to VDMA2 which frame it is
writing to, using the Genlock bus vdma_1_XIL_WD_MGENLOCK.
VDMA1, in the MHS file text below, interfaces with the video scaler via a VDMA read bus
(scaler input) and VDMA write bus (scaler output).
VDMA2 Configuration
VDMA2 is unidirectional, and is configured that way. It is used for reading the scaled
frame from memory in order to display it. It is a Genlock slave to VDMA1.
Video Scaler Configuration
The video scaler is configured as follows:
•single-engine 4:2:2
•11Hx11V-taps
•64 phases
•shared YC coefficients
VDMA1 Configuration
Its core uses a 148.5 MHz derivative of the 74.25 MHz input clock.
MPMC Configuration
The MPMC is configured to have three VFBC ports. Each port includes a FIFO. The FIFOs
are configured to be 2048 pixels in length. This is especially important for VDMA1, which
handles video data to/from the video scaler. The video scaler arbitrates on a line-by-line
basis. It does this by analyzing the status of the rd_almost_empty and
wd_almost_full flags on the VDMA buses, before reading or writing one line, but never
analyzes these flags once a line-read or line-write operation has commenced. This is
described in detail in the main text of this user guide. The guidelines for this port are
described in the following two sections.
Scaler READ-port
•For the port that feeds data into the video scaler, ensure that there is a FIFO of a size
equal to or greater than the maximum line length anticipated to be scaled by the
scaler. Ideally, set this to the next power of 2 above the maximum input line length.
For this example, the max line length is 1280, so the FIFO has been set to 2048 pixels.
•For systems like the VFBC, which have a FIXED threshold for the ALMOST
full/empty flags, set this value to the maximum input line-length. This ensures that
the rd_almost_empty flag will not be driven low until an entire line of video data
is in the FIFO, ready for the scaler to accept.
Video Scaler v4.0 User Guidewww.xilinx.com95
UG805 March 1, 2011
Appendix C: System Level Design
Scaler WRITE-port
•For the port that feeds from the video scaler out to the memory, ensure that there is a
FIFO of a size equal to or greater than the maximum line length anticipated to be
output by the scaler. Ideally, set this to the next power of 2 above the maximum
output line length. For this example, the max line length is 1280, so the FIFO has been
set to 2048 pixels.
•For systems like the VFBC, which have a FIXED threshold for the ALMOST
full/empty flags, set this value to the maximum output line-length. This ensures that
the wd_almost_full flag will not be driven low until there is sufficient space in
the FIFO for an entire line of video data.
Cropping from Memory
Controlling the VDMA dynamically (e.g., from MicroBlaze or other processor) allows you
to request any rectangle from any where in the image in memory, and change the position
and dimensions of this rectangle on a frame-by frame basis. One complication of doing this
with the VFBC is that the FIFO almost full/empty thresholds are FIXED at compile-time.
According to the guidelines above, it is necessary to set the thresholds to the maximum line
length. Yet, when cropping from memory, you will be requesting a rectangle of a smaller
width than the maximum line length. Consequently, the final lines may not be read from
memory correctly, resulting in some distortion at the bottom of the image.
To work around this issue, it is necessary, and safe, to request more lines than you want to scale. This keeps the FIFO topped up with data. This can be achieved by setting the VDMA
Read Vsize register (address offset 0x28) to a number greater than you want. See the
VDMA Data Sheet
differently to your desired values.
OSD Configuration
The OSD is configured for two layers. The first layer is video data read from VDMA2. The
second layer is text overlay.
EDK MHS File Text
The following is an example EDK MHS file insert for the system described.
Note:
scaler system in EDK.
This is NOT a complete design, but provides some idea as to the construction of a video
for more information. The scaler register settings should not be set
96www.xilinx.comVideo Scaler v4.0 User Guide
UG805 March 1, 2011
PORT m_wd_frame_ptr_out = vdma_0_XIL_WD_MGENLOCK
PORT vdma_wcmd_clk = vid_in_clk
PORT vdma_wd_clk = vid_in_clk
PORT vdma_rcmd_clk = vid_in_clk
PORT vdma_rd_clk = vid_in_clk
END
BEGIN timebase
PARAMETER INSTANCE = timebase_1
PARAMETER HW_VER = 3.00.a
PARAMETER C_BASEADDR = 0xc3800000
PARAMETER C_HIGHADDR = 0xc380ffff
PARAMETER C_MAX_LINES = 1024
PARAMETER C_INTERCONNECT_S_AXI_MASTERS = plbv46_axi_bridge_0.M_AXI
BUS_INTERFACE S_AXI = axi_interconnect_0
BUS_INTERFACE XSVI_OUT = timebase_1_XSVI_OUT
PORT ce = net_vcc
PORT video_clk_in = vid_in_clk
PORT fsync_o = timebase_1_fsync
PORT IP2INTC_Irpt = timebase_1_IP2INTC_Irpt
PORT S_AXI_ACLK = clk_100_0000MHzMMCM0
END