Texas Instruments (TI) reserves the right to make changes to its products or to discontinue any semiconductor
product or service without notice, and advises its customers to obtain the latest version of relevant information
to verify, before placing orders, that the information being relied on is current.
TI warrants performance of its semiconductor products and related software to the specifications applicable at
the time of sale in accordance with TI’s standard warranty. Testing and other quality control techniques are
utilized to the extent TI deems necessary to support this warranty. Specific testing of all parameters of each
device is not necessarily performed, except those mandated by government requirements.
Certain applications using semiconductor products may involve potential risks of death, personal injury, or
severe property or environmental damage (“Critical Applications”).
TI SEMICONDUCTOR PRODUCTS ARE NOT DESIGNED, INTENDED, AUTHORIZED, OR WARRANTED
TO BE SUITABLE FOR USE IN LIFE-SUPPORT APPLICATIONS, DEVICES OR SYSTEMS OR OTHER
CRITICAL APPLICATIONS.
Inclusion of TI products in such applications is understood to be fully at the risk of the customer. Use of TI
products in such applications requires the written approval of an appropriate TI officer. Questions concerning
potential risk applications should be directed to TI through a local SC sales office.
In order to minimize risks associated with the customer’s applications, adequate design and operating
safeguards should be provided by the customer to minimize inherent or procedural hazards.
TI assumes no liability for applications assistance, customer product design, software performance, or
infringement of patents or services described herein. Nor does TI warrant or represent that any license, either
express or implied, is granted under any patent right, copyright, mask work right, or other intellectual property
right of TI covering or relating to any combination, machine, or process in which such semiconductor products
or services might be or are used.
Copyright 1997, Texas Instruments Incorporated
iv
TVP4020 Programmers Reference ManualContents
3Dlabs is the worldwide trading name of 3Dlabs Inc. Ltd.
3Dlabs, GLINT and P
ERMEDIA
are registered trademarks of 3Dlabs Inc. Ltd.
Microsoft, Windows and Direct3D are either registered trademarks or trademarks
of Microsoft Corp. in the United States and/or other countries. OpenGL is a
registered trademark of Silicon Graphics, Inc. Macintosh and Power Macintosh
are registered trademarks and QuickDraw is a trademark of Apple Computer Inc.
All other trademarks are acknowledged and recognized.
TVP4020 is a high performance PCI/AGP graphics processor that
balances high quality 3D polygon and textured graphics acceleration,
windows acceleration and state-of-the-art MPEG1/MPEG2 playback with
a fast integrated SVGA core, integrated RAMDAC and video ports. This
document provides a high level overview of the architecture of the
TVP4020 graphics processor and is intended as an introduction for
design engineers and project managers planning the implementation of
TVP4020 based systems.
TVP4020 sets the standard for 3D and multimedia acceleration, making
it the ideal solution to meet the increasingly pervasive need for balanced
3D and multimedia acceleration - and all in a single, low cost PCI device.
This document has been written as the primary reference for
programmers and system designers who wish to develop software to
drive TVP4020. Information on programming the I/O registers can be
found in the
TVP4020 is the second generation P
TVP4020 Hardware Reference Manual.
ERMEDIA
device. Compared with
TVP4010, it provides greater flexibility, additional features and enhanced
performance. Throughout this manual the terms TVP4020 and P
ERMEDIA
are used interchangeably.
1.1
An understanding of the principles of 2D and 3D graphics programming
will be useful in reading this document.
How to use this manual
Chapter 2 gives an overview of P
ERMEDIA
.
Chapter 3 details the programming model for the chip.
Chapter 4 describes the data formats that P
ERMEDIA
supports in the
framebuffer, localbuffer and texture buffer.
Chapter 5 describes how to use P
Chapter 6 describes the initialization of P
Chapter 7 provides tips for programming P
ERMEDIA
for graphics rendering.
ERMEDIA
ERMEDIA
.
.
Chapter 8 provides examples of Delta programming.
Appendix A details the P
ERMEDIA
registers.
Appendix B gives the format used in the pseudocode examples
throughout the document.
1
IntroductionTVP4020 Programmers Reference Manual
Appendix C gives a table used to set-up common screen widths.
2
TVP4020 Programmers Reference ManualIntroduction
Appendix D describes how a Gouraud shaded triangle can be rendered
without using the Delta Unit. This is helpful in understanding how the
chip works and
also when dealing with TVP4010 legacy.
Appendix E tabulates the TVP4020 registers.
Appendix F describes the differences between TVP4010 and 2
A Glossary of technical terms follows the Appendices.
An extensive index is included.
Stretch BLTs, monochrome/color expansion and logic ops
•
8, 16, 24 and 32-bit packed framestore
•
MPEG2 compatible Video playback acceleration
•
YUV 4:4:4, YUV 4:2:2 and YUV 4:2:0 (native MPEG2 format)
•
Unlimited multiple playback windows (occluded)
•
Independent XY scaling and mirroring
•
Integrated geometry pipeline set-up processor
•
Integrated true-color 230 MHz RAMDAC
•
320x200 to 1600x1200 screen resolution
•
DPMS, DDC1 and DDC2AB+
•
Clock synthesizer and Hardware cursor
•
Multi-mode video streams
•
Simultaneous input and output video
•
Optional scaling and filtering
•
Optional color space conversion and gamma correction
•
Fast on-chip SVGA
•
Flexible multi-function SDRAM or SGRAM memory (2, 4, 6 or 8
Mbytes)
•
Microsoft PC97 and Intel GPC97 compliance
•
Comprehensive suite of optimized software drivers
•
Reference board designs and manufacturing kits
2
TVP4020 Programmers Reference ManualOverview
Bus
ce
Memory
I
e
VGA
Gra phics Hyperpipeline
2.2
2.2.1
Functional Overview
Memory Subsystem
ERMEDIA
P
provides flexible support for the memory subsystem (Fig. 2.1).
This allows the system designer a wide choice of price/performance
tradeoffs.
The same physical memory holds all data used by P
ERMEDIA
. Internally
the data types are divided into texture, localbuffer and framebuffer. The
localbuffer holds depth and stencil data; the framebuffer holds color data
for display.
Host BusSGRAM
2.2.2
Host Interface
Conceptually P
Interfa
Bypass
Figure 2.1External Interfaces
ERMEDIA
can be viewed as a register file. Control registers
nterfac
are primed with the information required for a primitive, and then to start
the chip drawing, a write is made to a Command register
ERMEDIA
P
registers can be accessed directly through the memory map.
Registers can be accessed either individually or in groups.
The chip also supports a bypass route to the memory to allow direct
read/write of pixels, and implementation of algorithms not directly
supported by P
ERMEDIA
.
3
OverviewTVP4020 Programmers Reference Manual
2.2.3
2.2.4
Task Switching
Where multiple applications wish to make simultaneous access to
ERMEDIA
P
the loading of correct state. P
, it is the responsibility of the software driving the chip to handle
ERMEDIA
has been designed to support a
number of different software architectures.
• Synchronous operation means that a new task can load its context
without waiting for current rendering to complete
• All loadable state can be read back
• A Sync command is provided to flush all rendering. This can be polled
or it can return an interrupt
SVGA
ERMEDIA
P
contains a fast VGA core. The P
ERMEDIA
SVGA is used for
DOS VGA applications and during boot time before switching to use the
Graphics Hyperpipeline. This document does not cover VGA
programming. Specific information on P
the
TVP4020 Hardware Reference Manual
ERMEDIA
’s VGA can be found in
. VGA information, such as
standard registers, is described in the “Programmer’s Guide to the EGA,
VGA and Super VGA Cards” by Richards F. Ferraro.
The following standard VGA modes are supported:
Mode
(hex)
00 0
0*
0+
01 1
1*
1+
02 2
2*
2+
03 3
3*
3+
04 440 by 258 by 84/256K1Graph320 by 200
05 540 by 258 by 84/256K bw1Graph320 by 200
06 680 by 258 by 82/256K bw1Graph640 by 200
07 7
7+
0D D40 by 258 by 816/256K8Graph320 by 200
0E E80 by 258 by 816/256K4Graph640 by 200
0F F80 by 258 by 14bw2Graph640 by 350
10 1080 by 258 by 1416/256K2Graph640 by 350
11 1180 by 308 by 162/256K1Graph640 by 480
12 1280 by 308 by 1616/256K1Graph640 by 480
13 1340 by 258 by 8256/256K1Graph320 by 200
Alpha
Format
40 by 25
40 by 25
40 by 25
40 by 25
40 by 25
40 by 25
80 by 25
80 by 25
80 by 25
80 by 25
80 by 25
80 by 25
80 by 25
80 by 25
Char SizeColorsMax
Page
8 by 8
8 by 14
9 by 16
8 by 8
8 by 14
9 by 16
8 by 8
8 by 14
9 by 16
8 by 8
8 by 14
9 by 16
This chapter describes the programming model for P
ERMEDIA
. It
describes the interface conceptually rather than detailing specific
registers and their exact usage. In-depth descriptions of how to program
ERMEDIA
P
ERMEDIA
P
RegionAddress SpaceBytes DescriptionComments
ConfigConfiguration256PCI ConfigurationPCI special
ZeroMemory128KControl Registersrelocatable
OneMemory8MMemory Region Onerelocatable
TwoMemory8MMemory Region Tworelocatable
ROMMemory64KExpansion ROMrelocatable
SVGAMemory & I/ O-SVGA Addr essesoptional & fixed
Address RangeDescriptionByte Swap
0000.0000 -> 0000.0FFF Control & StatusNo
0000.1000 -> 0000.1FFF Memory ControlNo
0000.2000 -> 0000.2FFF GP FIFO accessNo
0000.3000 -> 0000.3FFF Video ControlNo
0000.4000 -> 0000.4FFF RAMDACNo
0000.5000 -> 0000.57FF Video Str eam s General Purpose
0000.5800 -> 0000.5FFF Video Streams ControlNo
0000.6000 -> 0000.6FFF SVGA ControlNo
0000.7000 -> 0000.7FFF ReservedNo
0000.8000 ->
0000.FFFF
0001.0000 -> 0001.0FFF Control & StatusYes
0001.1000 -> 0001.1FFF Memory ControlYes
0001.2000 -> 0001.2FFF GP FIFO accessYes
0001.3000 -> 0001.3FFF Video ControlYes
0001.4000 -> 0001.4FFF RAMDACYes
0001.5000 -> 0001.57FF Video Str eam s General Purpose
0001.5800 -> 0001.5FFF Video Streams ControlNo
0001.6000 -> 0001.6FFF SVGA ControlYes
0001.7000 -> 0001.7FFF ReservedYes
for specific drawing operations can be found in later chapters.
is divided into the following memory regions:
Table 3.1Memory Regions
No
Bus
GP RegistersNo
No
Bus
6
TVP4020 Programmers Reference ManualProgramming Model
3.1
0001.8000 ->
0001.FFFF
Table 3.2Region 0 Address Map
ERMEDIA
P
as a Register file
The simplest way to view the interface to the P
Processor is as a flat block of memory-mapped registers (
file). This register file appears as part of the address map for P
When a P
ERMEDIA
host software driver is initialized it can map the
GP RegistersYes
ERMEDIA
Graphic
i.e.
a register
ERMEDIA
.
register file into its address space. Each register has an associated
address tag, giving its offset from the base of the register file (since all
registers reside on a 64-bit boundary, the tag offset is measured in
multiples of 8 bytes). The most straightforward way to load a value into a
register is to write the data to its mapped address. In reality the chip
interface comprises a 256 entry deep FIFO, and each write to a register
causes the written value and the register’s address tag to be written as a
new entry in the FIFO.
Programming P
ERMEDIA
to draw a primitive consists of writing values to
the appropriate registers followed by a write to a command register. This
last write triggers the start of drawing.
ERMEDIA
P
has approximately 200 registers. All registers are 32 bits wide
and should be 32-bit addressed. Many registers are split into bit fields,
and it should be noted that bit 0 is the least significant bit.
In future chip revisions the register file may be extended and currently
unused bits in certain registers may be assigned new meanings.
Software developers should ensure that only defined registers are
written to and that undefined bits in registers are always written as
zeros. The only exception to this rule is that in certain registers it is
convenient to allow unmasked values to be written to registers which
hold numeric data. These fields are marked as "not used" in Appendix A
and elsewhere.
Register Types
ERMEDIA
P
•Control Registers
•Command Registers
•Internal Registers
has three main types of register:
Control Registers are updated only by the host - the chip effectively uses
them as read-only registers. Examples of control registers are the
scissor clip min and max registers. Once initialized by the host, the chip
only reads these registers to determine the scissor clip extents. Most
registers are control registers.
Command Registers are those which, when written to, cause some
action to occur. Typically, the host will initialize the appropriate control
registers and then write to a command register to initiate drawing. Some
command registers such as ResetPickResult or Sync do not initiate
rendering. Apart from these, there are two types of command registers:
begin-draw and continue-draw. Begin-draw commands cause rendering
to start with those values specified by the control registers. Continuedraw commands cause drawing to continue with internal register values
as they were when the previous drawing operation completed. Making
use of continue-draw commands can significantly reduce the amount of
data that has to be loaded into P
ERMEDIA
when drawing multiple
connected objects such as polylines. Examples of command registers
include the Render and ContinueNewLine registers.
For convenience in this document we often refer to "sending a Render
command to P
ERMEDIA
" rather than saying "the Render Command
register is written to, which initiates drawing" .
Internal Registers are not accessible to host software. They are used
internally by the chip to keep track of changing values. Some control
registers have corresponding internal registers. When a begin-draw
command is sent and before rendering starts, the internal registers are
updated with the values in the corresponding control registers. If a
continue-draw command is sent then this update does not happen and
drawing continues with the current values in the internal registers. For
example, if a line is being drawn then the StartXDom and StartY control
registers specify the (x, y) coordinates of the first point in the line. When
a begin-draw command is sent these values are copied into internal
registers. As the line drawing progresses these internal registers are
updated to contain the (x, y) coordinates of the pixel being drawn. When
drawing has completed the internal registers contain the (x, y)
coordinates of the next point that would have been drawn. If a continuedraw command is now given, these final (x, y) internal values are not
modified and further drawing uses these values. If a begin-draw
command had been used the internal registers would have been reloaded from the StartXDom and StartY registers.
For the most part internal registers can be ignored. It is helpful to
appreciate that they exist in order to understand the continue-draw
commands.
Efficiency Issues and Register Types
Software developers wishing to write device drivers for P
become familiar with the different type s of registers. Some control
registers such as the StartXDom and StartY registers have to be
8
ERMEDIA
should
TVP4020 Programmers Reference ManualProgramming Model
updated for almost every primitive whereas other control registers such
as those for scissor clip or logical ops can be updated much less
frequently. Pre-loading of the appropriate control registers can reduce
the amount of data that has to be loaded into the chip for a given
primitive thus improving efficiency. In addition, as described above, the
final values in internal registers can sometimes be used for subsequent
drawing operations.
The tables in Appendix D lists the graphics registers according to their
type, name and address.
3.2
3.2.1
ERMEDIA
P
There are four ways of loading P
•The host writes a value to the mapped address of the register
•The host writes address-tag/data pairs to the FIFO.
•The host writes address-tag/data pairs to the FIFO via DMA.
•The host writes to raw memory mapped GP FIFO addresses.
I/O Interface
ERMEDIA
registers:
In cases where the host writes data values directly to the chip via the
register file, consideration has to be given to FIFO overflow (unless PCI
Disconnect is enabled). The InFIFOSpace register indicates how many
free entries remain in the FIFO. Before writing to any register, the host
must ensure that there is enough space left in the FIFO. The values in
this register can be read at any time. When using DMA, the DMA
controller will automatically e nsure that there is room in the FIFO befo re
it performs further transfers. Thus a buffer of any size up to 64K, 32 bit
words, can be passed to the DMA controller. The FIFO and DMA
controller are described in more detail below.
PCI Disconnect
The PCI bus protocol incorporates a feature known as PCI Disconnect,
which is supported by P
ERMEDIA
. PCI Disconnect is enabled by writing a
one to bit zero of the DisconnectControl register which is at offset 0x68
in PCI Region 0. Once the P
ERMEDIA
is in this mode, if the host processor
attempts to write to the full FIFO then instead of the write being lost, the
ERMEDIA
P
chip will assert PCI Disconnect which will cause the host
processor to keep retrying the write cycle until it succeeds.
This feature allows faster download of data to P
need not poll the InFIFOSpace register but should be used with care
since whenever the PCI Disconnect is asserted the bus is effectively
hogged by the host processor until such time as the P
an entry in its FIFO. In general this mode should only be used either for
operations where it is known that the P
than the host can generate it, or where there are no time critical
peripherals sharing the PCI bus.
3.2.2
3.2.3
Idle bit
In some systems, PCI Disconnect may cause interrupts to be lost if it
used too often or for too long. It is normal to only rely on this feature
when it is known that the data to be sent to P
ERMEDIA
will be absorbed
quickly enough that the disconnect will seldom be used. It also advisable
to check that the Graphics Processor is not processing a large primitive
before transferring data of this sort, and this may be done by checking
the Graphics Processor Active bit in the PCI Disconnect register.
Disconnect should not normally be enabled if this bit is set.
FIFO ControlFIFO Control
The description in section §3.1 above considered the P
ERMEDIA
to be a register file. More precisely, when a data value is written to a
register, this value and the address tag for that register are combined
and put into the FIFO as a new entry. The actual register is not updated
until P
ERMEDIA
processes this entry. In the case where P
performing a time consuming operation (
e.g.
drawing a large texture
ERMEDIA
mapped polygon), and not draining the FIFO very quickly, it is possible
for the FIFO to become full. If a write to a register is performed when the
FIFO is full no entry is put into the FIFO and that write is effectively lost.
interface
is busy
The input FIFO is 256 entries deep and each entry consists of a tag/data
pair; an address word which addresses the register to be updated,
followed by the data to be sent to the register. The InFIFOSpace register
can be read to determine how many entries are free. The value returned
by this register will never be greater than 256.
An example of loading P
ERMEDIA
registers using the FIFO is given below.
The pseudocode fills a series of rectangles. Details of the conventions
used in the pseudocode examples may be found in Appendix B.
Assume that the data to draw a single rectangle consists of 5 words
(including the Render command).
TVP4020 Programmers Reference ManualProgramming Model
Render (PERMEDIA_TRAPEZOID_PRIMITIVE);
}
The InFIFOSpace FIFO control register contains a count of the number
of entries currently free in the FIFO. The chip increments this register for
each entry it removes from the FIFO and decrements it every time the
host puts an entry in the FIFO. Before writing to the input FIFO, the user
must check that there is sufficient space by reading the InFIFOSpace
register.
The Graphics Core FIFO interface provides a port through which both
GC register addresses and data can be sent to the input FIFO. A range
of 4 Kbytes of host space is provided although all data may be sent
through one address in the range. ALL accesses go directly to the FIFO;
the range is provided to allow for data transfer schemes which force the
use of incrementing addresses.
Note that the GC registers cannot be read through this interface.
Command buffers generated to be sent to the input FIFO interface, may
be read directly by P
ERMEDIA
by using the DMA controller.
3.2.4
A data formatting scheme is provided to allow for multiple data words to
be sent with one address word where adjacent or grouped registers are
being written, or where one register is to be written many times.
Note. The FIFO interface can be accessed at 32 bit boundaries. This is
to allow a direct copy from a DMA format buffer.
The DMA Interface
Loading registers directly via the FIFO is often an inefficient way to
download data to P
a small number of entries, P
ERMEDIA
. Given that the FIFO can accommodate only
ERMEDIA
has to be frequently interrogated to
determine how much space is left. Also, consider the situation where a
given API function requires a large amount of data to be sent to
ERMEDIA
P
. If the FIFO is written directly then a return from this function is
not possible until almost all the data has been consumed by P
This may take some time depending on the types of primitives being
drawn.
To avoid these problems P
ERMEDIA
provides an on-chip DMA controller
which can be used to load data from arbitrary sized (< 64K 32-bit words)
host buffers into the FIFO. In its simplest form the host software has to
prepare a host buffer containing register address tag descriptions and
data values. It then writes the base address of this buffer to the
DMAAddress register and the count of the number of words to transfer
to the DMACount register. Writing to the DMACount register starts the
DMA transfer and the host can now perform other work. In general, if the
complete set of rendering commands required by a given call to a driver
function can be loaded into a single DMA buffer then the driver function
can return. Meanwhile, in parallel, P
ERMEDIA
is reading data from the
host buffer and loading it into its FIFO. FIFO overflow never occurs since
the DMA controller automatically waits until there is room in the FIFO
before doing any transfers.
The only restriction on the use of DMA control registers is that before
attempting to reload the DMACount register the host software must wait
until previous DMA has completed. It is valid to load the DMAAddress
register while the previous DMA is in progress since the address is
latched internally at the start of the DMA transfer. Many display driver
functions can be implemented using the following skeleton structure:
do any pre-work
DMAAddress(address of dma_buffer);
while (TRUE) {
count = *DMACount;// note this is volatile
if (count) {
while (--count)
;// wait for count to expire
}
else
break;// DMA completed
}
copy render data into DMA buffer
DMACount(number of words in DMA buffer)
return
12
Using DMA leaves the host free to return to the application, while in
parallel, P
ERMEDIA
is performing the DMA and drawing. This can increase
performance significantly over loading a FIFO directly. In addition, some
algorithms require that data be loaded multiple times (e.g. drawing the
same object across multiple clipping rectangles). Since the P
ERMEDIA
DMA only reads the buffer data, it can be downloaded many times
simply by restarting the DMA. This can be very beneficial if composing
the buffer data is a time consuming task.
A further optimization is to use a double buffered mechanism with two
DMA buffers. This allows the second buffer to be filled before waiting for
the previous DMA to complete thus further improving the parallelism
between host and P
ERMEDIA
processing.
TVP4020 Programmers Reference ManualProgramming Model
08162431
Cou n t or Ma sk
Addres s Tag
res erved
Mode
0 = Hold tag
1 = Increment tag
2 = Indexed ta g
3 = Reser ve d
do any pre-work
get free DMA buffer and mark as in use
put render data into this new buffer
DMAAddress(address of new buffer)
while (TRUE) {
count = *DMACount;// note this is volatile
if (count) {
while (--count)
;// wait for count to expire
}
else
break;// DMA completed
}
DMACount(number of words in new buffer)
mark the old buffer as free
return
In general the DMA buffer format consists of a 32-bit address tag
description word followed by one or more data words. The DMA buffer
consists of one or more sets of these formats. The following paragraphs
describe the different types of tag description words that can be used.
DMA Tag Description Format
When DMA is performed each 32-bit tag description in the DMA buffer
conforms to the following format.
reserved
Figure 3.1
DMA Tag Description Format
There are 3 different tag addressing modes for DMA: hold, increment
and indexed. The different DMA modes are provided to reduce the
amount of data which needs to be transferred, hence making better use
of the available DMA bandwidth. Each of these is described in the
following sections. Each row in the following diagrams represents a 32bit value in the DMA buffer. The address tag for each register is given in
the Graphics Register Reference Appendix D.
address-tag with Count=n-1, Mode=0
value 1
...
value n
This is commonly used for image download by setting the
SyncOnHostData bit in the Render command.. In this format the 32-bit
tag description contains a tag value and a count specifying the number
of data words following in the buffer. The DMA controller writes each of
the data words to the same address tag. For example, this is useful for
image download where pixel data is continuously written to the Color
register. The bottom 9 bits specify the register to which the data should
be written; the high-order 16 bits specify the number of data words
(minus 1) which follow in the buffer and which should be written to the
address tag (note that the 2-bit mode field for this format is zero so a
given tag value can simply be loaded into the low order 16 bits).
A special case of this format is where the top 16 bits are zero indicating
that a single data value follows the tag (
i.e.
the 32-bit tag description is
simply the address tag value itself). This allows simple DMA buffers to
be constructed which consist of tag/data pairs. For example to render a
horizontal span 10 pixels long starting from (2,5) the DMA buffer could
look like this:
address-tag with Count=n-1, Mode=1
value 1
...
value n
This format is similar to the hold format except that as each data value is
loaded the address tag is incremented (the value in the DMA buffer is
not changed; P
allows contiguous P
ERMEDIA
ERMEDIA
updates an internal copy). Thus, this mode
registers to be loaded by specifying a single
32-bit tag value followed by a data word for each register. The low-order
9 bits specify the address tag of the first register to be loaded. The 2 bit
mode field is set to 1 and the high-order 16 bits are set to the count
14
TVP4020 Programmers Reference ManualProgramming Model
(minus 1) of the number of registers to update. To enable use of this
format, the P
ERMEDIA
register file has been organized so that registers
which are frequently loaded together have adjacent address tags. For
example, the 8 AreaStipplePattern registers can be loaded as follows:
address tags are 9 bit values. For the purposes of the Indexed
DMA Format they are organized into major groups and within each
group there are up to 16 tags. The low-order 4 bits of a tag give its offset
within the group. The high-order 5 bits give the major group number.
Appendix D Register Table, lists the individual registers with their Major
Group and Offset.
8
Major GroupOffset
Figure 3.2
Indexed Format
094
This format allows up to 16 registers within a group to be loaded while
still only specifying a single address tag description word.
address tag with Mask, Mode=2
value 1
...
value n
If the Mode of the address tag description word is set to indexed mode
then the high-order 16 bits are used as a mask to indicate which
registers within the group are to be used. The bottom 4 bits of the
address tag description word are unused. The group is specified by bits
4 to 8. Each bit in the mask is used to represent a unique tag within the
group. If a bit is set then the corresponding register will be loaded. The
number of bits set in the mask determines the number of data words that
should be following the tag description word in the DMA buffer. The data
is stored in order of increasing corresponding address tag. For example,
The Mode bits are set to 2 so this is indexed mode. The Mask field
(0x0032) has 3 bits set so there are three data words following the tag
description word. Bits 1, 4 and 5 are set so the tag offsets are 1, 4 and 5.
The major group is given by the bits 4-8 which are 0x0F (in indexed
mode bits 0-3 are ignored). Thus the actual registers to update have
address tags 0x0F1, 0x0F4 and 0x0F5. These are updated with value 1,
value 2 and value 3 respectively.
DMA Example
The following pseudo-code shows the previous example of drawing a
series of rectangles but this time using the DMA controller. This example
uses a single DMA buffer and the simplest Hold Mode for the tag
description words in the buffer.
UINT32*pbuf;
DMAAddress (physical address of dma_buffer)
while (*DMACount != 0)
;// wait for DMA to complete
pbuf = dma_buffer;
*pbuf++ = PERMEDIATagdXDom;
*pbuf++ = 0;
*pbuf++ = PERMEDIATagdXSub;
*pbuf++ = 0;
*pbuf++ = PERMEDIATagdY;
*pbuf++ = 1 << 16;
for (i = 0; i < nrects; ++i) {
TVP4020 Programmers Reference ManualProgramming Model
The example assumes that a host buffer has been previously allocated
and is pointed at by “dma_buffer”. It is worth noting that significantly
less data would be required if indexed tags were used in this example.
DMA Buffer Addresses
Host software must generate the correct DMA buffer address for the
ERMEDIA
P
to P
DMA controller. Normally, this means that the address passed
ERMEDIA
must be the physical address of the DMA buffer in host
memory. The buffer must also reside at contiguous physical addresses
as accessed by P
ERMEDIA
. On a system which uses virtual memory for
the address space of a task, some method of allocating contiguous
physical memory, and mapping this into the address space of a task,
must be used.
If the virtual memory buffer maps to non-contiguous physical memory
then the buffer must be divided into sets of contiguous physical memory
pages and each of these sets transferred separately. In such a situation
the whole DMA buffer cannot be transferred in one go; the host software
must wait for each set to be transferred. Often the best way to handle
these fragmented transfers is via an interrupt handler.
DMA Interrupts
ERMEDIA
P
provides interrupt support, as an alternative means of
determining when a DMA transfer is complete. This can provide
considerable speed advantage. If enabled, the interrupt is generated
whenever the DMACount register changes from having a non-zero to
having a zero value. Since the DMACount register is decremented every
time a data item is transferred from the DMA buffer this happens when
the last data item is transferred from the DMA buffer.
To enable the DMA interrupt, the DMAInterruptEnable bit must be set in
the IntEnable register. The interrupt handler should check the DMAFlag
bit in the IntFlags register to determine that a DMA interrupt has actually
occurred. To clear the interrupt a word should be written to the IntFlags
register with the DMAFlag bit set to one.
A typical use of DMA interrupts might be as follows:
prepare DMA buffer
DMACount(n);// start a DMA transfer
prepare next DMA buffer
while (*DMACount != 0) {
mask interrupts
set DMA Interrupt Enable bit in IntEnable register
sleep on interrupt handler wake up
unmask interrupts
}
DMACount(n)// start the next DMA sequence
reset DMA Flag bit in IntFlags
send wake up to main task
}
Interrupts are complicated and depend on the facilities provided by the
host operating system. The above pseudocode only hints at the system
details.
This scheme frees the processor for other work while DMA is being
completed. Since the overhead of handling an interrupt is often quite
high for the host processor, the scheme should be tuned to allow a
period of polling before sleeping on the interrupt.
3.2.5
Output FIFO and Graphics Processor FIFO Interface
To read data back from P
ERMEDIA
an output FIFO is provided. Each entry
in this FIFO is 32-bits wide and it can hold tag or data values. Thus its
format is unlike the input FIFO whose entries are always tag/data pairs
(we can think of each entry in the input FIFO as being 41 bits wide – 9
bits for the tag and 32 bits for the data). The type of data written by
ERMEDIA
P
to the output FIFO is controlled by the FilterMode register. This
register allows filtering of output data in various categories including the
following:
•Depth: output in this category results from an image upload of the
Depth buffer.
•Stencil: output in this category results from an image upload of the
Stencil buffer.
•Color: output in this category results from an image upload of the
framebuffer.
•Synchronization: synchronization data is sent in response to a Sync
command.
The data for the FilterMode register consists of 2 bits per category. If the
least significant of these two bits is set (0x1) then output of the register
tag for that category is enabled; if the most significant bit is set (0x2)
then output of the data for that category is enabled. Both tag and data
output can be enabled at the same time. In this case the tag is written
first to the FIFO followed by the data. The FilterMode register is
described in more detail in section §5.15.
For example, to perform an image upload from the framebuffer, the
FilterMode register should have data output enabled for the Color
category. Then, the rectangular area to be uploaded should be
described to the Rasterizer. Each pixel that is read from the framebuffer
will then be placed into the output FIFO. If the output FIFO becomes full,
18
TVP4020 Programmers Reference ManualProgramming Model
then P
ERMEDIA
will block internally until space becomes available. It is
the programmer’s responsibility to read all data from the output FIFO.
For example, it is important to know how many pixels should result from
an image upload and to read exactly this many from the FIFO.
To read data from the output FIFO the OutputFIFOWords register should
first be read to determine the number of entries in the FIFO (reading
from the FIFO when it is empty returns undefined data). Then this many
32-bit data items are read from the FIFO. This procedure is repeated
until all the expected data or tag items have been read. The address of
the output FIFO is described below.
NB all expected data must be read back. P
ERMEDIA
will block if the output
FIFO becomes full. Programmers must be careful to avoid the deadlock
condition that will result if the host is waiting for space to become free in
the input FIFO while P
ERMEDIA
is waiting for the host to read data from
the output FIFO.
Graphics Processor FIFO Interface
ERMEDIA
P
has a sequence of 1K x 32 bit addresses in the PCI Region 0
address map called the Graphics Processor FIFO Interface. To read
from the output FIFO any address in this range can be read (normally a
program will choose the first address and use this as the address for the
output FIFO). All 32-bit addresses in this region perform the same
function – the range of addresses is provided for data transfer schemes
which force the use of incrementing addresses.
Writing to a location in this address range provides raw access to the
input FIFO. Again, the first address is normally chosen. Thus the same
address can be used for both input and output FIFOs. Reading gives
access to the output FIFO; writing gives access to the input FIFO.
Writing to the input FIFO by this method is different from writing to the
memory mapped register file. Since the register file has a unique
address for each register, writing to this unique address allows P
ERMEDIA
to determine the register for which the write is intended. This allows a
tag/data pair to be constructed and inserted into the input FIFO. When
writing to the raw FIFO address an address tag description must first be
written followed by the associated data. In fact, the format of the tag
descriptions and the data that follows is identical to that described above
for DMA buffers. Instead of using the P
transfer data to P
ERMEDIA
by constructing a DMA-style buffer of data and
ERMEDIA
DMA it is possible to
then copying each item in this buffer to the raw input FIFO address.
Based on the tag descriptions and data written P
ERMEDIA
constructs
tag/data pairs to enter as real FIFO entries. The DMA mechanism can
be thought of as an automatic way of writing to the raw input FIFO
address.
Note, that when writing to the raw FIFO address the FIFO full condition
must still be checked by reading the InFIFOSpace register. However,
writing tag descriptions does not cause any entries to be entered into the
FIFO – such a write simply establishes a set of tags to be paired with the
subsequent data. Thus, free space need be ensured only for actual data
items that are written (not the tag values). For example, in the simplest
case where each tag is followed by a single data item, assuming that the
FIFO is empty, then 32 writes are possible before checking again for
free space.
3.3
3.4
See the
TVP4020 Hardware Reference Manual
for more details of the
Graphics Processor FIFO Interface address range.
Interrupts
All interrupts can be individually enabled and disabled. Refer to the
TVP4020 Hardware Reference Manual
Synchronization
for more details.
There are two main cases where the host must synchronize with
ERMEDIA
P
•before reading back from P
•before directly accessing the memory via the bypass mechanism
Also the host must synchronize with P
:
ERMEDIA
registers
ERMEDIA for
framebuffer
management tasks such as double buffering, though this may be better
handled using the SuspendUntilFrameBlank command. Synchronizing
with P
ERMEDIA
implies waiting for any pending DMA to complete and
waiting for the chip to complete any processing currently being
performed. The following pseudo-code shows the general scheme:
20
TVP4020 Programmers Reference ManualProgramming Model
PERMEDIADatadata;
// wait for DMA to complete
while (*DMACount != 0) {
poll or wait for interrupt
}
while (*InFIFOSpace < 2) {
;// wait for free space in the FIFO
}
// enable sync output and send the Sync command
data.Word = 0;
data.FilterMode.Synchronization = 0x1;
FilterMode(data.Word);
Sync(0x0);
/* wait for the sync output data */
do {
while (*OutFIFOWords == 0)
;// poll waiting for data in output FIFO
} while (*OutputFIFO != Sync_tag);
3.5
Initially, we wait for DMA to complete as normal. We then have to wait
for space to become free in the FIFO (since the DMA controller actually
loads the FIFO). We need space for 2 registers: one to enable
generation of an output sync value, and the Sync command itself. The
enable flag can be set at initialization time. The output value will be
generated only when a Sync command has actually been sent, and
ERMEDIA
P
has then completed all processing.
Rather than polling, it is possible to use a Sync interrupt as mentioned in
the previous section. As well as enabling the interrupt and setting the
filter mode, the data sent in the Sync command must have the most
significant bit set in order to generate the interrupt. The interrupt is
generated when the tag or data reaches the output end of the Host Out
FIFO. Use of the Sync interrupt has to be considered carefully as
ERMEDIA
P
will generally empty the FIFO more quickly than it takes to set-
up and handle the interrupt.
Host Memory Bypass
Normally, the host will access memory indirectly via commands sent to
the P
ERMEDIA
FIFO interface. However, P
ERMEDIA
does provide the whole
memory as part of its address space so that it can be memory mapped
by an application. Access to the memory via this route is independent of
ERMEDIA
the P
FIFO.
21
Loading...
+ 293 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.