The PNX8550 is a highly integrated media processor intended for deployment in
Analog, Digital and Hybrid Television receivers. The PNX8550 is targeted at the mid
to high-end TV sets, focusing on dual program analog/digital picture improved
Standard Definition capability and single program High Definition decode and display
capability. The PNX8550 can be used for 100 Hz interlaced as well as 60 Hz
progressive screens. It is fully capable of performing advanced video improvement
algorithms, such as DRC™ or Digital Natural Motion™, on Standard Definition analog
or digital sources. It includes an HD capable de-interlacer for converting interlaced
HD transmission signals to progressive output for driving wide-XGA class Plasma or
PCI2.2
Flash IDE
Page 17
Philips Semiconductors
LCD displays. The PNX8550 includes DVD content scramble system (CSS) to
support he DVD player function. The PNX8550 also supports VCD, S-VCD and
CD-Audio players.
The PNX8550 is responsible for the video improvement processing on analog
sources, and for all source decode functions and video improvement processing on
digital sources. It includes integrated dual program conditional access, dual program
MPEG2 transport stream de-mux, dual SD or single HD MPEG2 video decode, audio
decode and processing, graphics generation, video processing, and image
composition and display. Two 32-bit 240 MHz VLIW media processors, referred to as
the TriMedia TM3260 CPU core, carry out the advanced video improvement
processing as well as all audio operations. Fixed function hardware performs stable
core video functions, such as picture level MPEG2 decoding, scaling, image
composition and pixel post processing.
The PNX8550 includes an integrated Remote Control receiver, I2C, USB host and
UART peripherals. In addition, a software controlled General Purpose IO block
provides for connectivity to switches/indicators and arbitrary serial devices, and
includes capability to connect to MemoryStick™ or MultiMediaCard™ Flash. The
PNX8550 also provides an industry standard PCI 2.2, 32-bit wide, 33 MHz system
bus connection allowing a wide variety of low-cost PC peripherals to be gluelessly
attached. The system bus includes support for direct connection to 8 and 16-bit wide
system ROM or Flash memories and 8 and 16-bit simple M68k type slave devices.
Internal IDE control logic provides the capability for a medium performance
(5MB/sec) IDE interface using only two low-cost external TTL packages.
PNX8550
Chapter 1: Functional specification
An embedded MIPS32 processor (PR4450) running at 250 MHz is available to run
the OS. The PR4450 processor is primarily responsible for running the demand
paged graphics-intensive operating system, while the TM3260 media processors are
responsible for running all real-time media functions. All hardware resources inside
the PNX8550 are accessible by both the MIPS processor and the TM3260 CPUs. A
‘sandbox’ style system protection provision ensures that selected MIPS memory
regions and critical peripherals can not be corrupted or inspected.
The PNX8550 provides a primary digital (YUV or RGB) output to connect to the
display specific output processor. In addition, a secondary analog video output
(CVBS or S-Video) for a VCR is available. It can operate either in analog PAL/NTSC
or digital mode.
The PNX8550 optionally allows passing a selected video stream through an external
accelerator, and bringing it back in to the primary video output stream to be merged
with graphics or other video planes. This connection provides for up to 81 Mpix/sec
pixel rate, with up to 10-bit/component accuracy.
The PNX8550 is manufactured using an advanced 0.12u CMOS process.
sharpening (LTI, CDS, HDP), Digital Color Transient Improvement, color features
(green enhancement, skin tone correction, blue stretch) on the video
• Running an OS, such as Linux™ or VxWorks™, creating 1 or more graphics
surfaces
• Blending up to 5 video or graphics images for output towards the primary display
(CRT, LCD or Plasma)
• Blending 1 video and 1 graphics image for output towards the VCR over an
analog S-Video or CVBS output (VCR audio requires external low-cost stereo
DAC)
• Outputting multi-channel audio across S/PDIF (Sony/Philips Digital Interface) for
decoding in a receiver
• Decoding of image files or audio content from a MemoryStick™ or
MultiMediaCard™
• Transmitting 1 audio/video program or a transport stream across an external
1394 interface
• Transmission of 1 or more programs, subject to compute power and network
bandwidth over a home network
• Decoding/execution of Remote Control commands
• Connecting to an optional serial modem or USB modem
• Booting from Nor or Nand Flash
Since the PNX8550 is a highly flexible, highly programmable system that performs
the majority of video processing in software, a wide variety of other applications are
possible. Any application that fits the constraints of the external interfaces, the media
processing power and the available memory bandwidth can in principle be
accommodated.
Two PNX8550 can be used to create a dual HD PIP or side-by-side HD hybrid TV set.
• All video/audio timing derived from a single low-cost external crystal (no VCXOs
required)
4.Compatibility with the PNX8525
With some exceptions listed below, the PNX8550 is a functional superset of the
PNX8525 and, at the system level, is backwards compatible with the PNX8525.
Applications for the PNX8525 can be ported to the PNX8550.
The list below identifies the main functional backwards compatibility issues between
the two products:
• The PNX8550 has no on-chip 1394 or SSI interface.
• The PNX8550 has two MPEG system processors, versus three on the PNX8525.
• The PNX8550 does not have the third in or out serial audio port of the PNX8525.
• The PNX8550 PR4450 CPU requires recompiling due to Instruction Set
architecture changes from the MIPSII-compliant PR3940 on the PNX8525 to the
new MIPS32 architecture-compliant PR4450 on the PNX8550. In addition, the
TLB has changed to become MIPS32-compliant, requiring source code changes
to memory management routines.
• The PNX8550 TM3260 CPUs are backwards binary compatible with the TM3218
CPU on the PNX8525, but recompiling allows higher performance due to
additional computational units
• MMIO addresses, and MMIO register layout are not identical - this is typically
hidden behind APIs
• The PNX8550 uses a different package, with similar (but not identical) pin
placement e.g., the PNX8525 used SDRAM while the PNX8550 uses DDR
memories.
• Everywhere else, the PNX8550 capabilities equal or exceed those of the
PNX8525
5.Analog/Digital Standard Definition Video Improvement Capabilities
5.1Temporal-Spatial Improvement Processing
The media processors together with the advanced de-interlacer in Memory Based
Scaler block of the PNX8550, can provide sophisticated temporal-spatial video
improvement processing on either external analog sources or internally decoded
MPEG2 sources. There is sufficient media processor capability and system memory
bandwidth to perform Sony’s Digital Reality Creation™ or Philips Consumer
Electronics’ Digital Natural Motion™’ on a Standard Definition signal, decode 2
MPEG video sources and associated audio, drive a wide-XGA class progressive
screen and do discretionary additional processing.
In a dual tuner system, temporal-spatial video improvement processing can be
performed on the main (large) image, or on two half-resolution images, as shown in
Figure 3
Main Image
Program 1
PIP
Program 2
.
not improved
Program 1
improved, but at
half-resolution
Program 2
improved
Figure 3:TV Modes and Video Improvement Processing
The discretionary processing margin allows future features above and beyond those
of hardware implementations of these video improvement algorithms. For example,
an intelligent detection of regions of the screen with different type of video content
can be applied, performing different types of improvement processing depending on
the content.
The external video improvement processor interface optionally allows running one
temporal-spatial video improved video stream through a proprietary external video
improvement device, and back directly into the PNX8550 primary video output.
5.2Temporal Noise Reduction
Temporal Noise Reduction is typically applied only to analog sources, but can be
applied to any video stream in the system. The PNX8550 QTNR processor block
performs temporal noise reduction by reading two video fields from memory and
writing a filtered output image to memory.
6.HD Decode and Display Capabilities
One PNX8550 supports single stream HD decoding and HD display. It supports
screens of 1920x1080i or ‘wide XGA’ style progressive screens.
In HD-HD mode, all video improvement processing is available, but non motioncompensated de-interlacing is used instead of temporal-spatial processing.
PNX8550
Chapter 1: Functional specification
Table 1
below shows the key video algorithms involved in converting the ATSC
transmission to the selected display type. Display resolutions smaller than those
indicated in the table are also supported. Progressive screens larger than 1280x720
are not supported.
Table 1: HD - HD Algorithms of the PNX8550
TransmissionDisplayAlgorithm Used
1920 x 1080i960H * 1080V 60 Hz IH downscaling
1920 x 1080i1920H * 1080V 60 Hz In/a
1920 x 1080i1280H x 720V 60 Hz P
(mode 1)
1920 x 1080i1280H x 720V 60 Hz P
(mode 2, highest quality)
1280 x 720 60 Hz P 1280H x 720V 60 Hz Pn/a
1280 x 720 60 Hz P 960H x 1080V 60 Hz IEach 1280x720 frame gets scaled to a
1280 x 720 60 Hz P 1920H x 1080V 60 Hz IEach 1280x720 frame gets scaled to a
Each 1920x540 field is scaled to a 1280 x
720 frame, losing some vertical resolution.
Film detector active.
• In film mode, the telecine is undone to
create the original progressive film
images.
• In video mode, median filtering plus
slanted edge detection/reconstruction
are applied to each field pair to construct
a full resolution 1920 x 1080 frame and
then scale it back to 1280 x 720P.
960x540 field at vertically correct position.
1920x540 field at vertically correct position.
All applications use high-definition video display in order to show the maximum
performance capability of the PNX8550. The standard definition video processing
includes the software based digital natural motion application. The PNX8550
converts the standard definition video to the high-definition video using the
combination of natural motion software and the memory based scaler hardware. The
software natural motion requires an external device to perform the back-end of the
natural motion application. The following quality tradeoffs may be made in order to
manage the memory bandwidth requirements for more than one video processing
using the 225MHz DDR-SDRAM memory.
Using a higher speed grade DDR creates additional discretionary bandwidth for more
discretionary features.
PNX8550
Chapter 1: Functional specification
• Using a lower number of bits per pixel for the video and graphics layers
• Using a lower number of pixels per line for graphics layers
• Using YUV420 instead of YUV422 for the memory based scaler processing
• Generating PIP picture from SD VCR output if possible
• Generating PIP and VCR output from same HD video stream adds more memory
bandwidth when compared with generating VCR output from HD and PIP from
VCR output.
• Reduced natural motion quality or no natural motion if one of the video input is
high-definition bit stream or video
6.1Dual HD Decode/Display Using Two PNX8550s
32 MB
225 MHz DDR
program 1 (TS)
program 2 (TS)
Figure 4:Dual HD Decode/Display Application
DV1
DV3
decode program1 HD
scale to 0.5 HD
all audio processing
DV_OUT1
(75 MHz 8 bit)
Figure 4 shows two PNX8550-performing dual-HD decode/displays. The first
PNX8550 decodes program 1, scales it to one-half horizontal resolution and outputs
the images across the primary output in 10-bit 656 mode. This program 1 image will
be used as the PIP image or side-by-side half resolution image. The second
PNX8550 decodes program 2 at HD resolution for use as the main image in case of
PIP, or the half horizontal image in case of a side-by-side display.
The second PNX8550 scales program 1 and 2 as needed and runs the OS. Locally
drawn graphics and both programs are composited as needed for the primary display.
Either program 1 or 2, with optional overlay graphics is sent to the VCR. The first
PNX8550 also performs all audio and discretionary media processing, since it has the
higher bandwidth margin.
This same configuration can also be used as a superior quality dual SD TV. In that
case, the first PNX8550 performs temporal/spatial improvement processing on
program 1 (656 or MPEG source), and forwards it in 2, 3 or 4x pixel rate to the
second PNX8550. The second PNX8550 can perform additional SD to HD 2D local
improvement processing on program1, while decoding or receiving PIP program2
and compositing all results.
The functionality achieved within the PNX8550 can be divided into four major
categories: control, decode, processing, and display. The audio, video and graphics
processing are controlled by the PR4450 and the TM3260-based control software.
Decode functions take input data streams and convert those streams into memory
based structures that the PNX8550 may further process. Decode functions may be
simple, as in the case of storing 656 input video into memory, or substantially more
complex, as in the case of MPEG2.
Processing functions are those that modify an existing data structure and prepare
that structure for display functions.
Display functions take the processed data structures from memory and generate the
appropriate output stream. As in the case of the decode functions, display functions
can be relatively simple, such as an I2S audio output or very complex, as in the case
of multi-surface composited displays.
PNX8550
Chapter 1: Functional specification
All decoded data structures are stored in memory, even when further processing is
not required. This mechanism implies that there is no direct path between input and
output data streams. The memory serves as the buffer to de-couple input and output
data streams. Based on the mode of operation, there may be multiple data structures
in memory for a given input stream. The PNX8550 uses the TM3260 CPUs and a
timestamping mechanism to determine when a specific memory data structure is to
be displayed.
The PNX8550 implements the required decode, processing, and display functions
with a combination of fixed function hardware and TM3260 CPU software modules.
The PR4450 MIPS processor is not intended to be involved with the three primary
function types other than to control them. The PNX8550 provides a good balance
between those functions that are implemented in fixed hardware and those that are
programmed to run on a TM3260 CPU.
Table 2
illustrates how the major tasks are implemented under each of the main
functional areas, and how they map to hardware resources or software.
Table 2: Partitioning of Functions to Resources
FunctionResourceDescription
Video Decoding/Acquisition
Digital video acquisitionVIPIncludes optional h-scaling or color space conversion, and
conversion to a variety of memory pixel formats.
Conditional access, PID filtering, section
filtering, transport stream demux
En/decryption for copy protectionMSP
MPEG2 HD or 2SD video decodingSoftware +
MPEG2 audio decodingSoftware
MSPMSP output streams separate video and audio elementary
streams into memory.
DES (triple and single, each in EBC or CBC)
EDMA
VMPG
AES en/decryption
Media processor software performs parsing of the video
elementary stream up to slice level. VMPG hardware block
performs decoding below slice level.
DVD authentication and de-scramblingDVD-CSSAuthentication and de-scrambling in hardware
Program stream demuxSoftware
Single program transport stream demux Software or
MSP
HD JPEG decodeSoftware
MPEG4 video decodingSoftware
Audio Decoding and Improvement Processing
Audio decoding AC3, AAC, MPEG L1,
L2, MP3, others
Audio processingSoftwareImprovement processing and mixing
Graphics
2D graphics rendering and DMA2D DE
Video Improvement Processing
Non-motion compensated de-interlacing MBSMedian, 2-field majority select, 3-field majority select with or
Motion compensated de-interlacingSoftware +
Motion estimationSoftwarePixel accurate and quarter pixel accurate versions available
Temporal up conversion
(Natural Motion)
Luminance histogram measurement,
other key video measurements
Temporal noise reduction QTNRQTNR can perform temporal noise reduction on one or more
Image scalingVIP, MBS
Video format conversions, including
color space conversion
Histogram correction, black stretch,
luminance sharpening (LTI, CDS, HDP),
Digital Color Transient Improvement,
color features (green enhancement, skin
tone correction, blue stretch)
Proprietary video improvement
processing
SoftwareDecoders for almost any audio format available
MBS
SoftwareCreates images temporally between two originals using motion
QTNR or MBSQTNR can do any video measurement during temporal noise
QVCP
MBS, VIP,
QVCP
QVCPPerformed during output to display
An external
chip connected
to the tunnel
Interface
…Continued
Software if no conditional access or section filtering, else MSP
without EDDI postpass for edge improvement
Software provides the MBS with a motion compensated field, to
which the MBS applies the chosen de-interlacing algorithm.
vectors.
reduction on analog sources, or by reading an image from
memory without producing an output. MBS can performs video
measurement during a de-interlace or scaling pass.
streams.
VIP can perform horizontal downscaling during acquisition.
MBS and MBS2 can perform up-and downscaling horizontal and
vertical in a single pass, optionally combined with format
conversions. The MBS can also perform de-interlacing.
QVCP can perform panoramic horizontal scaling during output.
MBS can convert any pixel format to any other format. VIP can
generate multiple video formats, QVCP can read multiple video
formats.
External video enhancement chip serves as a coprocessor to
the PNX8850 and reads the required data from the PNX8850
memory and writes the enhanced video pixel data to the
PNX8850 memory. The video pixel data returned from the
external chip can be either 8-bit or10-bit semi planar YUV422.
The PNX8850 CPUs will have access to the external chip
control and status registers through the tunnel interface.
Input Color Look-Up2x CLUTA CLUT may be switched into the data path for each of the 5
Surface composition with alpha
blending, chroma (range) keying
Video and graphics scalingQVCPHi-quality panoramic horizontal scaler for video, linear
Final gamma correction
contrast, brightness, white point control
Discretionary Processing
MPEG4 video encodingSoftware
MPEG4 Simple or Advanced Simple
Profile decoding
H.26L video decodingSoftware
MPEG2 video encodingSoftware1/2 D1 and other versions available
TransratingVLD +
DV decodingSoftwareFull quality decoder available
TranscodingVMPG +
Video Conferencing....A large variety of applications is available.
QVCP
QVCPFinal gamma correction after compositing
Software
Software
Software
…Continued
QVCP layers.
interpolator for graphics
The VLD hardware can be used to parse a MPEG2 video
stream. Software composes a new MPEG2 stream including the
video stream with a reduced bit rate.
Transcoding from MPEG2 uses VMPG for decoding.
Transcoding from other standards use a software decoder.
Software performs the encode.
9.Integrated Processors
9.1PR4450 General Purpose Processor
The PR4450 is a MIPS32 compatible general purpose CPU. It is intended for running
the demand-paged, graphics intensive operating system and user-interface, whereas
the TM3260 media processors are intended to run all real-time audio and video tasks.
The 250 MHz PR4450 processor implements a very low cost, low power and high
performance 32-bit processor ideal to be used in information appliances running
embedded operating systems such as VxWorks, or Linux.
PR4450 supports both the MIPS32 and MIPS16 instruction set architecture as
defined by MIPS Technologies Inc. MIPS16 encodes the instructions in 16-bits,
enabling a substantial reduction in the memory foot print requirements and thereby
reducing the overall system cost. The PR4450 requires re-compilation of PNX8525
source code, due to the ISA changes going from MIPS-II to MIPS32.
The PR4450 delivers 1.22 MIPS/MHz (Dhrystone 2.1) for a total of 325 MIPS. It has
an estimated power consumption of less than 1mW/Mhz which makes it an ideal core
for use in next generation information appliance system-on-silicon architectures.
Pipeline32 entry 32-bit register file for MIPS32, 11 of which are available for MIPS16 instructions
Branch PredictionSynchronous 8-stage pipeline with full hardware interlock support for data dependencies
MultiplicationDynamic branch prediction with a 4096-entry branch history table and a 256-entry branch address
MMUMAD (Multiply-Accumulate-Divide) unit with 64-bit accumulator, executing 32-bit multiply
Timers4 GB address space support
Caches
breakpoints for instruction and data addresses and/or data value
leading zeros and ones and multiply-subtract instructions to the existing MIPS-II ISA.
cache for optimal performance
instructions with 1 cycle repetition rate
Separate user and kernel modes and TLB ASIDs (Address Space Identifiers) provide full memory
protection support.
Three 32-bit timers are provided in coprocessor zero. These timers can also be used as event
counters for performance analysis. One of the counters can also operate as a watchdog timer.
9.2Dual TM3260 VLIW Media Processors
The two TM3260 CPUs in the PNX8550 are a version of the TriMedia 32-bit VLIW
media processor. Each processor is a 240 MHz, 5 instructions per clock cycle, Very
Long Instruction Word (VLIW) processor, with an extensive set of multimedia
instructions. It implements a superset of the TriMedia TM1300 instruction set, and
has a superset of the TM1300 functional units. It is fully binary backwards compatible
with the TM32 CPU on the PNX8525, but has a larger Instruction Cache for improved
performance. In addition, recompiling of source code results in higher media
performance due to several additional functional units.
The TM3260 supports 32-bit integer and IEEE-compatible 32-bit floating point data
formats. It also provides a Single Instruction Multiple Data (SIMD) style operation set
for operating on dual 16-bit or quad 8-bit packed data. It has a peak floating point
compute capacity of 1.1 G operations/s, and has 880 M multiply-add/s capability on
16-bit data. Its dual access 16 kB 8-way set-associative data cache provides a CPU
local data bandwidth of 1.8 GB/s. Its 64 kB 8-way set-associative instruction cache
provides 224 bits of instructions every clock cycle, for an instruction rate of 6.2 GB/s.
In the PNX8550 HD operating mode, the TM3260s are responsible for dual audio
decode, control of the HL MPEG2 decoder and running a film-detector. Significant
processor capacity is leftover, and available for discretionary functions, such as a
software MPEG2 or MPEG4 decode, etc. subject only to memory bandwidth and
CPU cycle availability. In SD operating mode, the two TM3260s are responsible for all
temporal-spatial video improvement processing.
The TM3260s have sufficient compute performance to deal with a variety of future
operating modes. The processor by itself can decode most compressed video
streams and associated audio at full frame rate, such as decoding a DV camcorder
image stream arriving over 1394. One processor is also capable of doing all audio
and video compression, decompression and processing necessary for bi-directional
video conferencing.
The TM3260s are responsible for all media processing and real-time processing
functions within the PNX8550. They run a small real-time operating system (pSos),
which allows them to respond efficiently and predictably to real-time events. In some
cases, a TM3260 will handle a media processing function in conjunction with fixed
hardware, such as the HL MPEG2 decoder. Each TM3260 executes code from the
unified system DRAM memory.
PNX8550
Chapter 1: Functional specification
The TM3260 is capable of operating in little or big-endian mode. The mode is chosen
at compilation time. The compiled binary program startup code sets the mode shortly
after CPU startup by setting the endian bit in the Program Control Status Word
(PCSW). Note that the system architecture of the PNX8550 requires that the MIPS
and TriMedia cores all operate in the same endian mode after startup.
Debug of software running on a TM3260 is performed using an interactive source
debugger on a PC. The PC talks to the TM3260 through the PNX8550 EJTAG pins,
using the system level EJTAG2.0 controller. Two TM_DBG JTAG modules on the
JTAG/boundary scan port provide an improved version of the TM1300 JTAG debug
port for legacy debug applications.
9.2.1Prefetch
The TM3260 is capability to prefetch the required data and instructions in hardware.
The hardware prefetch works in the background. This feature can be turned off by
resetting the prefetch bits in the global registers.
Table 4: TM3260 VLIW CPU Feature Summary
FeatureDescription
ISATM1300 ISA extended with 22 new instructions, with 32-bit RISC style load/store/compute instruction
set and an extensive set of 8, 16-bit SIMD multimedia instructions.
Instruction issue5 RISC or SIMD instructions every clock cycle
Data typesBoolean, 8/16/32 bit signed and unsigned integer, 32 bit IEEE floats
Table 4: TM3260 VLIW CPU Feature Summary …Continued
FeatureDescription
MMUNone, virtual = physical, full 4 GB space supported
ProtectionBase, limit style protection, where CPU can be set to only use part of system DRAM, and hardware
ensures no references take place outside this range.
MultipliersUp to 2 32x32 bit integer multiplies per clock
Up to 2 32 bit IEEE floating point multiplies per clock
Up to 4 16x16 multiply-adds per clock
Up to 8 8x8 multiplies per clock
DebugJTAG based software debugger, including hardware breakpoints for instruction and data addresses
Register file128 entry 32-bit register file
Interrupts64 auto-vectoring interrupts, with 8 programmable priority levels
TimersFour 32-bit timers/counters are provided. A wide selection of sources allows them to be used for
performance analysis, real-time interrupt generation and/or system event counting
System interfaceThe TM3260 runs fully asynchronous from system DRAM, and can operate at a frequency lower than
system DRAM to save power, or higher than system DRAM to gain performance
Software
development
environment
Application software
architecture
The TM3260 is supported by the C/C++ compiler tools available for the TM1300 family
Applications use the TSSA, TriMedia Streaming Software Architecture, allowing modular
development of audio, video processing functions.
10. Digital Video/Transport Stream Inputs
10.1Backwards Compatibility
The PNX8550 digital video/transport stream router and associated PNX8550 input
and output pins are a superset of the PNX8525 (Viper1). Any board/application
designed for PNX8525 can be ported to the PNX8550, except:
• Applications using the PNX8525 on-chip 1394 will require an external 1394 link
IC.
• Applications that require simultaneous use of 3 MSP Conditional Access Units on
input streams are not supported (each of the two MSPs can do 1 input stream + 1
memory stream).
• Applications that create a partial transport stream inside a MSP and route this to
TSOUT are not supported (established practice in the PNX8525 is to first go to
memory and create the partial transport stream in software - this is still supported
in the PNX8550).
Figure 6:Transport Stream Network and VIP Input Routing Block Diagram
11.1MPEG System Processor (MSP)
Each MSP block performs PID filtering, packet arrival time-stamping, de-scrambling,
de-multiplexing and section-filtering on up to two transport streams simultaneous,
one from its streaming input (see Figure 6
from system memory. The MSP outputs de-multiplexed Elementary Stream data to
system memory.
The streaming input typically carries a multi-program transport stream coming from
an external channel decoder through the DV1, DV2 or DV3 pins. The memory
transport stream capability is present to deal with encrypted transport streams
originating from elsewhere in the system, such as coming in over a TCP/IP
connection, or from a time shift hard-disk attached to a PCI-IDE controller or to the
network.
Internal de-scramblers are provided for both selected transport streams. Transport
streams may be passed through external POD (Point of deployment) or CI (Common
Interface) conditional access modules before being delivered to PNX8550. This
requires an external POD/CI interface chip, such as a CIMaX or similar.
The MSP blocks contains DVB, Multi2 (M2) and DES de-scramblers. These functions
are mutually exclusive - each MSP can only perform one de-scrambling type at a
time, but the two MSPs are independent and may simultaneously perform different
de-scramblers.
The MSPs each provide hardware support for:
PNX8550
Chapter 1: Functional specification
• DVB or DSS packet framing
• 64 PID filter
• 64 section filters each of 16 bytes (12 bytes if section filter information is split
across packets)
• Smaller width section filters are implemented by bit masking
• 4 PTS/DTS range filters
• De-multiplexing of Section or PES data to 96 output queues in the main unified
memory
The MSPs on the PNX8550 are a functional superset of the MSPs on the PNX8525,
providing dual-stream (1 live, 1 from memory) processing capability. They are not
backwards compatible.
Additional changes:
• Output queues increased from 48 to 96
• Programmable queue lock feature for the Memory Queue Manager
• Two small queue sizes have been changed to 2 MB and 4 MB to accommodate
HD program processing.
Note that de-multiplexing of a clear text Single Program Transport Stream is best
done in software on the TM3260s, and doesn’t require use of an MSP.
11.2 DVD Decoding
The DVD-CSS block is provided to allow integrated DVD playback capability. It
provides authentication and de-scrambling for DVDs.
A DVD drive is attached to the integrated medium-bandwidth IDE controller, and
provides its data either across the IDE interface, an external PCI-IDE bridge chip or
across a multi-bit serial interface to GPIO. The resulting system memory scrambled
program stream is de-scrambled by invoking a memory-to-memory operation on the
DVD-CSS block. The clear text program stream is then de-multiplexed by software on
the TM3260s.
11.3Software Processing of MPEG2 Streams
The TM3260s process the audio, video, and other Elementary Stream data types
output by an MSP or by a software program or transport stream demux.
Video Elementary Streams are parsed by software on a TM3260, which then sets up
the VMPG MPEG-2 decoder to decode at field/frame-level or for a given number of
macroblocks.
Audio Elementary Streams are parsed and decoded by a TM3260.
Other Elementary Stream types are processed by the TM3260 or MIPS CPU
depending on the real-time nature of the content and the application.
PNX8550
Chapter 1: Functional specification
Software on a TM3260 also performs clock recovery based on the transport stream
packet arrival timestamps.
11.4VMPG - MPEG2 Decoder and VLD2
The PNX8550 contains a hardware high-level MPEG2 video decoder, VMPG. VMPG
parses and decodes a given number of macroblocks or entire field/frame. One HD
MPEG2 stream (MP@HL) in any of the 18 ATSC formats can be decoded.
Alternately, two SD MPEG2 streams (MP@ML) may be processed simultaneously.
The amount of MPEG2 processing is limited only by allocated system memory
bandwidth or by macro block processing time, which depends on the MPEG2
decoder clock rate.
VMPG provides a VLD only mode. In this mode, it outputs Run Length Pairs and
Motion Vectors of a slice to system memory, but the slice is not decoded to pixels.
This mode can be used for software processing on MPEG2 tokens, such as
transrating to a lower bit rate MPEG2 stream, or partial decoding and transcoding to
other video formats. In most cases, the VLD2 block on the PNX8550 can be used
instead, which provides the same capability without using up valuable VMPG clock
cycles.
The PNX8550 VMPG is compatible with the VMPG on the PNX8525 including the 0.5
horizontal resolution decode mode.
Second VLD Block (VLD2)
The VLD2 block performs variable length decoding by fetching the bitstream from the
external memory and outputs a stream of decoded macroblock headers and a stream
of run-level pairs to the external memory. The VLD2 block supports processing of
multiple streams. The multiple processing is controlled by the software.
The on-chip hardware image processing blocks all use the same ‘native’ pixel
formats, as shown in Tab l e 5
can be read by another block.
• A limited number of native pixel formats are supported by all image subsystems,
as appropriate.
• The Memory Based Scaler supports conversion from arbitrary pixel formats to
any native format, during the anti-flicker filtering operation (this operation is
usually required on graphics images anyway, hence no extra passes are
introduced).
• Hardware subsystems support all native pixel formats in both little-endian and
big-endian system operation.
PNX8550
Chapter 1: Functional specification
. This ensures that image data produced by one block
• Software always sees the same component layout for a native pixel format unit,
whether it is running in little-endian or big-endian mode i.e., for a given native
format R, G, B (or Y,U,V) and alpha are always in the same place.
• Software (on the TM3260) can be written endian-mode independent, even when
doing SIMD style vectorized computations
The native formats of the PNX8550 include the most common indexed, packed RGB,
packed YUV and planar YUV formats used by Microsoft DirectX and Apple
Quicktime, with 100% bit layout compatibility in little and big-endian modes of
operation, respectively. This allows for easy porting of mainstream PC and Apple
software applications that create graphics or images.
TM3260 software image processing stages and encoders/decoders typically use
semi-planar or planar 4:2:0 or 4:2:2 formats as input and output.
(1) The VIP RGB output is mutually exclusive with horizontal scaling.
(2) Shown are the 2D engine frame buffer formats where drawing, RasterOps and alpha-blending of surfaces can be
accelerated. Additionally, the 2D Drawing Engine host port supports 1 bpp monochrome font/pattern data, and 4 and 8-bit
alpha only data for host-initiated anti-aliased drawings.
(3) VPK = Vertical Peaking
units contain two horizontally
adjacent pixels, no alpha
component
one with U and Vs
component
one with U and Vs
Two arrays, one with all Ys,
one with U and Vs
3Ys are packed in 4Bytes
3UV pair is packed in 8Bytes
6Ys, 3Us and 3Vs are
packed in 16bytes
Out QTNR
xxxxx
xxx xxx/x
xxx xx
xxx
xxx xx
xxxx
MBS, MBS2
In Out
xx
2D Draw
Engine (2)
QVCP InVPK (3)
In/Out
x
QVCP5L only
x (out)
x
QVCP5L only
x (out)
12.2Video Input Processor (VIP)
The Video Input Processors (VIP) handle incoming digital video and processes it for
use by other components of the PNX8550. The VIP provides the following functions:
• Receives 10-bit YUV4:2:2 digital video data from the selected DVx video port.
The data is dithered down to in-memory 8-bit data format. The YUV4:2:2 data
stream typically comes from devices such as the SAA 711X, which digitize PAL or
NTSC analog video.
• Performs horizontal downscaling to any resolution or upscaling by 2x.
• The upscaling feature is not available in HD video capture mode.
• Store video data inside the video acquisition window in system memory in any of
the native pixel formats indicated in
convert the 10-bit input to the selected 8-bit format.
Ta bl e 5. Perform error feedback rounding to
• Provides an internal Test Pattern Generator with NTSC, PAL, and variable format
The PNX8550 VIPs are backwards compatible with the PNX8525 VIP, but provide
10-bit accurate input processing.
HD Video
PNX8550
Chapter 1: Functional specification
• Acquires VBI data using a separate acquisition window from the video acquisition
window.
• ANC header decoding or window mode for VBI data extraction
• Interrupt generation for VBI or video written to memory
• Pixel frequency up to 40.5MHz, 81MHz input clock (SD VIP)
• Pixel frequency up to 81 MHz, 81MHz input clock (HD VIP using 20-bit (Y,UV)
input mode)
• Color space conversion (mutual exclusive with horizontal scaling)
• Raw data mode capture of 8- or 10-bit data
The HD video resolution stream is captured using the VIP1 block. The maximum pixel
frequency is 81MHz. Both 16-bit and 20-bit YUV 422 formats are supported. Both
embedded sync and explicit sync (HREF, VEF and FREF) are supported in the HD
video capture mode.
12.3Tunnel Interface
The tunnel interface is used to connect an external chip. One or more coprocessors
in the external chip will act as on-chip coprocessors. The coprocessors in the external
chip will have read/write access to the PNX8550 DDR-SDRAM memory. The
PNX8550 CPUs will have access to the external coprocessor’s control and status
registers. The coprocessor’s interrupt will signal will be connected to the PNX8550
through the GPIO signal.
The external device will read and write the required data including video data, motion
vector data and video measurement data.
The peak data transfer rate across the tunnel is about 267 MB/sec each way and the
usable DMA bandwidth in each way is about 210 MB/sec. This performance figure is
for the 200 MHz tunnel.
12.4Quality Temporal Noise Reduction (QTNR) and Video
Measurement
The QTNR block has two primary functions:
• Temporal Noise Reduction: reading two video fields from memory, “current”
(noisy) and “previous” (noise reduced) and producing a noise reduced version of
“current” in memory
– The temporal noise reduction can be performed on one video stream and the
The QTNR block simultaneously processes the video stream for video measurement,
noise measurement and noise reduction. The noise measurement is limited to one
video stream as the QTNR block maintains the context internally. The noise reduction
and video measurement functions can be performed on one or more video streams in
time multiplexed at filed or frame level. All functions are performed at one pixel per
cycle and the maximum pixel rate is 100 Mpixel/sec.
PNX8550
Chapter 1: Functional specification
• While doing this, or as a separate “measurement only” pass, perform video
measurements:
– Gather a histogram of luminance values (this data is used by software to
control histogram modification)
– Measure noise level inside a rectangular window
– Measure the lowest level luminance within a rectangular window (used to
control black stretch in QVCP)
– Measure UV bandwidth inside a rectangular window
– Measure the position of top and bottom black bars in the image
12.5Memory Based Scaler (MBS)
The PNX8550 contains a Memory Based Scaler that performs operation on images in
main memory. The MBS can either be controlled task by task by a TM3260, or it can
be given a list of de-interlacing and scaling tasks. It reads images from memory,
performs a transformation, and writes the result back in memory. The performance of
the MBS is typically limited only by the 125 M pixel/sec internal processing rate or by
the allocated main memory bandwidth. The usable pixel rate will be reduced from the
maximum pixel rate if required real time memory bandwidth is not allocated to the
MBS block.
The PNX8550 MBS can perform:
• De-interlacing using either a median, 2-field majority select, or 3-field majority
select algorithm with an edge detect/correct post-pass (these three provide
increasing quality, at expense of increased bandwidth).
• Edge detect/correct on an input frame that has been software de-interlaced (this
provides future capabilities in case we develop a better core de-interlacer than
3-field majority select).
• Horizontal and vertical scaling (on the input image or on the result of edge detect/
correct stage).
• Linear and non-linear aspect ratio conversion
• Anti-flicker filtering
• Conversions from any input pixel format to any non-indexed pixel format,
including conversions between 4:2:0, 4:2:2 and 4:4:4, indexed to true color
conversion, color expansion/compression, de-planarization/planarization (to
convert between planar and packed pixel formats, programmable color space
conversion.
Supported video measurement functions during scaling or de-interlacing pass:
Note that not all combinations of format conversion with scaling are supported, refer
to Chapter 31MBS.
The video processing functions are based on 4 and 6-tap polyphase filters with up to
64 phases. Three 6-tap filter units are used for horizontal scaling/filtering while three
4-tap filter units are assigned to vertical scaling/filtering. For some video formats
(e.g., YUV 4:2:x) the three 4-tap filters can be combined to work as two 6-tap filters.
PNX8550
Chapter 1: Functional specification
• Gather a histogram of luminance values (this data is used by software to control
histogram modification).
• Measure noise level inside a rectangular window.
• Measure the lowest level luminance within a rectangular window (used to control
black stretch in QVCP).
• Measure UV bandwidth inside a rectangular window.
• Measure the position of top and bottom black bars in the image.
The PNX8550 MBS is backwards compatible with the MBS of the PNX8525, but
provides the following improvements:
• The majority select de-interlacer and edge detect/correct de-interlace post-pass
• Larger line buffers that allow HD size images to be processed in a single pass
• Performs noise and video measurement. (Refer to Section 12.4 for measurement
function details.)
– The QTNR uses the same noise and video measurement functions and no
noise reduction function in the MBS.
12.62D and DMA Engine
A 2D rendering and DMA engine is included in the PNX8550 to perform high-speed
graphics operations. Solid fills, three operand BitBlt, lines, and monochrome data
expansion are available. Supported drawing formats include 8, 16, and 32-bits/pixel.
Monochrome data can be color expanded to any supported pixel format. Anti-aliased
lines and fonts are supported via a 16 level alpha blend bitblt.
A full 256-level alpha bitblt is available to blend source and destination images
together. Drawing is supported to any byte aligned memory location and at any image
stride. This block is compatible with the PNX8525 2D DE graphics engine.
The PNX8550 contains two QVCPs, which are responsible for combining and
displaying video and graphics images from the main memory. The primary QVCP
serves as the main display pipeline, the second one is targeted to be connected to a
record device (VCR). The primary QVCP allows composition of up to 5 layers, and
can output in 656/HD/VGA format in 10 bits per component up to 81 Mpix/sec.
The secondary QVCP allows composition of up to 2 layers, can output in 656 10-bit/
component mode up to 81 MHz (40.5 Mpix/sec). The secondary QVCP is connected
to an on-chip Digital Video Encoder, allowing direct analog CVBS or S-Video output.
In analog output mode, standard definition interlaced NTSC or PAL are supported.
The primary and secondary QVCP each contain a series of layers and mixers. The
QVCP creates a series of display data layers (pixel streams) and mixes them logically
from back to front to create the composited output picture.
In order to achieve high quality video and graphics, the QVCP performs the following
tasks:
PNX8550
Chapter 1: Functional specification
• Fetching of the image surfaces from memory
• Per component table lookup, allowing de-indexing or gamma equalization
• Video Quality Enhancement (Luminance Transient Improvement, Color
Dependant Sharpening, Horizontal Dynamic Peaking, Histogram Modification,
Digital Color Transient Improvement, Black Stretch, Skin Tone Correction, Blue
Stretch and Green Enhancement)
• Video and Graphics horizontal upscaling
• Color space unification of all the display surfaces
• Contrast and Brightness Control
• Positioning of the various surfaces
• Merging of the image surfaces (alpha blending and pixel selection based on
chroma range keying)
• Gamma correction on the merged result
• Screen timing generation adopted to the connected display requirements (SD-TV
Two layers in the primary QVCP support the semi-planar YUV formats, one layer in
the secondary QVCP supports semi-planar YUV formats. All other layers support only
indexed, RGB and packed YUV formats. QVCP does not support planar video
formats. The indexed format is limited to two layers in QVCP5L and one layer in
QVCP2L. See Tab l e 6
.
Each QVCP contains a number of identical processing layers, a pool of processing
resources that can be switched into a layer under application control, mixers and a
post-processing stage.
for a block diagram that shows how these are interconnected.
Page 44
Philips Semiconductors
Each QVCP layer contains (in processing order):
Table 6: QVCP Pool Elements
Pool Element
CLUT: Color Look-up Table2 sets1
DCTI: Digital Color Transient Improvement2 sets0
HSRU: Horizontal Sample Rate Up-converter, allowing
panoramic up scaling
HIST: Histogram Correction Unit, including black stretch2 sets0
LSHR: Luminance Sharpening (LTI, CDS, HDP)1 set0
CFTR: Color Features (green enhance, skin tone correction,
blue stretch)
PNX8550
Chapter 1: Functional specification
• PFU - Pixel Fetch and Formatting Unit, performing color expansion, undithering
and alpha value extraction
• CKEY - Chroma Range Keying
• CUSP - Color Up-sampling
• LINT - Linear Interpolator for graphics scaling
• VCBM - Video Contrast/Brightness Matrix
• LCU/FCU - receiving later fetch coordinates and sending pixels to the mixer,
subject to clipping outside screen coordinates
• The QVCP pool elements are indicated in Tab le 6 below:
Number in
Primary
QVCP
2 sets0
20
Number in
Secondary
QVCP
The mixer stage combines images from back to front, also allowing mixing in of a
fixed backdrop color. The mix can be controlled by chroma range keying. Mixing
modes include per-pixel alpha blending, and inverting colors. MIX operation can be
programmed by a set of raster operations (ROP). Mixing is performed either entirely
in the RGB domain or the YUV domain, depending on the output mode of operation
of the QVCP.
After mixing, post-processing optionally down samples 4:4:4 to 4:2:2 in CDNS, the
Chroma Down Sampler. Then, VBI insertion is performed (656 mode only), and the
output is formatted to one of the forms below.
• 24 or 30-bit full parallel RGB or YUV (primary QVCP only)
• 16 or 20-bit Y and U/V multiplexed data (primary QVCP only)
• 8 or 10-bit 656 (full D1, 4:2:2 YUV with embedded sync codes)
• 8 or 10-bit 4:4:4 format in 656-style with RGB or YUV
In each of the output modes, an optional H-sync, V-sync and blanking or odd/even
output can be enabled (primary QVCP only).
The primary QVCP is capable of reading the Semiplanar YUV422 or YUV420 in a
10 bits-per-pixel component. This format is provided in the Ch 6 Pixel Formats
10-bit component is inserted before the pedestal removal block within the prefetch
unit (PFU). The 10-bit support was added to the original “9 bits+1” format. The
parameter setting for the 10-bit support will be described in the QVCP document.
12.8Integrated Digital Video Encoder (DENC)
The secondary QVCP is connected to an on-chip Digital Video Encoder, allowing
direct analog video output. In analog output mode, standard definition interlaced
NTSC or PAL are supported.
The encoder has two DACs. DAC1 provides CVBS or luminance for S-Video. DAC2
provides chrominance for S-Video. Internal sensors allow software to test loading on
the S-Video chroma line to decide whether to output luma or CVBS on DAC1.
Two current type analog outputs provide Y/C or CVBS for driving a traditional
composite video output or an S-Video output to a VCR.
PNX8550
Chapter 1: Functional specification
. The
12.9PNX8510/11 Analog Companion Chip
In cases where the PNX8550 primary QVCP is not driving a display-specific digital
output processor, the PNX8510/11 can optionally be used to provide primary channel
analog video and dual stereo analog audio.
The PNX8510/11 has the following features:
• NTCS, PAL or SECAM Standard Definition CVBS, Y/C or SCART output
• HDTV RGB and YPrPb Output with 10 bits-per-component resolution including
the generation of tri-level syncs for various standards (e.g., SMPTE 295M, 296M
and EIA 807)
• Two stereo audio output DACs
• Control over I2C or primary QVCP VBI
13. Audio Processing and Input/Output
13.1Audio Processing
All audio processing in the PNX8550 is performed in software on the TM3260s. This
includes decoding of audio from compressed formats, sample rate conversion, mixing
and special effects processing. There is sufficient performance, if required, to
transcode received audio to multi-channel compressed audio sent over S/PDIF to an
attached receiver.
The PNX8550 supports two I2S stereo input ports. These ports are typically used in
association with the two analog video input streams. The I2S data is transferred via
DMA to the unified system memory. The stereo audio inputs support up to 32 bits/
sample at sample rates up to 96 kHz. An optional synthesized clock is available to
drive the A/D conversion process. If this clock is used, software can precisely control
the sampling rate and/or lock the audio sampling process to any time reference in the
system.
There are two dedicated 8-channel based I2S outputs. They can be used with
discrete high-quality, low-noise audio DACs, or they can be used in a dual stereo
configuration with the companion PNX8510/11 analog video/audio IC. The audio
outputs support up to 32 bits/sample at sample rates up to 96 kHz. The sampling
output rate is precisely controlled by software and can be locked to any time
reference in the system.
The PNX8550 supports a Sony Philips Digital Interface (SPDIF) output with IEC-1937
capabilities. Transmitted data is generated by the TM3260 software. This output port
can carry either stereo PCM samples from an internal audio mix, or one of the
originally received compressed audio programs including 5.1 channel AC-3, and
MPEG1 Layer1, 2 and 3. Transcoding of audio is possible, but is not included in the
normal system operation CPU and memory bandwidth compute budget. Sample rate
of transmitted audio is set by software, allowing perfect synchronization to any time
reference in the system.
PNX8550
Chapter 1: Functional specification
The PNX8550 supports two SPDIF inputs to connect to external sources, such as a
DVD player. The incoming data is timestamped and written to unified system
memory. Data interpretation and sample rate recovery is by software on the TM3260.
The audio data received can be in a variety of formats, such as stereo PCM data, 5.1
channel AC-3 data per IEC-1937 or other. Software decoded audio can be used for
mixing with other audio for output along one of the audio outputs. The sample rate is
determined by the SPDIF source, and can not be software controlled.
13.3Audio Compatibility
The Audio outputs and SPDIF input/output are 100% compatible with the PNX8525.
The I2S Audio inputs have been upgraded to support 32-bits/sample audio. This has
been done in a backwards compatible manner, except that the 8 bits/sample mode of
the PNX8525 is no longer supported.
14. Miscellaneous Functions
14.1Enhanced DMA Controller (EDMA)
The Enhanced DMA Controller supports memory-to-memory (move) transfers from
any byte location in the PNX8550 system memory DRAM to any other byte location in
the PNX8550 system memory DRAM. The DMA Controller has the following features:
DMA Controller can compute DVB-compliant cyclic redundancy checksum (CRC)
during a move operation.
DMA Controller can perform AES encryption or decryption during a move operation.
14.2Semaphores
The SEM block provides semaphores for mutual exclusion in a multi-processor
environment. It implements a total of 16 semaphores. Each processor in the system
can request a particular semaphore. Only one processor at a time can get the
semaphore.
There is no built-in mapping of semaphores to sharable hardware system resources.
Such mapping is by software convention.
PNX8550
Chapter 1: Functional specification
• Support for Scatter-Gather mode DMA
– Large blocks of contiguous data may be transferred to smaller buffer areas.
– Many small data buffers may be combined into a large buffer.
14.3Inter-Processor Communication
The PNX8550 has hardware support for Inter-Processor Communication (IPC). The
top eight interrupts of the TM3260 are allocated for software generated interrupts.
Other processors can communicate to the TM3260 through the top eight interrupt
registers.
There is one IPC hardware block (MIPC). The MIPC block has eight software
controlled interrupt bits. The interrupt signal from the MIPC block is connected to the
MIPS. Another interrupt signal from the MIPC block is also connected to all GICs. A
processor can interrupt other processors by setting an interrupt bit in the target
processor’s IPC block. The target processor will service the interrupt and clear that
interrupt in its IPC block.
15. System Memory
15.1System DRAM
The PNX8550 has an integrated DDR controller that supports 32-bit wide Dual Data
Rate (DDR) SDRAM memory at speeds up to 2x 225 MHz or PC2900 DDR SDRAM
(equivalent bandwidth to a 225 MHz 64-bit wide SDRAM). The memory interface can
support memory footprints of 16, 32, 64 or 128 MB.
The memory interface also performs the arbitration of the memory highway
guaranteeing adequate bandwidth and latency to the PR4450, the TM3260s, and
other internal resources that require memory access. A programmable list-based
memory arbitration scheme is used in order to customize the memory bandwidth
usage of various hardware blocks for a given application. CPUs in the system are
given the ability to intersect long DMA transfers up to a programmable number of
times per interval. This allows optimal CPU performance at high DDR DMA utilization
rates while guaranteeing real-time needs of audio/video DMA peripherals.
Refer to the datasheets listed in the above table for the supported memory
parameters.
15.2System EEPROM, ROM or Flash
EEPROM, ROM or Nand/Nor Flash memory connect to the PNX8550 by sharing
some PCI bus pins. The XIO bus created by this sharing supports 8 and 16-bit
storage devices, using only 8 separate data lines and a few control signals.
The PNX8550 provides 5 chip selects in order to support five external devices
including the system’s EEPROM, ROM or Flash device that holds all application code
and read-only data. Address range and wait states for this device are programmable.
Each chip select supports up to 64 MB of address space and five chip selects
together support 128 MB.
The PR4450 and TM3260s can execute or read from direct addressable device
types, but not from Nand Flash. Such execution is low performance and only
recommended for boot usage. After that, it is recommended to take compressed files
with code from Flash, decompress them and store them in DRAM, and execute from
DRAM. The TM3260 can be enabled or disabled to access Flash. The PR4450 can
reprogram Flash using special software. Flash can not be the target of a peripheral
DMA write. Writes require a software flash programming protocol.
PNX8550
Chapter 1: Functional specification
Execution from and direct addressed read only apply to addressable memory types,
such as EEPROM, ROM or traditional Nor Flash, and not to the more Nand-Flash file
system.
Peak page mode read performance is 33 MB/sec for 16-bit devices and 16 MB/sec
for 8-bit devices such as Intel StrataFlash (28FxxxJ3A, 32M, 64M, 128M) and ST
MLC-NOR Flash (M58LW064A, 64M). Cross-page random read accesses each take
4 to 5 PCI clock cycles depending on the access time of the device.
Flash is mostly active during system booting or with low bandwidth during system
operation in order to implement a small non-volatile Flash file system.
16. Security Provisions
The PNX8550 contains hardware provisions to ensure that critical or confidential
data, such as for example MIPS kernel memory or keys in conditional access devices
can be protected from corruption or inspection by agents other than trusted agents.
Each on-chip device has control/status registers that can be given distinct accessrights for each of the masters in the system: read-only, read/writable or invisible.
The DMA hierarchy that connects devices to system DRAM has 4 programmable
ranges or ‘sandboxes.’ Each device can optionally be set to be associated with a
given sandbox. It can only write to and read from memory locations in the sandbox
range. An attempt to access outside the assigned sandbox leads to dropped writes,
zero read values and raising of an interrupt.
The control over these security mechanisms is initially by ‘boot.’ Boot can then further
restrict access or leave access to the security control mechanisms to a trusted
master (e.g., MIPS).
These mechanisms are a superset of the mechanisms in the PNX8525. Of particular
use is the ability to ensure that a TriMedia processor can not write or read the MIPS
kernel mode memory. Devices controlled by the TriMedias can be placed in a
sandbox that also ensures they can not access the MIPS kernel memory.
16.1Power Modes
PNX8550
Chapter 1: Functional specification
• The PNX8550 system, with its programmable clocks, can be set to operate in
many different power modes. For example, to save power, the clocks to the
TM3260 CPUs and/or the PR4450 CPU can be reduced to a low frequency, and
individual unused blocks can be turned off altogether. All CPUs have powerdown
mechanisms, and can be powered down awaiting an interrupt. These modes are
not managed by a hardware power mode controller, but by software using the
standard provisions of the CPUs and clock system.
• The PNX8550 can in particular go to a very low dissipation ‘hibernate’ mode.
• Hibernation is entered under the PR4450 software control. Software powers
down the TM3260 CPUs completely, turns off all video/audio and I/O blocks of
the system, turns the clocks into a state where all PLLs are off, and where the
system bus clocks are driven from the 27 MHz crystal oscillator frequency divided
by 16. Software then sets wakeup conditions by programming the interrupt mask
registers in its interrupt controller and puts the PR4450 in ‘coma mode’ and
memory in self-refresh.
• During hibernation, memory content is retained and the GPIO block remains
active (on a reduced clock). The PCI outgoing clock is reduced to 27 MHz divided
by 16. The system will not respond to incoming PCI transactions or generate
outgoing PCI transactions, but other PCI components may remain operational.
All other system activity is halted.
• Any enabled interrupt will wake the PR4450 up from ‘coma mode’. In practice,
such an interrupt can only come from a few sources:
– An external GPIO input pin edge transition, on a pin designated as monitored
or as active interrupt pin i.e., an external device signaling wakeup
– An incoming Remote Control ‘power on’ command
– One of the timers/counters in GPIO i.e., a scheduled wakeup, or programmed
number of external events
• After wakeup from ‘coma mode,’ the PR4450 can examine the full RC event or
other tentative wakeup attempts, and if the wakeup is genuine, bring the system
back to full operational mode.
17. Peripheral Interfaces
The PNX8550 supports the required peripherals to build a baseline DTV system.
Peripheral functionality beyond what is supported in the PNX8550 may be achieved
by using commercially available low-cost PCI devices. The PNX8550 has the
following on-chip peripherals:
• Four I2C interfaces (two I2C with multimaster (400 kHz) and two I2C with DMA
(3.4 MHz))
• 2 UARTs (general purpose: one 2-wire, one 4-wire including large data FIFO)
• 16 dedicated General Purpose Software I/O (GPIO) pins which serve as system
interrupt inputs, software inputs/outputs as well as support arbitrary serial
protocol formats, including MemoryStick™ and MultiMediaCard™
– Only level interrupt is supported and the level is programmable for high or low
level based interrupts.
• There are 45 pins that can double as GPIO pins if their primary function is not
used.
• 1 Universal Remote Control receiver input with RC5/RC6/RC-MM and other
serial IR or RF protocol capabilities. Capable of waking the system up from
hibernate mode (uses GPIO).
• 1 or more universally programmable RC Blaster using GPIO outputs
17.1IDE Interface
The PNX8550 contains all logic to control two IDE drives. Two external low-cost TTL
devices
controller shares PCI pins.
The IDE (ATA) interface pins operate in PIO-4 mode transfer with a theoretical
maximum transfer rate of 16.6 MB/s. The processors see disk accesses as
autonomous DMA. Entire data blocks are fetched from or written to system DRAM.
All IDE disk registers (eight command and one control) are indirectly accessible by
processors through the PCI-XIO registers.
The IDE interface uses an XIO select pin for IDE_ENABLE and a GPIO pin for
INTRQ.
1
are required to capture/buffer and isolate the IDE signals, since the IDE
17.2MemoryStick and MultiMediaCard
Interface to a MemoryStick™ is accomplished using software and 3 GPIO pins. This
uses up 2 of the GPIO timestamp/sampled queue resources. Sustained file transfer
rates of 800 kB/sec have been demonstrated.
Interface to a MultiMediaCard is accomplished using software and 3 or 4 GPIO pins
(for MMC mode or SPI mode). If operation in both MMC mode followed by MMI mode
is required, a total of 5 GPIO pins are used. Either mode uses 2 of the GPIO
timestamp/sampled queue resources.
Decoding of audio or picture files on flash cards is performed by software.
Flash card types other than MemoryStick™ and MultiMediaCard™ are not gluelessly
supported and require an external interface chip on USB or PCI.
17.3GPIO
The PNX8550 has 16 dedicated GPIO pins. In addition, 45 other pins that have a
high likelihood of not being used in certain applications are designated as optional
GPIO pins that can either operate in regular mode or in GPIO mode. As an example,
the 10 data pins of the VCR output are available as fully functional GPIO in case the
on-chip Digital Video Encoder is used. Unused smartcard or regular UARTs can be
used as GPIO pins, etc. If the XIO use is limited to 8-bit devices or if XIO chip selects
are unused, additional GPIO lines are available.
The GPIO block is connected to many pins. Hence it is the ideal place to provide
useful central system functions. It performs the following major functions, each
detailed below:
PNX8550
Chapter 1: Functional specification
• Software I/O - set a pin or pin group, enable a pin (group), inspect pin values.
• Precise timestamping of internal and external events (up to 16 simultaneous
signals)
• Signal event sequence monitoring or signal generation (up to 6 simultaneous
signals)
• Generation of CPU interrupts
• Timer/counter capability (4 timers/counters)
17.3.1Software I/O
Each GPIO pin is a tri-state pin that can be individually enabled, disabled, written or
read by software. Pins are grouped in groups of 16, signals within a group can be
simultaneously enabled and changed or observed. Changes can use a mask to allow
certain pins to remain unchanged.
Note that this capability is useful for low/medium speed software implemented
protocols, as well as for observing switches, driving LEDs etc. It is highly
recommended to first use the powerful GPIO pins as protocol emulators, and not just
for static switches/LEDs (for which a solution such as a PCF8574 I2C parallel I/O is
well-suited).
17.3.2Timestamping
The GPIO block contains 16 timestamp units, each of which can be designated to
monitor an external GPIO pin or internal system events. For a monitored event, a
timestamp unit can be set to trigger on rising edge, falling edge or either edge. When
a trigger occurs, a precise occurrence time (31-bit timestamp value, 75 nsec
resolution) is put in a register and an interrupt is generated.
This capability is particularly valuable for precise monitoring of key audio/video
events and controlling the internal software phase-locked loops that lock to broadcast
time references. It can also be used for medium speed signal analysis.
17.3.3Event Sequence Monitoring and Signal Generation
The GPIO contains 6 queue units, each capable of monitoring or generating highspeed signals on up to 4 GPIO pins.
This capability creates a universal protocol emulator capable of emulating many
medium speed (0 - 20 Mbit/sec) protocols using software on the media processor.
Complex protocols, such as the MemoryStick™ protocol with 20 Mbit/sec peak rate
and an 800 kB/sec sustained file transfer rate, have been successfully implemented
on the PNX8525 GPIO, which also supports MultiMediaCard serial protocol. The
PNX8550 GPIO is a superset of the PNX8525 GPIO.
High speed signal analysis uses one of two modes:
PNX8550
Chapter 1: Functional specification
• Event queue hardware samples 1 to 4 GPIO inputs using one out of a variety of
clocks in the system, including clocks input to or generated by other GPIO pins.
Samples are packed in a word and stored in a list in system memory for software
analysis. Sampling can be applied to 1, 2 or 4 pins simultaneously.
• Event queue hardware builds an in-system memory list of timestamped GPIO pin
change events, individual per monitored GPIO pin. Edge events are timestamped
with 75 ns resolution.
Signal generation uses the same 2 capabilities, but in reverse, i.e., a sampled signal
is emitted or an in-memory timestamped list of change events is output over a pin.
The event sequence monitoring mechanism can be used for many functions and is
particularly useful for interpreting Remote Control commands. Signal generation is
useful for RC Blaster applications.
The GPIO block has a total of 6 complex signal analysis/signal synthesis resources
capable of sampling or timestamped list generation/creation.
17.3.4Interrupt Generation
A GPIO pin can be programmed to generate a level based interrupt on a low or high
transition. An interrupt line is connected to any CPU through generic interrupt
controller (GIC) block or directly to the TM3260’s vector interrupt block.
17.3.5Timer/Counter Capabilities
The GPIO contains 4 timers/counters, each capable of
Counting events on external or internal GPIO signals
Counting on a system clock
Generating an interrupt and wrapping around when a programmed count is reached
Optionally gating the clock by a second GPIO signal
The timers/counters are particularly useful to schedule a future interrupt in
preparation for hibernate mode. Another use is measuring the interval in which a
certain number of key audio/video events occur, e.g. audio sample input rate, etc. For
creation of pulse width modulated signals, the use of the generic GPIO signal
generation capability is recommended, rather than the use of a counter/timer.
• It allows any GPIO pin to act as low or high ‘level’ interrupt input to the system.
– Edge triggered interrupt is not supported.
• It provides separated interrupts at system level, allowing association of a given
CPU with a given GPIO task.
• It has better support for remote control, in particular ‘power-on’ detection and low-
power operating mode (see
Section 17.3.8).
• It allows a lower clock of operation with associated reduced timestamp accuracy.
• The reset value of dedicated GPIO pins has been changed.
17.3.8Remote Control Receiver/Blaster
The PNX8550 contains a dedicated hardware RC receiver input pin. This pin is
connected to the GPIO block. This is a regular GPIO input pin, except that it also has
a ‘power-on’ code detector as described below. The general signal analysis
capabilities of the GPIO block are used to interpret remote control keys.
Driver software uses GPIO pins to implement any remote control protocol, such as
the Philips RC5/RC6/RC-MM protocols or similar bit-serial remote control protocols.
The RC receiver driver uses the GPIO event sequence timestamping capability,
which can resolve edge events on signals with 75 ns accuracy when running at full
GPIO clock speed. A sequence of edge events followed by a period of inactivity
causes generation of an interrupt. Software then interprets the “character” by looking
at the in-system memory event list consisting of (time, direction of change). This
allows interpretation of arbitrary remote control protocols in software.
The GPIO block input connected to the RC input pin contains a “power-on” RC code
detector. This detector monitors the RC input pin and generates a wakeup interrupt to
the PR4450 if an event sequence passes a simple filter criteria set. The criteria
consist of a ‘high/low’ or ‘low/high’ or ‘either way’ sequence where each signal state
has a duration falling between two programmed limits. These criteria do not
guarantee that the key pressed was a ‘power-on’ key, but filter most spurious events
so that a minimum of unnecessary PR4450 wakeups occur.
Upon wakeup, software needs to examine the full event list to perform true ‘power-on’
key recognition. The ‘power-on’ detector can be used while GPIO is running at
reduced clock rates, as long as the clock is chosen to allow for error-free recognition
of the ‘power on’ character by its timestamped memory list. After the ‘power on’ key is
recognized, the system, including the GPIO clock, is brought up to full speed.
Any GPIO pins can be used for one or more RC blaster output(s). The event
synthesis capability of GPIO can likewise be used to emit an arbitrary RC event
sequence. A modulator to create an IR carrier is included in the GPIO. Any GPIO
pin(s) can hence be used as RC blaster(s) subject to GPIO queue resource
availability (6 total).
17.4PCI2.2 and XIO16 Bus Interface Unit
The PNX8550 contains an expansion bus interface unit ‘PCI-XIO16’ that allows easy
connection of a variety of board level memory components and peripherals. The bus
interface is a single set of pins that allows simultaneous connection of 32-bit PCI
master/slave devices as well as separated address/data style 8 and 16-bit micro
processor slave peripherals and standard (NOR) or disk-type (NAND) Flash memory.
PNX8550
Chapter 1: Functional specification
The bus interface unit contains a built-in single-channel DMA unit that can move
blocks of data from a peripheral to or from the PNX8550 SDRAM. The DMA unit can
access the PCI as well as 8 and 16-bit wide XIO devices. The DMA unit packs XIO
device data to/from 32-bit words, so that no CPU involvement is required to pre/post
process data.
17.4.1PCI Capabilities
The PNX8550 complies with Revision 2.2 of the PCI Bus specification and operates
as a 32-bit PCI master/target at 33 MHz.
The PNX8550 as PCI master allows any of its processors to generate single cycle
PCI transaction types, including memory cycles, I/O cycles, configuration cycles and
interrupt acknowledge cycles. As PCI target, the PNX8550 responds to memory
transactions and configuration type cycles, not to I/O cycles.
The PNX8550 can act as PCI bus arbiter for up to 3 external masters without external
logic. If the DSACK signal is used then the PCI bus arbiter supports only 2 external
masters.
The PCI clock is an input to the PNX8550, but if desired the general purpose
PNX8550 ‘PLL_OUT’ clock output, which upon reset automatically generates a 33
MHz clock, can be used as the PCI clock for the entire system.
17.4.2Simple Peripheral Capabilities (XIO8/16)
The 16-bit microprocessor peripheral interface is a master-only interface and
provides non-multiplexed address and data lines. A total of 26 address bits are
provided, as well as a bi-directional 16-bit data bus. Five device profiles are provided,
each generating a chip select for external devices. Up to 64 MB of address space is
allowed per device profile. The interface control signals are compatible with a
Motorola 68360 bus interface and support both fixed wait-state or dynamic
completion acknowledgment.
A total of 5 pre-decoded chip Select pins are available to accommodate typical
outside slave configurations with minimal or no external glue logic. Each chip select
pin has an associated programmable address range within the XIO address space.
Each chip select supports 64 MB of address space, and five chip selects together
support 128 MB. Each chip select pin can also choose to obey external DTACK
completion signaling or be set to have a preprogrammed number of wait cycles.
The peripheral interface derives 24 of the 26 address wires and 8 of the 16 data wires
from the PCI AD[31:0] pins. The remaining pins are XIO-specific and non-PCI shared.
During XIO transactions, the PCI signals FRAME, DEVSEL, IRDY, TRDY remain
quiescent, so that other PCI agents ignore the activity. Unused XIO pins are available
as GPIO pins.
PNX8550
Chapter 1: Functional specification
Table 9
Table 9: PCI-XIO16 Bus Interface Unit Capabilities
External DeviceDevice TypeCapabilities
External PCI
master
External PCI slave 32 bits, 33 MHz PCI
External 8-bit
slave
Standard Flash8/16 bits wideThe PNX8550 provides 5 chip selects, one of which is available for a Flash
NAND Flash8/16 bits wideDirect execution, random access read or write from this Flash type is not
32 bits, 33 MHz PCI
masters
targets
8 bits wide,
demuxed address/
data devices on’XIO
bus’
summarizes extension capabilities of the bus interface unit.
Arbitration built-in for up to 3 external PCI masters. Additional external masters
can be supported with external arbitration. External PCI bus masters can perform
high bandwidth, low latency DMA into and out of PNX8550 SDRAM. Large block
transfer-capable devices can sustain up to 100 MB/sec into SDRAM.
Glueless connection supported for multiple devices subject only to capacitive
loading constraints. The PR4450 can perform low-latency 8/16/32-bit memory or
I/O writes and reads to/from PCI targets. Access by TM3260s can be enabled or
disabled.
Up to 5 devices supported gluelessly or unlimited number subject to capacitive
loading rules with external address decode logic. The PR4450 can perform 8/16/
32-bit reads and writes to these XIO’ devices, which are automatically mapped to
8/16-bit wide transfers by the bus interface unit.
device. Address range and wait states for a Flash device are programmable. The
PR4450 and TM3260s can execute or read from Flash. Execution is low
performance and only recommended for boot usage. The MIPS CPU can reprogram Flash using special software. Flash can not be the target of a peripheral
DMA write. Writes require a software flash programming protocol.
Peak page mode read performance is at 33 MB/sec for 16-bit devices and 16
MB/sec for 8-bit devices such as Intel StrataFlash (28FxxxJ3A, 32M, 64M, 128M)
and ST MLC-NOR flash (M58LW064A, 64M). Cross-page random read accesses
each take 4 to 5 PCI clock cycles depending upon the access-time of the device.
Flash is mostly active during system booting or with low bandwidth during system
operation in order to implement a small non-volatile file system.
supported. Explicitly programmed I/O through special nand Flash PCI-XIO8/16
control/status registers is used to implement a file system on this disk-like Flash
type. Using the nand-Flash XIO provisions, a peak bandwidth of 13 MB/sec and a
sustained bandwidth of 11 MB/sec can be obtained from an AM30LV0064D 8Mx8
UltraNAND or equivalent Flash device. Maximum throughput for serial burst
accesses is 33MB/sec for 16-bit devices such as Samsung K9F5616U0B
(16 Mbits x 16).
Table 9: PCI-XIO16 Bus Interface Unit Capabilities
External DeviceDevice TypeCapabilities
CIMaX device8-bit data, 26-bit
address
1394 link core8-bit data and 9-bit
address (Philips
PDI1394LXX)
DOCSIS devicesFuture DOCSIS devices are expected to be PCI bus mastering devices. They
External SRAM,
ROM, EEPROM
External SDRAMnot supportedNot supported on PCI-XIO
External Motorola
style masters
External 8/16 bits
XIO DMA devices
8/16 bits wideCounts as generic XIO slave device.
not supportedThe PNX8550 PCI-XIO does NOT support external Motorola style masters. The
not supportedNot supported. Use one of the streaming DV inputs or outputs instead.
The external logic for conditional access consists of a CIMaX device with 2 PCMCIA slot devices and glue logic (373, 245). This entire subsystem behaves as an
8-bit wide slave with an up to 26-bit address space. This subsystem interfaces
gluelessly to the XIO bus except for the possible logic needed to combine the
DTACK signaling of multiple devices.
There is a medium bandwidth of communication between CIMaX and the
PNX8550, which is not expected to be an issue w.r.t. PCI performance.
The Philips PDI1394LXX family connects glueless to the XIO in 8-bit data mode
using 8-bit data and a 9-bit address with dedicated read and write strobes,
optional wait signal and a separate chip select. For systems which require high
asynchronous performance, a 1394 link device with direct PCI connection can be
used.
connect gluelessly.
PNX8550 assumes that it always is the master on the XIO bus.
…Continued
18. Endian Modes
19. PNX8550 Boot
The PNX8550 fully supports little- and big-endian software stacks.
The PNX8550 always starts its on-chip MIPS device in a fixed endianness, which is
determined by the boot script. There is a system provision for MIPS software to reset
and restart the MIPS in the opposite endianness, such that a field software Flash
upgrade can release a ‘endianness opposite boot’ operating system upgrade.
The PNX8550 on-chip peripherals and coprocessors observe the system global
endianness flag, as does the PR4450. The TM3260 endianness is set by the TM3260
program module itself and should always be set identical to system endianness.
When selecting PCI peripherals for a dual-endianness product, care must be taken to
ensure that they can operate without “CPU fixup” in either endianness. Typically,
PowerPC compatible PCI devices support both endianness types in the exact same
way as the PNX8550.
The PNX8550 boot occurs on an externally initiated hardware reset, a software reset,
or on watchdog timer timeout.
The PNX8550 uses a scripted boot i.e., a hardware block (the Boot module) within
the PNX8550 executes a script consisting of simple commands (write a given value
at a given address, delay xxx cycles, etc.).
Three BOOT_MODE resistor strapping pins determine which script is executed.
The PNX8550 on-chip PR4450 and TM3260 processors are capable of executing
code directly from standard Flash ROM to allow for a second-level of booting.
Remark: Direct execution from NAND Flash or “Disk” Flash is not supported.
Direct execution from Flash ROM has very limited performance. Hence, the MIPS
typically copies a Flash file to high-performance system DRAM and executes the
code from DRAM. The Flash file may contain a self-decompressing system
initialization application. Implementing a multi-stage boot process in this manner can
help to minimize the system memory cost.
The System designer may choose to customize the initial boot that is executed via
the Boot module by connecting an appropriately programmed I2C EEPROM and by
configuring the Boot module to read the initial boot commands from the EEPROM.
A standalone PNX8550 system is able to reliably update its own Flash boot image,
whether the Flash is standard NOR or NAND Flash ROM. In most systems this is
done by an extra Flash storage capacity that is used by the Flash update software to
guarantee atomicity of a boot image update under power failure. The update either
succeeds or the old boot image is retained. In some systems it may, however, be cost
effective to use a medium-sized boot I2C EEPROM instead. This boot EEPROM
holds the code to recover a corrupted Flash from some system resource, such as a
network or disk drive.
PNX8550
Chapter 1: Functional specification
20. Boundary Scan
In the presence of an external host processor, the PNX8550 must execute an I2C
EEPROM boot script that loads a small amount of board level personality data. Once
this data is obtained, the PNX8550 is ready to follow the standardized PCI
enumeration and configuration protocol executed by the external host processor. In
external host configurations, a single small I2C EEPROM is required and no Flash
memory is needed. The host is responsible for configuring a list of PNX8550 internal
registers, loading an application software image into the PNX8550 SDRAM and
starting the TM3260. In the presence of an external host, the on-chip PR4450 is
generally not used.
The PNX8550 is compliant with the IEEE1149.1 Boundary Scan standard. It can be
seamlessly integrated with other IEEE 1149.1 compliant devices to perform boardlevel testing. The PNX8550 scan chain implementation has all I/O signal pins except
the analog signals in the boundary scan registers. The boundary scan registers can
be connected between TDI and TDO pins by executing the required instructions. The
boundary scan instructions are used to capture the signal pins data from the input
pins and also to force fixed values to the output signal pins. A detail description of the
boundary scan usage for the PNX8550 is given in the “PNX8550 Test Block
Specification” document.
There are two new hardware blocks and one sub IP block for the second tape-out or
“RevB” of the PNX8550. The new hardware blocks are the modified MBS block
(MBS2) and the Vertical Peaking block (VPK). A contrast reserve with soft-clipping
hardware IP block is added to the QVCP-5L hardware IP block.
21.1MBS2 Block
The MBS2 block is a derivative of the current memory based scaler (MBS) block.
Refer Section 12.5 on page 1-25
block contains only horizontal and vertical scaler functions. All other functions
including the video measurement, all modes of de-interlacing and the edgedependent de-interlacing functions are not available in the MBS2 block.
21.1.1Frequency of Operation and Performance
The MBS2 block operates at 145 MHz. The MBS2 block output data is optionally
directed to either an external memory or to the QVCP-5L block by setting the
appropriate control registers. The direct streaming mode to QVCP generates
proportionately more bandwidth for the video downscaling function.
for the description of the MBS block. The MBS2
PNX8550
21.2Vertical Peaking Block (VPK)
The Vertical Peaking unit is intended to increase the overall vertical sharpness of the
input video signal. In Figure 8
Vertical Peaking unit is shown.
Y
Un-
8bi t
dithe
YUV i n
(DTL
32bit )
PFU
(DTL)
Figure 8:VPK Block Diagram
r
UV 8bit
MMI O
Programmable
9 tap Peaking
9 bit s
UV delay
fil ter
Horiz ontal
Low-Pass
filter
Un-
dither
, a block diagram of the functional description of the
+
9+1 bit s
+
Yout
Semipl anar Y out
or Packed YUV out
(DTL 32bit)
C
R
O
P
PSU
INTL
Semip lan ar
UV out
(DTL 32bit)
F
R
M
T
CLIP, Coring,
Smart nes s
Control
from Eagle 1c
9+1 bit s
UVout
pedest al
For luminance, vertical peaking is achieved by means of applying a programmable
vertical high-pass filter to the original data and adding the filtered versions in a
controlled manner to the original video data. Two of the three filters operate in a fixed
frequency band, having fixed filter coefficients, while the third filter is a fully
programmable 9 taps filter. The three filters have a separate gain control, which can
be programmed and adjusted on a field-by-field basis.
Furthermore, coring is provided to obtain an improved performance when noisy
signals are present. The coring threshold can be programmed on a field-by-field
basis. Finally, the smartness control realizes a gain reduction in areas where large
peaking values are expected. This significantly reduces the sensitivity for alias
creation.
The Vertical Peaking unit can produce either progressive or interlaced output. In case
the output is interlaced, every other output line is omitted.
The following additional features are included in the Vertical Peaking unit:
Cropping
A programmable output window may be programmed that is cropped out of the input.
Demo mode
The Vertical Peaking supports a demo mode, meaning that a window may be
programmed in which the vertical peaking is applied. This mode is typically used to
show the end-user the difference between processed and nonprocessed parts of an
image. The output of the Vertical Peaking block is transferred to the QVCP-5L.
PNX8550
Chapter 1: Functional specification
Bypass
The vertical peaking unit can be optionally bypassed in order to transfer the video
stream directly from MBS2 to QVCP-5L. The HD resolution video from the MBS2 can
be bypassed through the vertical peaking unit.
21.2.1Frequency of Operation and Performance
The Vertical Peaking block operates at 81 MHz and the maximum pixel rate of the
vertical peaking operation is 81 Mpixel/sec. The Vertical Peaking block output data is
optionally directed to the QVCP-5L block by setting the appropriate control registers.
The peaking function only supports a maximum of 720 pixels per line; however the
bypass mode supports a maximum of 1920 pixels/line.
21.3Video Streaming Connections
21.3.1Tunnel Interface to QVCP
The video data from an external chip through Tunnel interface can be either stored in
memory or transferred directly to QVCP-5L. The selection is controlled by a global
control register. This control register should be used during the system initialization
time and is not meant to be used for dynamic switching.
21.3.2MBS2 Block to QVCP
The output of the MBS2 can be either stored in an external memory or directly
transferred to QVCP-5L through the Vertical Peaking unit.
21.3.3Vertical Peaking Block to QVCP
The Vertical Peaking output data is directly transferred to the QVCP-5L. The VPK unit
can only receive the input data from the MBS2 block.
21.4Contrast Brightness Control with Soft Clipper (CBSC)
The contrast reserve with soft clipper block enhances the visual quality of the video
when it is displayed on a matrix display including the LCD-TV. The CBSC block performs contrast and brightness control, color space conversion, soft clipping and face
(large bright region) detection.
For weak signals, application of high (up to 2.0x) contrast gain is preferred. Without
protection, a high contrast gain can cause hard clipping, which will distort the image
quality. The Contrast Reserve algorithm will reduce the clipping artifacts both locally
and globally. Even for signals at normal level, in order to achieve strong contrast
perception, the contrast and brightness controller deliberately try to overdrive the
display, and the soft clipper prevents artifacts by applying the following four functions:
• local reduction of color saturation
• local reduction of contrast
• global reduction of contrast and/or brightness
• global reduction of clipper characteristics from hard to soft
The local reduction is controlled by two programmable non-linear gain factor Look Up
Tables (LUTs), in which a high gain factor is set for small signals and a low gain factor
is set for large signals. The last two actions are partially based on software feedback
of the hardware face detector output. Software adjusts contrast and brightness
control parameters based on the face detector output and other measurement
results.
The CBSC block supports both YUV and RGB video input/output formats. The
supported contrast range is from 50% to 200% with sufficient quality. Color saturation
control is only available when the input is YUV.
Input and output data formats:
Input
Y/R, U/G and V/B 10 bits signed with nominal data range -256~255.
(the same as the internal QVCP data format)
Output
R/Y, G/U and B/V 10 bits signed with nominal data range -512~511.
The CBSC block is placed in front of the CDNS sub IP block within the QVCP-5L.
The PNX8550 contains three different bus interconnect networks (see Figure 1
PNX8550 System Block Diagram):
• MIPS Device Control and Status Network (MDCS)
• Tri-Media Device Control and Status Network (TDCS)
• Pipelined Memory Access Network (PMAN), which is referred to as the Hub
The MDCS and TDCS are networks used exclusively for MMIO (Memory Mapped IO)
traffic. Every block requiring IO programming has a memory mapped set of registers
and the formats of these registers are designed to be software enumerable.
The PMAN Network is used by all blocks requiring access to memory via DMA.
The separation of DCS and PMAN networks ensures that MMIO traffic does not
disturb memory accesses. The blocks on the TDCS MMIO network are typically
controlled by the TriMedia processor, while the blocks on the MDCS MMIO network
are typically controlled by the MIPS CPU.
All on-chip and off-chip CPUs and devices in the PNX8550 can be set to access
memory-mapped resources. To implement protection, not all masters in the system
are allowed to access all memory-mapped resources. As an example, it is possible to
set registers such that processes on the TM3260 CPU core can not access devices
and memory ranges that are mission critical.
This chapter describes the following:
• Standard PNX8550 system memory map (internal MIPS PR4450 host CPU)
• Object visibility rules and protection mechanisms
• Registers controlling the system memory map and protection mechanisms
• Alternate system memory map (external PCI host CPU).
Memory map view from the MIPS PR4450 CPU, TM3260 CPU, DCS-Bus masters
and external PCI devices.
Alternate PNX8550 system configurations exist, but are not supported.
Page 64
Philips Semiconductors
2.Functional Description
2.1Bus Architecture Block Diagram
The PNX8550 architecture block diagram is shown in Figure 1.
PNX8550
Chapter 2: Bus Architecture and System Memory Map
CAB
CAB
DCS
DCS
Security
Security
DCS Ctrl
DCS Ctrl
GIC-MIPS
GIC-MIPS
IPC-MIPS
IPC-MIPS
CLOCKSCAB
CLOCKSCAB
BOOT
BOOT
GLB REG1
GLB REG1
GLB REG2
GLB REG2
RESET
RESET
TM DBG1
TM DBG1
TM DBG2
TM DBG2
UART1
UART1
UART2
UART2
IIC3
IIC3
IIC4
IIC4
DDR
DDR
T
T
Controller
Controller
R
R
R
DMA
DMA
DMA
DMA
R
W
W
W
W
Pipelined MemoryAccessNetwork
Pipelined Memory Access Network
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
T
T
T
T
T
T
T
T
T
T
I
I
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
I
I
PR4450
PR4450
T
T
Monitor
Monitor
PMAN
PMAN
T
T
MIPS –Device Control and Status Network
MIPS – Device Control and Status Network
T
T
T
T
T
T
PCI/XIO
PCI/XIO
I
I
T
T
T
T
T
T
T
T
USB Host
USB Host
T
T
Scard1
Scard1
T
T
SCard2
SCard2
T
T
DMA Gate
DMA Gate
DMA Gate
DMA Gate
Security
Security
PMAN
PMAN
Arbiter
Arbiter
EJTAG
EJTAG
DE
DE
IIC1
IIC1
IIC2
IIC2
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
R
R
W
W
TM32_1
TM32_1
TM32_2
TM32_2
QVCP5L
QVCP5L
QVCP2L
QVCP2L
VPK
VPK
DVD/CSS
DVD/CSS
VIP1
VIP1
VIP2
VIP2
MBS
MBS
MBS2
MBS2
QTNR
QTNR
EDMA
EDMA
VLD
VLD
TSDMA
TSDMA
VMPG
VMPG
MSP1
MSP1
MSP2
MSP2
T
T
I
I
T
T
I
I
TR
TR
TR
TR
TM –Device Control and Status Network
TM – Device Control and Status Network
T
T
T
T
TW
TW
TW
TW
T
T
T
T
T
T
T
T
T
T
TR
TR
T
T
T
T
T
T
T
T
Security
Security
T
T
DCS Ctrl
DCS Ctrl
T
T
GIC-TM32_1
GIC-TM32_1
T
T
GIC-TM32_2
GIC-TM32_2
T
T
IPC-TM32_1
IPC-TM32_1
T
T
IPC-TM32_2
IPC-TM32_2
T
T
DENC
DENC
T
T
SPDI-O
SPDI-O
T
T
SPDI-I1
SPDI-I1
T
T
SPDI-I2
SPDI-I2
T
T
AI1
AI1
T
T
AI2
AI2
T
T
AO1
AO1
T
T
AO2
AO2
T
T
GPIO
GPIO
T
T
Tunnel
Tunnel
DCS
DCS
DAC
DAC
R
R
W
W
W
W
W
W
W
W
R
R
R
R
R
R
W
W
R
R
W
W
Pipelined MemoryAccessNetwork
Pipelined Memory Access Network
T
T
T
Bridge
Bridge
Bridge
I
I
I
MDCNTDCNPMAN
MDCNTDCNPMAN
I
I
I
T
T
T
PMAN
PMAN
Figure 1:PNX8550 System Block Diagram
2.2Architecture
There are two MMIO networks in the PNX8550: the MIPS DCS (MDCS) network and
the TriMedia DCS (TDCS) network. The MDCS and TDCS are intended for MMIO
traffic to configure the various devices in the system. Each peripheral on the DCS
network is located as close as possible to the CPU that is expected to typically control
that peripheral.
The GPIO interrupts that may require fast service will be serviced by the TriMedia
cores. Therefore, the GPIO module is located on the TDCS network.
Global (1 and 2) register peripherals contain configuration registers for the chip. Most
of the system optimization features are programmable via the Global 2 registers.
All peripherals can be accessed from the MIPS, TriMedia cores, PCI, Boot, and
EJTAG. Due to system security issues, access to any of the peripherals can be
blocked for any of these DCS network initiators. (See Chapter 4 DCS Network &
Security.) Access between the MDCS and TDCS networks is provided by the
DCS2DCS network bridge. All other modules in the PNX8550 are not allowed to
access peripherals.
A generic “Device Transaction Level” (DTL) point-to-point initiator-target
communication protocol is used on the boundary between a peripheral and the DCS
network. MMIO communication through the DTL protocol always consists of a single
32-bit data element.
2.3Low Power Adapter Implementation
PNX8550
Chapter 2: Bus Architecture and System Memory Map
In order to minimize the power consumed in the DCS network, system designers can
choose to use the power management features provided with the architecture. There
are several options described below. Only options 1-3 are supported by the default
implementation of the standard DCS network modules:
1. Utilize the asynchronous interface option and reduce adapter and network
controller frequency.
2. Implement a hierarchical network with some segments operating at very low
frequency.
3. Clock gating for network controller and adapters. A dcs_start_clk signal is
generated based on registered versions of dcs_cmd_req_i and
dcs_cmd_complete_t. This allows the dcs_clk to most of the network controller
and all the target adapters to be gated when there is no access in progress.
There is no access latency impact for this implementation and it still allows a very
high power saving for typical systems.
The selected power management architecture does not enforce the use of the power
management features, but allows system designers to make the choice. The
dcs_start_clk is an asynchronous signal broadcast to all chiplets where the dcs_clk
will be gated. Synchronization of dcs_start_clk will be done in the chiplet. In the
network controller there is a software enable for the dcs_start_clk signal. The default
for dcs_start_clk is logic 1.
It should be noted that there are no clock gates for the initiator adapters. There are
very few initiators and they can have separate clock inputs (e.g. use their core_clock)
for the DCS network interface. Therefore, the total power consumption of the initiator
interfaces are expected to be very small compared to the overall chip power and thus
clock gating was omitted.
The network utilization is expected to be below 0.5% in many systems. i.e.,
approximately a factor 200 in power could be saved by enabling the interconnect
clocks only when a transfer is in progress. That will ensure the power consumption of
the DCS network will be significantly below most other components in the system.
There are two DCS networks in the PNX8550, the MDCS and the TDCS. Each
network has its own network controller (MDCSC and TDCSC respectively), as shown
in Figure 2
PNX8550
Chapter 2: Bus Architecture and System Memory Map
Monitors
Timeout & Error Control
Arbitration
Security
Initiator 0
Initiator 1
l l l
Initiator N
DTL
DTL
DTL
DCS
DCS
DCS
Bridge InBridge Out
Figure 2:Device Control and Status Network
Multiplexing
DCSDCS
Target 0
DTL
DCS
Target 1
DTL
DCS
l l l
Target M
DTL
DCS
The DCS network controllers provide the following functions:
• Provide programmable address map:
– Optionally, address map information may be input signals to the DCS
controller.
– Each target will have a defined aperture of a given size, offset from a defined
base address register. A given bus controller may be configured to have up to
4 different base address register shared by various targets.
– In addition, each target may have additional regions which will be steered to it.
This would be used primarily by bridges, but possibly other targets. Each of
these additional apertures are defined using a High/Low Address method with
a programmable width for each address.
• Provide programmable timeout generation.
• Capture error and timeout information:
– Initiator ID of the currently granted DCS Initiator
– 32-bit address of the currently granted DCS transaction
– Encoded target number for the currently selected DCS device
– Additional command information including cmd_mask and cmd_read
• Allow interrupt generation for any non-masked timeouts and errors.
• Allow optional selective blocking of initiators via read/write registers with
programmable power on defaults.
• Allow optional blocking of errors on 32-bit reads (needed for TriMedia).
• Include timing closure flops for all dcs*_cmd_sel_t and dcs*_cmd_req_i signals.
• Conform to signaling protocol of the DCS Network Specification
See Chapter 4, DCS Network & Securityfor details regarding the DCS Network.
The standard system memory map applies to PNX8550 configurations without an
external PCI host processor. In such systems, the PNX8550 hardware reset and boot
block script initialize on-chip registers and start the on-chip MIPS PR4450 CPU.
Initialization software on the MIPS PR4450 completes system initialization. The
resulting standard system memory map is shown in Figure 3
0xFFFF FFFF
PNX8550
Chapter 2: Bus Architecture and System Memory Map
0x2000 0000
Base18 (XIO_BASE)
Base14 (MMIO_BASE)
PCI_BASE1_HI
PCI_BASE1_LO
Base10 (DRAM_BASE)
Figure 3:PNX8550 Standard System Memory Map (Internal MIPS PR4450 Host)
0x0000 0000
XIO Bus Peripherals & Flash (8..128 MB)
MMIO On-Chip Device Registers (2 MB)
PCI Bus Aperture
Local DRAM (max 128 MB)
Flash is set against top of this range
4.1Apertures in the Standard System Memory Map
The PNX8550 local DRAM aperture is 16, 32, 64 or 128 MB and it is used for
accessing the DRAM.
The XIO bus aperture is not used in all systems. If used, it can be set from 8..128 MB
in size. An access to this aperture goes to external XIO peripherals attached to the
PNX8550 PCI-XIO bus, such as an IDE disk drive, ROM, Flash, SRAM memory, or
8-bit peripheral devices. Due to the sharing of wires between PCI and XIO devices,
the XIO aperture is not accessible to PCI bus masters. It is however claimed as an
aperture upon building the PCI bus memory map to prevent a local XIO peripheral
from being assigned the same address as a PCI device.
The MMIO aperture is a fixed 2 MB in size. It contains the 32-bit wide control and
status registers of all on-chip devices of the PNX8550.
The PCI bus aperture starts above DRAM and ends at the first MMIO aperture
address. An access to this aperture goes to PCI bus target devices, which may be a
range of memory or a PCI device control/status register. See Chapter 13
An example of a typical system configuration and its standard memory map is shown
in Figure 4
PNX8550
Chapter 2: Bus Architecture and System Memory Map
. This is the configuration set by all the PNX8550 built-in boot scripts.
0xFFFF FFFF
Unused
0x1FFF FFFF
64 MB XIO
0x1C00 0000
0x1BE0 0000
0x01FF FFFF
2 MB MMIO
Unused
(map to PCI after boot)
PCI Bus
32 MB DRAM
0x0000 0000
Figure 4:System Configuration and Standard Memory Map (Boot Script #1)
PCI DeviceNor Flash/ROM
4.2Building the Standard System Memory Map
The system memory map is built according to the following rules:
• DRAM is set to start at address 0, with a size equal to or greater than the actual
DRAM size in the system. The aperture size must be a power of 2 between 2 MB
and 128 MB.
• If used in the system, the XIO aperture is set against the top of the first 512 MB. It
must have a size equal to or greater than the actual peripheral address range.
The aperture size must be a power of 2 between 2 MB and 256 MB, and
allocated on a natural boundary i.e., a 128-MB aperture must start on an address
that is a multiple of 128 MB.
32 MB DRAM
PNX8550
Boot Mode
RESET_IN
001
XIO_SEL0
• The 2-MB MMIO aperture, with on-chip device control/status registers, is set
against XIO. It must be allocated on a 2-MB boundary.
• The area between the top of DRAM and MMIO is designated as a bridge to the
PCI bus.
The PNX8550 built-in boot scripts perform the first three of the above steps. MIPS
PR4450 software is responsible for the last step. If the size assumptions by the builtin boot scripts are inappropriate, a custom boot script can be used. Refer to
Chapter 7
Boot.
4.3Rationale for the Standard System Memory Map
The PNX8550 address decoder logic requires that all three apertures are powers of
two in size and are “naturally aligned” i.e., a 32-MB aperture must start on an address
that is a multiple of 32 MB.
It is required that MIPS PR4450 kernel mode processes can access any object in the
system. Due to the nature of the virtual to physical address translation in the on-chip
MIPS PR4450, kernel mode access is limited to the first 512 MB of physical address
space. Hence, the standard system memory map puts all objects in the lower
512 MB.
Furthermore, the MIPS PR4450 exception vectors must reside in the lower part of the
physical address space, necessitating DRAM at this address.
The MIPS PR4450 starts execution from a virtual address provided from the global
register module. This address is mapped to a physical address inside the MIPS
PR4450. This address may be in the XIO aperture, allowing MIPS PR4450 to start
from external ROM or Flash. Note that the system has a special MIPS PR4450 boot
provision, described in Section 8.1.1
without any direct addressable XIO to external ROM or Flash.
The choice of MMIO positioned below XIO is arbitrary.
The PCI aperture must be kept as large as possible to allow for multiple PCI devices
with attached local memories. Hence, it fills all left over space.
PNX8550
Chapter 2: Bus Architecture and System Memory Map
, which can be used to boot MIPS PR4450
When changing the values in above mentioned aperture boundary registers of an
aperture, the following steps should be taken to avoid a temporary overlapping with
other apertures:
1. When moving to a higher address:
a. set the lower boundary register content to the new higher boundary.
b. set the higher boundary register content to the new higher boundary.
c. set the lower boundary register content to the new lower boundary.
2. When moving to a lower address:
a. set the higher boundary register content to the new lower boundary.
b. set the lower boundary register content to the new lower boundary.
c. set the higher boundary register content to the new higher boundary.
5.Hardware Limitations to Object Visibility
The PNX8550 hardware imposes the following limits:
• No DMA device can access MMIO aperture.
• Only the DMA engine within the PCI/XIO module can target the PCI or XIO
aperture.
In addition, the TM3260 CPU cores are set up such that they never map the PCI or
XIO aperture into their address space. This convention is highly recommended as
certain PCI and XIO devices may have side effects from read operations, and
TM3260 CPU generate speculative reads.
Table 1: PNX8550 Hardware Limitations to Object Visibility
Master TypeDRAMMMIOPCIXIO
On-chip MIPS PR4450 CPUyesyesyesyes
On-chip TM3260 CPU coresyesyesNote ANote A
On-chip MDCS DMA devicesyesnoyesyes
On-chip TDCS DMA devicesyesnonono
Off-chip PCI bus mastersyesyesyesno
Note: The TM3260 CPU core is not normally set to see PCI or XIO directly in its address
map. It can however use explicit MMIO transactions to the PCI-XIO block to perform any
single cycle type PCI transaction, including memory, I/O and intack. Using the same method,
it can also perform XIO bus transactions.
6.Register Descriptions
6.1Aperture Control Registers
PNX8550
Chapter 2: Bus Architecture and System Memory Map
6.1.1PCI, TM3260, and MIPS PR4450
The following registers relate to the PNX8550 System Memory Map and Object
Visibility. For more information, see Chapter 13
PCI-XIO, Chapter 29 TM3260 CPU
Core Processor, and Section 4. Standard System Memory Map.
Table 2: Aperture Control Registers
OffsetSymbolDescription
BASE Registers (PCI-XIO)
0x04 0050BASE10 (DRAM_BASE) Shadow of PCI config register that determines DRAM base address in memory
map for external PCI masters.
0x04 0054BASE14 (MMIO_BASE)Shadow of PCI config register that determines MMIO base address in memory
map
0x04 0058BASE18 (XIO_BASE)Shadow of PCI config register that determines XIO base address in memory map
0x04 0018PCI_base1_loLow address of region 1 of DCS-bus that gets bridged to PCI
0x04 001CPCI_base1_hiHigh address of region 1 of DCS-bus that gets bridged to PCI
0x04 0020PCI_base2_loLow address of region 2 of DCS-bus that gets bridged to PCI
0x04 0024PCI_base2_hiHigh address of region 2 of DCS-bus that gets bridged to PCI
TM3260 CPU Aperture Control Registers (TM3260 CPU)
00x10 0034TM32_DRAM_LOLow address of DRAM
00x10 0038TM32_DRAM_HIHigh address of DRAM
00x10 003CTM32_DRAM_CLIMITCacheable limit (addresses above this are not cached)
6.2Global 2 Registers
Table 3 on page 2-10 lists the Global 2 registers relative to the PNX8550 System
Memory Map. Detailed information on these registers can be found in Ch12 Global
0x04 D200DMA_GATE_LOInternal bus DRAM low address register
0x04 D204DMA_GATE_HIInternal bus DRAM high address register
0x04 D208APERTURE_WEEnable DCS_DRAM_LO and DCS_DRAM_HI registers to be writable.
PNX8550
7.Alternate System Memory Map with External Host CPU
7.1PCI Standard Boot and Memory Map Assembly
If an external host CPU (MIPS PR4450 or other) is present on the PCI bus, it has the
responsibility to enumerate and configure all PCI resident devices, including each
PNX8550. This configuration process builds an address map where apertures of all
devices (including each PNX8550) are given unique PCI addresses. The
standardized protocol that accomplishes this is described in the PCI Local Bus
The configuration process of the PNX8550 in this case is summarized below:
1. The boot block writes to the PCI_SETUP register to set the desired size for each
of the DRAM, MMIO and XIO apertures—typically equal to attached DRAM size
on a particular board; 2 MB for MMIO and up to 128 MB for XIO. The PCI-XIO
block generates a “retry” on any attempt by the host to access it until the
PCI_SETUP write is completed.
2. The host PCI BIOS reads each base address PCI configuration register of a PCI
device, in particular the lsb to determine if the requested aperture is a PCI
memory or I/O space aperture. The PNX8550 has three such base addresses,
each requesting a PCI memory space type aperture. They are Base10 (DRAM),
Base14 (MMIO) and Base18 (XIO).
3. The host writes an “all 1” value to each base address register and reads it back.
The PCI block hardware returns 0s in all “don’t care bits” and 1s in all actually
writable bits, from which the host deduces the size of the requested memory
aperture.
4. The host writes a unique address value to each base address register to set the
aperture base address of DRAM, MMIO and XIO.
Note the outcome of this host configuration process is three apertures (DRAM,
MMIO, XIO) that will most likely be adjacent to each other in the PCI address space
and anywhere between 0x0 to 0xFFFF FFFF.
7.2Internal MIPS PR4450 and External Host CPU
The address decoding logic for DRAM, XIO and on-chip MMIO devices is designed to
decode addresses relative to the base address values established by the PCI
memory map building protocol. The TM3260 CPU core is able to execute code and
load/store data at any physical address. However, the on-chip MIPS PR4450 CPU is
not able to execute in this unpredictable PCI host address map environment, since it
expects a certain physical memory layout e.g., DRAM and MMIO must be within the
first 512 MB.
Hence, the internal MIPS PR4450 CPU is not used when an external host is present.
In certain instances e.g., if the code on the external host is developed specifically for
the PNX8550, and it can guarantee “MIPS PR4450-friendly” base address values,
the internal MIPS PR4450 can still be used.
8.Memory Map Perspectives
8.1View from MIPS PR4450
The on-chip MIPS PR4450 CPU is only active when no external host CPU is present.
If MIPS PR4450 is active, it is in the “standard system memory map” environment,
where all system resources are in the first 512 MB of the physical address space.
The chosen virtual-to-physical addressing scheme for the internal MIPS PR4450
CPU is shown in the following diagram.
PNX8550
Chapter 2: Bus Architecture and System Memory Map
0xFFFF FFFF
sw-debug
kseg2
(kernel cached)
0xC000 0000
0xBFFF FFFF
0xA000 0000
0x9FFF FFFF
0x8000 0000
0x7FFF FFFF
0x0000 0000
Note: Shaded areas are TLB mappable.
kseg1
(kernel uncached)
kseg0
(kernel cached)
kuseg
(user cached)
Virtual Address
4GB
3GB
2.5GB
2GB
4GB
3GB
1GB
0.5GB
Physical Address
Mapping only
if TLB is OFF
optional DRAM
shadow
Not Accessible
0xFFFF FFFF
0xC000 0000
0xBFFF FFFF
0x4000 0000
0x3FFF FFFF
0x2000 0000
0x1FFF FFFF
0x0000 0000
Figure 5:MIPS PR4450 Address Map
Kseg0 and kseg1 are not mappable via the TLB. They map directly to the lower
0.5-GB where all system resources (DRAM, PCI, MMIO, XIO) reside.
The software debug area is the only part of kseg2 that is used in the PNX8550. This
SW debug area is non-cacheable and can not be mapped with a TLB. The mapping
of this area can be enabled/disabled in the PNX8550.
User-Mode processes will reside in kuseg. They will almost always be mapped via
the MIPS PR4450 TLB. The TLB allows per page translation to any physical address
in the 4 GB physical memory space.
8.1.1MIPS PR4450 Exception Vector Logic
The exception vectors for the MIPS PR4450 reside in kseg0 or kseg1 depending on
the value of the MIPS PR4450 internal Boot Exception Vector (BEV) bit. These
exception vectors and the corresponding virtual and physical address locations are
shown in the following table.
Note: The BEV bit is set to 1 during reset and typically set to 0 after boot of the operating
system.
PNX8550
Chapter 2: Bus Architecture and System Memory Map
Virtual Address Physical
Address
0xbfc0 00000x1fc0 00000xbfc0 00000x1fc0 0000
0xff20 0200
Virtual AddressPhysical
Address
0xbfc0 0200 or
0xff20 0200
0x1fc0 0200 or
0xff20 0200
The MIPS PR4450 CPU on the PNX8550 has the option to relocate the except vector
base in kseg1 (when BEV=1) to a different address. The configurable bits are [28:12].
These bits can be programmed by writing into the MIPS_RESET_VECTOR register
in the global register 2 module during boot.
The boot script can use the reset remap mechanism to start the MIPS PR4450 at any
desired address e.g., from an XIO non-volatile memory or from initialized DRAM.
The memory map seen by each TM3260 CPU core contains three apertures. Each
aperture is independent. Any address is a legal base address, including
0x0000 0000. Apertures for each TM3260 CPU should not be set to overlap or
extend across the 0xFFFF FFFF limit of 32-bit addressing conventions. The
apertures for each TM3260 are shown in the following block diagram.
PNX8550
Chapter 2: Bus Architecture and System Memory Map
0xFFFF FFFFF
MMIO_BASE
TM32_APERT1_HI
TM32_APERT1_LO
TM32_DRAM_HI
TM32_DRAM_CLIMIT
TM32_DRAM_LO
0x0000 0000
Figure 6:Memory Map for TM3260 CPU Core
Inaccessible
2 MB
MMIO Aperture
Inaccessible
Aperture_1
Inaccessible
Non-cacheable
DRAM Aperture
Inaccessible
Each TM3260 CPU core requires all its apertures to be a multiple of 64 kB and reside
on a 64-kB boundary. They are programmed by writing to MMIO registers inside the
TM3260 CPU core, with the exception of the MMIO_BASE, which is directly taken
from the PCI Base14 register content. In the PNX8550, the TM32_DRAM_LO/HI
registers must be set to view the entire PNX8550 DRAM aperture. Protection can be
accomplished by using the TM_REGION_LO/HI registers, as described in
Section 7.2
.
Each TM3260 CPU can access the DRAM aperture with all load and store
instructions. Loads and stores in the cacheable area use the data cache. Loads and
stores in the non-cacheable area bypass the data cache and go directly to memory
across the memory interface. Execution is supported from the entire DRAM aperture,
which always uses the instruction cache.
Upon either reset, TM3260 CPU puts all registers and cache control in its defined
initial state. It does not start program execution. Program execution is started when
the code for “start” is written to the TM32_CTL MMIO register. Execution starts at the
address contained in the TM32_START_ADDR MMIO register and the boot address
is therefore flexible. Also, the addresses for the exception vectors are programmable
and do not put any constraints on the system design.
Aperture_1 is not normally used in the PNX8550. It is enabled by writing a
TM32_APERT1_HI greater than TM32_APERT1_LO. In that case, loads/stores to
such addresses cause (non-cached) accesses across the TDCS bus. In special
applications of the PNX8550, this could be used to map (part of) PCI and/or XIO
directly into the TM3260 address map.
Remark: Due to the TM3260 CPU core architecture and compiler code generator, it does
speculative loads i.e., it may perform loads to any location in its address map without an explicit
request. The MMIO devices inside the
or XIO devices may be able to do so. For this reason, direct mapping of PCI or XIO is not
generally recommended.
See Ch29 TM3260 CPU Core Processor for more information.
8.3View from PCI Bus
There are two different cases to consider for PCI:
PNX8550
Chapter 2: Bus Architecture and System Memory Map
PNX8550 are designed to cope with this, but not all PCI
• The PNX8550 is PCI configuration manager. Its CPU allocates base addresses
for all PCI components in the system.
• An external CPU is PCI configuration manager on the PCI bus and the PNX8550
is one of the components.
8.3.1PNX8550 as PCI Configuration Manager
As PCI configuration manager, the PNX8550 builds the standard system memory
map. External PCI bus masters see the memory map, but can only access the
PNX8550 DRAM and MMIO apertures and other PCI target devices, not the XIO
aperture
0xFFFF FFFF
Base14 (MMIO_BASE)
Base10 (DRAM_BASE)
Figure 7:PNX8550 (as seen from a PCI Bus Master - #1)
0x0000,0000
PNX8550 MMIO (2 MB)
PNX8550 DRAM
8.3.2An External Host CPU as PCI Configuration Manager
In this case, the host CPU PCI BIOS builds the system memory map. The host CPU
assigns each PNX8550 in the system a unique Base10, Base14 and Base18 per the
procedure described in Section 6.1
Usually, apertures of a given device end up adjacent to one another. In a X86 PC,
apertures typically end up near the high-end of the 4-GB address space. Address 0 is
never used as the first 640 kB are used by the PC. Other host CPUs may have their
own conventions. In general, no assumptions should be made about the resulting
memory layout, other than that apertures don’t overlap.
Note the PCI specification requires an aperture to be aligned at a boundary which is
the same size as the aperture e.g., a 128-MB aperture will be located at a 128-MB
boundary. External PCI bus masters see the memory map, but can only directly
access the PNX8550 DRAM and MMIO apertures, not the XIO aperture.
See Figure 8
It is possible for the external host CPU to do an indirect access to PNX8550 XIO by
writing to the PCI XIO module MMIO registers to request a single cycle XIO access or
a XIO DMA transaction.
PNX8550
Chapter 2: Bus Architecture and System Memory Map
.
0xFFFF FFFF
PNX8550 MMIO (2 MB)
Base14 (MMIO_BASE)
PNX8550 DRAM
Base10 (DRAM_BASE)
0x0000 0000
Figure 8:PNX8550 (as seen from a PCI Bus Master - #2)
8.4View from the MDCS and TDCS Buses
Accessibility from the MDCS and TDCS buses is shown in Figure 9 on page 2-16.
DRAM is accessible to all peripherals.
Peripherals on the MDCS bus can access DRAM, PCI, or XIO apertures. On the
TDCS bus, only the DRAM aperture is visible to the peripherals. Each TM3260 will
typically be enabled to access its DRAM and MMIO apertures only due to the fact that
XIO and PCI peripherals may be sensitive to speculative reads generated by the
TM3260’s. However, both TM3260’s can be enabled to also access the PCI1/2 and
XIO apertures if the speculative read feature is disabled during code compilation.
1 This aperture is the TM3260 internal view of the TM3260 registers.
Each TM3260 only sees it’s own registers here i.e., PNX8550 aperture
14_0000 maps to TM1 10_0000. PNX8550 aperture 16_0000 maps to TM2
10_0000.
2
Each TriMedia sees the lower 64 kB of the MMIO aperture as cache i.e.,
PNX8550 aperture 13_0000 maps to TM1 00_0000. PNX8550 aperture
15_0000 maps to TM2 00_0000.
3
The tunnel configuration sets its aperture size from 4kB to 514kB if the tunnel
is not in use.
4
DRAM access from an external PCI master uses the Base 10 register from
PCI Configuration space when the en_pci2mmi is set to 1. Any address in the
XIO range is routed to the XIO block in the PCI module. Transactions to XIO
originating on the TDCS network are forwarded across a bridge.
This document describes the design details of the DVI_HUB Memory Access
Infrastructure (aka DMA-HUB or “the HUB”) in the PNX8550. The design is based on
the Pipelined Memory Access Network (PMAN) technology specification. In addition
to the Network function, the HUB includes a generic arbiter for data flow control within
complex memory systems and the PMAN Security block. See Figure 1
for more details.
The PMAN or Hub provides DMA data paths and control which link most of the
Peripheral devices with the main memory controller, allowing data to be read from or
written to main memory at a very high rate. The Arbiter controls which Peripheral
gains access to the main memory controller via the Hub and the PMAN Security block
provides for the separation of memory areas within main memory, so as to limit
unintentional interference by a mis-programmed peripheral.
and Figure 2
1.1Features
The key features of the HUB are:
• Provides a hierarchical memory access network that connects Peripheral DMA
ports to a single access port of the System Memory Controller.
• Includes simple round-robin sub-arbitration for lower levels of hierarchy.
• Utilizes the IP_1010 arbiter to provide sophisticated intermediate arbitration for
upper levels of the network hierarchy.
• Includes a security mechanism to limit memory access of Peripherals to
programmable regions in system memory (dvi_msec).
• Provides data synchronization, transaction buffering and partitioning
mechanisms.
• Supports Tunnel-to-QVCP data streaming mode.
• Supports MBS2→VPK→QVCP data streaming mode.
• Includes a memory access “gate” for MMIO transactions, for debug applications
via EJTAG.
Page 83
Philips Semiconductors
2.Functional Description
2.1PNX8550 HUB Block
The following diagram shows the Hub as it interconnects within the PNX8550 system.
The Arbiter module is used as an arbiter between different DMA channel clusters.
Inside these clusters traffic from related DMA channels of Peripherals are combined
by applying round-robin arbitration. (See Tab l e 1
arbitrated DMA channels.)
The arbitration engine combines Time-Division Multiple Access (TDMA), priority, and
round-robin methods; resulting in a guaranteed and high-level quality of service. The
arbitration engine ensures programmable maximum latency and programmable
minimal bandwidth to the unified resource. It also makes sure that best effort agents
are fairly granted when higher priority agents do not request the channel.
The priority table can be dynamically altered by software. Two priority tables are
implemented from which the inactive table can be changed on-the-fly. The Arbiter
hardware takes care of smooth switching between the two tables.
After reset, the Arbiter is in “boot” mode and guarantees that each requesting agent is
given a “grant” to main memory (Round Robin is the default arbitration method).
• Two-level Round Robin arbitration
– provides equal opportunities to the lower priority “best effort” or DMA write
agents
– 16 round robin slots in the first level
– 8 round robin slots in the second level
• Dynamic arbitration scheme
– Two sets of arbitration parameters can be defined. Selection can be made
dynamically via software based on system needs.
ID Mapping
Table 1 shows the mapping of each peripheral device to unique identification
numbers. It also shows the amount of subarbitration for the given peripherals. Unless
otherwise noted, the amount of buffering per DMA channel is 256 bytes.
The PMAN Security block (dvi_msec for “memory security”) will invalidate memory
access to all locations that are not inside any of a peripheral’s assigned sandbox
(memory access region). There are four sandboxes defined via a lower and upper
address limit for each sandbox. Each peripheral device can be associated with each
sandbox under software control. The sandbox access information is held in a set of
registers with a dedicated entry for each peripheral (some blocks share a common
set of registers, for example SMC1 and SMC2).
Access to these registers can be enabled/disabled for every initiator via the DCS
security module. In the
by the MIPS processor.
Implementing 4 sandboxes allows a sandbox for both the MIPS and TM3260controlled peripherals, plus 2 additional sandboxes for special purposes. The
sandboxes adhere to the following rules:
PNX8550
Chapter 3: PMAN Hub
PNX8550 system, the sandbox registers can only be written
• Sandbox address ranges can be overlapping.
• Granularity of the address range is 64 kB.
• There are 2 registers for each sandbox:
– Sandbox Base Address
– Sandbox Top Address
• A peripheral can be assigned to several or no sandboxes.
• There is no support for an inverted sandbox (Base_Address > Top_Address).
In case a peripheral attempts an access where the address is not within the range of
any of the enabled sandboxes (or if no sandbox is enabled for this peripheral), the
following will happen:
• ‘0xdeadda7a’ is returned on reads.
• Writes will be blocked by setting all write mask bits to zero.
• An interrupt is generated. (If the interrupt is enabled, then a processor is notified
of the event.)
In the case of an access violation, the address value and ID value of the attempted
access is stored in the “Protection Error Address” and Interrupt Status registers within
the dvi_msec module. In case of multiple violators, only the address and ID of the first
violator are stored. The Protection Error Address register can be ‘re-opened’ to store
the new Address and ID values by clearing the “Memory Access protection Error”
status bit (STAT_PROT_ERR).
2.2.3DMA Gate
The DMA gate is simply a DTL-to-DTL connection between a DTL-MMIO “initiator”
and a DTL-DMA “target” within the Hub. Any 32-bit read or write transaction that is in
the range of DMA_GATE_LO to DMA_GATE_HI will cause a read or write transaction
to main memory via the DMA Gate. This is provided as a special test path by EJTAG
and is not generally used by normal operating software. See Figure 3
Address Mapping.
DMA Gate
The DMA_GATE_LO and DMA_GATE_HI registers are located in the Global
Registers. See Chapter 12
The Memory Bandwidth Monitor block contains three 40-bit counters that are capable
of counting memory transaction events as they arrive at the DDR Memory Controller.
Each counter has a configuration register that will determine the transaction type,
monitor mode, the specific memory interface (MTL) port of the DDR Memory
Controller that shall be selected for monitoring, and the Hub device ID (if the Hub is
selected for monitoring). A single control register is used to start or stop the counting
process so that the counters will start or stop together, assuring consistency in the
measurements that are obtained.
Each counter is capable of counting in units of memory clocks, number of
transactions (memory requests), bandwidth (i.e., the number of bytes transferred),
memory controller idle time (in memory clock cycles), and latency (in memory clock
cycles).
By using different combinations of units and by monitoring different memory ports,
one can obtain information that can be extremely useful in determining the
performance of the memory subsystem and/or the overall PNX8550 system.
See Section 4.2
PMAN Security Registers for more information on the programming
One of the most important purposes of the arbiter is to guarantee a high level of
quality service to the DMA agents (peripheral devices). In technical terms this means:
• the ability to guarantee a programmable maximum latency to DMA agents
• the ability to guarantee a programmable amount of bandwidth to DMA agents
• the ability to provide equal opportunity to DMA agents
• any (complex) combination of the three mechanisms mentioned above
The arbiter does not process requests for memory access from CPUs (MIPs and
TriMedia). Typically the performance of CPUs depends directly on the access latency
to memory and for this reason they require the lowest possible memory latency. To
realize this CPUs can best get their performance requirements via a private port on a
multi-port memory controller. Therefore, the CPUs are not connected to the arbiter
and do not route memory requests via the Hub.
PNX8550
Chapter 3: PMAN Hub
To support the quality of service features as mentioned above the arbiter algorithm
consists of a combination of three basic arbitration mechanisms. These are:
• Time-Division Multiple Access (TDMA) arbitration to guarantee maximum latency
• priority arbitration to guarantee bandwidth to reading Soft Real Time DMA (SRT
DMA) agents
• round-robin arbitration to guarantee bandwidth to writing SRT DMA agents
• round-robin arbitration to provide equal opportunity for Best Effort (BE) DMA
The combination of these three basic algorithms operate together in the arbiter as
shown in Figure 4
.
Timing Wheel
PNX8550
Chapter 3: PMAN Hub
TDMA
highest
high
priority list
low
overall priority
round robin #1
round robin #2
lowest
Figure 4:Arbitration Scheme
The TDMA timing wheel is implemented with 128 entries, numbered 1 to 128. The
TDMA_entries field in the NR_entries_A
1
register will determine the actual number of
entries that are used. TDMA entries higher than this value will be ignored. If the
TDMA_entries is greater than 128 then all 128 entries are used, but no more. If
TDMA_entries is set to zero then the TDMA timing wheel is not used for arbitration.
The priority list is implemented with 16 entries, numbered 1 to 16. The Priority_entries
field in the NR_entries_A register will determine the actual number of entries that are
used. If a value greater than 16 is written all 16 entries are used, but no more. If the
Priority_entries is set to zero then the priority list is not used for arbitration.
The round robin #1 list is implemented with 16 entries, numbered 1 to 16. The
round_robin1_entries field in the NR_entries_A register will determine the actual
number of entries that are used. If a value greater than 16 is written all 16 entries are
used, but no more. If the
round_robin1_entries is set to zero then the round robin #1 list
is not used for arbitration.
The round robin #2 list is implemented with 16 entries, numbered 1 to 8. The
round_robin2_entries field in the NR_entries_A register will determine the actual
number of entries that are used. If a value greater than 8 is written all 8 entries are
used, but no more. If the
round_robin2_entries is set to zero then the round robin #2 list
is not used for arbitration.
1.All references to “Set A” registers also apply equally to “Set B”.
Assuming the arbiter has been configured to include the priority list and both roundrobin lists, any arbiter decision is made through the following four steps:
1. First the DMA requests are compared against the current entry in the TDMA
2. If the agent in the current entry is not requesting the DMA requests will be
3. If none of the DMA requests matches the current entry in the TDMA timing wheel
4. If none of the DMA requests matches (e.g., the current entry in the TDMA timing
PNX8550
Chapter 3: PMAN Hub
timing wheel. If the agent in the current entry is requesting this agent will be
granted.
compared against the agents in the priority list and if one or more of the agents in
the priority list is requesting the one that has the highest priority will be granted.
or one or more entries in the priority list, the arbiter will grant the DMA agent that
has not been served for the longest time by choosing from the round robin #1 list.
Every time the arbiter provides a grant to any DMA agent, the round robin #1
arbiter checks if this agent is in it’s list and makes that agent the lowest priority
entry in the round robin #1 list. If a certain agent is granted because of it’s entry in
the TDMA timing wheel or priority list and the same agent has also an entry in the
round-robin #1 list, then in the next clock cycle this agent will have the lowest
priority in the round-robin #1 list. Also, in case there are multiple entries of the
same agent in the round-robin #1 list, the highest entry in the list gets the lowest
priority during the next cycle. The other entries of the same agents do not get the
lowest priority.
wheel, or one or more entries in the priority list, or one or more entries in the first
round-robin list), the arbiter will grant the DMA agent that has not been served for
the longest time from the round robin #2 list of entries. The round-robin #2 list
operates the same way as the round-robin #1 list, but all entries in the #2 list
have a lower priority than those in the #1 list.
The TDMA wheel will only proceed to the next entry if one of the two following
situations applies:
• there is a grant at the level of the TDMA wheel
• there is no match in the complete list (TDMA, priority and both round-robin lists)
All entries in the TDMA wheel, priority list and both round-robin lists are fully
programmable via the DTL MMIO interface of the arbiter. The same is true for the
number of entries in any of these four. It is also possible to set the number of entries
in the TDMA wheel, priority list and/or round-robin lists to zero. This allows the user to
use only one of the four mechanisms or any combination of them. In case all four are
set to zero for the active set of entries, the arbiter defaults to a round-robin arbitration
over all agents.
The arbitration algorithm only starts after the arbiter has been properly initialized via
the programming registers. Following the de-assertion of a hard reset, the arbiter
uses a simple counting algorithm to arbitrate between all request inputs. In this boot
mode, agents are granted in the order that they are internally wired.
Arbiter Startup Behavior
After reset is de-asserted, the arbiter is placed in boot mode. In this mode, the arbiter
sequentially grants each agent access to the memory if the agent has asserted its
request. After de-assertion of rst_an starting with req[0], then req[1], etc. Four agents
are checked in each clock cycle. This means that in the situation that only req[15] is
asserted, it will take four clock cycles before the arbiter will grant this agent. In the
first clock cycle it will check req[0] up to req[3], in the second clock cycle req[4] up to
req[7], in the third clock cycle req[8] up to req[11] and the fourth clock cycle req[12] up
to req[15]. The boot counter increments to next value when all agents corresponding
to that count value have been serviced or when there is no request from the agents
corresponding to that count value.
This mode is not intended to intelligently allocate memory bandwidth. Its goal is to
simply make sure that all agent requests are granted. While in boot mode, it is
expected that the system software will set up the arbiter via the DTL MMIO port and
switch to the normal operation mode. As there are two sets of configuration registers
(A and B), software should initialize one of the sets and then select the normal
operation mode that corresponds to that set via a write to the Arbiter Control register.
If necessary, the alternate set may be configured differently and the new
configuration may be engaged by simply writing the new mode in the Arbiter Control
register.
3.2Standard Features
PNX8550
Chapter 3: PMAN Hub
3.2.1Clock Programming
The Hub operates with the Memory Controller clock, as well as the clocks of all the
peripheral modules that connect to the Hub. There is no separate clock for the Hub.
3.2.2Reset-Related Issues
A partial reset of the HUB data transfer buffers not possible on a global basis. Each
peripheral device may use an Abort at the DTL-DMA interface to clear transactions
that may be pending within the data transfer buffers for that peripheral.
3.2.3Register Programming Guidelines
The default configuration of the Arbiter is to provide Round Robin access to all
peripheral devices. This can be altered by software by programming the Arbiter. Once
the Arbiter configuration is completed, the system should be able to operate without
further change to the Arbiter; however it is possible for software to change the Arbiter
configuration on-the-fly in order to change the minimum latency or the minimum
memory bandwidth that is available to each peripheral device.
Note that the active set of configuration registers (set A or set B) cannot be read by
software once that set is activated. The inactive set may be safely written or read. If
software needs to have access to the values within the active set, then a copy of
these values should be maintained in main memory as a reference.
The PMAN Security sandbox register settings must be initialized by software since
the default is for all peripherals to use sandbox #1. It is recommended that the
sandboxes be organized so that each processor (MIPS, TriMedia1 and TriMedia2)
and the peripherals that are associated with each processor have access to a
separate memory region. The memory regions may overlap if there is a need to share
I/O buffers with more than one processor. Note that there are separate enable bits for
“read” access and “write” access, allowing one processor and its peripherals to fully
access a memory region; while another processor may only have “read” access to
shared buffer or data space.
The fourth sandbox may be used for special purposes, such as inter-process
communication buffers and semaphores, etc.
4.Register Descriptions
4.1PMAN Hub Arbiter Registers
4.1.1Register Summary
.
Table 2: PMAN Hub Arbiter Register Summary
OffsetSymbolDescription
0x06 4000—41FCTDMA A128 entries of TDMA timing wheel for set A
0x06 4200—423CPRIORITY A16 entries of priority list for set A
0x06 4240—427CReserved
0x06 4280—42BCFIRST Round Robin A16 entries of first round robin list for set A
0x06 42C0—42FCReserved
0x06 4300—431CLAST Round Robin A8 entries of last round robin list for set A
0x06 4320—43FCReserved
0x06 4400—45FCTDMA B128 entries of TDMA timing wheel for set B
0x06 4600—463CPRIORITY B16 entries of priority list for set B
0x06 4640—467CReserved
0x06 4680—46BCFIRST Round Robin B16 entries of first round robin list for set B
0x06 46C0—46FCReserved
0x06 4700—471CLAST Round Robin B8 entries of last round robin list for set B
0x06 4720—47FCReserved
0x06 4800NR Entries ANumber of valid entries in arbitration lists for set A
0x06 4804NR Entries BNumber of valid entries in arbitration lists for set B
0x06 4808—48FCReserved
0x06 4900ControlRegister to control operation mode of arbiter
0x06 4904StatusRegister to monitor operation mode of arbiter
0x06 4908—4FF8Reserved
0x06 4FFCMODULE_IDModule ID and revision information