New ROP Units with Improved Antialiasing ........................................................................................ 21
Compute Architecture for Graphics ....................................................................................................... 23
Next Generation Effects using GPU Computing ................................................................................. 24
Ray tracing ........................................................................................................................................ 25
Over the years, the continuing and insatiable demand for high quality 3D graphics has driven NVIDIA to
create significant GPU architectural innovations. In 1999, the GeForce 256 enabled hardware transform
and lighting. In 2001, GeForce 3 introduced programmable shading. Later, GeForce FX provided full 32bit floating point precision throughout the GPU. And in 2006, GeForce 8 introduced a powerful and
efficient unified, scalar shader design. Each GPU we designed was intended to take graphics closer to
reality, and to distinguish the PC as the most dynamic and technologically advanced gaming platform.
NVIDIA’s latest GPU, codenamed GF1001, is the first GPU based on the Fermi architecture. GF100
implements all DirectX 11 hardware features, including tessellation and DirectCompute, among others.
GF100 brings forward a vastly improved compute architecture designed specifically to support next
generation gaming effects such as raytracing, order-independent transparency, and fluid simulations.
Game performance and image quality receive a tremendous boost, and GF100 enables film-like
geometric realism for game characters and objects. Geometric realism is central to the GF100
architectural enhancements for graphics. In addition, PhysX simulations are much faster, and developers
can utilize GPU computing features in games most effectively with GF100.
In designing GF100, our goals were to deliver:
•
• Exceptional Gaming Performance
••
•
• First-rate image quality
••
•
• Film-like Geometric Realism
••
•
• A Revolutionary Compute Architecture for Gaming
••
Exceptional Gaming Performance
First and foremost, GF100 is designed for gaming performance leadership. Based on Fermi’s third
generation Streaming Multiprocessor (SM) architecture, GF100 doubles the number of CUDA cores over
the previous architecture.
The geometry pipeline is significantly revamped, with vastly improved performance in geometry shading,
stream out, and culling. The number of ROP (Raster Operations) units per ROP partition is doubled and
fillrate is greatly improved, enabling multiple displays to be driven with ease. 8xMSAA performance is
vastly improved through enhanced ROP compression. The additional ROP units also better balance
overall GPU throughput even for portions of the scene that cannot be compressed.
First-rate image quality
GF100 implements a new 32xCSAA (Coverage Sampling Antialiasing) mode based on eight
multisamples and 24 coverage samples. CSAA has also been extended to support alpha-to-coverage
(transparency multisampling) on all samples, enabling smoother rendering of foliage and transparent
textures. GF100 produces the highest quality antialiasing for both polygon edges and alpha textures with
minimal performance penalty. Shadow mapping performance is greatly increased with hardware
accelerated DirectX 11 four-offset Gather4.
1
“GF” denotes that the chip is a Graphics solution based on the Fermi architecture. “100” denotes that this is the high end part of
the “GF” family of GPUs.
4
Film-like Geometric Realism
While programmable shading has allowed PC games to mimic film in per-pixel effects, geometric realism
has lagged behind. The most advanced PC games today use one to two million polygons per frame. By
contrast, a typical frame in a computer generated film uses hundreds of millions of polygons. This
disparity can be partly traced to hardware—while the number of pixel shaders has grown from one to
many hundreds, the triangle setup engine has remained a singular unit, greatly affecting the relative pixel
versus geometry processing capabilities of today’s GPUs. For example, the GeForce GTX 285 has more
than 150× the shading horsepower of the GeForce FX, but less than 3× the geometry processing rate.
The outcome is such that pixels are shaded meticulously, but geometric detail is comparatively modest.
In tackling geometric realism, we looked to movies for inspiration. The intimately detailed characters in
computed generated films are made possible by two key techniques: tessellation and displacement
mapping. Tessellation refines large triangles into collections of smaller triangles, while displacement
mapping changes their relative position. In conjunction, these two techniques allow arbitrarily complex
models to be formed from relatively simple descriptions. Some of our favorite movie characters, such as
Davy Jones from Pirates of the Caribbean were created using these techniques.
GF100’s entire graphics pipeline is designed to deliver high performance in tessellation and geometry
throughput. GF100 replaces the traditional geometry processing architecture at the front end of the
graphics pipeline with an entirely new distributed geometry processing architecture that is implemented
using multiple “PolyMorph Engines” . Each PolyMorph Engine includes a tessellation unit, an attribute
setup unit, and other geometry processing units. Each SM has its own dedicated PolyMorph Engine (we
provide more details on the Polymorph Engine in the GF100 architecture sections below). Newly
generated primitives are converted to pixels by four Raster Engines that operate in parallel (compared to
a single Raster Engine in prior generation GPUs). On-chip L1 and L2 caches enable high bandwidth
transfer of primitive attributes between the SM and the tessellation unit as well as between different SMs.
Tessellation and all its supporting stages are performed in parallel on GF100, enabling breathtaking
geometry throughput.
While GF100 includes many enhancements and performance improvements over past GPU
architectures, the ability to perform parallel geometry processing is possibly the single most important
GF100 architectural improvement. The ability to deliver setup rates exceeding one primitive per clock
while maintaining correct rendering order is a significant technical achievement never before done in a
GPU.
Revolutionary Compute Architecture for Gaming
The rasterization pipeline has come a long way, but as games aspire to film quality, graphics is moving
toward advanced algorithms that require the GPU to perform general computation along with
programmable shading. G80 was the first NVIDIA GPU to include compute features. GF100 benefits
from what we learned on G80 in order to significantly improve compute features for gaming.
GF100 leverages Fermi’s revolutionary compute architecture for gaming applications. In graphics,
threads operate independently, with a predetermined pipeline, and exhibit good memory access locality.
Compute threads on the other hand often communicate with each other, work in no predetermined
fashion, and often read and write to different parts of memory. Major compute features improved on
GF100 that will be useful in games include faster context switching between graphics and PhysX,
concurrent compute kernel execution, and an enhanced caching architecture which is good for irregular
5
algorithms such as ray tracing and AI algorithms. We will discuss these features in more detail in
subsequent sections of this paper.
Vastly improved atomic operation performance allows threads to safely cooperate through work queues,
accelerating novel rendering algorithms. For example, fast atomic operations allow transparent objects
to be rendered without presorting (order independent transparency) enabling developers to create levels
with complex glass environments.
For seamless interoperation with graphics, GF100’s GigaThread engine reduces context switch time to
about 20 microseconds, making it possible to execute multiple compute and physics kernels for each
frame. For example, a game may use DirectX 11 to render the scene, switch to CUDA for selective ray
tracing, call a Direct Compute kernel for post processing, and perform fluid simulations using PhysX.
6
Geometric Realism
Geometric Realism
Geometric RealismGeometric Realism
Tessellation and Displacement Mapping Overview
While tessellation and displacement mapping are not new rendering techniques, up until now, they have
mostly been used in films. With the introduction of DirectX 11 and NVIDIA’s GF100, developers will be
able to harness these powerful techniques for gaming applications. In this section we will discuss some
of the characteristics and benefits of tessellation and displacement mapping in the context of game
development and high-quality, realtime rendering.
Game assets such as objects and characters are typically created using software modeling packages
like Mudbox, ZBrush, 3D Studio Max, Maya, or SoftImage. These packages provide tools based on
surfaces with displacement mapping to aid the artist in creating detailed characters and environments.
Today, the artist must manually create polygonal models at various levels of detail as required by the
various rendering scenarios in the game in order to maintain playable frame-rates. These models are
meshes of triangles with associated texture maps needed for proper shading. When used in a game, the
model information is sent per frame to the GPU through its host interface. Game developers tend to use
relatively simple geometric models due to the limited bandwidth of the PCI Express bus and the modest
geometry throughput of current GPUs.
Even in the best of game titles, there are geometric artifacts due to limitations of existing graphics APIs
and GPUs. The result of compromising geometric complexity can be seen in the images below. The
holster has a heavily faceted or segmented strap. The corrugated roof, which should look wavy, is in fact
a flat surface with a striped texture. Finally, like most characters in games, this person wears a hat,
carefully sidestepping the complexity of rendering hair.
Due to limitations in existing graphics APIs and GPUs, even graphically advanced games
are forced to make concessions in geometric detail.
Using GPU-based tessellation, a game developer can send a compact geometric representation of an
object or character, and the tessellator unit can produce the correct geometric complexity for the
specific scene. We’ll now go into greater detail discussing the characteristics and benefits of tessellation
in combination with displacement mapping.
7
Consider the character below. On the left we see the quad mesh used to model the general outline of
When a displacement map (left) is applied to a flat surface, the resulting surface
the figure. This representation is quite compact, even when compared to typical game assets. The
image of the character in the middle was created by finely tessellating the description on the left. The
result is a very smooth appearance, free of any of the faceting that resulted from limited geometry.
Unfortunately this character, while smooth, is no more detailed than the coarse mesh. The image on the
right was created by applying a displacement map to the smoothly tessellated character in the middle.
This character has a richness of geometric detail that you might associate with film production.
Benefits of Tessellation with Displacement Mapping
There are a number of benefits to using tessellation with displacement mapping. The representation is
compact, scalable and leads to efficient storage and computation. The compactness of the description
means that the memory footprint is small and little bandwidth is consumed pulling the constituent
vertices on to the GPU. Because animation is performed on the compact description, more compute
intensive, sophisticated, realistic movement is possible. The on-demand synthesis of triangles creates
the ability to match the geometric complexity and the number of triangles generated to the situation for
the specific character as it appears in a given frame.
This ability to control geometric level of detail (LOD) is very powerful. Because it is on-demand and the
data is all kept on-chip, precious
memory bandwidth is preserved. Also,
because one model may produce
many LODs, the same game assets
may be used on a variety of platforms,
from a modest notebook to a Quad SLI
system for example.
The character can also be tailored to
how it appears in the scene, if it is
small then it gets little geometry, if it is
close to the screen it is rendered with
(right) expresses the height information encoded in the displacement map.
8
maximum detail. Additionally, scalable assets mean that developers may be able to use the same
Displaced surfaces behave naturally with animation.
models on multiple generations of games and future GPUs where performance increases enable even
greater detail than was possible when initially deployed in a game. Complexity can be adjusted
dynamically to target a given frame rate. Finally, models that are rendered using tessellation with
displacement mapping much more closely resemble those used natively in the tools used by artists,
freeing artists from the overhead work of creating models with different LODs.
Displacement mapping is a very powerful modeling and rendering technique. A displacement map is a
texture that expresses height information. When applied to a model, the displacement map is used to
alter the relative position of vertices in the model. Displacement mapping allows complex geometry to
be stored in a compact map. In this way, displacement maps can be regarded as a form of geometry
compression.
Unlike emboss maps, normal maps, and parallax maps which merely alter the appearance of pixels,
displacement maps alter the position of vertices. This enables self occlusion, accurate shadows, and
robust behavior at the edges of silhouettes.
Displacement mapping is complementary to existing bump mapping techniques. For example,
displacement maps can be used to define major surface features while finer grained techniques such as
normal mapping are used for low level details such as scratches and moles.
In addition to being a simple way to create
complex geometry, displacement mapped
geometry also behaves naturally when
animated. Consider the simple example to the
right—the blunt spikes follow the base shape
as it is bent. Displacement mapped
characters behave similarly. Consider the Imp
character on the preceding page. It is
animated by manipulating the coarse control
hull (left). The displacement mapped
character (right) naturally follows the
animation of the underlying surface.
Finally, one of the most interesting aspects of displacement maps is the ability to easily modify them
during game play. In today’s games, spraying a metal door with bullets leaves a trail of bullet “decals”,
but the shape of the door will not be altered. With displacement mapping, the same decal textures can
be used to alter the displacement map, allowing a player to deform both the appearance and underlying
structure of game objects.
9
Today’s games employ decals to depict altered surfaces. With displacement mapping, bullet decals can be used to alter the
underlying geometry of objec ts.
10
Loading...
+ 21 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.