Nvidia GF100 Whitepaper

Whitepaper
NVIDIA GF100
World’s Fastest GPU Delivering Great Gaming
Performance with True Geometric Realism
Dedicated to the World’s PC Gamers
Table of Contents
Introducing GF100................................................................................................................................... 4
Exceptional Gaming Performance ........................................................................................................ 4
First-rate image quality ........................................................................................................................ 4
Film-like Geometric Realism................................................................................................................. 5
Revolutionary Compute Architecture for Gaming ................................................................................. 5
Geometric Realism .................................................................................................................................. 7
Tessellation and Displacement Mapping Overview ............................................................................... 7
Benefits of Tessellation with Displacement Mapping ............................................................................ 8
GF100 Architecture In-Depth ................................................................................................................. 11
GPC Architecture ............................................................................................................................... 12
Parallel Geometry Processing ............................................................................................................ 13
The PolyMorph Engine ................................................................................................................... 13
Raster Engine ................................................................................................................................. 14
Third Generation Streaming Multiprocessor ....................................................................................... 16
512 High Performance CUDA cores................................................................................................ 16
16 Load/Store Units ....................................................................................................................... 16
Four Special Function Units ............................................................................................................ 16
Dual Warp Scheduler ...................................................................................................................... 17
Texture Units .................................................................................................................................. 17
64 KB Configurable Shared Memory and L1 Cache ........................................................................ 19
L2 Cache ........................................................................................................................................... 20
New ROP Units with Improved Antialiasing ........................................................................................ 21
Compute Architecture for Graphics ....................................................................................................... 23
Next Generation Effects using GPU Computing ................................................................................. 24
Ray tracing ........................................................................................................................................ 25
Smoothed Particle Hydrodynamics (SPH) .......................................................................................... 26
NVIDIA 3D Vision Surround ................................................................................................................... 28
Bezel Correction ................................................................................................................................ 29
Conclusion ............................................................................................................................................ 30
Introducing GF100
Introducing GF100
Introducing GF100Introducing GF100
Over the years, the continuing and insatiable demand for high quality 3D graphics has driven NVIDIA to create significant GPU architectural innovations. In 1999, the GeForce 256 enabled hardware transform and lighting. In 2001, GeForce 3 introduced programmable shading. Later, GeForce FX provided full 32­bit floating point precision throughout the GPU. And in 2006, GeForce 8 introduced a powerful and efficient unified, scalar shader design. Each GPU we designed was intended to take graphics closer to reality, and to distinguish the PC as the most dynamic and technologically advanced gaming platform.
NVIDIA’s latest GPU, codenamed GF1001, is the first GPU based on the Fermi architecture. GF100 implements all DirectX 11 hardware features, including tessellation and DirectCompute, among others. GF100 brings forward a vastly improved compute architecture designed specifically to support next generation gaming effects such as raytracing, order-independent transparency, and fluid simulations.
Game performance and image quality receive a tremendous boost, and GF100 enables film-like geometric realism for game characters and objects. Geometric realism is central to the GF100 architectural enhancements for graphics. In addition, PhysX simulations are much faster, and developers can utilize GPU computing features in games most effectively with GF100.
In designing GF100, our goals were to deliver:
Exceptional Gaming Performance
••
First-rate image quality
••
Film-like Geometric Realism
••
A Revolutionary Compute Architecture for Gaming
••
Exceptional Gaming Performance
First and foremost, GF100 is designed for gaming performance leadership. Based on Fermi’s third generation Streaming Multiprocessor (SM) architecture, GF100 doubles the number of CUDA cores over the previous architecture.
The geometry pipeline is significantly revamped, with vastly improved performance in geometry shading, stream out, and culling. The number of ROP (Raster Operations) units per ROP partition is doubled and fillrate is greatly improved, enabling multiple displays to be driven with ease. 8xMSAA performance is vastly improved through enhanced ROP compression. The additional ROP units also better balance overall GPU throughput even for portions of the scene that cannot be compressed.
First-rate image quality
GF100 implements a new 32xCSAA (Coverage Sampling Antialiasing) mode based on eight multisamples and 24 coverage samples. CSAA has also been extended to support alpha-to-coverage (transparency multisampling) on all samples, enabling smoother rendering of foliage and transparent textures. GF100 produces the highest quality antialiasing for both polygon edges and alpha textures with minimal performance penalty. Shadow mapping performance is greatly increased with hardware accelerated DirectX 11 four-offset Gather4.
1
“GF” denotes that the chip is a Graphics solution based on the Fermi architecture. “100” denotes that this is the high end part of
the “GF” family of GPUs.
4
Film-like Geometric Realism
While programmable shading has allowed PC games to mimic film in per-pixel effects, geometric realism has lagged behind. The most advanced PC games today use one to two million polygons per frame. By contrast, a typical frame in a computer generated film uses hundreds of millions of polygons. This disparity can be partly traced to hardware—while the number of pixel shaders has grown from one to many hundreds, the triangle setup engine has remained a singular unit, greatly affecting the relative pixel versus geometry processing capabilities of today’s GPUs. For example, the GeForce GTX 285 has more than 150× the shading horsepower of the GeForce FX, but less than 3× the geometry processing rate. The outcome is such that pixels are shaded meticulously, but geometric detail is comparatively modest.
In tackling geometric realism, we looked to movies for inspiration. The intimately detailed characters in computed generated films are made possible by two key techniques: tessellation and displacement mapping. Tessellation refines large triangles into collections of smaller triangles, while displacement mapping changes their relative position. In conjunction, these two techniques allow arbitrarily complex models to be formed from relatively simple descriptions. Some of our favorite movie characters, such as Davy Jones from Pirates of the Caribbean were created using these techniques.
GF100’s entire graphics pipeline is designed to deliver high performance in tessellation and geometry throughput. GF100 replaces the traditional geometry processing architecture at the front end of the graphics pipeline with an entirely new distributed geometry processing architecture that is implemented using multiple “PolyMorph Engines” . Each PolyMorph Engine includes a tessellation unit, an attribute setup unit, and other geometry processing units. Each SM has its own dedicated PolyMorph Engine (we provide more details on the Polymorph Engine in the GF100 architecture sections below). Newly generated primitives are converted to pixels by four Raster Engines that operate in parallel (compared to a single Raster Engine in prior generation GPUs). On-chip L1 and L2 caches enable high bandwidth transfer of primitive attributes between the SM and the tessellation unit as well as between different SMs. Tessellation and all its supporting stages are performed in parallel on GF100, enabling breathtaking geometry throughput.
While GF100 includes many enhancements and performance improvements over past GPU architectures, the ability to perform parallel geometry processing is possibly the single most important GF100 architectural improvement. The ability to deliver setup rates exceeding one primitive per clock while maintaining correct rendering order is a significant technical achievement never before done in a GPU.
Revolutionary Compute Architecture for Gaming
The rasterization pipeline has come a long way, but as games aspire to film quality, graphics is moving toward advanced algorithms that require the GPU to perform general computation along with programmable shading. G80 was the first NVIDIA GPU to include compute features. GF100 benefits from what we learned on G80 in order to significantly improve compute features for gaming.
GF100 leverages Fermi’s revolutionary compute architecture for gaming applications. In graphics, threads operate independently, with a predetermined pipeline, and exhibit good memory access locality. Compute threads on the other hand often communicate with each other, work in no predetermined fashion, and often read and write to different parts of memory. Major compute features improved on GF100 that will be useful in games include faster context switching between graphics and PhysX, concurrent compute kernel execution, and an enhanced caching architecture which is good for irregular
5
algorithms such as ray tracing and AI algorithms. We will discuss these features in more detail in subsequent sections of this paper.
Vastly improved atomic operation performance allows threads to safely cooperate through work queues, accelerating novel rendering algorithms. For example, fast atomic operations allow transparent objects to be rendered without presorting (order independent transparency) enabling developers to create levels with complex glass environments.
For seamless interoperation with graphics, GF100’s GigaThread engine reduces context switch time to about 20 microseconds, making it possible to execute multiple compute and physics kernels for each frame. For example, a game may use DirectX 11 to render the scene, switch to CUDA for selective ray tracing, call a Direct Compute kernel for post processing, and perform fluid simulations using PhysX.
6
Geometric Realism
Geometric Realism
Geometric RealismGeometric Realism
Tessellation and Displacement Mapping Overview
While tessellation and displacement mapping are not new rendering techniques, up until now, they have mostly been used in films. With the introduction of DirectX 11 and NVIDIA’s GF100, developers will be able to harness these powerful techniques for gaming applications. In this section we will discuss some of the characteristics and benefits of tessellation and displacement mapping in the context of game development and high-quality, realtime rendering.
Game assets such as objects and characters are typically created using software modeling packages like Mudbox, ZBrush, 3D Studio Max, Maya, or SoftImage. These packages provide tools based on surfaces with displacement mapping to aid the artist in creating detailed characters and environments. Today, the artist must manually create polygonal models at various levels of detail as required by the various rendering scenarios in the game in order to maintain playable frame-rates. These models are meshes of triangles with associated texture maps needed for proper shading. When used in a game, the model information is sent per frame to the GPU through its host interface. Game developers tend to use relatively simple geometric models due to the limited bandwidth of the PCI Express bus and the modest geometry throughput of current GPUs.
Even in the best of game titles, there are geometric artifacts due to limitations of existing graphics APIs and GPUs. The result of compromising geometric complexity can be seen in the images below. The holster has a heavily faceted or segmented strap. The corrugated roof, which should look wavy, is in fact a flat surface with a striped texture. Finally, like most characters in games, this person wears a hat, carefully sidestepping the complexity of rendering hair.
Due to limitations in existing graphics APIs and GPUs, even graphically advanced games
are forced to make concessions in geometric detail.
Using GPU-based tessellation, a game developer can send a compact geometric representation of an object or character, and the tessellator unit can produce the correct geometric complexity for the specific scene. We’ll now go into greater detail discussing the characteristics and benefits of tessellation in combination with displacement mapping.
7
Consider the character below. On the left we see the quad mesh used to model the general outline of
When a displacement map (left) is applied to a flat surface, the resulting surface
the figure. This representation is quite compact, even when compared to typical game assets. The image of the character in the middle was created by finely tessellating the description on the left. The result is a very smooth appearance, free of any of the faceting that resulted from limited geometry. Unfortunately this character, while smooth, is no more detailed than the coarse mesh. The image on the right was created by applying a displacement map to the smoothly tessellated character in the middle. This character has a richness of geometric detail that you might associate with film production.
The “Imp” © Kenneth Scott, id Software 2008
Benefits of Tessellation with Displacement Mapping
There are a number of benefits to using tessellation with displacement mapping. The representation is compact, scalable and leads to efficient storage and computation. The compactness of the description means that the memory footprint is small and little bandwidth is consumed pulling the constituent vertices on to the GPU. Because animation is performed on the compact description, more compute intensive, sophisticated, realistic movement is possible. The on-demand synthesis of triangles creates the ability to match the geometric complexity and the number of triangles generated to the situation for the specific character as it appears in a given frame.
This ability to control geometric level of detail (LOD) is very powerful. Because it is on-demand and the data is all kept on-chip, precious memory bandwidth is preserved. Also, because one model may produce many LODs, the same game assets may be used on a variety of platforms, from a modest notebook to a Quad SLI system for example.
The character can also be tailored to how it appears in the scene, if it is small then it gets little geometry, if it is close to the screen it is rendered with
(right) expresses the height information encoded in the displacement map.
8
maximum detail. Additionally, scalable assets mean that developers may be able to use the same
Displaced surfaces behave naturally with animation.
models on multiple generations of games and future GPUs where performance increases enable even greater detail than was possible when initially deployed in a game. Complexity can be adjusted dynamically to target a given frame rate. Finally, models that are rendered using tessellation with displacement mapping much more closely resemble those used natively in the tools used by artists, freeing artists from the overhead work of creating models with different LODs.
Displacement mapping is a very powerful modeling and rendering technique. A displacement map is a texture that expresses height information. When applied to a model, the displacement map is used to alter the relative position of vertices in the model. Displacement mapping allows complex geometry to be stored in a compact map. In this way, displacement maps can be regarded as a form of geometry compression.
Unlike emboss maps, normal maps, and parallax maps which merely alter the appearance of pixels, displacement maps alter the position of vertices. This enables self occlusion, accurate shadows, and robust behavior at the edges of silhouettes.
Displacement mapping is complementary to existing bump mapping techniques. For example, displacement maps can be used to define major surface features while finer grained techniques such as normal mapping are used for low level details such as scratches and moles.
In addition to being a simple way to create complex geometry, displacement mapped geometry also behaves naturally when animated. Consider the simple example to the right—the blunt spikes follow the base shape as it is bent. Displacement mapped characters behave similarly. Consider the Imp character on the preceding page. It is animated by manipulating the coarse control hull (left). The displacement mapped character (right) naturally follows the animation of the underlying surface.
Finally, one of the most interesting aspects of displacement maps is the ability to easily modify them during game play. In today’s games, spraying a metal door with bullets leaves a trail of bullet “decals”, but the shape of the door will not be altered. With displacement mapping, the same decal textures can be used to alter the displacement map, allowing a player to deform both the appearance and underlying structure of game objects.
9
Today’s games employ decals to depict altered surfaces. With displacement mapping, bullet decals can be used to alter the
underlying geometry of objec ts.
10
Loading...
+ 21 hidden pages