Page 1

April 4-7, 2016 | Silicon Valley
HIGH PERFORMANCE VIDEO
ENCODING WITH NVIDIA GPUS
Abhijit Patait
Eric Young
April 4th, 2016
Page 2

NVIDIA GPU Video Technologies
Video Hardware Capabilities
Video Software Overview
AGENDA
Common Use Cases for Video
Performance and Quality Tuning
New Directions
SDK Links
2
Page 3

NVIDIA GPU VIDEO TECHNOLOGIES
3
Page 4

NVIDIA VIDEO TECHNOLOGIES
• Dedicated hardware for encode & decode
• Linux, Windows, FFMPEG
4
Page 5

NVIDIA VIDEO TECHNOLOGIES EVOLUTION
Low-latency Streaming
Cloud transcoding
• Social media
GRID
• Live streaming
• Video-on-demand
5
Page 6

GPU VIDEO ENCODE
• Low power
• Low latency
• High performance and scalability
• Automatic benefit from
Benefits
Time
Frame #4
Frame #4
improvements in hardware
• Linux, Windows, C/C++, FFMPEG
support
Client
Client
Client
Client
6
Page 7

VIDEO HARDWARE CAPABILITIES
7
Page 8

NVIDIA GPU VIDEO HARDWARE
NVDEC
• Video decoder
• MPEG-2, VC-1, H.264, HEVC
• Fermi, Kepler, Maxwell, and
future GPUs
NVENC
• Video encoder
• H.264, HEVC
• Kepler, Maxwell, and future
GPUs
8
Page 9

ENCODE CAPABILITIES
KEPLER
(GK107, GK104)
H.264 only H.264 only H.264 and HEVC/H.265
Standard 4:2:0,
Planar 4:4:4 & proprietary 4:4:4
~240 fps 2-pass encoding @
720p
GRID K340/K520, K1/K2,
Quadro K5000, Tesla K10/K20,
GeForce GTX 680
NV Encode SDK 1.0-5.0 NV Encode SDK 4.0+ NV Encode SDK 5.0
Standard 4:2:0, 4:4:4 and
H.264 lossless encoding
~500 fps 2-pass encoding @
720p
Maxwell-based GRID &
Quadro products
MAXWELL GEN 1
(GM107)
MAXWELL GEN 2
(GM200, GM204, GM206)
Standard 4:2:0, 4:4:4 and
H.264 lossless encoding
~900 fps 2-pass encoding @
720p
Tesla M4, M40, M6, M60,
Quadro M4000, M5000, M6000,
GeForce GTX 960, 980, Titan X
Video Codec SDK 6.0+
9
Page 10

DECODE CAPABILITIES
KEPLER
(GK107, GK104)
MPEG-2, MPEG-4, H.264 MPEG-2, MPEG-4, H.264, HEVC
H.264: ~200 fps at 1080p;
1 stream of 4K@30
H.265: Not supported H.265: Not supported H.265: ~500 fps at 1080p
Video Codec SDK 5.0+ Video Codec SDK 5.0+ Video Codec SDK 5.0+
4096 × 4096 4096 × 4096 4096 × 4096
(GM107, GM204, GM200)
with CUDA acceleration
H.264: ~540 fps at 1080p
4 streams of 4K@30
MAXWELL 1
MAXWELL 2
(GM206)
MPEG-2, MPEG-4, H.264
HEVC/H.265 fully in hardware
H.264: ~540 fps at 1080p
4 streams of 4K@30
4 streams of 4K@30
10
Page 11

VIDEO SOFTWARE OVERVIEW
11
Page 12

NVIDIA VIDEO TECHNOLOGIES – PRE-2016
VIDEO DECODE/PLAYBACK
DXVA for Windows
VDPAU for Linux
NVCUVID VIDEO DECODING
Windows, Linux,
CUDA interoperability
NVENC SDK
Hardware encoder API
Windows, Linux
CUDA, DirectX interoperability
GRID/CAPTURE SDK, MFT
Use-case specific APIs
12
Page 13

NVIDIA VIDEO TECHNOLOGIES – 2016++
VIDEO CODEC SDK
• Flexibility
• API for encode + decode
• Windows, Linux
• CUDA, DirectX, OpenGL
interoperability
• High performance transcode
• Current: Video Codec SDK 6.0
FFMPEG SUPPORT*
• Hardware acceleration for most
popular video and audio framework
• Leverages FFmpeg’s Audio codec,
stream muxing, and RTP protocols.
• Windows, Linux
• Wide adoption
*To get access to the latest FFmpeg repository with NVENC support,
please contact your NVIDIA relationship manager.
13
Page 14

VIDEO CODEC SDK FEATURES
Video SDK = encode +
decode
Hardware assisted motion estimation for
encoders, Image stabilization
What’s New
6.0
6.0
GB inputs 6.0
estimation only
-frames
-ahead
6.0
7.0
Transcoding, Broadcast, Video production
RGB + encode
custom
d perceptual quality – Available in May 2016
14
Page 15

ROADMAP
Q2’15 Q3’15 Q4’15 Q1’16 Q2’16 Q3’16
NVENC SDK 5.0
• HEVC
• Maxwell Gen 2
• H.264 4:4:4
• H.264 lossless
GM204
Maxwell Gen 2
GM206
Video SDK 6.0
• ME-only (H.264)
• HEVC Quality+
• RGB inputs
• HEVC AQ
Future…
• Quality++
• HEVC 10-bit
• HEVC 4:4:4
• HEVC lossless
• ME-only (HEVC)
• 4K HEVC 60 fps
• 8K HEVC
Pascal
15
Page 16

COMMON USE CASES FOR VIDEO
16
Page 17

CAPTURE + ENCODE
• Capture Desktop (NvFBC) and
RenderTargets (NvIFR)
• Low Latency, low CPU overhead
• Fully offloads H.264 and HEVC with
NVENC
• High density of users per GPU
• Streaming Games and Enterprise Apps
Apps
Apps
Apps
Graphics
commands
Tesla, GRID, or Quadro GPU
3D
Framebuffer
Remote
Graphics Stack
H.264 or
raw streams
NVENC
NVIFR NVFBC
Render
Target
Front
Buffer
Network
17
Page 18

STREAM APPLICATIONS
• Streaming software
• VMware Horizon Blast Extreme
• Nice Desktop Cloud Visualization
• Capture SDK + Encode SDK
• Capture (NvFBC and NvIFR)
• Encode with NvENC (H.264 and HEVC)
• Supported in Virtualized environments
• GPU direct attached mode
• vGPU mode (shared GPU)
18
Page 19

PERFORMANCE STUDY
• VMWare Horizon Blast Extreme + GPU
• 37% better performance (fps)
• 21% lower latency
• 19% reduction in bandwidth
• 16% reduction in CPU utilization
• 18% increase in number of users
19
Page 20

LIVE VIDEO TRANSCODING
• Higher number video streams per GPU server
• 1 stream to N streams (multi-resolution)
• Fewer servers needed, higher density, lower TCO
• Requires Lower bitrate (B-Frames)
• Live Transcoding User Generated Content
• Live video broadcasts, presidential debates, concerts
• Broadcasting from mobile device
• Live game streaming events
20
Page 21

TRANSCODE FOR ARCHIVING
• High density of streams per GPU servers
• Lower TCO, lower latency
• 1 stream to N streams (multi-resolution)
• Archiving
• HQ archiving for non-live video streaming
• Quality is and low bitrate are the most important (I, B, and P support)
• Cost per stream
21
Page 22

VIDEO CONFERENCING
• Live video conferencing
• Video transcoding (1 to N streams)
• Screen sharing for meetings
• Video enhancements
• Video stabilization
• Frame rate up sampling
• High quality, low bitrate
22
Page 23

PERFORMANCE AND QUALITY TUNING
23
Page 24

RECOMMENDED SETTINGS
Remote Graphics
• NVENC has video presets for latency (I and P frames only)
NV_HW_ENC_PRESET_LOW_LATENCY_HQ
NV_HW_ENC_PARAMS_RC_2_PASS_QUALITY
• Video Bitrate settings for low latency
dwVBVBufferSize = dwAvgBitRate / (dwFrameRateNum/dwFrameRateDen)
dwVBVInitialDelay = dwVBVBufferSize
• Video Bitrate settings for higher quality
K = 4;
dwVBVBufferSize = K * dwAvgBitRate / (dwFrameRateNum/dwFrameRateDen)
dwVBVInitialDelay = dwVBVBufferSize
24
Page 25

RECOMMENDED SETTINGS
Video Transcoding
• NVENC settings for video quality (I, B, P frames)
NV_ENC_PRESET_HQ_GUID
NV_ENC_PARAMS_RC_2_PASS_QUALITY
set B frames > 0 (EncodeConfig::numB)
• Video Bitrate settings for low latency
dwVBVBufferSize = dwAvgBitRate / (dwFrameRateNum/dwFrameRateDen)
dwVBVInitialDelay = dwVBVBufferSize
• Video Bitrate settings for higher quality
K = 4;
dwVBVBufferSize = K * dwAvgBitRate / (dwFrameRateNum/dwFrameRateDen)
dwVBVInitialDelay = dwVBVBufferSize
25
Page 26

TESLA PERFORMANCE
Xeon E5 sw encode
Tesla M60 / 2xGM204
Tesla M6 / 1xGM204
Tesla M4 / 1xGM206
# NVDEC
1+1 2+2
1 2
1 1
# NVENC
STREAMS*
2
(x264)
2 x (14+14)
14+14
7
(435Mpixels/sec)
# 1080P30 HEVC
STREAMS*
0.25-0.5
(x265)
2 x (10+10)
(622+622Mpixels/sec)
10+10
(622+622Mpixels/sec)
5
(311Mpixels/sec)
*Each Maxwell NVENC can do:
• 7x h.264 1080p30 Highest Quality with B-frames
• 5x HEVC 1080p30 Highest Quality with no B-frames
26
Page 27

ENCODE PERF/QUALITY
Medium
Slow
Medium
Slow
Medium
Slow
37.0
37.2
37.4
37.6
37.8
38.0
0 100 200 300 400 500
Quality (PSNR)
Performance (FPS)
Quality vs Performance
NVENC
QSV
x264
• Quality
• = x264
• Performance
• Single NVENC is 3-4x vs x264
27
Page 28

NEW DIRECTIONS
28
Page 29

NEW USE CASES
• Standalone NVENC motion estimation mode
• Continued video quality improvements
• Adaptive GOP, Adaptive B-frames, Adaptive Quantization
• Temporal AQ
• Frame look ahead
• Video Stabilization with compute
• Use CUDA cores for image stabilization to remove video shakiness
• Algorithm is well suited for GPU architectures
• Takes advantage of texture cache
• Scales on GPUs because of high level of parallelism\
29
Page 30

DEEP LEARNING VIDEO INFERENCE
Using 3D ConvNet
• Video Analysis using pre-trained Convolution3D network (spatiotemporal signals)
• Use NVDEC to improve performance when running GPU inference
• https://research.facebook.com/blog/c3d-generic-features-for-video-analysis/
30
Page 31

SDK LINKS
31
Page 32

NVIDIA VIDEO CODEC SDK
Since Kepler dGPU have had Fixed-
Function Decoder and Encoder blocks
NVENC – NVIDIA Video Encoder
NVDEC – NVIDIA Video Decoder
Samples and documentation
https://developer.nvidia.com/nvidiavideo-codec-sdk
GM200
32
Page 33

FFMPEG + NVENC
• NVENC added 1/2015
• NVRESIZE added 8/2015
• CUDA Context sharing and Zero-Copy
• NVDEC added 1/2016
• https://developer.nvidia.com/ffmpeg
33
Page 34

April 4-7, 2016 | Silicon Valley
QUESTIONS?
Find us at GTC Hangouts
GTC Pod B - H6145A: Video and Image Processing
4/5 (Tuesday) @ 12:45 – 2pm
GTC Pod A - H6145B: Video and Image Processing
4/6 (Wednesday) @ 8:45am - 10am
Abhijit Patait apatait@nvidia.com
Eric Young eyoung@nvidia.com