Added chapter regarding the DPU targeted reference
02/28/2019 Version 1.0
Initial release
N/A
SendFeedback
Revision History
The following table shows the revision history for this document.
Section Revision Summary
design.
DPU IP Product Guidewww.xilinx.com2
PG338 (v1.2) March 26, 2019
SendFeedback
Table of Contents
Revision History .......................................................................................................................................................................... 2
IP Facts .................................................................................................................................................................................... 5
Example System with DPU ...................................................................................................................................................... 8
Licensing and Ordering Information .................................................................................................................................. 9
Chapter 2: Product Specification
Hardware Architecture
DSP with Enhanced Utilization (DPU_EU)
Register Space .......................................................................................................................................................................... 13
DPU Performance on Different Devices ......................................................................................................................... 22
Performance of Different Models ..................................................................................................................................... 22
Please Read: Important Legal Notices ............................................................................................................................ 43
DPU IP Product Guidewww.xilinx.com4
PG338 (v1.2) March 26, 2019
Introduction
DPU IP Facts Table
Supported
Zynq®-7000 SoC and
Supported User
Chapter 3: DPU
Design Files
Encrypted RTL
Example Design
Verilog
Constraint File
Xilinx Design Constraints (XDC)
Supported
Design Entry
Vivado® Design Suite
Simulation
N/A
Synthesis
Vivado Synthesis
Provided by Xilinx at the Xilinx Support web page
SendFeedback
IP Facts
The Xilinx® Deep Learning Processor Unit (DPU) is
a configurable engine dedicated for convolutional
neural network. The computing parallelism can be
configured according to the selected device and
application. It includes a set of efficiently optimized
instructions. It can support most convolutional
neural networks, such as VGG, ResNet, GoogLeNet,
YOLO, SSD, MobileNet, FPN, etc.
Features
•
One
slave AXI interface for accessing
configuration and status registers.
• One master interface for accessing instructions.
• Supports configurable AXI master interface with
64 or 128 bits for accessing data.
• Supports individual configuration of each
channel.
• Supports optional interrupt request generation.
• Some highlights of DPU functionality include:
o Configurable hardware architecture includes:
B512, B800, B1024, B1152, B1600, B2304,
B3136, and B4096
o Configurable core number up to three
o Convolution and deconvolution
o Max pooling
o ReLu and Leaky ReLu
o Concat
o Elementwise
o Dilation
o Reorg
o Fully connected layer
o Batch Normalization
o Split
Core Specifics
Device Family
Interfaces
Resources
Provided with Core
S/W Driver
Tested Design Flows
Notes:
1. Linux OS and driver support information are available from
DPU
TRD or DNNDK.
2. If the requirement is on Zynq-7000 SoC, contact your local
FAE.
3. For the supported versions of the tools, see the Vivado
Design Suite User Guide: Release Notes Installation, and
Licensing (UG973).
UltraScale+™ MPSoC Family
Memory-mapped AXI interfaces
See
Configuration
Included in PetaLinux
Support
DPU IP Product Guidewww.xilinx.com5
PG338 (v1.2) March 26, 2019
Host
CPU
RAM
High Speed D at a Tube
DPU
High
Performance
Sched uler
Instruction
Fetch Unit
Globa l Memory Pool
Hybrid Compu tin g Array
PE
PE
PE
PE
X22327-022019
SendFeedback
Chapter 1: Overview
Introduction
The Xilinx® Deep Learning Processor Unit (DPU) is a programmable engine dedicated for convolutional
neural network. The unit contains register configure module, data controller module, and convolution
computing module. There is a specialized instruction set for DPU, which enables DPU to work efficiently
for many convolutional neural networks. The deployed convolutional neural network in DPU includes
VGG, ResNet, GoogLeNet, YOLO, SSD, MobileNet, FPN, etc.
The DPU IP can be integrated as a block in the programmable logic (PL) of the selected Zynq®-7000
SoC and Zynq UltraScale™+ MPSoC devices with direct connections to the processing system (PS). To
use DPU, you should prepare the instructions and input image data in the specific memory address that
DPU can access. The DPU operation also requires the application processing unit (APU) to service
interrupts to coordinate data transfer.
The top-level block diagram of DPU is shown in Figure 1.
DPU IP Product Guidewww.xilinx.com6
PG338 (v1.2) March 26, 2019
Figure 1: Top-Level Block Diagram
Chapter 1: Overview
Hardware Platform
DPU Driver
Lib
API
Vivado
DPU
ExampleThi rd Pa rty
bitfile
X22328-022019
SendFeedback
Development Tools
Use the Xilinx Vivado Design Suite to integrate DPU into your own project. Vivado Design Suite 2018.2
or later version is recommended. Previous versions of Vivado can also be supported. For requests,
contact your sales representative.
Device Resources
The DPU logic resource is optimized and scalable across Xilinx UltraScale+ MPSoC and Zynq-7000
devices. For the detailed resource utilization, refer to
Chapter 3: DPU Configuration
.
How to Run DPU
The DPU operation depends on the driver which is included in the Xilinx Deep Neural Network
Development Kit (DNNDK) toolchain.
You can download the free developer resources from the Xilinx website:
Refer to the DNNDK User Guide (UG1327) to obtain an essential guide on how to run a DPU with
DNNDK tools. The basic development flow is shown in the following figure. First, use Vivado to
generate the bitstream. Then, download the bitstream to the target board and install the DPU driver.
For instructions on how to install the DPU driver and dependent libraries, refer to the DNNDK User Guide (UG1327).
DPU IP Product Guidewww.xilinx.com7
PG338 (v1.2) March 26, 2019
Figure 2: Basic Development Flow
Chapter 1: Overview
DPU
Cam era
AXI Inte rcon nect
Controller
DDR
ARMR5
DisplayPort
USB3.0
SATA3.1
PCI e G e n2
GigE
USB2.0
UART
SPI
Quad SPI
NAND
SD
dem os aicgam ma
Co lor_
conversion
DMA
AXI
Interconnect
AXI
Interconnect
MIPI
CSI2
AXI Inte rcon nect
MIPI
CSI2
X22329-030719
SendFeedback
Example System with DPU
The figure below shows an example system block diagram with the Xilinx UltraScale+ MPSoC using a
camera input. DPU is integrated into the system through AXI interconnect to perform deep learning
inference tasks such as image classification, object detection, and semantic segmentation.
Figure 3: Example System with Integrated DPU
DNNDK
Deep Neural Network Development Kit (DNNDK) is a full-stack deep learning toolchain for inference
with the DPU.
As shown in Figure 4, DNNDK is composed of Deep Compression Tool (DECENT), Deep Neural Network
Compiler (DNNC), Neural Network Runtime (N2Cube), and DPU Profiler.
DPU IP Product Guidewww.xilinx.com8
PG338 (v1.2) March 26, 2019
Chapter 1: Overview
DECENTN2Cub e
DNN C
Prof ile r
OS
H ost CP U
DPU
X22330-022019
Industry-standard
Libraries
Loader
Operating System
H ost C P U
Deep Learning App
(DPU -accele rated)
Prof ile r
Libarary
DPU Driver
DPU
Us er Spac e
Kernel Space
Hardware Platform
X22331-022019
SendFeedback
Figure 4: DNNDK Toolchain
The instructions of DPU are generated offline with DNNDK.
Figure 5
illustrates the hierarchy of executing
deep learning applications on the target hardware platform with DPU.
Figure 5: Application Execution Hierarchy
Licensing and Ordering Information
This IP module is provided at no additional cost under the terms of the Xilinx End User License.
Information about this and other IP modules is available at the Xilinx Intellectual Property page. For
information on pricing and availability of other Xilinx IP modules and tools, contact your local Xilinx sales
representative.
DPU IP Product Guidewww.xilinx.com9
PG338 (v1.2) March 26, 2019
Instruction
Schedu l e r
CPU (DNNDK)
Memory Controller
Bus
Fetcher
Decoder
Di spa tc h er
On-Chip B u ff e r
Controller
Data Mover
On-Chip BRAM
BRAM Read e r/Writer
Computing
En g i ne
Conv
En g i ne
Misc
En g i ne
PE
PE
PE
Processing Sys te m (PS)
Programmable Logic (PL)
Off-Chip Me mory
X22332-022019
SendFeedback
Chapter 2: Product Specification
Hardware Architecture
The detailed hardware architecture of DPU is shown in Figure 6. After start-up, DPU fetches instructions
from the off-chip memory and parses instructions to operate the computing engine. The instructions
are generated by the DNNDK compiler where substantial optimizations have been performed.
To improve the efficiency, abundant on-chip memory in Xilinx® devices is used to buffer the
intermediate data, input, and output data. The data is reused as much as possible to reduce the
memory bandwidth. Deep pipelined design is used for the computing engine. Like other accelerators,
the computational arrays (PE) take full advantage of the fine-grained building blocks, which includes
multiplier, adder, accumulator, etc. in Xilinx devices.
DPU IP Product Guidewww.xilinx.com10
PG338 (v1.2) March 26, 2019
Figure 6: DPU Hardware Architecture
Chapter 2: Product Specification
IMG
ram
IMG
ram
WGT
ram
A
D
B
B
RES
+
×
DSP48 Slice
A+D
M
clk 1x
IMG
ram
IMG
ram
WGT
ram
A
D
B
+
×
DSP48 Slice
A+D
M
clk 2x
WGT
ram
RES
0
DLY
RES
1
OUT
0
OUT
1
+
A
DLY
D
DLY
B0
Async
B1
Async
D
Async
A
Async
B
B
SEL
PCIN
P
PCOUT
PCOUT
RES
0
clk 1x
clk 1x
X22333-022019
SendFeedback
DSP with Enhanced Utilization (DPU_EU)
In the previous DPU version, the general logic and DSP slices work in the same clock domain, though
technically the latter can run at a higher frequency. To enhance the utilization of DSP slices in DPU, the
advanced DPU_EU version was designed.
The EU in “DPU_EU” means enhanced utilization of DSP slices. DSP Double Data Rate (DDR) technique is
used to improve the performance achieved with the device. Therefore, two input clocks for DPU is
needed, one for general logic, and the other for DSP slices. The difference between DPU and DPU_EU is
shown in Figure 7.
All DPU mentioned in this document refer to DPU_EU, unless otherwise specified.
Port Descriptions
The DPU top-level interfaces are shown in the following figure.
DPU IP Product Guidewww.xilinx.com11
PG338 (v1.2) March 26, 2019
Figure 7: Difference between DPU and DPU_EU
Figure 8: DPU_EU IP Port
Chapter 2: Product Specification
S_AXI
Memory mapped
32
I/O
32-bit Memory mapped AXI interface
s_axi_aclk
Clock
1 I AXI clock input for S_AXI
s_axi_aresetn
Reset
1 I Active-Low reset for S_AXI
dpu_2x_clk
Clock
1 I Input clock used for DSP unit in DPU.
dpu_2x_resetn
Reset
1 I Active-Low reset for DSP unit
m_axi_dpu_aclk
Clock
1 I Input clock used for DPU general logic.
m_axi_dpu_aresetn
Reset
1 I Active-Low reset for DPU general logic
DPUx_M_AXI_INSTR
Memory mapped
32
I/O
32-bit Memory mapped AXI interface
DPUx_M_AXI_DATA0
Memory mapped
128
I/O
128-bit Memory mapped AXI interface
DPUx_M_AXI_DATA1
Memory mapped
128
I/O
128-bit Memory mapped AXI interface
dpu_interrupt
Interrupt
1~3
O
Active-High interrupt output from DPU.
SendFeedback
The DPU I/O signals are listed and described in
Table 1: DPU Signal Description
Table 1
.
Signal Name
Interface Type Width I/O Description
AXI slave interface
AXI master interface
AXI master interface
AXI master interface
for registers.
The frequency is two times of
m_axi_dpu_aclk.
for instruction of DPU.
for DPU data fetch.
for DPU data fetch.
The data width is decided by the DPU
number.
Notes:
1. If only input ports are needed, you can edit the ports in the block diagram and declare at
interface level.
the
port
PG338 (v1.2) March 26, 2019
DPU IP Product Guidewww.xilinx.com12
Chapter 2: Product Specification
Reg_dpu_reset
0x004
32
R/W
[0] – reset of DPU core 0
Reg_dpu_isr
0x608
32 R [0] – interrupt status of DPU core 0
SendFeedback
Register Space
The DPU IP implements registers in the programmable logic.
registers are accessible from the host CPU through the S_AXI interface.
Table 2 shows the DPU IP registers. These
Reg_dpu_reset
The reg_dpu_reset register controls the resets of all DPU cores integrated in the DPU IP. The lower three
bits of this register control the reset of up to three DPU cores respectively. All the reset signals are
active-High. The details of reg_dpu_reset is shown in Table 2.
Table 2: Reg_dpu_reset
Register Address
Offset
Width Type Description
[1] – reset of DPU core 1
[2] – reset of DPU core 2
Reg_dpu_isr
The reg_dpu_isr register represents the interrupt status of all DPU cores integrated in the DPU IP. The
lower three bits of this register shows the interrupt status of up to three DPU cores respectively. The
details of reg_dpu_irq is shown in Table 3.
Register Address
Table 3: Reg_dpu_isr
Width Type Description
Offset
[1] – interrupt status of DPU core 1
[2] – interrupt status of DPU core 2
DPU IP Product Guidewww.xilinx.com13
PG338 (v1.2) March 26, 2019
Loading...
+ 30 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.