No part of this document may be reproduced or transmitted in any form or by any means without prior
written consent of Huawei Technologies Co., Ltd.
Trademarks and Permissions
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specied in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees
or representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every eort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
CANN
AI CPU Custom Operator Development Guide
(Inference)About This Document
About This Document
Overview
An AI CPU operator is an operation of complete compute logic that runs on AI
CPU, one of the compute engines in the Ascend AI Processor. You might need to
develop a custom AI CPU operator in the following cases:
●During neural network (NN) training or inference, if you
operator when converting a third-party open-source network to adapt to the
Ascend AI Processor, a custom AI CPU operator can help you streamline the
model execution process and improve the functionality commissioning
eciency. After the functionality commissioning is passed, convert the custom
AI CPU operator into a TBE operator for performance commissioning.
●In certain scenarios, it is impossible to implement custom operators that run
on AI Core. For example, some operators require int64 data, which is
incompatible with AI Core instructions. When such an operator is not the
performance bottleneck of your network, you can develop a custom AI CPU
operator instead for Ascend AI Processor support.
In the moment, AI CPU custom operators can run only in EP standard form.
Intended Audience
This document is intended for developers who develop custom AI CPU operators.
After reading this document, you will be able to:
nd an unsupported
●Describe the principles and
●Develop custom AI CPU operators based on the samples provided in this
document.
To better understand this document, you need to have:
2.2 Building and Running an Operator...................................................................................................................................9
3 Operator Development Workow.................................................................................... 13
4 Operator Development Preparations............................................................................... 16
5.4.1 Adaptation Plug-in Development (TensorFlow).....................................................................................................33
5.4.2 Adaptation Plug-in Development (Cae).................................................................................................................36
5.5 Operator Project Building and Deployment................................................................................................................ 40
5.5.2 OPP Deployment............................................................................................................................................................... 41
9.2 AI CPU APIs............................................................................................................................................................................. 57
9.2.2 Class CpuKernelContext.................................................................................................................................................. 58
9.2.2.1 CpuKernelContext Constructor and Destructor................................................................................................... 58
9.2.3 Class TensorShape............................................................................................................................................................. 63
9.2.4 Class Tensor......................................................................................................................................................................... 68
9.2.5 Class AttrValue................................................................................................................................................................... 74
9.2.7 Data Types.........................................................................................................................................................................100
9.3.6.1 Class OperatorFactory................................................................................................................................................123
9.3.6.2 Class OperatorCreatorRegister................................................................................................................................125
9.3.6.2.1 Constructor and Destructor.................................................................................................................................. 125
9.3.6.3 Class InferShapeFuncRegister..................................................................................................................................126
9.3.6.3.1 Constructor and Destructor.................................................................................................................................. 126
9.3.6.4 Class InferFormatFuncRegister............................................................................................................................... 127
9.3.6.4.1 Constructor and Destructor.................................................................................................................................. 127
9.3.6.5 Class VerifyFuncRegister........................................................................................................................................... 128
9.3.6.5.1 Constructor and Destructor.................................................................................................................................. 128
9.3.6.6 Class InferenceContext...............................................................................................................................................129
9.3.6.6.1 InferenceContext Constructor and Destructor............................................................................................... 129
9.3.6.7 Class ShapeAndType...................................................................................................................................................134
CANN
AI CPU Custom Operator Development Guide
(Inference)Contents
9.3.6.7.1 Constructor and Destructor.................................................................................................................................. 134
9.4.2 Class Operator..................................................................................................................................................................138
9.4.2.1 Constructor and Destructor..................................................................................................................................... 138
CANN
AI CPU Custom Operator Development Guide
(Inference)Contents
9.4.3 Class Tensor.......................................................................................................................................................................167
9.4.3.1 Constructor and Destructor..................................................................................................................................... 167
9.4.4 Class TensorDesc............................................................................................................................................................. 172
9.4.4.1 Constructor and Destructor..................................................................................................................................... 172
9.4.5 Class Shape....................................................................................................................................................................... 187
9.4.5.1 Constructor and Destructor..................................................................................................................................... 187
9.4.6 Class AttrValue.................................................................................................................................................................191
9.4.6.1 Constructor and Destructor..................................................................................................................................... 191
CANN
AI CPU Custom Operator Development Guide
(Inference)Contents
9.4.7 Data Type and Enumerated Value............................................................................................................................ 193
9.5.2 Class OpRegistrationData............................................................................................................................................ 196
9.5.2.2 Constructor and Destructor..................................................................................................................................... 196
9.5.3 Class OpReceiver............................................................................................................................................................. 219
9.5.3.1 Constructor and Destructor..................................................................................................................................... 219
CANN
AI CPU Custom Operator Development Guide
(Inference)1 Quick Start
1 Quick Start
1.1 Neural Network Introduction
1.2 Operator Basics
1.1 Neural Network Introduction
To enable computers to master knowledge like human beings, a multi-layer
connection network needs to be constructed to
iterative computing and training of the network, it can extract object features.
Generally, this method is called deep learning (DL). With uninterrupted
development, deep learning has displayed its tremendous application value and is
receiving increasing attentions from the industry and academia. Deep learning has
achieved remarkable progresses in image, voice, natural language processing, big
data feature extraction, and ad click-through rate estimation. As a result, multiple
infrastructures, such as
promote deep learning across
Deep neural network research fuels rapid development of neural network models,
enabling them to complete more and more complex processing tasks in a wider
range of
technologies for decades, ever fast and
been provided for neural network models and data, such as CPUs, GPUs, TPUs,
and the latest Ascend AI Processor launched by Huawei.
Articial neural network (ANN) may also be referred to as neural network (NN)
for short, which is an important branch of machine learning (ML). Scientists
perform mathematical modeling on the most basic neurons and build
neural networks based on the certain hierarchical relationship of neurons,
enabling
structures through learning and training, and thereby achieve various complex
computations.
elds. With the rapid development of semiconductor chips and computer
articial neural networks to learn knowledge, adjust their internal
Cae, MXNet, and TensorFlow, have been developed to
elds.
energy-ecient computing resources have
dene a complex object. After
articial
1.2 Operator Basics
A deep learning algorithm consists of multiple compute units referred to as
operators (Ops). In network models, an operator describes the compute logic of
CANN
AI CPU Custom Operator Development Guide
(Inference)1 Quick Start
the layer, for example, the convolution layer that performs convolution and the
fully-connected (FC) layer that multiplies the input by a weight matrix.
The following introduces some basic terms about operators.
Operator Name
The name of an operator identies the operator on a network, and therefore must
be unique on a network. The example network has operators Conv1, Pool1, and
Conv2. Conv1 and Conv2 are of the same type convolution. Conv1 and Conv2 each
indicates a convolution operation.
Figure 1-1 Example network topology
Operator Type
Each operator is of a specic type. For example, the convolution operator is of the
convolution type. A network can have
dierent operators of the same type.
Tensor
Tensors are used to represent the input data and output data in operator
computations. TensorDesc (the tensor descriptor) describes the input data and
output data. Table 1-1 describes the attributes of the TensorDesc struct.
Table 1-1 Description of the TensorDesc attributes
CANN
AI CPU Custom Operator Development Guide
(Inference)1 Quick Start
AttributeDenitionshapeSpecies the shape of a tensor, for example, (10,),
(1024,1024), or (2,3,4). For details, see Shape.
Default: none
Format: (i1, i2, ..., in), where, i1 to in are positive
integers.
dtypeSpecies the data types of a tensor object.
Default: none
Value range: oat16, oat32, int8, int16, int32, uint8,
uint16, bool
FormatSpecies the data layout format. For details, see Format.
Format
Shape
In the deep learning framework, n-dimensional data is stored in an n-dimensional
array. For example, a feature graph of a convolutional neural network is stored in
a four-dimensional array. The four dimensions are batch size (N), height (H),
width (W), and channels (C), respectively.
Data can be stored only in linear mode because the dimensions have a
Dierent deep learning frameworks store feature maps in dierent layouts. For
example,
while TensorFlow uses the layout [Batch, Height, Width, Channels], that is, NHWC.
As shown in Figure 1-2, for an RGB image, the pixel values of each channel are
clustered in sequence as RRRGGGBBB with the NCHW layout. However, with the
NHWC layout, the pixel values are interleaved as RGBRGBRGB.
Figure 1-2 NCHW and NHWC
Cae uses the layout [Batch, Channels, Height, Width], that is, NCHW,
xed order.
The shape of a tensor is described in the format of (D0, D1, ..., Dn – 1), where, D0
n
are positive integers.
to D
For example, the shape (3, 4) indicates a 3 x 4 matrix, where the rst dimension
has three elements, and the second dimension has four elements.
The number count in the round bracket equals to the dimension count of the
tensor. The
brackets, and the second element depends on the element count in the second left
square bracket, and so on. See the following examples.
The tensor shape has its physical meanings:
For a tensor with shape (4, 20, 20, 3), it indicates four 20 x 20 (corresponding to
the two 20s in the shape) pictures (corresponding to 4 in the shape), each of
whose pixel contains the red, green, and blue color components (corresponding to
3 in the shape).
Figure 1-3 Physical meanings of tensor shape
In programming, the shape can be simply understood as a loop of each layer of a
tensor. For example, for operating tensor A with shape (4, 20, 20, 3), the loop
statement is as follows.
produce A {
for (i, 0, 4) {
for (j, 0, 20) {
for (p, 0, 20) {
for (q, 0, 3) {
A[((((((i*20) + j)*20) + p)*3) + q)] = a_tensor[((((((i*20) + j)*20) + p)*3) + q)]
}
}
}
}
}
Axis
An axis is denoted by the index of a dimension of a tensor. For a 2D tensor with
ve rows and six columns, that is, with shape (5, 6), axis 0 represents the rst
CANN
AI CPU Custom Operator Development Guide
(Inference)1 Quick Start
dimension in the tensor, that is, the rows; axis 1 represents the second dimension
of tensor, that is, the columns.
For example, for tensor [[[1, 2],[3, 4]], [[5, 6],[7, 8]]] with shape (2, 2, 2), axis 0
represents data in the rst dimension, that is, matrices [[1, 2],[3, 4]] and [[5, 6],
[7, 8]], axis 1 represents data in the second dimension, that is, arrays [1, 2], [3, 4],
[5, 6], and [7, 8], and axis 2 indicates the data in the third dimension, that is,
numbers 1, 2, 3, 4, 5, 6, 7, and 8.
A negative axis is interpreted as indexing from the end.
n
The axes of an
Figure 1-4 Axis diagram
-dimensional tensor include 0, 1, 2, ..., and n – 1.
Weight
The input data is multiplied by a weight value in the compute unit. For example,
for a two-input operator, an associated weight value is allocated to each of the
inputs. Generally, data of more importance is assigned with a greater weight
value. Therefore, the feature indicated by data with zero weight can be ignored.
As shown in Figure 1-5, in the compute unit, input X1 is multiplied by its
associated weight W1, that is, X1 * W1.
Figure 1-5 Weight computation example
Bias
A bias is another linear component to be applied to the input data, in addition to
a weight. The bias is added to the product of the input and its weight.
As shown in Figure 1-6, in the compute unit, input X1 is multiplied by its
associated weight W1 and then added with its associated bias B1, that is, X1 * W1+ B1.
CANN
AI CPU Custom Operator Development Guide
(Inference)2 AI CPU Introduction
Figure 2-1 System architecture
The following components are involved in building and executing AI CPU
operators:
●Graph Engine (GE): a
unied IR interface provided by Huawei based on the
Ascend AI Software Stack for interfacing with dierent machine learning
frameworks, such as TensorFlow and PyTorch. GE implements the preparation,
partition, optimization, compilation, loading, execution, and management of
the network topology, or the graph.
CANN
AI CPU Custom Operator Development Guide
(Inference)2 AI CPU Introduction
●AI CPU Engine: interfaces with GE, provides the AI CPU operator information
library, and implements operator registration, operator memory allocation
calculation, subgraph optimization, and task generation.
●AI CPU Schedule: works with the Task Schedule to schedule and execute NN
models.
●AI CPU Processor: completes operator computations and provides the operator
implementation library for implementing the execution of AI CPU operators.
●Data Processor: preprocesses data of training samples in training scenarios.
2.2 Building and Running an Operator
Logical Architecture for Building and Running an Operator
A complete AI CPU operator consists of four parts: operator prototype denition,
operator adaption plug-in of the corresponding open-source framework, operator
information library denition, and operator implementation.
Figure 2-2 shows the logical architecture of building and running a developed
operator on the Ascend AI Processor hardware platform.
Figure 2-2 Logical architecture for building and running an operator
TFAdapter is used only for training based on the TensorFlow framework.
The columns in the preceding
CANN
AI CPU Custom Operator Development Guide
(Inference)2 AI CPU Introduction
Operat
or
implem
entatio
n
Operat
or
plug-in
Operat
or
prototy
pe
library
The operator class implementation includes the operator denition
and operator computation implementation.
In the custom operator development scenario based on a third-party
framework (such as TensorFlow and Cae), after developing
implementation code of the custom operator, you need to develop an
adaptation plug-in to map the third-party operator to an operator
supported by the Ascend AI Processor and register the operator
information with GE. To run a network trained on a third-party
framework, the operator plug-in information in GE is loaded and
called to parse and map the operators on the network to operators
supported by the Ascend AI Processor.
The operator prototype denitionspecies the constraints on an
operator that runs on the Ascend AI Processor, mainly reecting the
mathematical meanings of the operator. The constraints include
dening the operator inputs, outputs, and other attributes, verifying
arguments, and inferring the shape. During network execution, GE
calls the
operator arguments. If the verication passes, GE infers the output
shape and dtype of each node by calling the inference function of the
operator prototype library and allocates static memory for the result
tensor.
verication API of the operator prototype library to verify
Operat
or
inform
ation
library
Building an Operator
Figure 2-3 shows the workow of building an AI CPU operator.
The operator information library mainly reects the restrictions on the
physical implementation of operators on the Ascend AI Processor,
including the input and output names and data types. During network
execution, AI CPU Engine performs basic verication and operator
matching based on the operator information in the operator
information library.
CANN
AI CPU Custom Operator Development Guide
(Inference)2 AI CPU Introduction
Figure 2-3 Building an AI CPU operator
1.Deliver a third-party network model to GE.
For TensorFlow-based online training, TF Adapter is called to generate the
source TensorFlow model, which is then delivered to GE. For AscendCL-based
model inference, the source model is directly delivered to GE.
The topology of a network model is referred to as a graph.
2.GE calls the operator plug-in to map operators in the source network model
to operators supported by the Ascend AI Processor, so that the original
TensorFlow/Cae graph can be parsed into a graph supported by the Ascend
AI Processor.
3.GE calls the
verication API of the operator prototype library to verify
operator arguments. If the verication passes, GE infers the output shape and
dtype of each node by calling the inference function of the operator
prototype library and allocates memory for the result tensor.
4.GE delivers the entire graph to AI CPU Engine. AI CPU Engine reads the
operator information library, looks up an appropriate format for the operator,
and returns the format to GE.
5.GE partitions the graph into subgraphs and delivers the subgraphs to AI CPU
Engine. AI CPU Engine optimizes the subgraphs and returns the optimized
subgraphs to GE.
6.GE builds the graph (including memory and stream allocation) and sends a
genTask request to AI CPU Engine. Then, AI CPU Engine returns the taskinfo
of the operator to GE. After the graph build process is complete, a model
le
that adapts to the Ascend AI Processor is generated.
Set up the development and operating
environment required for operator
development, execution, and verication.
Analyze the operator, specify its functionality,
input, and output, and determine the operator
type and the name of the OPP le generated
after the operator is built.
Create a custom operator project.4.3 Project
4 Operator
Development
Preparations
4.2 Operator
Analysis
Creation
CANN
AI CPU Custom Operator Development Guide
(Inference)3 Operator Development Workow
ActionDescriptionSee Also
Operator
code
implementati
on
Operator
prototype
denition
Implement the compute logic of the operator.5.1 Operator
Code
Implementati
on
Implement the operator prototype denition
le, which species the constraints on an
operator that runs on the Ascend AI Processor,
5.2 Operator
Prototype
Denition
mainly reecting the mathematical meanings
of the operator. The constraints include
dening the operator inputs, outputs,
attributes, and value ranges, verifying
arguments, and inferring the shape. The
information
dened by the prototype is
registered with the operator prototype library
of GE. During
oine model conversion, GE
calls the verication API of the operator
prototype library to verify operator arguments.
verication passes, GE infers the output
If the
shape and dtype of each node by calling the
inference function of the operator prototype
library and allocates static memory for the
result tensor.
Operator
information
denition
Operator
plug-in
implementati
on
The operator information congurationle is
used to register the operator information with
the operator information library, including the
OpType and input/output dtype and name.
During network execution, AI CPU Engine
performs basic
verication and operator
matching based on the operator information
in the operator information library.
If your custom operator is developed based on
a third-party framework (such as TensorFlow
or Cae), you need to develop a plug-in to
map the operator to one that adapts to the
Ascend AI Processor.
CANN
AI CPU Custom Operator Development Guide
(Inference)3 Operator Development Workow
ActionDescriptionSee Also
Operator
project
building and
deployment
● Operator build: builds the operator plug-in
implementation le, prototype denitionle, and information denitionle into the
operator plug-in library, operator prototype
library, and operator information library,
respectively.
● Operator deployment: deploys the operator
implementation
le, plug-in library,
prototype library, and information library to
the system OPP, that is, a corresponding
directory in the opp directory.
In the command line, you can use the build
script of the sample project for one-click
compilation. A custom OPP will be generated.
Specify the opp directory and execute the OPP
to deploy your custom operator.
Operator STSystem Testing (ST) veries the operator
CANN
AI CPU Custom Operator Development Guide
(Inference)4 Operator Development Preparations
4 Operator Development Preparations
4.1 Environment Setup
4.2 Operator Analysis
4.3 Project Creation
4.1 Environment Setup
●Before custom operator development, you need to set up the development
environment and operating environment.
Set the development environment and operating environment by referring to
CANN Software Installation Guide
a.Select an installation scheme and install the required hardware to run on
the development environment and operating devices.
b.Deploy and install Toolkit and congure environment variables in the
development environment.
c.Install the inference software and
operating environment.
Once the development environment is set up, you can obtain the API header
les and the library les required for building and running operators.
Once the operating environment is set up, you can run the executable le
generated after build.
AI CPU operator development depends on the AI CPU OPP. During
environment setup, make sure to install the AI CPU OPP.
.
congure environment variables in the
●If you tend to develop custom operators in MindStudio, install MindStudio by
referring to
MindStudio User Guide
.
4.2 Operator Analysis
Before developing an AI CPU operator, you need to determine the operator
function, input, output, development mode, operator type (
implementation function name, and more.
CANN
AI CPU Custom Operator Development Guide
(Inference)4 Operator Development Preparations
Step 1 Specify the operator function and mathematical expression.
Take the Add operator as an example. The mathematical expression of the Add
operator is as follows:
z=x+y
The Add operator adds two inputs and returns a result.
Step 2 Specify the inputs and output.
●The Add operator has two inputs, x and y, and outputs the result z.
●The supported input data types include
has the same data type as the inputs.
●The operator inputs support all shapes. The output has the same shape as the
inputs.
●The operator input supports the following formats: NCHW, NC1HWC0,
NHWC, and ND.
Step 3 Specify the operator implementation
●Name
OpType
in upper camel case and indicate the separation of words with
a single capitalized letter.
●Name the operator le in either of the following ways:
Name the operator
le after
OpType
–Convert the rst uppercase letter to a lowercase letter.
Example: Abc -> abc
–Replace each uppercase letter following lowercase letters with an
underscore (_) and a lowercase letter.
Example: AbcDef -> abc_def
–Uppercase letters following a digit or an uppercase letter are regarded as
a semantic string. If there is a lowercase letter after this string, replace
the last uppercase letter in this string with an underscore (_) and a
lowercase letter, and convert the other uppercase letters into lowercase
letters. If there is no lowercase letter after the string, directly convert the
string into lowercase letters.
Before developing an operator, you need to create an operator project.
MindStudio Mode
For details about how to create an operator project in MindStudio, see "Custom
Operator Development > Project Creation" in
Command Line Mode
Click here to download the sample package that matches your CANN version in
use. Find the sample in the samples/cplusplus/level1_single_api/4_op_dev/1_custom_op directory.
add
MindStudio User Guide
.
Append your own custom operator to the sample project. The sample project
provides some AI CPU and TBE custom operator samples developed from their
Cae and TensorFlow counterparts.
Note: If you do not have enough permission to obtain the code, contact Huawei technical
support to apply for joining the Ascend community.
The directory structure of the operator project is as follows. Develop the operator
deliverables in the corresponding directory accordingly.
├── cpukernel
│ ├── impl // Directory of the operator implementation
│ ├── op_info_cfg
│ ├── aicpu_kernel
│ ├── xx.ini // Operator information library