HUAWEI AI CPU User Manual

CANN V100R020C20

AI CPU Custom Operator Development Guide (Inference)

Issue 01 Date 2021-03-11

HUAWEI TECHNOLOGIES CO., LTD.

No part of this document may be reproduced or transmitted in any form or by any means without prior written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd. All other trademarks and trade names mentioned in this document are the property of their respective holders.

Notice

The purchased products, services and features are stipulated by the contract made between Huawei and the customer. All or part of the products, services and features described in this document may not be within the purchase scope or the usage scope. Unless otherwise specied in the contract, all statements, information, and recommendations in this document are provided "AS IS" without warranties, guarantees or representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every eort has been made in the preparation of this document to ensure accuracy of the contents, but all statements, information, and recommendations in this document do not constitute a warranty of any kind, express or implied.

NO TICE

CANN AI CPU Custom Operator Development Guide (Inference) About This Document

About This Document

Overview

An AI CPU operator is an operation of complete compute logic that runs on AI CPU, one of the compute engines in the Ascend AI Processor. You might need to develop a custom AI CPU operator in the following cases:

● During neural network (NN) training or inference, if you operator when converting a third-party open-source network to adapt to the Ascend AI Processor, a custom AI CPU operator can help you streamline the model execution process and improve the functionality commissioning eciency. After the functionality commissioning is passed, convert the custom AI CPU operator into a TBE operator for performance commissioning.

● In certain scenarios, it is impossible to implement custom operators that run on AI Core. For example, some operators require int64 data, which is incompatible with AI Core instructions. When such an operator is not the performance bottleneck of your network, you can develop a custom AI CPU operator instead for Ascend AI Processor support.

In the moment, AI CPU custom operators can run only in EP standard form.

Intended Audience

This document is intended for developers who develop custom AI CPU operators. After reading this document, you will be able to:

nd an unsupported

● Describe the principles and

● Develop custom AI CPU operators based on the samples provided in this document.

To better understand this document, you need to have:

Prociency in C++ programming.

●

● Knowledge of mathematical expressions

● Knowledge of machine learning and deep learning

● Knowledge of the

workow and principles of the Ascend platform.

workow of AI CPU operator implementation.

CANN AI CPU Custom Operator Development Guide (Inference) About This Document

● Knowledge of the TBE custom operator development workow on the Ascend platform.

CANN AI CPU Custom Operator Development Guide (Inference) Contents

About This Document................................................................................................................ ii

1 Quick Start................................................................................................................................ 1

1.1 Neural Network Introduction..............................................................................................................................................1

1.2 Operator Basics........................................................................................................................................................................ 1

2 AI CPU Introduction................................................................................................................ 7

2.1 Overview.................................................................................................................................................................................... 7

2.2 Building and Running an Operator...................................................................................................................................9

3 Operator Development Workow.................................................................................... 13

4 Operator Development Preparations............................................................................... 16

4.1 Environment Setup...............................................................................................................................................................16

4.2 Operator Analysis................................................................................................................................................................. 16

4.3 Project Creation..................................................................................................................................................................... 18

5 Operator Development

5.1 Operator Code Implementation...................................................................................................................................... 20

5.2 Operator Prototype Denition......................................................................................................................................... 23

5.2.1 Principle................................................................................................................................................................................ 23

5.2.2 Implementation..................................................................................................................................................................24

5.3 Operator Information Library Denition...................................................................................................................... 29

5.4 Operator Adaptation........................................................................................................................................................... 32

5.4.1 Adaptation Plug-in Development (TensorFlow).....................................................................................................33

5.4.2 Adaptation Plug-in Development (Cae).................................................................................................................36

5.5 Operator Project Building and Deployment................................................................................................................ 40

5.5.1 Operator Project Building...............................................................................................................................................40

5.5.2 OPP Deployment............................................................................................................................................................... 41

Workow.................................................................................... 20

6 Operator ST.............................................................................................................................42

6.1 Introduction............................................................................................................................................................................ 42

6.2 Implementation..................................................................................................................................................................... 43

7 Operator Verication on Network....................................................................................53

8 Sample Reference..................................................................................................................54

8.1 Custom Operator Development.......................................................................................................................................54

CANN AI CPU Custom Operator Development Guide (Inference) Contents

8.2 Operator Verication on Network.................................................................................................................................. 54

9 API Reference......................................................................................................................... 56

9.1 Overview.................................................................................................................................................................................. 56

9.2 AI CPU APIs............................................................................................................................................................................. 57

9.2.1 Introduction......................................................................................................................................................................... 57

9.2.2 Class CpuKernelContext.................................................................................................................................................. 58

9.2.2.1 CpuKernelContext Constructor and Destructor................................................................................................... 58

9.2.2.2 GetOpType........................................................................................................................................................................59

9.2.2.3 Input................................................................................................................................................................................... 59

9.2.2.4 Output................................................................................................................................................................................60

9.2.2.5 GetAttr............................................................................................................................................................................... 61

9.2.2.6 GetInputsSize...................................................................................................................................................................61

9.2.2.7 GetOutputsSize............................................................................................................................................................... 62

9.2.2.8 Init....................................................................................................................................................................................... 62

9.2.3 Class TensorShape............................................................................................................................................................. 63

9.2.3.1 SetFormat......................................................................................................................................................................... 63

9.2.3.2 GetFormat........................................................................................................................................................................ 64

9.2.3.3 GetUnknownRank..........................................................................................................................................................64

9.2.3.4 SetUnknownRank...........................................................................................................................................................65

9.2.3.5 GetDimSizes..................................................................................................................................................................... 66

9.2.3.6 SetDimSizes......................................................................................................................................................................66

9.2.3.7 GetDimSize.......................................................................................................................................................................67

9.2.3.8 GetDims.............................................................................................................................................................................67

9.2.3.9 NumElements.................................................................................................................................................................. 68

9.2.4 Class Tensor......................................................................................................................................................................... 68

9.2.4.1 GetTensorShape..............................................................................................................................................................69

9.2.4.2 SetTensorShape...............................................................................................................................................................69

9.2.4.3 SetDataType.....................................................................................................................................................................70

9.2.4.4 GetDataType....................................................................................................................................................................70

9.2.4.5 SetData.............................................................................................................................................................................. 71

9.2.4.6 GetData............................................................................................................................................................................. 72

9.2.4.7 SetDataSize...................................................................................................................................................................... 72

9.2.4.8 GetDataSize..................................................................................................................................................................... 73

9.2.4.9 CalcDataSizeByShape................................................................................................................................................... 73

9.2.4.10 NumElements............................................................................................................................................................... 74

9.2.5 Class AttrValue................................................................................................................................................................... 74

9.2.5.1 GetString........................................................................................................................................................................... 74

9.2.5.2 GetListString.................................................................................................................................................................... 75

9.2.5.3 AddListString................................................................................................................................................................... 75

9.2.5.4 ListStringSize....................................................................................................................................................................76

9.2.5.5 SetString............................................................................................................................................................................77

9.2.5.6 SetListString..................................................................................................................................................................... 77

CANN AI CPU Custom Operator Development Guide (Inference) Contents

9.2.5.7 GetInt................................................................................................................................................................................. 78

9.2.5.8 GetListInt...........................................................................................................................................................................79

9.2.5.9 AddListInt..........................................................................................................................................................................79

9.2.5.10 ListIntSize....................................................................................................................................................................... 80

9.2.5.11 SetInt................................................................................................................................................................................80

9.2.5.12 SetListInt......................................................................................................................................................................... 81

9.2.5.13 GetFloat.......................................................................................................................................................................... 82

9.2.5.14 GetListFloat....................................................................................................................................................................82

9.2.5.15 AddListFloat...................................................................................................................................................................83

9.2.5.16 ListFloatSize...................................................................................................................................................................83

9.2.5.17 SetFloat........................................................................................................................................................................... 84

9.2.5.18 SetListFloat.................................................................................................................................................................... 85

9.2.5.19 GetBool........................................................................................................................................................................... 85

9.2.5.20 GetListBool.....................................................................................................................................................................86

9.2.5.21 AddListBool.................................................................................................................................................................... 86

9.2.5.22 ListBoolSize.................................................................................................................................................................... 87

9.2.5.23 SetBool............................................................................................................................................................................ 88

9.2.5.24 SetListBool......................................................................................................................................................................88

9.2.5.25 GetDataType..................................................................................................................................................................89

9.2.5.26 GetListDataType...........................................................................................................................................................90

9.2.5.27 AddListDataType..........................................................................................................................................................90

9.2.5.28 ListDataTypeSize.......................................................................................................................................................... 91

9.2.5.29 SetDataType.................................................................................................................................................................. 91

9.2.5.30 SetListDataType............................................................................................................................................................92

9.2.5.31 SetTensorShape............................................................................................................................................................ 93

9.2.5.32 SetListTensorShape..................................................................................................................................................... 93

9.2.5.33 AddListTensorShape....................................................................................................................................................94

9.2.5.34 GetTensorShape........................................................................................................................................................... 95

9.2.5.35 GetListTensorShape.....................................................................................................................................................95

9.2.5.36 ListTensorShapeSize....................................................................................................................................................96

9.2.5.37 SetTensor........................................................................................................................................................................ 96

9.2.5.38 SetListTensor................................................................................................................................................................. 97

9.2.5.39 AddListTensor................................................................................................................................................................97

9.2.5.40 GetTensor....................................................................................................................................................................... 98

9.2.5.41 GetListTensor.................................................................................................................................................................98

9.2.5.42 ListTensorSize................................................................................................................................................................ 99

9.2.6 GetSizeByDataType...........................................................................................................................................................99

9.2.7 Data Types.........................................................................................................................................................................100

9.2.7.1 DataType.........................................................................................................................................................................100

9.2.7.2 Format............................................................................................................................................................................. 101

9.2.7.3 DeviceType..................................................................................................................................................................... 102

9.2.8 Macro Denitions............................................................................................................................................................103

CANN AI CPU Custom Operator Development Guide (Inference) Contents

9.2.8.1 REGISTER_CPU_KERNEL............................................................................................................................................ 103

9.3 Operator Prototype Denition APIs............................................................................................................................. 103

9.3.1 Introduction.......................................................................................................................................................................103

9.3.2 Operator Prototype Denition APIs..........................................................................................................................105

9.3.2.1 Prototype Denition (REG_OP)...............................................................................................................................105

9.3.2.2 Derivative APIs..............................................................................................................................................................111

9.3.3 Operator Prototype InferShape APIs........................................................................................................................115

9.3.3.1 IMPLEMT_INFERFUNC............................................................................................................................................... 115

9.3.3.2 IMPLEMT_COMMON_INFERFUNC........................................................................................................................ 116

9.3.3.3 INFER_FUNC_REG........................................................................................................................................................117

9.3.3.4 COMMON_INFER_FUNC_REG................................................................................................................................. 117

9.3.3.5 ELMTWISE_INFER_SHAPEANDTYPE......................................................................................................................118

9.3.3.6 BROADCAST_INFER.....................................................................................................................................................119

9.3.4 Operator Prototype Verify APIs..................................................................................................................................120

9.3.4.1 IMPLEMT_VERIFIER..................................................................................................................................................... 120

9.3.4.2 VERIFY_FUNC_REG...................................................................................................................................................... 121

9.3.5 Operator InferFormat APIs.......................................................................................................................................... 121

9.3.5.1 IMPLEMT_INFERFORMAT_FUNC............................................................................................................................121

9.3.5.2 INFER_FORMAT_FUNC_REG.................................................................................................................................... 122

9.3.6 Internal Associated APIs............................................................................................................................................... 123

9.3.6.1 Class OperatorFactory................................................................................................................................................123

9.3.6.1.1 CreateOperator......................................................................................................................................................... 123

9.3.6.1.2 GetOpsTypeList......................................................................................................................................................... 124

9.3.6.1.3 IsExistOp......................................................................................................................................................................125

9.3.6.2 Class OperatorCreatorRegister................................................................................................................................125

9.3.6.2.1 Constructor and Destructor.................................................................................................................................. 125

9.3.6.3 Class InferShapeFuncRegister..................................................................................................................................126

9.3.6.3.1 Constructor and Destructor.................................................................................................................................. 126

9.3.6.4 Class InferFormatFuncRegister............................................................................................................................... 127

9.3.6.4.1 Constructor and Destructor.................................................................................................................................. 127

9.3.6.5 Class VerifyFuncRegister........................................................................................................................................... 128

9.3.6.5.1 Constructor and Destructor.................................................................................................................................. 128

9.3.6.6 Class InferenceContext...............................................................................................................................................129

9.3.6.6.1 InferenceContext Constructor and Destructor............................................................................................... 129

9.3.6.6.2 SetInputHandleShapesAndTypes........................................................................................................................ 130

9.3.6.6.3 SetOutputHandleShapesAndTypes.....................................................................................................................130

9.3.6.6.4 GetInputHandleShapesAndTypes........................................................................................................................131

9.3.6.6.5 GetOutputHandleShapesAndTypes....................................................................................................................131

9.3.6.6.6 SetMarks..................................................................................................................................................................... 132

9.3.6.6.7 GetMarks.....................................................................................................................................................................133

9.3.6.6.8 Create........................................................................................................................................................................... 134

9.3.6.7 Class ShapeAndType...................................................................................................................................................134

CANN AI CPU Custom Operator Development Guide (Inference) Contents

9.3.6.7.1 Constructor and Destructor.................................................................................................................................. 134

9.3.6.7.2 SetShape......................................................................................................................................................................135

9.3.6.7.3 GetShape.....................................................................................................................................................................136

9.3.6.7.4 SetType.........................................................................................................................................................................136

9.3.6.7.5 GetDataType.............................................................................................................................................................. 137

9.4 operator API......................................................................................................................................................................... 137

9.4.1 Overview............................................................................................................................................................................ 137

9.4.2 Class Operator..................................................................................................................................................................138

9.4.2.1 Constructor and Destructor..................................................................................................................................... 138

9.4.2.2 AddControlInput...........................................................................................................................................................139

9.4.2.3 BreakConnect................................................................................................................................................................ 139

9.4.2.4 IsEmpty............................................................................................................................................................................140

9.4.2.5 InferShapeAndType..................................................................................................................................................... 140

9.4.2.6 GetAttr.............................................................................................................................................................................141

9.4.2.7 GetAllAttrNamesAndTypes....................................................................................................................................... 144

9.4.2.8 GetDynamicInputNum...............................................................................................................................................144

9.4.2.9 GetDynamicInputDesc............................................................................................................................................... 145

9.4.2.10 GetDynamicOutputNum.........................................................................................................................................146

9.4.2.11 GetDynamicOutputDesc......................................................................................................................................... 146

9.4.2.12 GetDynamicSubgraph..............................................................................................................................................147

9.4.2.13 GetDynamicSubgraphBuilder................................................................................................................................148

9.4.2.14 GetInferenceContext.................................................................................................................................................149

9.4.2.15 GetInputConstData...................................................................................................................................................149

9.4.2.16 GetInputsSize.............................................................................................................................................................. 150

9.4.2.17 GetInputDesc.............................................................................................................................................................. 151

9.4.2.18 GetName...................................................................................................................................................................... 152

9.4.2.19 GetSubgraph............................................................................................................................................................... 152

9.4.2.20 GetSubgraphBuilder................................................................................................................................................. 153

9.4.2.21 GetSubgraphNamesCount......................................................................................................................................154

9.4.2.22 GetSubgraphNames................................................................................................................................................. 154

9.4.2.23 GetOpType...................................................................................................................................................................155

9.4.2.24 GetOutputDesc.......................................................................................................................................................... 155

9.4.2.25 GetOutputsSize.......................................................................................................................................................... 156

9.4.2.26 SetAttr........................................................................................................................................................................... 157

9.4.2.27 SetInput........................................................................................................................................................................ 160

9.4.2.28 SetInferenceContext................................................................................................................................................. 161

9.4.2.29 TryGetInputDesc........................................................................................................................................................ 162

9.4.2.30 UpdateInputDesc.......................................................................................................................................................163

9.4.2.31 UpdateOutputDesc................................................................................................................................................... 164

9.4.2.32 UpdateDynamicInputDesc..................................................................................................................................... 164

9.4.2.33 UpdateDynamicOutputDesc..................................................................................................................................165

9.4.2.34 VerifyAllAttr.................................................................................................................................................................166

CANN AI CPU Custom Operator Development Guide (Inference) Contents

9.4.3 Class Tensor.......................................................................................................................................................................167

9.4.3.1 Constructor and Destructor..................................................................................................................................... 167

9.4.3.2 Clone................................................................................................................................................................................ 168

9.4.3.3 IsValid.............................................................................................................................................................................. 168

9.4.3.4 GetData...........................................................................................................................................................................169

9.4.3.5 GetTensorDesc.............................................................................................................................................................. 170

9.4.3.6 GetSize.............................................................................................................................................................................170

9.4.3.7 SetData............................................................................................................................................................................171

9.4.3.8 SetTensorDesc............................................................................................................................................................... 172

9.4.4 Class TensorDesc............................................................................................................................................................. 172

9.4.4.1 Constructor and Destructor..................................................................................................................................... 172

9.4.4.2 GetDataType..................................................................................................................................................................173

9.4.4.3 GetFormat...................................................................................................................................................................... 174

9.4.4.4 GetName........................................................................................................................................................................ 175

9.4.4.5 GetOriginFormat..........................................................................................................................................................175

9.4.4.6 GetOriginShape............................................................................................................................................................ 176

9.4.4.7 GetRealDimCnt.............................................................................................................................................................177

9.4.4.8 GetShape........................................................................................................................................................................ 177

9.4.4.9 GetShapeRange............................................................................................................................................................ 178

9.4.4.10 GetSize.......................................................................................................................................................................... 178

9.4.4.11 SetDataType................................................................................................................................................................179

9.4.4.12 SetFormat.................................................................................................................................................................... 180

9.4.4.13 SetName.......................................................................................................................................................................180

9.4.4.14 SetOriginFormat........................................................................................................................................................ 181

9.4.4.15 SetOriginShape.......................................................................................................................................................... 182

9.4.4.16 SetSize........................................................................................................................................................................... 182

9.4.4.17 SetRealDimCnt........................................................................................................................................................... 183

9.4.4.18 SetShape.......................................................................................................................................................................184

9.4.4.19 SetShapeRange.......................................................................................................................................................... 184

9.4.4.20 SetUnknownDimNumShape..................................................................................................................................185

9.4.4.21 Update.......................................................................................................................................................................... 186

9.4.5 Class Shape....................................................................................................................................................................... 187

9.4.5.1 Constructor and Destructor..................................................................................................................................... 187

9.4.5.2 GetDim............................................................................................................................................................................ 187

9.4.5.3 GetDims.......................................................................................................................................................................... 188

9.4.5.4 GetDimNum.................................................................................................................................................................. 189

9.4.5.5 GetShapeSize.................................................................................................................................................................189

9.4.5.6 SetDim............................................................................................................................................................................. 190

9.4.6 Class AttrValue.................................................................................................................................................................191

9.4.6.1 Constructor and Destructor..................................................................................................................................... 191

9.4.6.2 CreateFrom.................................................................................................................................................................... 191

9.4.6.3 GetValue......................................................................................................................................................................... 192

CANN AI CPU Custom Operator Development Guide (Inference) Contents

9.4.7 Data Type and Enumerated Value............................................................................................................................ 193

9.4.7.1 Format............................................................................................................................................................................. 193

9.4.7.2 DataType.........................................................................................................................................................................194

9.4.7.3 Struct UsrQuantizeFactor..........................................................................................................................................194

9.4.7.4 Struct TensorDescInfo................................................................................................................................................ 195

9.4.7.5 GetSizeByDataType..................................................................................................................................................... 195

9.5 Operator Plug-in APIs....................................................................................................................................................... 195

9.5.1 Overview............................................................................................................................................................................ 195

9.5.2 Class OpRegistrationData............................................................................................................................................ 196

9.5.2.1 Overview......................................................................................................................................................................... 196

9.5.2.2 Constructor and Destructor..................................................................................................................................... 196

9.5.2.3 REGISTER_CUSTOM_OP.............................................................................................................................................197

9.5.2.4 FrameworkType............................................................................................................................................................ 197

9.5.2.5 OriginOpType................................................................................................................................................................198

9.5.2.6 ParseParamsFn............................................................................................................................................................. 199

9.5.2.7 ParseParamsByOperatorFn.......................................................................................................................................200

9.5.2.8 FusionParseParamsFn.................................................................................................................................................201

9.5.2.9 FusionParseParamsFn (Overload) ........................................................................................................................ 202

9.5.2.10 ParseSubgraphPostFn.............................................................................................................................................. 203

9.5.2.11 ParseOpToGraphFn...................................................................................................................................................204

9.5.2.12 ImplyType.....................................................................................................................................................................206

9.5.2.13 DelInputWithCond.................................................................................................................................................... 207

9.5.2.14 DelInputWithOriginalType..................................................................................................................................... 208

9.5.2.15 GetImplyType..............................................................................................................................................................208

9.5.2.16 GetOmOptype............................................................................................................................................................ 209

9.5.2.17 GetOriginOpTypeSet................................................................................................................................................ 209

9.5.2.18 GetFrameworkType...................................................................................................................................................210

9.5.2.19 GetParseParamFn......................................................................................................................................................210

9.5.2.20 GetParseParamByOperatorFn............................................................................................................................... 210

9.5.2.21 GetFusionParseParamFn.........................................................................................................................................211

9.5.2.22 GetFusionParseParamByOpFn.............................................................................................................................. 211

9.5.2.23 GetParseSubgraphPostFn....................................................................................................................................... 212

9.5.2.24 GetParseOpToGraphFn............................................................................................................................................212

9.5.2.25 AutoMappingFn......................................................................................................................................................... 213

9.5.2.26 AutoMappingByOpFn.............................................................................................................................................. 213

9.5.2.27 AutoMappingFnDynamic........................................................................................................................................215

9.5.2.28 AutoMappingByOpFnDynamic............................................................................................................................. 216

9.5.2.29 AutoMappingSubgraphIndex................................................................................................................................ 218

9.5.2.30 InputReorderVector...................................................................................................................................................218

9.5.3 Class OpReceiver............................................................................................................................................................. 219

9.5.3.1 Constructor and Destructor..................................................................................................................................... 219

9.5.4 DECLARE_ERRORNO...................................................................................................................................................... 220

CANN AI CPU Custom Operator Development Guide (Inference) 1 Quick Start

1 Quick Start

1.1 Neural Network Introduction

1.2 Operator Basics

1.1 Neural Network Introduction

To enable computers to master knowledge like human beings, a multi-layer connection network needs to be constructed to iterative computing and training of the network, it can extract object features. Generally, this method is called deep learning (DL). With uninterrupted development, deep learning has displayed its tremendous application value and is receiving increasing attentions from the industry and academia. Deep learning has achieved remarkable progresses in image, voice, natural language processing, big data feature extraction, and ad click-through rate estimation. As a result, multiple infrastructures, such as promote deep learning across

Deep neural network research fuels rapid development of neural network models, enabling them to complete more and more complex processing tasks in a wider range of technologies for decades, ever fast and been provided for neural network models and data, such as CPUs, GPUs, TPUs, and the latest Ascend AI Processor launched by Huawei.

Articial neural network (ANN) may also be referred to as neural network (NN) for short, which is an important branch of machine learning (ML). Scientists perform mathematical modeling on the most basic neurons and build neural networks based on the certain hierarchical relationship of neurons, enabling structures through learning and training, and thereby achieve various complex computations.

elds. With the rapid development of semiconductor chips and computer

articial neural networks to learn knowledge, adjust their internal

Cae, MXNet, and TensorFlow, have been developed to

elds.

energy-ecient computing resources have

dene a complex object. After

articial

1.2 Operator Basics

A deep learning algorithm consists of multiple compute units referred to as operators (Ops). In network models, an operator describes the compute logic of

CANN AI CPU Custom Operator Development Guide (Inference) 1 Quick Start

the layer, for example, the convolution layer that performs convolution and the fully-connected (FC) layer that multiplies the input by a weight matrix.

The following introduces some basic terms about operators.

Operator Name

The name of an operator identies the operator on a network, and therefore must be unique on a network. The example network has operators Conv1, Pool1, and Conv2. Conv1 and Conv2 are of the same type convolution. Conv1 and Conv2 each indicates a convolution operation.

Figure 1-1 Example network topology

Operator Type

Each operator is of a specic type. For example, the convolution operator is of the convolution type. A network can have

dierent operators of the same type.

Tensor

Tensors are used to represent the input data and output data in operator computations. TensorDesc (the tensor descriptor) describes the input data and output data. Table 1-1 describes the attributes of the TensorDesc struct.

Table 1-1 Description of the TensorDesc attributes

Attribute

name Indexes a tensor and must be unique.

Denition

CANN AI CPU Custom Operator Development Guide (Inference) 1 Quick Start

Attribute Denition shape Species the shape of a tensor, for example, (10,),

(1024,1024), or (2,3,4). For details, see Shape.

Default: none Format: (i1, i2, ..., in), where, i1 to in are positive

integers.

dtype Species the data types of a tensor object.

Default: none Value range: oat16, oat32, int8, int16, int32, uint8,

uint16, bool

Format Species the data layout format. For details, see Format.

Format

Shape

In the deep learning framework, n-dimensional data is stored in an n-dimensional array. For example, a feature graph of a convolutional neural network is stored in a four-dimensional array. The four dimensions are batch size (N), height (H), width (W), and channels (C), respectively.

Data can be stored only in linear mode because the dimensions have a Dierent deep learning frameworks store feature maps in dierent layouts. For example, while TensorFlow uses the layout [Batch, Height, Width, Channels], that is, NHWC.

As shown in Figure 1-2, for an RGB image, the pixel values of each channel are clustered in sequence as RRRGGGBBB with the NCHW layout. However, with the NHWC layout, the pixel values are interleaved as RGBRGBRGB.

Figure 1-2 NCHW and NHWC

Cae uses the layout [Batch, Channels, Height, Width], that is, NCHW,

xed order.

The shape of a tensor is described in the format of (D0, D1, ..., Dn – 1), where, D0

are positive integers.

to D

For example, the shape (3, 4) indicates a 3 x 4 matrix, where the rst dimension has three elements, and the second dimension has four elements.

The number count in the round bracket equals to the dimension count of the tensor. The brackets, and the second element depends on the element count in the second left square bracket, and so on. See the following examples.

rst element depends on the element count in the outer square

CANN AI CPU Custom Operator Development Guide (Inference) 1 Quick Start

Table 1-2 Tensor shape examples

Tensor Shape

1 (0,) [1, 2, 3] (3,) [[1, 2],[3, 4]] (2, 2) [[[1, 2],[3, 4]], [[5, 6],[7, 8]]] (2,2,2)

The tensor shape has its physical meanings: For a tensor with shape (4, 20, 20, 3), it indicates four 20 x 20 (corresponding to

the two 20s in the shape) pictures (corresponding to 4 in the shape), each of whose pixel contains the red, green, and blue color components (corresponding to

3 in the shape).

Figure 1-3 Physical meanings of tensor shape

In programming, the shape can be simply understood as a loop of each layer of a tensor. For example, for operating tensor A with shape (4, 20, 20, 3), the loop statement is as follows.

produce A { for (i, 0, 4) { for (j, 0, 20) { for (p, 0, 20) { for (q, 0, 3) { A[((((((i*20) + j)*20) + p)*3) + q)] = a_tensor[((((((i*20) + j)*20) + p)*3) + q)] } } } } }

Axis

An axis is denoted by the index of a dimension of a tensor. For a 2D tensor with ve rows and six columns, that is, with shape (5, 6), axis 0 represents the rst

CANN AI CPU Custom Operator Development Guide (Inference) 1 Quick Start

dimension in the tensor, that is, the rows; axis 1 represents the second dimension of tensor, that is, the columns.

For example, for tensor [[[1, 2],[3, 4]], [[5, 6],[7, 8]]] with shape (2, 2, 2), axis 0 represents data in the rst dimension, that is, matrices [[1, 2],[3, 4]] and [[5, 6], [7, 8]], axis 1 represents data in the second dimension, that is, arrays [1, 2], [3, 4], [5, 6], and [7, 8], and axis 2 indicates the data in the third dimension, that is, numbers 1, 2, 3, 4, 5, 6, 7, and 8.

A negative axis is interpreted as indexing from the end.

The axes of an

Figure 1-4 Axis diagram

-dimensional tensor include 0, 1, 2, ..., and n – 1.

Weight

The input data is multiplied by a weight value in the compute unit. For example, for a two-input operator, an associated weight value is allocated to each of the inputs. Generally, data of more importance is assigned with a greater weight value. Therefore, the feature indicated by data with zero weight can be ignored.

As shown in Figure 1-5, in the compute unit, input X1 is multiplied by its associated weight W1, that is, X1 * W1.

Figure 1-5 Weight computation example

Bias

A bias is another linear component to be applied to the input data, in addition to a weight. The bias is added to the product of the input and its weight.

As shown in Figure 1-6, in the compute unit, input X1 is multiplied by its associated weight W1 and then added with its associated bias B1, that is, X1 * W1 + B1.

CANN AI CPU Custom Operator Development Guide (Inference) 1 Quick Start

Figure 1-6 Bias computation example

CANN AI CPU Custom Operator Development Guide (Inference) 2 AI CPU Introduction

2 AI CPU Introduction

2.1 Overview

2.2 Building and Running an Operator

2.1 Overview

AI CPU executes CPU operators (including control, scalar, and vector operators) on Ascend AI Processor. The following architecture.

gure shows its context in the Ascend solution

CANN AI CPU Custom Operator Development Guide (Inference) 2 AI CPU Introduction

Figure 2-1 System architecture

The following components are involved in building and executing AI CPU operators:

● Graph Engine (GE): a

unied IR interface provided by Huawei based on the Ascend AI Software Stack for interfacing with dierent machine learning frameworks, such as TensorFlow and PyTorch. GE implements the preparation, partition, optimization, compilation, loading, execution, and management of the network topology, or the graph.

CANN AI CPU Custom Operator Development Guide (Inference) 2 AI CPU Introduction

● AI CPU Engine: interfaces with GE, provides the AI CPU operator information library, and implements operator registration, operator memory allocation calculation, subgraph optimization, and task generation.

● AI CPU Schedule: works with the Task Schedule to schedule and execute NN models.

● AI CPU Processor: completes operator computations and provides the operator implementation library for implementing the execution of AI CPU operators.

● Data Processor: preprocesses data of training samples in training scenarios.

2.2 Building and Running an Operator

Logical Architecture for Building and Running an Operator

A complete AI CPU operator consists of four parts: operator prototype denition, operator adaption plug-in of the corresponding open-source framework, operator information library denition, and operator implementation.

Figure 2-2 shows the logical architecture of building and running a developed

operator on the Ascend AI Processor hardware platform.

Figure 2-2 Logical architecture for building and running an operator

TFAdapter is used only for training based on the TensorFlow framework. The columns in the preceding

during custom operator development.

Deliver able

Description

gure are the deliverables to be implemented

CANN AI CPU Custom Operator Development Guide (Inference) 2 AI CPU Introduction

Operat or implem entatio n

Operat or plug-in

Operat or prototy pe library

The operator class implementation includes the operator denition and operator computation implementation.

In the custom operator development scenario based on a third-party framework (such as TensorFlow and Cae), after developing implementation code of the custom operator, you need to develop an adaptation plug-in to map the third-party operator to an operator supported by the Ascend AI Processor and register the operator information with GE. To run a network trained on a third-party framework, the operator plug-in information in GE is loaded and called to parse and map the operators on the network to operators supported by the Ascend AI Processor.

The operator prototype denition species the constraints on an operator that runs on the Ascend AI Processor, mainly reecting the mathematical meanings of the operator. The constraints include dening the operator inputs, outputs, and other attributes, verifying arguments, and inferring the shape. During network execution, GE calls the operator arguments. If the verication passes, GE infers the output shape and dtype of each node by calling the inference function of the operator prototype library and allocates static memory for the result tensor.

verication API of the operator prototype library to verify

Operat or inform ation library

Building an Operator

Figure 2-3 shows the workow of building an AI CPU operator.

The operator information library mainly reects the restrictions on the physical implementation of operators on the Ascend AI Processor, including the input and output names and data types. During network execution, AI CPU Engine performs basic verication and operator matching based on the operator information in the operator information library.

CANN AI CPU Custom Operator Development Guide (Inference) 2 AI CPU Introduction

Figure 2-3 Building an AI CPU operator

1. Deliver a third-party network model to GE. For TensorFlow-based online training, TF Adapter is called to generate the

source TensorFlow model, which is then delivered to GE. For AscendCL-based model inference, the source model is directly delivered to GE.

The topology of a network model is referred to as a graph.

2. GE calls the operator plug-in to map operators in the source network model to operators supported by the Ascend AI Processor, so that the original TensorFlow/Cae graph can be parsed into a graph supported by the Ascend AI Processor.

3. GE calls the

verication API of the operator prototype library to verify operator arguments. If the verication passes, GE infers the output shape and dtype of each node by calling the inference function of the operator prototype library and allocates memory for the result tensor.

4. GE delivers the entire graph to AI CPU Engine. AI CPU Engine reads the operator information library, looks up an appropriate format for the operator, and returns the format to GE.

5. GE partitions the graph into subgraphs and delivers the subgraphs to AI CPU Engine. AI CPU Engine optimizes the subgraphs and returns the optimized subgraphs to GE.

6. GE builds the graph (including memory and stream allocation) and sends a genTask request to AI CPU Engine. Then, AI CPU Engine returns the taskinfo of the operator to GE. After the graph build process is complete, a model

le

that adapts to the Ascend AI Processor is generated.

CANN AI CPU Custom Operator Development Guide (Inference) 2 AI CPU Introduction

Running an Operator

Figure 2-4 shows the workow of running an AI CPU operator.

Figure 2-4 Running an AI CPU operator

1. GE delivers an operator execution request.

2. Runtime delivers the corresponding tasks to AI CPU Schedule.

3. AI CPU Schedule schedules tasks and calls the operator compute API.

4. The AI CPU Operator Package (OPP) parses and instantiates the operator implementation, and executes the Compute function to run the operator.

CANN AI CPU Custom Operator Development Guide (Inference) 3 Operator Development Workow

3 Operator Development Workow

The following gure shows the AI CPU custom operator development workow.

Figure 3-1 Operator development workow

Table 3-1 describes the development workow.

Table 3-1 AI CPU operator command-line development workow

Action

Environment setup

Operator analysis

Project creation

Description See Also

Set up the development and operating environment required for operator development, execution, and verication.

Analyze the operator, specify its functionality, input, and output, and determine the operator type and the name of the OPP le generated after the operator is built.

Create a custom operator project. 4.3 Project

4 Operator Development Preparations

4.2 Operator Analysis

Creation

CANN AI CPU Custom Operator Development Guide (Inference) 3 Operator Development Workow

Action Description See Also

Operator code implementati on

Operator prototype

denition

Implement the compute logic of the operator. 5.1 Operator

Code Implementati on

Implement the operator prototype denition le, which species the constraints on an

operator that runs on the Ascend AI Processor,

5.2 Operator Prototype

Denition

mainly reecting the mathematical meanings of the operator. The constraints include dening the operator inputs, outputs, attributes, and value ranges, verifying arguments, and inferring the shape. The information

dened by the prototype is registered with the operator prototype library of GE. During

oine model conversion, GE calls the verication API of the operator prototype library to verify operator arguments.

verication passes, GE infers the output

If the shape and dtype of each node by calling the inference function of the operator prototype library and allocates static memory for the result tensor.

Operator information

denition

Operator plug-in implementati on

The operator information conguration le is used to register the operator information with the operator information library, including the OpType and input/output dtype and name. During network execution, AI CPU Engine performs basic

verication and operator matching based on the operator information in the operator information library.

If your custom operator is developed based on a third-party framework (such as TensorFlow or Cae), you need to develop a plug-in to map the operator to one that adapts to the Ascend AI Processor.

5.3 Operator Information Library

Denition

5.4 Operator Adaptation

CANN AI CPU Custom Operator Development Guide (Inference) 3 Operator Development Workow

Action Description See Also

Operator project building and deployment

● Operator build: builds the operator plug-in

implementation le, prototype denition le, and information denition le into the operator plug-in library, operator prototype library, and operator information library, respectively.

● Operator deployment: deploys the operator

implementation

le, plug-in library, prototype library, and information library to the system OPP, that is, a corresponding directory in the opp directory.

In the command line, you can use the build script of the sample project for one-click compilation. A custom OPP will be generated. Specify the opp directory and execute the OPP to deploy your custom operator.

Operator ST System Testing (ST) veries the operator

correctness in an actual hardware environment.

Operator

verication

Load the custom operator to a model for

verication.

on network

5.5.1 Operator Project Building

6 Operator ST

7 Operator

Verication

on Network

CANN AI CPU Custom Operator Development Guide (Inference) 4 Operator Development Preparations

4 Operator Development Preparations

4.1 Environment Setup

4.2 Operator Analysis

4.3 Project Creation

4.1 Environment Setup

● Before custom operator development, you need to set up the development environment and operating environment.

Set the development environment and operating environment by referring to

CANN Software Installation Guide

a. Select an installation scheme and install the required hardware to run on

the development environment and operating devices.

b. Deploy and install Toolkit and congure environment variables in the

development environment.

c. Install the inference software and

operating environment.

Once the development environment is set up, you can obtain the API header les and the library les required for building and running operators.

Once the operating environment is set up, you can run the executable le generated after build.

AI CPU operator development depends on the AI CPU OPP. During environment setup, make sure to install the AI CPU OPP.

congure environment variables in the

● If you tend to develop custom operators in MindStudio, install MindStudio by referring to

MindStudio User Guide

4.2 Operator Analysis

Before developing an AI CPU operator, you need to determine the operator function, input, output, development mode, operator type ( implementation function name, and more.

OpType

CANN AI CPU Custom Operator Development Guide (Inference) 4 Operator Development Preparations

Step 1 Specify the operator function and mathematical expression.

Take the Add operator as an example. The mathematical expression of the Add operator is as follows:

z=x+y

The Add operator adds two inputs and returns a result.

Step 2 Specify the inputs and output.

● The Add operator has two inputs, x and y, and outputs the result z.

● The supported input data types include has the same data type as the inputs.

● The operator inputs support all shapes. The output has the same shape as the inputs.

● The operator input supports the following formats: NCHW, NC1HWC0,

NHWC, and ND.

Step 3 Specify the operator implementation

● Name

OpType

in upper camel case and indicate the separation of words with

a single capitalized letter.

● Name the operator le in either of the following ways: Name the operator

le after

OpType

– Convert the rst uppercase letter to a lowercase letter.

Example: Abc -> abc

– Replace each uppercase letter following lowercase letters with an

underscore (_) and a lowercase letter. Example: AbcDef -> abc_def

– Uppercase letters following a digit or an uppercase letter are regarded as

a semantic string. If there is a lowercase letter after this string, replace the last uppercase letter in this string with an underscore (_) and a lowercase letter, and convert the other uppercase letters into lowercase letters. If there is no lowercase letter after the string, directly convert the string into lowercase letters.

oat16, oat32, and int32. The output

le name and operator type (

as follows:

OpType

Examples: ABCDef -> abc_def; Abc2DEf -> abc2d_ef; Abc2DEF -> abc2def; ABC2dEF -> abc2d_ef

In this example,

OpType

of the operator is dened as Add. You are advised to

name the deliverables as follows:

● Code implementation (or kernel implementation)

les of the operator:

add_kernel.h and add_kernel.cc

● Plug-in implementation

le: add_kernel_plugin.cpp

● Prototype denition les: add.h and add.cpp

● Information

denition le: add.ini

Based on the preceding analysis, the design specications of the Add operator are as follows.

NO TE

CANN AI CPU Custom Operator Development Guide (Inference) 4 Operator Development Preparations

Table 4-1 Add operator design specications

OpType Add Operator Input Name: x Shape: all Data type:

oat16, oat32, int32

Name: y Shape: all Data type:

oat16, oat32, int32

Operator Output Name: z Shape: all Data type:

oat16, oat32, int32

Operator Implementation File Name

----End

4.3 Project Creation

Overview

Before developing an operator, you need to create an operator project.

MindStudio Mode

For details about how to create an operator project in MindStudio, see "Custom Operator Development > Project Creation" in

Command Line Mode

Click here to download the sample package that matches your CANN version in use. Find the sample in the samples/cplusplus/level1_single_api/4_op_dev/ 1_custom_op directory.

add

MindStudio User Guide

Append your own custom operator to the sample project. The sample project provides some AI CPU and TBE custom operator samples developed from their Cae and TensorFlow counterparts.

Note: If you do not have enough permission to obtain the code, contact Huawei technical support to apply for joining the Ascend community.

The directory structure of the operator project is as follows. Develop the operator deliverables in the corresponding directory accordingly.

├── cpukernel │ ├── impl // Directory of the operator implementation │ ├── op_info_cfg │ ├── aicpu_kernel │ ├── xx.ini // Operator information library

le (.ini)

les (.h and .cc)

NO TICE

CANN AI CPU Custom Operator Development Guide (Inference) 4 Operator Development Preparations

├── framework │ ├── xx_plugin // Directory of the operator adaptation plug-in les (.cpp) ├── op_proto // Directory of operator prototype denition les (.h and .cpp)

● If you need to develop multiple custom AI CPU operators, implement them in

the same operator project. Store the implementation les according to the preceding directory structure.

● If you need to develop custom AI CPU operators and custom TBE operators at

the same time, you also need to implement them in the same operator project. For details about TBE operator development, see

Development Guide

TBE Custom Operator

+ 201 hidden pages

HUAWEI AI CPU User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

About This Document

Contents

1 Quick Start

1.1 Neural Network Introduction

1.2 Operator Basics

2 AI CPU Introduction

2.1 Overview

2.2 Building and Running an Operator

4 Operator Development Preparations

4.1 Environment Setup

4.2 Operator Analysis

4.3 Project Creation