STMicroelectronics STM32H7 User Manual

UM2611

User manual

Artificial Intelligence (AI) and computer vision function pack for STM32H7 microcontrollers

Introduction

FP-AI-VISION1 is a function pack (FP) demonstrating the capability of STM32H7 Series microcontrollers to execute a Convolutional Neural Network (CNN) efficiently in relation to computer vision tasks. FP-AI-VISION1 contains everything needed to build a CNN-based computer vision application on STM32H7 microcontrollers.

FP-AI-VISION1 also demonstrates several memory allocation configurations for the data involved in the application. Each configuration enables the handling of specific requirements in terms of amount of data required by the application. Accordingly, FP-AI-VISION1 implements examples describing how to place the different types of data efficiently in both the on-chip and external memories. These examples enable the user to understand easily which memory allocation fits his requirements the best.

This user manual describes the content of the FP-AI-VISION1 function pack and details the different steps to be carried out in order to build a CNN-based computer vision application on STM32H7 microcontrollers.

UM2611 - Rev 3 - September 2020

www.st.com

For further information contact your local STMicroelectronics sales office.

 

 

 

UM2611

General information

1General information

 

The FP-AI-VISION1 function pack runs on the STM32H7 microcontrollers based on the Arm® Cortex®-M7

 

processor.

Note:

Arm is a registered trademark of Arm Limited (or its subsidiaries) in the US and/or elsewhere.

1.1FP-AI-VISION1 function pack feature overview

Runs on the STM32H747I-DISCO board connected with the STM32F4DIS-CAM camera daughterboard

Includes three image classification application examples based on CNN:

One food recognition application operating on color (RGB 24 bits) frame images

One person presence detection application operating on color (RGB 24 bits) frame images

One person presence detection application operating on grayscale (8 bits) frame images

Includes complete application firmware for camera capture, frame image preprocessing, inference execution and output post-processing

Includes examples of integration of both floating-point and 8-bit quantized C models

Supports several configurations for data memory placement in order to meet application requirements

Includes test and validation firmware in order to test, debug and validate the embedded application

Includes capture firmware enabling dataset collection

Includes support for file handling (on top of FatFS) on external microSDcard

UM2611 - Rev 3

page 2/50

 

 

UM2611

Software architecture

1.2Software architecture

The top-level architecture of the FP-AI-VISION1 function pack usage is shown in Figure 1.

Figure 1. FP-AI-VISION1 architecture

Applications

(food recognition, person presence detection)

 

 

 

 

 

 

 

 

 

 

 

 

 

STM32_AI_Runtime

 

 

 

 

STM32_Image

 

 

(Neural Network runtime library)

 

(Image processing library)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

STM32_AI_Utilities

 

STM32_Fs

 

 

 

FatFS

 

 

(Optimized routines)

 

(FatFS abstraction)

 

(Light FAT file system)

 

 

 

 

 

 

 

 

 

 

 

 

 

Middleware level

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Board support package

 

 

Hardware abstraction layer

 

 

(BSP)

 

 

 

 

 

 

 

(HAL)

 

 

 

 

 

 

 

 

 

 

 

 

 

Drivers

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Camera sensor

 

 

STM32

 

 

LCD

 

 

 

 

 

 

 

 

 

 

 

 

 

Hardware components

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

STM32F4DIS-CAM

 

 

 

 

STM32H747I-DISCO

 

 

 

 

 

 

 

 

 

 

 

 

 

Development boards

 

 

 

 

 

 

 

 

 

1.3Terms and definitions

Table 1 presents the definitions of the acronyms that are relevant for a better contextual understanding of this document.

 

Table 1. List of acronyms

 

 

Acronym

Definition

 

 

API

Application programming interface

 

 

BSP

Board support package

 

 

CNN

Convolutional Neural Network

 

 

DMA

Direct memory access

 

 

FAT

File allocation table

 

 

UM2611 - Rev 3

page 3/50

 

 

 

 

UM2611

 

 

Overview of available documents and references

 

 

 

 

 

 

 

Acronym

Definition

 

FatFS

Light generic FAT file system

 

 

 

 

FP

Function pack

 

 

 

 

FPS

Frame per second

 

 

 

 

HAL

Hardware abstraction layer

 

 

 

 

LCD

Liquid crystal display

 

 

 

 

MCU

Microcontroller unit

 

 

 

 

microSD

Micro secure digital

 

MIPS

Million of instructions per second

 

 

 

 

NN

Neural Network

 

 

 

 

RAM

Random access memory

 

 

 

 

QVGA

Quarter VGA

 

 

 

 

SRAM

Static random access memory

 

 

 

 

VGA

Video graphics array resolution

 

 

 

1.4Overview of available documents and references

Table 2 lists the complementary references for using FP-AI-VISION1.

 

Table 2. References

 

 

ID

Description

 

 

[1]

User manual:

Getting started with X-CUBE-AI Expansion Package for Artificial Intelligence (AI) (UM2526).

 

 

 

[2]

Reference manual:

STM32H745/755 and STM32H747/757 advanced Arm®-based 32-bit MCUs (RM0399).

 

[3]

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications:

https://arxiv.org/pdf/1704.04861.pdf

 

 

 

[4]

The Food-101 Data Set:

https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/

 

 

 

[5]

STM32CubeProgrammer software for programming STM32 products:

STM32CubeProg

 

 

 

[6]

Keras - The Python Deep Learning library:

https://keras.io/

 

 

 

[7]

STM32Cube initialization code generator:

STM32CubeMX

 

 

 

UM2611 - Rev 3

page 4/50

 

 

UM2611

Building a CNN-based computer vision application on STM32H7

2Building a CNN-based computer vision application on STM32H7

Figure 2 illustrates the different steps to obtain a CNN-based computer vision application running on the STM32H7 microcontrollers.

Figure 2. CNN-based computer vision application build flow

FLOAT

Quantization tool

model

QUANTIZED model

STM32Cube.AI

network.c/h

Neural Network

network.c/h

network_data.c/h

runtime library

network_data.c/h

32-bit

Runtime

8-bit

library

floating-point

integer

 

C code

 

C code

 

STM32H7

 

 

drivers

 

 

 

 

Build

 

 

 

Image

 

 

Build

 

 

 

 

 

float

 

 

 

preprocessing library

 

 

quantized

 

 

 

 

 

 

 

 

 

 

 

 

 

C model

 

 

 

Main framework

 

 

C model

 

 

 

 

 

 

 

 

 

 

 

fp_vision_app.c/h

 

 

 

 

 

 

Legend:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

CNN float

 

 

 

 

 

 

 

 

 

img_preprocess.c/h

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

main.c/h

 

 

 

 

 

 

CNN quantized

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Generated library

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Computer vision application on STM32H7

Other libraries

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Ecosystem & tools

 

Starting from a floating-point CNN model (designed and trained using a framework such as Keras), the user

 

generates an optimized C code (using the STM32Cube.AI tool, [1]) and integrates it in a computer vision

 

framework (provided as part of FP-AI-VISION1) in order to build his computer vision application on STM32H7.

Note:

For users having selected a dual-core MCU like the STM32H747 for their application but running it on the

 

Cortex®-M7 core only: STM32CubeMX does not support the addition of packages like the STM32Cube.AI (X-

 

CUBE-AI) to the project. As a consequence, when using STM32CubeMX along with STM32Cube.AI, a single-

 

core MCU like the STM32H743 must be selected to be able to generate the Neural Network code for the

 

Cortex®-M7 core.

 

 

 

 

 

 

 

 

 

The user has the possibility to select one of two options for generating the C code:

 

 

Either generating the floating-point C code directly from the CNN model in floating-point

 

 

Or quantizing the floating-point CNN model to obtain an 8-bit model, and subsequently generating the

 

 

corresponding quantized C code

 

 

 

 

 

 

 

 

UM2611 - Rev 3

page 5/50

 

 

UM2611

Integration of the generated code

For most CNN models, the second option enables to reduce the memory footprint (Flash and RAM) as well as inference time. The impact on the final output accuracy depends on the CNN model as well as on the quantization process (mainly the test dataset and the quantization algorithm).

As part of the FP-AI-VISION1 function pack, three image classification application examples are provided including the following material:

One food recognition application:

Floating–point Keras model (.h5 file)

8-bit quantized model (.h5 file + .json file) obtained using STM32Cube.AI (X-CUBE-AI) quantizer

Generated C code in both floating point and 8-bit quantized format

Example of computer vision application integration based on C code generated by STM32Cube.AI (X- CUBE-AI)

Two person presence detection applications:

8-bit quantized models (.tflite file) obtained using the TFLiteConverter tool(1)

Generated C code in 8-bit quantized format

Examples of computer vision application integration based on C code generated by STM32Cube.AI (X- CUBE-AI)

1.TensorFlow is a trademark of Google Inc.

2.1Integration of the generated code

From a float or quantized model, the user must use the STM32Cube.AI tool (X-CUBE-AI) to generate the corresponding optimized C code.

When using the GUI version of STM32Cube.AI (X-CUBE-AI) with the user's own .ioc file, the following set of files is generated in the output directory:

Src\network.c and Inc\network.h: contain the description of the CNN topology

Src\network_data.c and Inc\network_data.c: contain the weights and biases of the CNN

Note:

For the network, the user must keep the default name, which is “network”. Otherwise, the user must rename all

 

the functions and macros contained in files ai_interface.c and ai_interface.h. The purpose of the ai_

 

interface.c and ai_interface.h files is to provide an abstraction interface to the NN API.

 

From that point, the user must copy and replace the above generated .c files and .h files respectively into the

 

following directories:

 

\Projects\STM32H747I-DISCO\Applications\<app_name>\CM7\Src

 

 

<app_name> is any of

 

 

FoodReco_MobileNetDerivative\Float_Model

 

 

FoodReco_MobileNetDerivative\Quantized_Model

 

 

PersonDetection\Google_Model

 

PersonDetection\MobileNetv2_Model

 

\Projects\STM32H747I-DISCO\Applications\<app_name>\CM7\Inc

 

 

<app_name> is any of

 

 

FoodReco_MobileNetDerivative\Float_Model

 

 

FoodReco_MobileNetDerivative\Quantized_Model

 

 

PersonDetection\Google_Model

 

 

PersonDetection\MobileNetv2_Model

An alternate solution is to use the CLI (command-line interface) version of STM32Cube.AI (X-CUBE-AI), so that the generated files be directly copied into the Src and Inc directories contained in the output directory provided on the command line. This solution does not require any manual copy/paste operation.

UM2611 - Rev 3

page 6/50

 

 

UM2611

Integration of the generated code

The application parameters are configured in files fp_vision_app.c and fp_vision_app.h where they can be easily adapted to the user's needs.

In file fp_vision_app.c:

The output_labels[] table of strings (where each string corresponds to one output class of the Neural Network model) is the only place where adaptation is absolutely required for a new application.

The App_Context_Init() function is in charge of initializing the different software components of the application. Some changes may be required to:

adapt the camera orientation

adapt the path to read input images from the microSDcard when in Onboard Validation mode

adapt to the NN input data range used during the training phase

adapt the pixel color format of the NN input data

In file fp_vision_app.h:

The two following #define must absolutely be updated with the dimensions of the NN input tensor:

AI_NETWORK_WIDTH

AI_NETWORK_HEIGHT

UM2611 - Rev 3

page 7/50

 

 

UM2611

Package content

3Package content

3.1CNN model

The FP-AI-VISION1 function pack is demonstrating two CNN-based image classification applications:

A food-recognition application recognizing 18 types of food and drink

A person presence detection application identifying whether a person is present in the image or not

3.1.1Food recognition application

The food-recognition CNN is a derivative of the MobileNet model (refer to [3]).

MobileNet is an efficient model architecture [3] suitable for mobile and embedded vision applications. This model architecture was proposed by Google®.

The MobileNet model architecture includes two simple global hyper-parameters that efficiently trade off between latency and accuracy. Basically these hyper-parameters allow the model builder to determine the application rightsized model based on the constraints of the problem.

The food recognition model that is used in this FP has been built by adjusting these hyper-parameters for an optimal trade-off between accuracy, computational cost and memory footprint, considering the STM32H747 target constraints.

The food-recognition CNN model has been trained on a custom database of 18 types of food and drink:

Apple pie

Beer

Caesar salad

Cappuccino

Cheesecake

Chicken wings

Chocolate cake

Coke

Cupcake

Donut

French fries

Hamburger

Hot dog

Lasagna

Pizza

Risotto

Spaghetti bolognese

Steak

The food-recognition CNN is expecting color image of size 224 × 224 pixels as input, each pixel being coded on three bytes: RGB888.

The FP-AI-VISION1 function pack includes two examples based on the food recognition application: one example implementing the floating-point version of the generated code, and one example implementing the quantized version of the generated code.

3.1.2Person presence detection application

Two person presence detection applications are provided in this package:

One based on a low-complexity CNN model (so-called Google_Model) working on grayscale images (8 bits per pixel) with a resolution of 96 × 96 pixels. The model is downloaded from storage.googleapis.com.

One based on a higher-complexity CNN model (so-called MobileNetv2_Model) working on color images (24 bits per pixel) with a resolution of 128 × 128 pixels.

The person presence detection models contain two output classes: Person and Not Person.

The FP-AI-VISION1 function pack demonstrates 8-bit quantized models.

UM2611 - Rev 3

page 8/50

 

 

STMicroelectronics STM32H7 User Manual

UM2611

Software

3.2Software

3.2.1Folder organization

Figure 3 shows the folder organization in FP-AI-VISION1 function pack.

Figure 3. FP-AI-VISION1 folder tree

FLOAT

Quantization tool

model

QUANTIZED model

STM32Cube.AI

32-bit

Generated code:

8-bit

floating-point

network.c/h

integer

C code

network_data.c/h

C code

Legend:

CNN float

CNN quantized

Ecosystem & tools

Driver

Contains all the BSP and STM32H7 HAL source code.

UM2611 - Rev 3

page 9/50

 

 

UM2611

Software

Middlewares

Contains five sub-folders:

ST/STM32_AI_Runtime

The lib folder contains the Neural Network runtime libraries generated by STM32Cube.AI (X-CUBE-AI) for each IDE: IAR Embedded Workbench® from IAR Systems (EWARM), MDK-ARM from Keil®, and STM32CubeIDE from STMicroelectronics. These libraries do not need to be replaced when converting a new Neural Network.

The Inc folder contains the include files required by the runtime libraries.

These two folders do not need to be replaced when converting a new Neural Network, unless using a new version of the X-CUBE-AI code generator.

ST/STM32_AI_Utilities

Contains optimized routines.

ST/STM32_Image

Contains a library of functions for image processing. These functions are used to preprocess the input frame image captured by the camera. The purpose of this preprocessing is to generate the adequate data (such as size, format, and others) to be input to the Neural Network during the inference.

ST/STM32_Fs

Contains a library of functions for handling image files using FatFS on a microSDcard.

Third_Party/FatFS

Third party middleware providing support for FAT file system.

Project/STM32H747I-DISCO/Applications

Contains the projects and source codes for the applications provided in the FP-AI-VISION1 FP. These applications are running on the STM32H747 (refer to [2]), which is a dual-core microcontroller based on the

Cortex®-M7 and Cortex®-M4 processors. The application code is running only on the Cortex®-M7 core.

Project/STM32H747I-DISCO/Applications/Common

This folder contains the source code common to all applications:

ai_interface.c and ai_interface.h

Provide an abstraction of the NN API.

fp_vision_ai.c and fp_vision_ai.h

Provide the utilities that are required to adapt the representation of the NN input data, post-process the NN output data, initialize the NN, and run an inference of the NN. These files require to be adapted by the user for application parameters when integrating a new Neural Network model.

fp_vision_camera.c and fp_vision_camera.h

Provide the functions to configure and manage the camera module.

fp_vision_display.c and fp_vision_display.h

Provide the functions to configure and manage the LCD display.

fp_vision_preproc.c and fp_vision_preproc.h

Provide an abstraction layer to the image preprocessing library (located in Middlewares/ST/STM32_Imag e).

fp_vision_test.c and fp_vision_test.h

Provide a set of functions for testing, debugging and validating the application.

fp_vision_utils.c and fp_vision_utils.h

Provide a set of miscellaneous utilities.

UM2611 - Rev 3

page 10/50

 

 

UM2611

Software

Project/STM32H747I-DISCO/Applications/FoodReco_MobileNetDerivative

This folder contains all the source code related to the food recognition application. It contains two sub-folders, one sub-folder per application example:

One demonstrating the integration of the float C model (32-bit float C code)

One demonstrating the integration of the quantized C model (8-bit integer C code)

Each sub-folder is composed as follows:

Binary

Contains the binaries for the applications:

STM32H747I-DISCO_u_v_w_x_y_z.bin

Binaries generated from the source files contained in the Float_Model/CM7 and Quantized_Model /CM7 folders.

u corresponds to the application name. For the food recognition application, the value is:

Food

v corresponds to the model type. For the food recognition application, it can be:

Std (for standard)

Optimized (for optimized)

When v is Opt, it means that the binary is generated from sources that are not released as part of the FP-AI-VISION1 function pack since they are generated from a specific version of the food recognition CNN model. This specific version of the model is further optimized for a better tradeoff between accuracy and embedded constraints such as memory footprint and MIPS. Contact STMicroelectronics for information about this specific version.

w corresponds to the data representation of the model type. For the food recognition application, it can be:

Float (for float 32 bits)

Quant8 (for quantized 8 bits)

x corresponds to the configuration for the volatile data memory allocation. For the food recognition application, it can be:

Ext (for external SDRAM)

Split (for split between internal SRAM and external SDRAM)

IntMem (for internal SRAM with memory optimized)

IntFps (for internal SRAM with FPS optimized)

y corresponds to the memory allocation configurations for the non-volatile data. For the food recognition application, it can be:

IntFlash (for internal Flash memory)

QspiFlash (for external Q-SPI Flash memory)

ExtSdram (for external SDRAM)

z corresponds to the version number of the FP-AI-VISION1 release. It is expressed as Vabc where a, b and c represent the major version, minor version, and patch version numbers respectively. For the food recognition application corresponding to this user manual, the value is:

V200

UM2611 - Rev 3

page 11/50

 

 

UM2611

Software

 

CM7

 

 

 

 

Contains the source code specific to the food recognition application example that is executed on the

 

 

Cortex®-M7 core. There are two types of files:

 

 

Files that are generated by the STM32Cube.AI tool (X-CUBE-AI):

 

 

 

network.c and network.h: contain the description of the CNN topology

 

 

 

network_data.c and network_data.h: contain the weights and biases of the CNN

 

 

Files that contain the application:

 

 

 

main.c and main.h

 

 

 

fp_vision_app.c and fp_vision_app.h

 

 

 

 

Used to configure the application specific settings.

 

 

 

stm32h7xx_it.c and stm32h7xx_it.h

 

 

 

 

Implement the interrupt handlers.

 

CM4

 

 

 

 

This folder is empty since all the code of the food recognition application is running on the Cortex®-M7 core.

 

Common

 

 

 

Contains the source code that is common to the Cortex®-M7 and Cortex®-M4 cores.

 

EWARM

 

 

 

Contains the IAR Systems IAR Embedded Workbench® workspace and project files for the application

 

 

example. It also contains the startup files for both cores.

 

MDK-ARM

 

 

 

Contains the Keil® MDK-ARM workspace and project files for the application example. It also contains the

 

 

startup files for both cores.

 

STM32CubeIDE

 

 

Contains the STM32CubeIDE workspace and project files for the application example. It also contains the

 

 

startup files for both cores.

Note:

For the EWARM, MDK-ARM and STM32CubeIDE sub-folders, each application project may contain several

 

configurations. Each configuration corresponds to:

 

A specific data placement in the volatile memory (RAM)

 

A specific placement of the weight-and-bias table in the non-volatile memory (Flash)

Project/STM32H747I-DISCO/Applications/PersonDetection

This folder contains the source code that is specific to the person presence detection applications. It contains two sub-folders, one sub-folder per application example:

One demonstrating the integration of a low-complexity model (so-called Google_Model)

One demonstrating the integration of a medium-complexity model (so-called MobileNetv2_Model)

The organization of sub-folders is identical to the one of the sub-folders described above in the context of the food recognition application examples.

UM2611 - Rev 3

page 12/50

 

 

UM2611

Software

The Binary sub-folder contains the binaries for the applications. The binaries are named as STM32H747I-DISC O_u_v_w_x_y_z.bin where:

u corresponds to the application name. For the person presence detection applications, the value is:

Person

v corresponds to the model type. For the person presence detection applications, it can be:

Google

MobileNetV2

w corresponds to the data representation of the model type. For the person presence detection applications, the value is:

Quant8 (for quantized 8 bits)

x corresponds to the memory allocation configurations for the volatile data. For the person presence detection applications, the value is:

IntFps (for internal SRAM with FPS optimized)

y corresponds to the memory allocation configurations for the non-volatile data. For the person presence detection applications, the value is:

IntFlash (for internal Flash memory)

z corresponds to the version number of the FP-AI-VISION1 release. It is expressed as Vabc where a, b and c represent the major version, minor version, and patch version numbers respectively. For the person presence detection application corresponding to this user manual, the value is:

V200

Utilities/AI_resources/Food-Recognition

This sub-folder contains:

The original trained model (file FoodReco_MobileNet_Derivative_Float.h5) for the food recognition CNN used in the application examples. This model is used to generate:

Either directly the floating-point C code via STM32Cube.AI (X-CUBE-AI)

Or the 8-bit quantized model via the quantization process, and then subsequently the integer C code via STM32Cube.AI (X-CUBE-AI)

The files required for the quantization process (refer to Section 3.2.2 Quantization process):

config_file_foodreco_nn.json: file containing the configuration parameters for the quantization operation

test_set_generation_foodreco_nn.py: file containing the function used to prepare the test vectors used in the quantization process

The quantized model generated by the quantization tool (files FoodReco_MobileNet_Derivative_Qua ntized.json and FoodReco_MobileNet_Derivative_Quantized.h5)

The re-training script (refer to ): FoodDetection.py along with a Jupyternotebook (FoodDetection.i pynb)

A script (create_dataset.py) to convert a dataset of images in the format expected by the piece of firmware performing the validation on board (refer to Onboard Validation mode in Section 3.2.8 Embedded validation, capture and testing)

Utilities/AI_resources/PersonDetection

This sub-folder contains:

MobileNetv2_Model/README.md: describes how to retrain a new person detection image classifier from a pre-trained network using TensorFlow.

MobileNetv2_Model/create_dataset.py: Pythonscript to create the ***Person20*** dataset from the previously downloaded COCO dataset as described in the README.md file.

MobileNetv2_Model/train.py: Pythonscript to create an image classifier model from a pre-trained MobileNetV2 head.

UM2611 - Rev 3

page 13/50

 

 

UM2611

Software

MobileNetv2_Model/quantize.py: Pythonscript to perform post-training quantization on a Keras model using the TFLiteConverter tool from TensorFlow. Sample images are required to run the quantization operation.

3.2.2Quantization process

The quantization process consists in quantizing the parameters (weights and biases) as well as the activations of a NN in order to obtain a quantized model having parameters and activations represented on 8-bit integers.

Quantizing a model reduces the memory footprint because weights, biases, and activations are on 8 bits instead of 32 bits in a float model. It also reduces the inference execution time through the optimized DSP unit of the Cortex®-M7 core.

Several quantization schemes are supported by the STM32Cube.AI (X-CUBE-AI) tool:

Fixed point Qm,n

Integer arithmetic (signed and unsigned)

Refer to the STM32Cube.AI tool (X-CUBE-AI) documentation in [1] for more information on the different quantization schemes and how to run the quantization process.

Note: • Two datasets are required for the quantization operation. It is up to the user to provide his own datasets.

The impact of the quantization on the accuracy of the final output depends on the CNN model (that is its topology), but also on the quantization process: the test dataset and the quantization algorithm have a significant impact on the final accuracy.

3.2.3Training scripts

Training scripts are provided for each application.

3.2.3.1Food recognition application

File Utilities/AI_ressources/Food-Recognition/FoodDetection.ipynb contains an example script showing how to train the MobileNet derivative model used in the function pack. As the dataset used to train the model provided in the function pack is not publicly available, the training script relies on a subset of the Food-101 dataset (refer to [4]). This publicly available dataset contains images of 101 food categories with 1000 images per category.

In order to keep the training process short, the script uses only 50 images per food category, and limits the training of the model to 20 epochs. To achieve a training on the whole dataset, the variable max_imgs_per_class in section Prepare the test and train datasets must be updated to np.inf.

Note:

The use of the GPU is recommended for the complete training on the whole dataset.

 

The Jupyternotebook is also available as a plain Pythonscript in the Utilities/AI_ressources/Food-R

 

ecognition/FoodDetection.py file.

3.2.3.2Person presence detection application

File Utilities/AI_ressources/PresenceDetection/MobileNetv2_Model/train.py contains an example script showing how to retrain the MobileNetV2 model by using transfer learning. The training script relies on the ***Person20*** dataset. Instructions on how to build the ***Person20*** dataset from the publiclyavailable COCO-2014 dataset can be found in Utilities/AI_resources/PersonDetection/MobileNetv 2_Model/README.md along with the Utilities/AI_resources/PersonDetection/MobileNetv2_Mode l/create_dataset.py Pythonscript to filter COCO images. An example Pythonscript to perform posttraining quantization is available in Utilities/AI_resources/PersonDetection/MobileNetv2_Model/q uantize.py. The post-training quantization is performed on a Keras model using the TFLiteConverter tool from TensorFlow. Sample images are required to run the quantization operation. Sample images can be extracted from the model training set.

UM2611 - Rev 3

page 14/50

 

 

UM2611

Software

3.2.4Memory requirements

When integrating a C model generated by the STM32Cube.AI (X-CUBE-AI) tool, the following memory requirements must be considered:

Volatile (RAM) memory requirement: memory space is required to allocate:

The inference working buffer (called the activation buffer in this document). This buffer is used during inference to store the temporary results of the intermediate layers within the Neural Network.

The inference input buffer (called the nn_input buffer in this document), which is used to hold the input data of the Neural Network.

Non-volatile (Flash) memory requirement: memory space is required to store the table containing the weights and biases of the network model.

On top of the above-listed memory requirements, some more requirements come into play when integrating the C model for a computer vision application:

Volatile (RAM) memory requirement: memory space is required in order to allocate the various buffers that are used across the execution of the image pipeline (camera capture, frame pre-processing).

UM2611 - Rev 3

page 15/50

 

 

Loading...
+ 35 hidden pages