STMicroelectronics STM32H7 User Manual

UM2611
User manual
Artificial Intelligence (AI) and computer vision function pack
for STM32H7 microcontrollers

Introduction

FP-AI-VISION1 is a function pack (FP) demonstrating the capability of STM32H7 Series microcontrollers to execute a
Convolutional Neural Network (CNN) efficiently in relation to computer vision tasks. FP-AI-VISION1 contains everything needed to build a CNN-based computer vision application on STM32H7 microcontrollers.
FP-AI-VISION1 also demonstrates several memory allocation configurations for the data involved in the application. Each
configuration enables the handling of specific requirements in terms of amount of data required by the application. Accordingly,
FP-AI-VISION1 implements examples describing how to place the different types of data efficiently in both the on-chip and
external memories. These examples enable the user to understand easily which memory allocation fits his requirements the best.
This user manual describes the content of the FP-AI-VISION1 function pack and details the different steps to be carried out in order to build a CNN-based computer vision application on STM32H7 microcontrollers.
UM2611 - Rev 3 - September 2020 For further information contact your local STMicroelectronics sales office.
www.st.com

1 General information

The FP-AI-VISION1 function pack runs on the STM32H7 microcontrollers based on the Arm® Cortex®-M7 processor.
Note: Arm is a registered trademark of Arm Limited (or its subsidiaries) in the US and/or elsewhere.

1.1 FP-AI-VISION1 function pack feature overview

Runs on the STM32H747I-DISCO board connected with the STM32F4DIS-CAM camera daughterboard
Includes three image classification application examples based on CNN:
One food recognition application operating on color (RGB 24 bits) frame images
One person presence detection application operating on color (RGB 24 bits) frame images
One person presence detection application operating on grayscale (8 bits) frame images
Includes complete application firmware for camera capture, frame image preprocessing, inference execution and output post-processing
Includes examples of integration of both floating-point and 8-bit quantized C models
Supports several configurations for data memory placement in order to meet application requirements
Includes test and validation firmware in order to test, debug and validate the embedded application
Includes capture firmware enabling dataset collection
Includes support for file handling (on top of FatFS) on external microSD™ card
UM2611
General information
UM2611 - Rev 3
page 2/50

1.2 Software architecture

The top-level architecture of the FP-AI-VISION1 function pack usage is shown in Figure 1.
UM2611
Software architecture
Figure 1. FP-AI-VISION1 architecture
Applications
(food recognition, person presence detection)
STM32_AI_Runtime
(Neural Network runtime library)
STM32_AI_Utilities
(Optimized routines)
Middleware level
STM32_Fs
(FatFS abstraction)
Board support package
(BSP)
Drivers
Hardware components
STM32_Image
(Image processing library)
FatFS
(Light FAT file system)
Hardware abstraction layer
(HAL)
STM32 LCDCamera sensor
STM32H747I-DISCOSTM32F4DIS-CAM
1.3
UM2611 - Rev 3
Development boards

Terms and definitions

Table 1 presents the definitions of the acronyms that are relevant for a better contextual understanding of this
document.
Table 1. List of acronyms
Acronym Definition
API Application programming interface
BSP Board support package
CNN Convolutional Neural Network
DMA Direct memory access
FAT File allocation table
page 3/50
Acronym Definition
FatFS Light generic FAT file system
FP Function pack
FPS Frame per second
HAL Hardware abstraction layer
LCD Liquid crystal display
MCU Microcontroller unit
microSD
MIPS Million of instructions per second
NN Neural Network
RAM Random access memory
QVGA Quarter VGA
SRAM Static random access memory
VGA Video graphics array resolution
Micro secure digital
UM2611
Overview of available documents and references

1.4 Overview of available documents and references

Table 2 lists the complementary references for using FP-AI-VISION1.
Table 2. References
ID Description
User manual:
[1]
Getting started with X-CUBE-AI Expansion Package for Artificial Intelligence (AI) (UM2526).
Reference manual:
[2]
STM32H745/755 and STM32H747/757 advanced Arm
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications:
[3]
https://arxiv.org/pdf/1704.04861.pdf
The Food-101 Data Set:
[4]
https://data.vision.ee.ethz.ch/cvl/datasets_extra/food-101/
STM32CubeProgrammer software for programming STM32 products:
[5]
STM32CubeProg
Keras - The Python Deep Learning library:
[6]
https://keras.io/
STM32Cube initialization code generator:
[7]
STM32CubeMX
®
-based 32-bit MCUs (RM0399).
UM2611 - Rev 3
page 4/50
UM2611
Building a CNN-based computer vision application on STM32H7

2 Building a CNN-based computer vision application on STM32H7

Figure 2 illustrates the different steps to obtain a CNN-based computer vision application running on the
STM32H7 microcontrollers.
Figure 2. CNN-based computer vision application build flow
network.c/h
network_data.c/h
FLOAT
model
32-bit
floating-point
C code
Build
float
C model
Quantization tool
STM32Cube.AI
Runtime
library
STM32H7
drivers
Image
preprocessing library
Main framework
fp_vision_app.c/h
img_preprocess.c/h
main.c/h
Neural Network
runtime library
QUANTIZED
model
network.c/h
network_data.c/h
8-bit integer C code
Build
quantized
C model
Legend:
CNN float
CNN quantized
Generated library
Computer vision application on STM32H7
Other libraries
Ecosystem & tools
Starting from a floating-point CNN model (designed and trained using a framework such as Keras), the user generates an optimized C code (using the STM32Cube.AI tool, [1]) and integrates it in a computer vision framework (provided as part of FP-AI-VISION1) in order to build his computer vision application on STM32H7.
Note: For users having selected a dual-core MCU like the STM32H747 for their application but running it on the
Cortex®-M7 core only: STM32CubeMX does not support the addition of packages like the STM32Cube.AI (X-
CUBE-AI) to the project. As a consequence, when using STM32CubeMX along with STM32Cube.AI, a single-
core MCU like the STM32H743 must be selected to be able to generate the Neural Network code for the Cortex®-M7 core.
The user has the possibility to select one of two options for generating the C code:
Either generating the floating-point C code directly from the CNN model in floating-point
Or quantizing the floating-point CNN model to obtain an 8-bit model, and subsequently generating the corresponding quantized C code
UM2611 - Rev 3
page 5/50
UM2611
Integration of the generated code
For most CNN models, the second option enables to reduce the memory footprint (Flash and RAM) as well as inference time. The impact on the final output accuracy depends on the CNN model as well as on the quantization process (mainly the test dataset and the quantization algorithm).
As part of the FP-AI-VISION1 function pack, three image classification application examples are provided including the following material:
One food recognition application:
Floating–point Keras model (.h5 file)
8-bit quantized model (.h5 file + .json file) obtained using STM32Cube.AI (X-CUBE-AI) quantizer
Generated C code in both floating point and 8-bit quantized format
Example of computer vision application integration based on C code generated by STM32Cube.AI (X-
CUBE-AI)
Two person presence detection applications:
8-bit quantized models (.tflite file) obtained using the TFLiteConverter tool
Generated C code in 8-bit quantized format
Examples of computer vision application integration based on C code generated by STM32Cube.AI (X-
CUBE-AI)
1. TensorFlow is a trademark of Google Inc.
(1)

2.1 Integration of the generated code

From a float or quantized model, the user must use the STM32Cube.AI tool (X-CUBE-AI) to generate the corresponding optimized C code.
When using the GUI version of STM32Cube.AI (X-CUBE-AI) with the user's own .ioc file, the following set of files is generated in the output directory:
Src\network.c and Inc\network.h: contain the description of the CNN topology
Src\network_data.c and Inc\network_data.c: contain the weights and biases of the CNN
Note: For the network, the user must keep the default name, which is “network”. Otherwise, the user must rename all
the functions and macros contained in files ai_interface.c and ai_interface.h. The purpose of the ai_ interface.c and ai_interface.h files is to provide an abstraction interface to the NN API.
From that point, the user must copy and replace the above generated .c files and .h files respectively into the following directories:
\Projects\STM32H747I-DISCO\Applications\<app_name>\CM7\Src
<app_name> is any of
FoodReco_MobileNetDerivative\Float_Model
FoodReco_MobileNetDerivative\Quantized_Model
PersonDetection\Google_Model
PersonDetection\MobileNetv2_Model
\Projects\STM32H747I-DISCO\Applications\<app_name>\CM7\Inc
<app_name> is any of
FoodReco_MobileNetDerivative\Float_Model
FoodReco_MobileNetDerivative\Quantized_Model
PersonDetection\Google_Model
PersonDetection\MobileNetv2_Model
UM2611 - Rev 3
An alternate solution is to use the CLI (command-line interface) version of STM32Cube.AI (X-CUBE-AI), so that the generated files be directly copied into the Src and Inc directories contained in the output directory provided on the command line. This solution does not require any manual copy/paste operation.
page 6/50
UM2611
Integration of the generated code
The application parameters are configured in files fp_vision_app.c and fp_vision_app.h where they can be easily adapted to the user's needs.
In file fp_vision_app.c:
The output_labels[] table of strings (where each string corresponds to one output class of the Neural Network model) is the only place where adaptation is absolutely required for a new application.
The App_Context_Init() function is in charge of initializing the different software components of the application. Some changes may be required to:
adapt the camera orientation
adapt the path to read input images from the microSD™ card when in Onboard Validation mode
adapt to the NN input data range used during the training phase
adapt the pixel color format of the NN input data
In file fp_vision_app.h:
The two following #define must absolutely be updated with the dimensions of the NN input tensor:
AI_NETWORK_WIDTH
AI_NETWORK_HEIGHT
UM2611 - Rev 3
page 7/50

3 Package content

3.1 CNN model

The FP-AI-VISION1 function pack is demonstrating two CNN-based image classification applications:
A food-recognition application recognizing 18 types of food and drink
A person presence detection application identifying whether a person is present in the image or not

3.1.1 Food recognition application

The food-recognition CNN is a derivative of the MobileNet model (refer to [3]).
MobileNet is an efficient model architecture [3] suitable for mobile and embedded vision applications. This model architecture was proposed by Google®.
The MobileNet model architecture includes two simple global hyper-parameters that efficiently trade off between latency and accuracy. Basically these hyper-parameters allow the model builder to determine the application right­sized model based on the constraints of the problem.
The food recognition model that is used in this FP has been built by adjusting these hyper-parameters for an optimal trade-off between accuracy, computational cost and memory footprint, considering the STM32H747 target constraints.
The food-recognition CNN model has been trained on a custom database of 18 types of food and drink:
Apple pie
Beer
Caesar salad
Cappuccino
Cheesecake
Chicken wings
Chocolate cake
Coke
Cupcake
Donut
French fries
Hamburger
Hot dog
Lasagna
Pizza
Risotto
Spaghetti bolognese
Steak
The food-recognition CNN is expecting color image of size 224 × 224 pixels as input, each pixel being coded on three bytes: RGB888.
The FP-AI-VISION1 function pack includes two examples based on the food recognition application: one example implementing the floating-point version of the generated code, and one example implementing the quantized version of the generated code.
UM2611
Package content

3.1.2 Person presence detection application

Two person presence detection applications are provided in this package:
One based on a low-complexity CNN model (so-called Google_Model) working on grayscale images (8 bits per pixel) with a resolution of 96 × 96 pixels. The model is downloaded from storage.googleapis.com.
One based on a higher-complexity CNN model (so-called MobileNetv2_Model) working on color images (24 bits per pixel) with a resolution of 128 × 128 pixels.
The person presence detection models contain two output classes: Person and Not Person.
The FP-AI-VISION1 function pack demonstrates 8-bit quantized models.
UM2611 - Rev 3
page 8/50

3.2 Software

3.2.1 Folder organization

Figure 3 shows the folder organization in FP-AI-VISION1 function pack.
UM2611
Software
Figure 3. FP-AI-VISION1 folder tree
FLOAT
model
32-bit
floating-point
C code
Quantization tool
STM32Cube.AI
Generated code:
network.c/h
network_data.c/h
QUANTIZED
model
8-bit integer C code
UM2611 - Rev 3
Legend:
CNN float
CNN quantized
Ecosystem & tools
Driver
Contains all the BSP and STM32H7 HAL source code.
page 9/50
Middlewares
Contains five sub-folders:
ST/STM32_AI_Runtime
The lib folder contains the Neural Network runtime libraries generated by STM32Cube.AI (X-CUBE-AI) for each IDE: IAR Embedded Workbench® from IAR Systems (EWARM), MDK-ARM from Keil®, and
STM32CubeIDE from STMicroelectronics. These libraries do not need to be replaced when converting a
new Neural Network.
The Inc folder contains the include files required by the runtime libraries.
These two folders do not need to be replaced when converting a new Neural Network, unless using a new version of the X-CUBE-AI code generator.
ST/STM32_AI_Utilities
Contains optimized routines.
ST/STM32_Image
Contains a library of functions for image processing. These functions are used to preprocess the input frame image captured by the camera. The purpose of this preprocessing is to generate the adequate data (such as size, format, and others) to be input to the Neural Network during the inference.
ST/STM32_Fs
Contains a library of functions for handling image files using FatFS on a microSD™ card.
Third_Party/FatFS
Third party middleware providing support for FAT file system.
UM2611
Software
Project/STM32H747I-DISCO/Applications
Contains the projects and source codes for the applications provided in the FP-AI-VISION1 FP. These applications are running on the STM32H747 (refer to [2]), which is a dual-core microcontroller based on the
Cortex®-M7 and Cortex®-M4 processors. The application code is running only on the Cortex®-M7 core.
Project/STM32H747I-DISCO/Applications/Common
This folder contains the source code common to all applications:
ai_interface.c and ai_interface.h
Provide an abstraction of the NN API.
fp_vision_ai.c and fp_vision_ai.h
Provide the utilities that are required to adapt the representation of the NN input data, post-process the NN output data, initialize the NN, and run an inference of the NN. These files require to be adapted by the user for application parameters when integrating a new Neural Network model.
fp_vision_camera.c and fp_vision_camera.h
Provide the functions to configure and manage the camera module.
fp_vision_display.c and fp_vision_display.h
Provide the functions to configure and manage the LCD display.
fp_vision_preproc.c and fp_vision_preproc.h
Provide an abstraction layer to the image preprocessing library (located in Middlewares/ST/STM32_Imag e).
fp_vision_test.c and fp_vision_test.h
Provide a set of functions for testing, debugging and validating the application.
fp_vision_utils.c and fp_vision_utils.h
Provide a set of miscellaneous utilities.
UM2611 - Rev 3
page 10/50
UM2611
Software
Project/STM32H747I-DISCO/Applications/FoodReco_MobileNetDerivative
This folder contains all the source code related to the food recognition application. It contains two sub-folders, one sub-folder per application example:
One demonstrating the integration of the float C model (32-bit float C code)
One demonstrating the integration of the quantized C model (8-bit integer C code)
Each sub-folder is composed as follows:
Binary
Contains the binaries for the applications:
STM32H747I-DISCO_u_v_w_x_y_z.bin
Binaries generated from the source files contained in the Float_Model/CM7 and Quantized_Model /CM7 folders.
u corresponds to the application name. For the food recognition application, the value is:
Food
v corresponds to the model type. For the food recognition application, it can be:
Std (for standard)
Optimized (for optimized)
When v is Opt, it means that the binary is generated from sources that are not released as part of the FP-AI-VISION1 function pack since they are generated from a specific version of the food recognition CNN model. This specific version of the model is further optimized for a better trade­off between accuracy and embedded constraints such as memory footprint and MIPS. Contact STMicroelectronics for information about this specific version.
w corresponds to the data representation of the model type. For the food recognition application, it
can be:
Float (for float 32 bits)
Quant8 (for quantized 8 bits)
x corresponds to the configuration for the volatile data memory allocation. For the food
recognition application, it can be:
Ext (for external SDRAM)
Split (for split between internal SRAM and external SDRAM)
IntMem (for internal SRAM with memory optimized)
IntFps (for internal SRAM with FPS optimized)
y corresponds to the memory allocation configurations for the non-volatile data. For the food
recognition application, it can be:
IntFlash (for internal Flash memory)
QspiFlash (for external Q-SPI Flash memory)
ExtSdram (for external SDRAM)
z corresponds to the version number of the FP-AI-VISION1 release. It is expressed as Vabc
where a, b and c represent the major version, minor version, and patch version numbers respectively. For the food recognition application corresponding to this user manual, the value is:
V200
UM2611 - Rev 3
page 11/50
CM7
Contains the source code specific to the food recognition application example that is executed on the Cortex®-M7 core. There are two types of files:
Files that are generated by the STM32Cube.AI tool (X-CUBE-AI):
network.c and network.h: contain the description of the CNN topology
network_data.c and network_data.h: contain the weights and biases of the CNN
Files that contain the application:
main.c and main.h
fp_vision_app.c and fp_vision_app.h
Used to configure the application specific settings.
stm32h7xx_it.c and stm32h7xx_it.h
Implement the interrupt handlers.
CM4
This folder is empty since all the code of the food recognition application is running on the Cortex®-M7 core.
Common
Contains the source code that is common to the Cortex®-M7 and Cortex®-M4 cores.
EWARM
Contains the IAR Systems IAR Embedded Workbench® workspace and project files for the application example. It also contains the startup files for both cores.
MDK-ARM
Contains the Keil® MDK-ARM workspace and project files for the application example. It also contains the startup files for both cores.
STM32CubeIDE
Contains the STM32CubeIDE workspace and project files for the application example. It also contains the startup files for both cores.
Note: For the EWARM, MDK-ARM and STM32CubeIDE sub-folders, each application project may contain several
configurations. Each configuration corresponds to:
A specific data placement in the volatile memory (RAM)
A specific placement of the weight-and-bias table in the non-volatile memory (Flash)
UM2611
Software
Project/STM32H747I-DISCO/Applications/PersonDetection
This folder contains the source code that is specific to the person presence detection applications. It contains two sub-folders, one sub-folder per application example:
One demonstrating the integration of a low-complexity model (so-called Google_Model)
One demonstrating the integration of a medium-complexity model (so-called MobileNetv2_Model)
The organization of sub-folders is identical to the one of the sub-folders described above in the context of the food recognition application examples.
UM2611 - Rev 3
page 12/50
UM2611
Software
The Binary sub-folder contains the binaries for the applications. The binaries are named as STM32H747I-DISC O_u_v_w_x_y_z.bin where:
u corresponds to the application name. For the person presence detection applications, the value is:
Person
v corresponds to the model type. For the person presence detection applications, it can be:
Google
MobileNetV2
w corresponds to the data representation of the model type. For the person presence detection applications, the value is:
Quant8 (for quantized 8 bits)
x corresponds to the memory allocation configurations for the volatile data. For the person presence detection applications, the value is:
IntFps (for internal SRAM with FPS optimized)
y corresponds to the memory allocation configurations for the non-volatile data. For the person presence detection applications, the value is:
IntFlash (for internal Flash memory)
z corresponds to the version number of the FP-AI-VISION1 release. It is expressed as Vabc where a, b and c represent the major version, minor version, and patch version numbers respectively. For the person presence detection application corresponding to this user manual, the value is:
V200
Utilities/AI_resources/Food-Recognition
This sub-folder contains:
The original trained model (file FoodReco_MobileNet_Derivative_Float.h5) for the food recognition CNN used in the application examples. This model is used to generate:
Either directly the floating-point C code via STM32Cube.AI (X-CUBE-AI)
Or the 8-bit quantized model via the quantization process, and then subsequently the integer C code
via STM32Cube.AI (X-CUBE-AI)
The files required for the quantization process (refer to Section 3.2.2 Quantization process):
config_file_foodreco_nn.json: file containing the configuration parameters for the quantization
operation
test_set_generation_foodreco_nn.py: file containing the function used to prepare the test
vectors used in the quantization process
The quantized model generated by the quantization tool (files FoodReco_MobileNet_Derivative_Qua ntized.json and FoodReco_MobileNet_Derivative_Quantized.h5)
The re-training script (refer to ): FoodDetection.py along with a Jupyter™ notebook (FoodDetection.i pynb)
A script (create_dataset.py) to convert a dataset of images in the format expected by the piece of firmware performing the validation on board (refer to Onboard Validation mode in Section 3.2.8 Embedded
validation, capture and testing)
Utilities/AI_resources/PersonDetection
This sub-folder contains:
MobileNetv2_Model/README.md: describes how to retrain a new person detection image classifier from a pre-trained network using TensorFlow™.
MobileNetv2_Model/create_dataset.py: Python™ script to create the ***Person20*** dataset from the previously downloaded COCO dataset as described in the README.md file.
MobileNetv2_Model/train.py: Python™ script to create an image classifier model from a pre-trained MobileNetV2 head.
UM2611 - Rev 3
page 13/50
UM2611
Software
MobileNetv2_Model/quantize.py: Python™ script to perform post-training quantization on a Keras model using the TFLiteConverter tool from TensorFlow™. Sample images are required to run the
quantization operation.

3.2.2 Quantization process

The quantization process consists in quantizing the parameters (weights and biases) as well as the activations of a NN in order to obtain a quantized model having parameters and activations represented on 8-bit integers.
Quantizing a model reduces the memory footprint because weights, biases, and activations are on 8 bits instead of 32 bits in a float model. It also reduces the inference execution time through the optimized DSP unit of the
Cortex®-M7 core.
Several quantization schemes are supported by the STM32Cube.AI (X-CUBE-AI) tool:
Fixed point Qm,n
Integer arithmetic (signed and unsigned)
Refer to the STM32Cube.AI tool (X-CUBE-AI) documentation in [1] for more information on the different quantization schemes and how to run the quantization process.
Note: Two datasets are required for the quantization operation. It is up to the user to provide his own datasets.
The impact of the quantization on the accuracy of the final output depends on the CNN model (that is its topology), but also on the quantization process: the test dataset and the quantization algorithm have a significant impact on the final accuracy.

3.2.3 Training scripts

Training scripts are provided for each application.
3.2.3.1 Food recognition application
File Utilities/AI_ressources/Food-Recognition/FoodDetection.ipynb contains an example script showing how to train the MobileNet derivative model used in the function pack. As the dataset used to train the model provided in the function pack is not publicly available, the training script relies on a subset of the Food-101 dataset (refer to [4]). This publicly available dataset contains images of 101 food categories with 1000 images per category.
In order to keep the training process short, the script uses only 50 images per food category, and limits the training of the model to 20 epochs. To achieve a training on the whole dataset, the variable
max_imgs_per_class in section Prepare the test and train datasets must be updated to np.inf.
Note: The use of the GPU is recommended for the complete training on the whole dataset.
The Jupyter™ notebook is also available as a plain Python™ script in the Utilities/AI_ressources/Food-R ecognition/FoodDetection.py file.
3.2.3.2 Person presence detection application
File Utilities/AI_ressources/PresenceDetection/MobileNetv2_Model/train.py contains an example script showing how to retrain the MobileNetV2 model by using transfer learning. The training script relies on the ***Person20*** dataset. Instructions on how to build the ***Person20*** dataset from the publicly­available COCO-2014 dataset can be found in Utilities/AI_resources/PersonDetection/MobileNetv
2_Model/README.md along with the Utilities/AI_resources/PersonDetection/MobileNetv2_Mode l/create_dataset.py Python™ script to filter COCO images. An example Python™ script to perform post-
training quantization is available in Utilities/AI_resources/PersonDetection/MobileNetv2_Model/q uantize.py. The post-training quantization is performed on a Keras model using the TFLiteConverter tool from
TensorFlow™. Sample images are required to run the quantization operation. Sample images can be extracted from the model training set.
UM2611 - Rev 3
page 14/50

3.2.4 Memory requirements

When integrating a C model generated by the STM32Cube.AI (X-CUBE-AI) tool, the following memory requirements must be considered:
Volatile (RAM) memory requirement: memory space is required to allocate:
The inference working buffer (called the activation buffer in this document). This buffer is used
during inference to store the temporary results of the intermediate layers within the Neural Network.
The inference input buffer (called the nn_input buffer in this document), which is used to hold the
input data of the Neural Network.
Non-volatile (Flash) memory requirement: memory space is required to store the table containing the weights and biases of the network model.
On top of the above-listed memory requirements, some more requirements come into play when integrating the C model for a computer vision application:
Volatile (RAM) memory requirement: memory space is required in order to allocate the various buffers that are used across the execution of the image pipeline (camera capture, frame pre-processing).
UM2611
Software
UM2611 - Rev 3
page 15/50
Loading...
+ 35 hidden pages