Artificial Intelligence (AI) and computer vision function pack
for STM32H7 microcontrollers
Introduction
FP-AI-VISION1 is a function pack (FP) demonstrating the capability of STM32H7 Series microcontrollers to execute a
Convolutional Neural Network (CNN) efficiently in relation to computer vision tasks. FP-AI-VISION1 contains everything needed
to build a CNN-based computer vision application on STM32H7 microcontrollers.
FP-AI-VISION1 also demonstrates several memory allocation configurations for the data involved in the application. Each
configuration enables the handling of specific requirements in terms of amount of data required by the application. Accordingly,
FP-AI-VISION1 implements examples describing how to place the different types of data efficiently in both the on-chip and
external memories. These examples enable the user to understand easily which memory allocation fits his requirements the
best.
This user manual describes the content of the FP-AI-VISION1 function pack and details the different steps to be carried out in
order to build a CNN-based computer vision application on STM32H7 microcontrollers.
UM2611 - Rev 3 - September 2020
For further information contact your local STMicroelectronics sales office.
www.st.com
1General information
The FP-AI-VISION1 function pack runs on the STM32H7 microcontrollers based on the Arm® Cortex®-M7
processor.
Note:Arm is a registered trademark of Arm Limited (or its subsidiaries) in the US and/or elsewhere.
1.1FP-AI-VISION1 function pack feature overview
•Runs on the STM32H747I-DISCO board connected with the STM32F4DIS-CAM camera daughterboard
•Includes three image classification application examples based on CNN:
–One food recognition application operating on color (RGB 24 bits) frame images
–One person presence detection application operating on color (RGB 24 bits) frame images
–One person presence detection application operating on grayscale (8 bits) frame images
•Includes complete application firmware for camera capture, frame image preprocessing, inference execution
and output post-processing
•Includes examples of integration of both floating-point and 8-bit quantized C models
•Supports several configurations for data memory placement in order to meet application requirements
•Includes test and validation firmware in order to test, debug and validate the embedded application
Starting from a floating-point CNN model (designed and trained using a framework such as Keras), the user
generates an optimized C code (using the STM32Cube.AI tool, [1]) and integrates it in a computer vision
framework (provided as part of FP-AI-VISION1) in order to build his computer vision application on STM32H7.
Note:For users having selected a dual-core MCU like the STM32H747 for their application but running it on the
Cortex®-M7 core only: STM32CubeMX does not support the addition of packages like the STM32Cube.AI (X-
CUBE-AI) to the project. As a consequence, when using STM32CubeMX along with STM32Cube.AI, a single-
core MCU like the STM32H743 must be selected to be able to generate the Neural Network code for the
Cortex®-M7 core.
The user has the possibility to select one of two options for generating the C code:
•Either generating the floating-point C code directly from the CNN model in floating-point
•Or quantizing the floating-point CNN model to obtain an 8-bit model, and subsequently generating the
corresponding quantized C code
UM2611 - Rev 3
page 5/50
UM2611
Integration of the generated code
For most CNN models, the second option enables to reduce the memory footprint (Flash and RAM) as well as
inference time. The impact on the final output accuracy depends on the CNN model as well as on the quantization
process (mainly the test dataset and the quantization algorithm).
As part of the FP-AI-VISION1 function pack, three image classification application examples are provided
including the following material:
•One food recognition application:
–Floating–point Keras model (.h5 file)
–8-bit quantized model (.h5 file + .json file) obtained using STM32Cube.AI (X-CUBE-AI) quantizer
–Generated C code in both floating point and 8-bit quantized format
–Example of computer vision application integration based on C code generated by STM32Cube.AI (X-
CUBE-AI)
•Two person presence detection applications:
–8-bit quantized models (.tflite file) obtained using the TFLiteConverter tool
–Generated C code in 8-bit quantized format
–Examples of computer vision application integration based on C code generated by STM32Cube.AI (X-
CUBE-AI)
1. TensorFlow is a trademark of Google Inc.
(1)
2.1Integration of the generated code
From a float or quantized model, the user must use the STM32Cube.AI tool (X-CUBE-AI) to generate the
corresponding optimized C code.
When using the GUI version of STM32Cube.AI (X-CUBE-AI) with the user's own .ioc file, the following set of
files is generated in the output directory:
•Src\network.c and Inc\network.h: contain the description of the CNN topology
•Src\network_data.c and Inc\network_data.c: contain the weights and biases of the CNN
Note:For the network, the user must keep the default name, which is “network”. Otherwise, the user must rename all
the functions and macros contained in files ai_interface.c and ai_interface.h. The purpose of the ai_interface.c and ai_interface.h files is to provide an abstraction interface to the NN API.
From that point, the user must copy and replace the above generated .c files and .h files respectively into the
following directories:
An alternate solution is to use the CLI (command-line interface) version of STM32Cube.AI (X-CUBE-AI), so that
the generated files be directly copied into the Src and Inc directories contained in the output directory provided
on the command line. This solution does not require any manual copy/paste operation.
page 6/50
UM2611
Integration of the generated code
The application parameters are configured in files fp_vision_app.c and fp_vision_app.h where they can
be easily adapted to the user's needs.
In file fp_vision_app.c:
•The output_labels[] table of strings (where each string corresponds to one output class of the Neural
Network model) is the only place where adaptation is absolutely required for a new application.
•The App_Context_Init() function is in charge of initializing the different software components of the
application. Some changes may be required to:
–adapt the camera orientation
–adapt the path to read input images from the microSD™ card when in Onboard Validation mode
–adapt to the NN input data range used during the training phase
–adapt the pixel color format of the NN input data
In file fp_vision_app.h:
•The two following #define must absolutely be updated with the dimensions of the NN input tensor:
–AI_NETWORK_WIDTH
–AI_NETWORK_HEIGHT
UM2611 - Rev 3
page 7/50
3Package content
3.1CNN model
The FP-AI-VISION1 function pack is demonstrating two CNN-based image classification applications:
•A food-recognition application recognizing 18 types of food and drink
•A person presence detection application identifying whether a person is present in the image or not
3.1.1Food recognition application
The food-recognition CNN is a derivative of the MobileNet model (refer to [3]).
MobileNet is an efficient model architecture [3] suitable for mobile and embedded vision applications. This model
architecture was proposed by Google®.
The MobileNet model architecture includes two simple global hyper-parameters that efficiently trade off between
latency and accuracy. Basically these hyper-parameters allow the model builder to determine the application rightsized model based on the constraints of the problem.
The food recognition model that is used in this FP has been built by adjusting these hyper-parameters for an
optimal trade-off between accuracy, computational cost and memory footprint, considering the STM32H747 target
constraints.
The food-recognition CNN model has been trained on a custom database of 18 types of food and drink:
•Apple pie
•Beer
•Caesar salad
•Cappuccino
•Cheesecake
•Chicken wings
•Chocolate cake
Coke
™
•
•Cupcake
•Donut
•French fries
•Hamburger
•Hot dog
•Lasagna
•Pizza
•Risotto
•Spaghetti bolognese
•Steak
The food-recognition CNN is expecting color image of size 224 × 224 pixels as input, each pixel being coded on
three bytes: RGB888.
The FP-AI-VISION1 function pack includes two examples based on the food recognition application: one example
implementing the floating-point version of the generated code, and one example implementing the quantized
version of the generated code.
UM2611
Package content
3.1.2Person presence detection application
Two person presence detection applications are provided in this package:
•One based on a low-complexity CNN model (so-called Google_Model) working on grayscale images (8 bits
per pixel) with a resolution of 96 × 96 pixels. The model is downloaded from storage.googleapis.com.
•One based on a higher-complexity CNN model (so-called MobileNetv2_Model) working on color images
(24 bits per pixel) with a resolution of 128 × 128 pixels.
The person presence detection models contain two output classes: Person and Not Person.
The FP-AI-VISION1 function pack demonstrates 8-bit quantized models.
UM2611 - Rev 3
page 8/50
3.2Software
3.2.1Folder organization
Figure 3 shows the folder organization in FP-AI-VISION1 function pack.
UM2611
Software
Figure 3. FP-AI-VISION1 folder tree
FLOAT
model
32-bit
floating-point
C code
Quantization tool
STM32Cube.AI
Generated code:
network.c/h
network_data.c/h
QUANTIZED
model
8-bit
integer
C code
UM2611 - Rev 3
Legend:
CNN float
CNN quantized
Ecosystem & tools
Driver
Contains all the BSP and STM32H7 HAL source code.
page 9/50
Middlewares
Contains five sub-folders:
•ST/STM32_AI_Runtime
The lib folder contains the Neural Network runtime libraries generated by STM32Cube.AI (X-CUBE-AI) for
each IDE: IAR Embedded Workbench® from IAR Systems (EWARM), MDK-ARM from Keil®, and
STM32CubeIDE from STMicroelectronics. These libraries do not need to be replaced when converting a
new Neural Network.
The Inc folder contains the include files required by the runtime libraries.
These two folders do not need to be replaced when converting a new Neural Network, unless using a new
version of the X-CUBE-AI code generator.
•ST/STM32_AI_Utilities
Contains optimized routines.
•ST/STM32_Image
Contains a library of functions for image processing. These functions are used to preprocess the input frame
image captured by the camera. The purpose of this preprocessing is to generate the adequate data (such as
size, format, and others) to be input to the Neural Network during the inference.
•ST/STM32_Fs
Contains a library of functions for handling image files using FatFS on a microSD™ card.
•Third_Party/FatFS
Third party middleware providing support for FAT file system.
UM2611
Software
Project/STM32H747I-DISCO/Applications
Contains the projects and source codes for the applications provided in the FP-AI-VISION1 FP. These
applications are running on the STM32H747 (refer to [2]), which is a dual-core microcontroller based on the
Cortex®-M7 and Cortex®-M4 processors. The application code is running only on the Cortex®-M7 core.
Project/STM32H747I-DISCO/Applications/Common
This folder contains the source code common to all applications:
•ai_interface.c and ai_interface.h
Provide an abstraction of the NN API.
•fp_vision_ai.c and fp_vision_ai.h
Provide the utilities that are required to adapt the representation of the NN input data, post-process the NN
output data, initialize the NN, and run an inference of the NN. These files require to be adapted by the user
for application parameters when integrating a new Neural Network model.
•fp_vision_camera.c and fp_vision_camera.h
Provide the functions to configure and manage the camera module.
•fp_vision_display.c and fp_vision_display.h
Provide the functions to configure and manage the LCD display.
•fp_vision_preproc.c and fp_vision_preproc.h
Provide an abstraction layer to the image preprocessing library (located in Middlewares/ST/STM32_Image).
•fp_vision_test.c and fp_vision_test.h
Provide a set of functions for testing, debugging and validating the application.
This folder contains all the source code related to the food recognition application. It contains two sub-folders, one
sub-folder per application example:
•One demonstrating the integration of the float C model (32-bit float C code)
•One demonstrating the integration of the quantized C model (8-bit integer C code)
Each sub-folder is composed as follows:
•Binary
Contains the binaries for the applications:
–STM32H747I-DISCO_u_v_w_x_y_z.bin
Binaries generated from the source files contained in the Float_Model/CM7 and Quantized_Model/CM7 folders.
◦u corresponds to the application name. For the food recognition application, the value is:
•Food
◦v corresponds to the model type. For the food recognition application, it can be:
•Std (for standard)
•Optimized (for optimized)
When v is Opt, it means that the binary is generated from sources that are not released as part of
the FP-AI-VISION1 function pack since they are generated from a specific version of the food
recognition CNN model. This specific version of the model is further optimized for a better tradeoff between accuracy and embedded constraints such as memory footprint and MIPS. Contact
STMicroelectronics for information about this specific version.
◦w corresponds to the data representation of the model type. For the food recognition application, it
can be:
•Float (for float 32 bits)
•Quant8 (for quantized 8 bits)
◦x corresponds to the configuration for the volatile data memory allocation. For the food
recognition application, it can be:
•Ext (for external SDRAM)
•Split (for split between internal SRAM and external SDRAM)
•IntMem (for internal SRAM with memory optimized)
•IntFps (for internal SRAM with FPS optimized)
◦y corresponds to the memory allocation configurations for the non-volatile data. For the food
recognition application, it can be:
•IntFlash (for internal Flash memory)
•QspiFlash (for external Q-SPI Flash memory)
•ExtSdram (for external SDRAM)
◦z corresponds to the version number of the FP-AI-VISION1 release. It is expressed as Vabc
where a, b and c represent the major version, minor version, and patch version numbers
respectively. For the food recognition application corresponding to this user manual, the value is:
•V200
UM2611 - Rev 3
page 11/50
•CM7
Contains the source code specific to the food recognition application example that is executed on the
Cortex®-M7 core. There are two types of files:
–Files that are generated by the STM32Cube.AI tool (X-CUBE-AI):
◦network.c and network.h: contain the description of the CNN topology
◦network_data.c and network_data.h: contain the weights and biases of the CNN
–Files that contain the application:
◦main.c and main.h
◦fp_vision_app.c and fp_vision_app.h
Used to configure the application specific settings.
◦stm32h7xx_it.c and stm32h7xx_it.h
Implement the interrupt handlers.
•CM4
This folder is empty since all the code of the food recognition application is running on the Cortex®-M7 core.
•Common
Contains the source code that is common to the Cortex®-M7 and Cortex®-M4 cores.
•EWARM
Contains the IAR Systems IAR Embedded Workbench® workspace and project files for the application
example. It also contains the startup files for both cores.
•MDK-ARM
Contains the Keil® MDK-ARM workspace and project files for the application example. It also contains the
startup files for both cores.
•STM32CubeIDE
Contains the STM32CubeIDE workspace and project files for the application example. It also contains the
startup files for both cores.
Note:For the EWARM, MDK-ARM and STM32CubeIDE sub-folders, each application project may contain several
configurations. Each configuration corresponds to:
•A specific data placement in the volatile memory (RAM)
•A specific placement of the weight-and-bias table in the non-volatile memory (Flash)
This folder contains the source code that is specific to the person presence detection applications. It contains two
sub-folders, one sub-folder per application example:
•One demonstrating the integration of a low-complexity model (so-called Google_Model)
•One demonstrating the integration of a medium-complexity model (so-called MobileNetv2_Model)
The organization of sub-folders is identical to the one of the sub-folders described above in the context of the food
recognition application examples.
UM2611 - Rev 3
page 12/50
UM2611
Software
The Binary sub-folder contains the binaries for the applications. The binaries are named as STM32H747I-DISC
O_u_v_w_x_y_z.bin where:
•u corresponds to the application name. For the person presence detection applications, the value is:
–Person
•v corresponds to the model type. For the person presence detection applications, it can be:
–Google
–MobileNetV2
•w corresponds to the data representation of the model type. For the person presence detection applications,
the value is:
–Quant8 (for quantized 8 bits)
•x corresponds to the memory allocation configurations for the volatile data. For the person presence
detection applications, the value is:
–IntFps (for internal SRAM with FPS optimized)
•y corresponds to the memory allocation configurations for the non-volatile data. For the person presence
detection applications, the value is:
–IntFlash (for internal Flash memory)
•z corresponds to the version number of the FP-AI-VISION1 release. It is expressed as Vabc where a, b and
c represent the major version, minor version, and patch version numbers respectively. For the person
presence detection application corresponding to this user manual, the value is:
–V200
Utilities/AI_resources/Food-Recognition
This sub-folder contains:
•The original trained model (file FoodReco_MobileNet_Derivative_Float.h5) for the food recognition
CNN used in the application examples. This model is used to generate:
–Either directly the floating-point C code via STM32Cube.AI (X-CUBE-AI)
–Or the 8-bit quantized model via the quantization process, and then subsequently the integer C code
via STM32Cube.AI (X-CUBE-AI)
•The files required for the quantization process (refer to Section 3.2.2 Quantization process):
–config_file_foodreco_nn.json: file containing the configuration parameters for the quantization
operation
–test_set_generation_foodreco_nn.py: file containing the function used to prepare the test
vectors used in the quantization process
•The quantized model generated by the quantization tool (files FoodReco_MobileNet_Derivative_Quantized.json and FoodReco_MobileNet_Derivative_Quantized.h5)
•The re-training script (refer to ): FoodDetection.py along with a Jupyter™ notebook (FoodDetection.ipynb)
•A script (create_dataset.py) to convert a dataset of images in the format expected by the piece of
firmware performing the validation on board (refer to Onboard Validation mode in Section 3.2.8 Embedded
validation, capture and testing)
Utilities/AI_resources/PersonDetection
This sub-folder contains:
•MobileNetv2_Model/README.md: describes how to retrain a new person detection image classifier from
a pre-trained network using TensorFlow™.
•MobileNetv2_Model/create_dataset.py: Python™ script to create the ***Person20*** dataset
from the previously downloaded COCO dataset as described in the README.md file.
•MobileNetv2_Model/train.py: Python™ script to create an image classifier model from a pre-trained
MobileNetV2 head.
UM2611 - Rev 3
page 13/50
UM2611
Software
•MobileNetv2_Model/quantize.py: Python™ script to perform post-training quantization on a Keras
model using the TFLiteConverter tool from TensorFlow™. Sample images are required to run the
quantization operation.
3.2.2Quantization process
The quantization process consists in quantizing the parameters (weights and biases) as well as the activations of
a NN in order to obtain a quantized model having parameters and activations represented on 8-bit integers.
Quantizing a model reduces the memory footprint because weights, biases, and activations are on 8 bits instead
of 32 bits in a float model. It also reduces the inference execution time through the optimized DSP unit of the
Cortex®-M7 core.
Several quantization schemes are supported by the STM32Cube.AI (X-CUBE-AI) tool:
•Fixed point Qm,n
•Integer arithmetic (signed and unsigned)
Refer to the STM32Cube.AI tool (X-CUBE-AI) documentation in [1] for more information on the different
quantization schemes and how to run the quantization process.
Note:•Two datasets are required for the quantization operation. It is up to the user to provide his own datasets.
•The impact of the quantization on the accuracy of the final output depends on the CNN model (that is its
topology), but also on the quantization process: the test dataset and the quantization algorithm have a
significant impact on the final accuracy.
3.2.3Training scripts
Training scripts are provided for each application.
3.2.3.1Food recognition application
File Utilities/AI_ressources/Food-Recognition/FoodDetection.ipynb contains an example script
showing how to train the MobileNet derivative model used in the function pack. As the dataset used to train the
model provided in the function pack is not publicly available, the training script relies on a subset of the Food-101
dataset (refer to [4]). This publicly available dataset contains images of 101 food categories with 1000 images per
category.
In order to keep the training process short, the script uses only 50 images per food category, and limits the
training of the model to 20 epochs. To achieve a training on the whole dataset, the variable
max_imgs_per_class in section Prepare the test and train datasets must be updated to np.inf.
Note:The use of the GPU is recommended for the complete training on the whole dataset.
The Jupyter™ notebook is also available as a plain Python™ script in the Utilities/AI_ressources/Food-R
ecognition/FoodDetection.py file.
3.2.3.2Person presence detection application
File Utilities/AI_ressources/PresenceDetection/MobileNetv2_Model/train.py contains an
example script showing how to retrain the MobileNetV2 model by using transfer learning. The training script relies
on the ***Person20*** dataset. Instructions on how to build the ***Person20*** dataset from the publiclyavailable COCO-2014 dataset can be found in Utilities/AI_resources/PersonDetection/MobileNetv
2_Model/README.md along with the Utilities/AI_resources/PersonDetection/MobileNetv2_Mode
l/create_dataset.py Python™ script to filter COCO images. An example Python™ script to perform post-
training quantization is available in Utilities/AI_resources/PersonDetection/MobileNetv2_Model/q
uantize.py. The post-training quantization is performed on a Keras model using the TFLiteConverter tool from
TensorFlow™. Sample images are required to run the quantization operation. Sample images can be extracted
from the model training set.
UM2611 - Rev 3
page 14/50
3.2.4Memory requirements
When integrating a C model generated by the STM32Cube.AI (X-CUBE-AI) tool, the following memory
requirements must be considered:
•Volatile (RAM) memory requirement: memory space is required to allocate:
–The inference working buffer (called the activation buffer in this document). This buffer is used
during inference to store the temporary results of the intermediate layers within the Neural Network.
–The inference input buffer (called the nn_input buffer in this document), which is used to hold the
input data of the Neural Network.
•Non-volatile (Flash) memory requirement: memory space is required to store the table containing the
weights and biases of the network model.
On top of the above-listed memory requirements, some more requirements come into play when integrating the C
model for a computer vision application:
•Volatile (RAM) memory requirement: memory space is required in order to allocate the various buffers that
are used across the execution of the image pipeline (camera capture, frame pre-processing).
UM2611
Software
UM2611 - Rev 3
page 15/50
Loading...
+ 35 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.