, The Generation of
Open Ethernet logo, UFM®, Unbreakable Link®, Virtual Protocol Interconnect®, Voltaire® and Voltaire logo are
regist er ed trademark s of Mellano x Technologi es, Ltd.
All ot her t rademarks are property of their respective owners.
For the most u pdate d lis t of Me llanox trademark s, visit http://www.mellanox.com/page
/trademarks
NOTE:
THIS HARDWARE, SOFTWARE OR TEST SUITE PRODUCT (“PRODUCT(S)”) AND ITS RELATED
DOCUMENTATION ARE PROVI DED BY MELL ANOX TECH NOLOGIES
“AS
-IS” WITH AL L FAULTS OF ANY
KIND AND SOLELY FOR THE PURPOSE OF AIDING THE CUSTOMER IN TESTING APPLICATIONS THAT
USE THE PRODUCTS IN DESIGNATED SOLUTIONS
. THE CUSTOMER
'S MANUFACTURING TEST
ENVIRONMENT HAS NOT MET THE STANDARDS SET BY MELLANOX TECHNOLOGIES TO FULLY
QUALIFY THE PRO DUCT(S) AND/OR THE SYST EM U SING IT. THEREFORE, MEL LANOX TECHNOLOGIES
CANNOT AND DOES NOT GUARANTEE O R WARRANT THAT THE PRODUCTS WI LL OPERATE WITH THE
HIGH EST Q UALIT Y
. AN Y EXPRESS OR IMPLIED WARRANTIES
, INCLUDING
, B UT NOT L IMIT ED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT ARE DISCLAIMED
. IN NO EVENT SHALL MELLANOX BE LIABLE TO CUSTOMER OR
ANY THIRD PARTIES FOR ANY DIRECT, INDIRECT, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES OF ANY KIND
(INCLUDING, BUT NOT LIMITED TO, PAYMENT FOR PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES
; LOSS OF USE, DATA, OR P R OFI TS ;
OR BUSINESS INTERRUPTION
)
HOWEVER CAUSED AND ON ANY T HEORY OF LIABILITY, WHETHER IN CONTRACT, STR ICT LIA BILI TY
,
OR T ORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY FROM THE USE OF THE
PRODUCT(S) AND RELATED DOCUMENTATION EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
DAMAGE.
Doc #: MLNX-15-53886
Page 3
3
Mellanox Technologies Confidential
Rev 1.1
Table of Contents
Document Revision History .................................................................................................................. 6
About this Manual ................................................................................................................................. 7
Figure 4 - BlueField High Level Hardware View ..................................................................................... 20
Page 6
6
Mellanox Technologies Confidential
Rev 1.1
Document Revision History
Rev 1.1 – September 04, 2018
Added section “2.5 Building Poky Initramfs” and moved other subsections underneath it
Rev 1.0 – August 10, 2018
First release
Page 7
7
Mellanox Technologies Confidential
Rev 1.1
About this Manual
Welcome to the BlueField™ SW User Manual. This document provides information that explains the BlueField Software Distribution (BSD) and how to develop and/or customize applications, system software, and file system images for the BlueField platform.
Audience
This document is intended for software devel opers and DevOps engineers interested in
creating and/or customizing software applications and system software for the Mellanox
®
BlueField SoC platform.
Document Conventions
The following lists conventions used in this document.
NOTE: Identifies important information that contains helpful suggestions.
CAUTION: Alerts you to the risk of personal injury, system damage, or loss of data.
WARNING: Warns you that failure to take or avoid a specific action might result in
personal injury or a malfunction of the hardware or software. Be aware of the hazards
involved with electrical circuitry and be familiar with standard practices for preventing
accidents before you work on any equipment.
WARNING: Warns you that failure to take or avoid a specific action might result in
personal injury or a malfunction of the hardware or software. Be aware of the hazards
involved with electrical circuitry and be familiar with standard practices for preventing
accidents before you work on any equipment.
Common Abbreviations and Acronyms
Abbreviation / Acronym Whole Word / Description
ATF Arm Trusted Firmware
BFB BlueField™ bootstream
BSD BlueField Software Distribution
eMMC Embedded Multi-Media Card
ESP EFI system partition
FS File system
Page 8
8
Mellanox Technologies Confidential
Rev 1.1
Abbreviation / Acronym Whole Word / Description
FW Firmware
GDB GNU Debugger
GPT GUID partition
HW Hardware
IB InfiniBand
KGDB Kernel debugger
KGDBOC Kernel debugger over con s ol e
NIC Network interface card
OCD On-chip debugger
OVS Open vSwitch
PCIe PCI Express or Peripheral Component Interconnect Express
SoC System on chip
SW Software
UEFI Unified Extensible Firmware Interface
UPVS UEFI Persistent Variable Store
VPI Virtual Protocol Interconnect
Related Documentation
For additional information, see the following documents:
Firmware Release Notes for
Mellanox adapter devices
MFT User Manu al Mellanox Firmware Tools User’s Manual. See under the docs/
The InfiniBand Architecture Specification that is provided by IBTA.
See the Release Notes PDF file relevant to your adapter device under the docs/ folder of installed pa c kage.
folder of installed package.
MFT Release Notes Release Notes for the Mellanox Firmware Tools. See under the
Mellanox OFED for Linux
User Manual
WinOF User Manual Mellanox WinOF User Manual describes installation, configuration
VMA User Manual Mellanox VMA User Manual describes installation, configuration
BlueField 2U Reference
Platform Hardware User
Manual
docs/ folder of installe d package.
Intended for system administrators responsible for the installation,
configuration, management and maintenance of the software and
hardware of VPI adapter cards.
and operation of Mellanox WinOF driver.
and operation of Mellanox VMA driver.
Provides details as to the interfaces of the reference platform, speci-
fications and hardware installation instructions.
Page 9
9
Mellanox Technologies Confidential
Rev 1.1
Document Name Description
Mellanox BlueField Reference Platforms Bring Up
Guide
Mellanox BlueField
SmartNIC Installation and
Bring Up Guide
This document describes a step-by-step procedure of how to bring
up the BlueField Reference Platform.
This document describes a step-by-step procedure of how to bring
up the BlueField SmartNIC.
Page 10
10
Mellanox Technologies Confidential
Rev 1.1
1 BlueField Software Overview
It is recommended to upgrade your BlueField product to the latest software and
firmware versions available in order to enjoy the latest features and bug fixes.
Mellanox® provides software which enables users to fully utilize the BlueField™ SoC and
enjoy the rich feature-set it provides. Usi ng the BlueField software packages, users are able
to:
• Quickly and easily boot an initial Linux image on your development board
• Port existing applications to and develop new applications for BlueField
• Patch, configure, rebuild, update or otherwise customize your image
• Debug, profile, and tune their development system using open source development tools
taking advantage of the diverse and vibrant Arm ecosystem.
The BlueField family of SoC devices combines an array of 64-bit Armv8 A72 cores coupled
®
with the ConnectX
interconnect. Standard Linux distributions run on the Arm cores allowing common open source development tools to be used. Developers should find the programming environment familiar and intuitive which in turn allows them to quickly and efficiently
design, implement and verify their control-plane and data-plane applications.
BlueField SW ships with the Mellanox BlueField Reference Platform. Bluefield SW is a reference Linux distribution based on the Yocto Poky distribution and extended to include the
Mellanox OFED stack for Arm and a Linux kernel which supports NVMe-oF. This SW distribution is capable of running all customer-based Linux applications seamlessly. Yocto also
provides an SDK which contains an extremely flexible cross-build environment allowing
software targeted for the BlueField SoC to build on virtually any x86 server running any
Linux distribution.
The following are other software elemen ts delivered with BlueField SoC:
• Arm Trusted Firmware (ATF) for BlueField
• UEFI for BlueField
• OpenBMC for BMC (ASPEED 2500) found on development board
• Hardware Diagnostics
• Mellanox OFED stack
• Mellanox MFT
1.1 Debug Tools
BlueField SoC includes hardware support for the Arm DS5 suite as well as CoreSight™ debug & trace. As such, a wide range of commercial off-the-shel f Arm debug tools should
work seamlessly with BlueField.
The BlueField SoC also supports the ubiquitous GDB.
Page 11
11
Mellanox Technologies Confidential
Rev 1.1
Linux
Cryp todev
(TRN G + pub/
private.)
Misc Platform
Drivers
(SMBus , GPIO ,
Mgmt net)
PCIe Root Complex
NVDIMM
Driver Stack
Arm v8
NVMe Driver
ConnectX Driver
(mlx 5)
IB/RD MA Core
NVMe
-oF T arge t
iSCSI
Kernel
User
ASAP^2OFE D Ve rbsCryptodev
OVSiSER
OpenS SL
SP DKDPDK
NVMe-oF
Tar get
NVDIMM
Mgmt
NVML
Lib raries
Arm v8
UEFI
BSP
ARM Trusted Firmware (ATF)
BSP
1.2 BlueField Adapter/SmartNIC
The BlueField SmartNIC is shipped wit h the BlueField Software Distribution (BSD) pre-installed. The BlueField adapter Arm execution environment has the capability of being fully
isolated from the x86 host and uses a dedicated network management interface (separate
from the x86 host’s management interface). The Arm cores can run the Open vSwitch Database (OVSDB) or other virtual switches to create a secure solution for bare metal provisioning.
The software package also includes support for DPDK as well as applications for encryption.
1.3 BlueField-based Storage Appliance
Mellanox® BlueField™ Software provides the foundation for building a JBOF (Just a Bunch
of Flash) storage system including NVMe-oF target software, PCIe switch support,
NVDIMM-N support, and NVMe disk hot-swap support.
BlueField SW allows enabling Mellanox ConnectX
signature offload, erasure coding of fload, iSER, Storage Spaces Direct, and more.
®
offload such as RDMA/RoCE, T10 DIF
1.4 BlueField Architecture
The BlueField architecture is a combination of two preexisting standard off-the-shelf components, Arm AArch64 processors, and Mellanox ConnectX-5 network controller, each with its
own rich software ecosystem. As such, almost any of the programmer-visible software interfaces in BlueField come from existing standard interfaces for the respective components.
The Arm related interfaces (including tho se r elated to the boot process, PCIe connectivity,
and cryptographic operation acceleration) are standard Linux on Arm interfaces. These interfaces are enabled by drivers and low level code provided by Mellanox as part of the BlueField software delivered and upstreamed t o respective open source projects, such as Linux.
The ConnectX-5 network controller-related interfaces (including those for Ethernet and InfiniBand connectivity, RDMA and RoCE, and storage and network operation acceleration)
Figure 1 - Interfaces on BlueField
Page 12
12
Mellanox Technologies Confidential
Rev 1.1
are identical to the interfaces that support ConnectX-5 standalone network controller cards.
These interfaces take advantage of the Mellanox OFED software stack and InfiniBand verbsbased interfaces to support software.
Page 13
13
Mellanox Technologies Confidential
Rev 1.1
2 Installation and Initialization
Disclaimer: This section is preliminary and subject to change. Please consult the
README files in the BlueField™ Software Distribution (BSD) for the most updated
content.
The BSD consists of the following images:
• BlueField-<bluefield_version>_install-bluewhale.bfb – installation BFB file for the
version>.<yocto_version>.sh – Yocto-produced SDK in a self-installing script. This contains all cross-build tools and utilities to allow building an image targeted for the BlueField platform on an x86 server running Linux.
2.1 Unpacking BlueField Software Distribution
To unpack the BSD, run:
$ tar xvf BlueField-<bluefield_version>.tar.xz
This unpacks to a BlueField-<bluefield_version>/ subdirectory containing the following toplevel hierarchy:
• bin – contains tools to manage the process of installing runtime software. For example,
the mlx-mkbfb tool to generate the BlueField boot stream files used to provide initial boot
images.
• boot – contains the boot loader binaries built from the provided sources for each of Mel-
®
lanox
’s BlueField devices. The “bl*” files in a device’s dedicated folder are taken from
the Arm Trusted Firmware (ATF) and each represents a different boot phase; note that the
file bl1.bin corresponds to the boot ROM burned into the SoC itself. The *.fd file is the
Unified Extensible Firmware Interface (UEFI) boot image. The *.bfb file is a generated
BlueField boot stream file which includes all the above boot loader components.
• distro – contains information pertinent to different Linux distributions. For example, the
distro/yocto directory contains the “meta-bluefield” layer used to build a BlueField-targeted version of the standard Yocto/Poky meta-distribution.
Page 14
14
Mellanox Technologies Confidential
Rev 1.1
BL1
R es e t
BL2
BL31BL33
ACPI Table
Name
Boot Image
Path
Kernel
Arguments
• sample – contains sample images which can be used to boot up a BlueField
™
chip to a
Linux bash prompt, to either validate that hardware is working correctly, or for experimentation. See the README and README.install files in that directory for more information.
• src – contains patches for various components (e.g. ATF, UEFI, and Linux), as well as
complete sources for Linux drivers and user-space components authored by Mellanox
®
and not yet upstreamed.
The BSD contains numerous README files in t he aforementioned directories which provide more information. These README files must be consulted particularly when upgrading your BSD release as they contain impor tant release-specific information.
The following README files in particular are important to consult for possible release specific information:
• sample/README.install
• sample/README
• distro/yocto/README-bluefield
• src/atf/README
• src/atf/README-bfb
• distro/rhel/pxeboot/README
2.2 Upgrading Boot Software
This section describes how to use the BlueFi eld alternate boot partition support feature to
safely upgrade the boot software. We give the requirements that motivate the feature and explain the software interfaces that are used to configure it.
2.2.1 BFB File Overview
Figure 2 - BlueField Bootstream
Page 15
15
Mellanox Technologies Confidential
Rev 1.1
The default BlueField bootstream (BFB) shown above is a standard boot BFB that is stored
on the embedded Multi-Media Card (eMMC) as can be seen by the boot path that points to a
GUID partition (GPT) on the eMMC device. That path is a normal UEFI boot path and it
will be stored in the UPVS (UEFI Persistent Var iable Store) EEPROM as a side effect of
booting with this BFB. That is, if you use the mlxbf-bootctl utility to write this BFB to the
eMMC boot partition, the SoC chip will read it via the boot FIFO on the RShim device by
default on the next reboot.
™
BFB files can be useful for many things such as in stalling new software on a BlueField
SoC. For example, the installation BFB for BlueField platforms normally contains an initramfs file in the BFB chain. Using the initramfs (and Linux kernel Image also found in the
BFB) you can do things like set the boot partition on the eMMC using mlx-bootctl or flash
new HCA firmware using MFT utilities. You can also install a full root file system on the
eMMC while running out of the initramfs.
The types of files possible in a BFB are listed below.
Before explaining the implementation of the solution, the BlueField boot process needs to be
expanded upon.
Figure 3 - Basic BlueField Boot Flow
™
The BlueField
boot flow is comprised of 4 main phases:
• Hardware loads Arm Trusted Firmware (ATF)
• ATF loads UEFI—together ATF and UEFI make up the booter software
• UEFI loads the operating system, such as the Linux kernel
• The operating system loads applications and user data
When booting from eMMC, these stages make use o f two different types of storage within
the eMMC part:
• ATF and UEFI are loaded from a special area known as an eMMC boot partition. Data
from a boot partition is automatically strea med from the eMMC device to the eMMC
controller under hardware control during the initial boot-up. Each eMMC device has two
boot partitions, and the partition which is used to stream the boot data is chosen by a nonvolatile configuration register in the eMMC.
• The operating system, applications, and user data co me from the remainder of the chip,
known as the user area. This area is accessed via block-size reads and writes, done by a
device driver or similar software routin e.
2.2.3 The mlxbf-bootctl Program
Access to all the boot partition management is done via a program packaged with the BlueField software called “bootctl”. The binary is shipped as part of the Yocto image (under
/sbin) and the sources are shipped in the “src” directory in the BlueField Runtime Distribution. A simple “make” command builds the utility.
• --device – use a device other than the default /dev /mmcblk0
• --bootstream – write the specified bootstream to the alternate partition of the device. This
queries the base device (e.g. /dev/mmcblk0) for the alternate partition, and uses that information to open the appropriate boot partition device (e.g. /dev/mmcblk0boot0).
• --overwrite-current (used with “--bootstream”) – overwrite the current boot partition in-
stead of the alternate one. (Not recommended!)
• --output (used with “--bootstream”) – specify a file to which to write the boot partition
data (creating it if necessary), rather than using an existing master device and deriving the
boot partition device.
• --watchdog-swap – arrange to start the Arm watchdog timer with a countdown of the
specified number of seconds until it triggers; also, set the boot software so that it swaps
the primary and alternate partitions at the next reset.
• --nowatchdog-swap – ensure that after the next reset, no watchdog is started, and no
swapping of boot partitions occurs.
2.2.4 Upgrading the Bootloader
In most deployments, the Arm cores of BlueField™ are expected to obtain their software
stack from an on-board eMMC device. Even in environments where the final OS kernel is
not kept on eMMC—for instance, systems which boot over a network—the initial booter
code still comes from the eMMC.
Most software stacks need to be modified or upgraded in their lifetime. Ideally, the user is
able to install the new software version on their BlueField system, test it, and then fall back
to a previous version if the new one does not work. In some environments, it is important
that this fallback operation happen aut omati cally since there may be no physical access to the
system. In others, there may be an external agent, such as a service processor, which could
manage the process.
In order to satisfy the requests listed above, the following must be performed:
1. Provision two software partitions on the eMMC, 0 and 1. At any given time, one area
must be designated the primary partition, and the other the backup partition. The primary
partition is the one booted on the next reboot or reset.
2. Allow software running on the Arm cores to declare that the primary partit ion is now the
backup partition, and vice versa. (For the remainder of this section, this operation is referred to as “swapping the partitions” even though only the pointer is modified, and the
data on the partitions does not move.)
3. Allow an external agen t, such as a service processor, to swap the primary and backup par-
titions.
4. Allow software running on the Arm cores to reboot the system, while activating an up-
grade watchdog timer. If the upgrade watchdog expires (due to the new image being broken, invalid, or corrupt), the system automatically reboots after swapping the primary and
backup partitions.
Page 18
18
Mellanox Technologies Confidential
Rev 1.1
2.2.5 Updating the Boot Partition
To update the boot partition on the Arm cores, let us a ssume to have a new bootstream file
called “bootstream.new” which we would like to install and validate. To update to the bootstream, run:
This writes the new bootstream to the alternate boot partition, swaps alternate and primary so
that the new bootstream is used on the next reboot, and then reboots to use it. (You may also
use “--overwrite-current” instead of “--swap”, which just overwrites the current boot partition. But this is not recommended as there is no easy way to recover if the new booter code
does not bring the system up.)
2.2.6 Safely Updating with a BMC
The Arm cores notify the BMC prior to the reboot that an upgrade is about to happen. Software running on the BMC can then be implemented to watch the Arm cores after reboot. If
after some time the BMC does not detect the Arm cores come up properly, it can use its USB
debug connection to the Arm cores to properly reset the Arm cores. It first sets a suitable
mode bit that the Arm booter responds to by switching the primary and alternating boot partitions as part of resetting into its original state.
2.2.7 Safely Updating Boot Software from the Arm Cores
Without a BMC, the Arm watchdog may be used to achieve similar results. If something
goes wrong on the next reboot and the system does not come up properly, it will reboot and
return to the original configuration. In this case, the user may run:
With these commands, the user reboots the system, and, if it hangs for 60 seconds or more,
the watchdog fires and resets the chip, t he booter swaps the partitions back again to the way
they were before, and the system reboots back with the original boot partition data. Similarly, if the system comes up but panics and r eset s, the booter will again swap the boot partition back to the way it was before.
The user must ensure that Linux after the reboot is configured to boot up with the
“sbsa_gwdt” driver enabled. This is the Server Base System Architecture (SBSA) Generic
WatchDog Timer. As soon as the driver is loaded, it begins refreshing the watchdog and preventing it from firing, which allows the system to finish booting up safely. In the example
above, 60 seconds are allowed from system reset until the Linux watchdog kernel d r iver is
loaded. At that point, the user’s application may open /dev/watchdog explicitly, and the application would then become responsible for refreshing the watchdog frequently enough to
keep the system from rebooting.
For documentation on the Linux watchdog subsystem, see the Linux watchdog documentation (e.g. https://www.kernel.org/doc/Documentation/watchdog/watchdog-api.txt
To disable the watchdog completely, for example, run:
# echo V > /dev/watchdog
The user may select to incorporate other featur es of the Arm generic watchdog into their application code using the programming API as well.
).
Page 19
19
Mellanox Technologies Confidential
Rev 1.1
Once the system has booted up, in addition to disabling or reconfiguring the watchdog itself
if the user desires, they must also clear the “swap on next reset” functionality from the booter
by running:
# mlxbf-bootctl --nowatchdog-swap
Otherwise, next time the system is reset (via reboot, external reset, etc.) it assumes a failure
or watchdog reset occurred and swaps the eMMC boot partition automatically.
The aforementioned steps can be done manually, or can be done automatically by software
running in the newly-booted system.
2.2.8 Changing the Linux Kernel or Root File System
The solutions above simply update the boot partition to hold new boot loader software (ATF
and UEFI). If the user wants to also provide a new kernel image and/or modify the root file
system, the user should partition their eMMC into multiple partitions appropriately.
For example, the user may have a single FAT partition from which UEFI can read the kernel
image file, but the new bootstream contains a UEFI bootpath pointing to an updated kernel
image. Similarly, the user may have two Linux partitions, and their upgrade procedure would
write a new filesystem into the “idle” Linux partition, then reboot with the bootstream holding kernel boot arguments which direct it to boot from the previously idle partition.
The details on how exactly to do this depend on the specifics of how and what needs to be
upgraded for the specific application, but in principle any component of the system can be
safely upgraded using this type of approach.
For more information, please refer to EDK2 user documentation on Github at:
NOTE: While descriptions of Arm Trusted Firmware (ATF) are provided related to the
BlueField™ platform; for general knowledge of what ATF is and how it works, please
refer to ATF documents from Arm. The Arm Trusted Firmware User Guide located at
“docs/user-guide.rst” in the ATF sources is a good place to start.
ATF is used in Armv8 systems for booting the chip and then providing secure interfaces. It
implements various Arm interface standar ds like PSCI (Power State Coordination Interface),
SMC (Secure Monitor Call) and TBBR (Trusted Board Boot Requirements). ATF is used as
the primary bootloader to load UEFI (Unified Extensible Firmware Interface) on the
BlueField platform.
Page 20
20
Mellanox Technologies Confidential
Rev 1.1
AR M A72 Cores
Bo ot ROM
Bo ot FIFO
Secure Boot Registers
BOO T SRAM
boot parti tion
0
boot parti tion
1
eMMC
RSHIM
• Lif ecy cle Fu ses
• Ke y Stor age
• MDK
BlueField SoC
EEP ROM (UPVS
)
SPI Fla sh
Figure 4 - BlueField High Level Hardware View
ATF has various bootloader stages when loading:
• BL1 – BL1 is stored in the on-chip boot ROM; it is executed when the primary core is
reset. Its main functionality is to do some initial architectural and platform initialization
to the point where it can load the BL2 image, then it loads BL2 and switches execution to
it.
• BL2 – BL2 is loaded and then executed on the on-chip boot SRAM. Its main functional-
ity is to perform the rest of the low-level architectural and platform initialization (e.g. initializing DRAM, setting up the System Address Mapping and calculating the Physical
Memory Regions). It then loads the rest of the boot images (BL31, BL33). After loading
the images, it traps itself back to BL1 via an SMC, which in turn switches execution to
BL31.
• BL31 – BL31 is known as the EL3 Runtime Software. It is loaded to the boot RAM. Its
main functionality is to provide low-level runtime service support. After it finishes all its
runtime software initialization, it passes control to BL33.
• BL33 – BL33 is known as the Non-trusted Firmware. For this case we are using EDK2
(Tianocore) UEFI. It is in charge of loading and passing control to the OS. For more detail on this, please see the EDK2 source.
NOTE: Some users may wish to use the GRUB2 bootloader for various reasons. In that
case, UEFI would be configured to load GRUB2 instead of the Linux kernel.
2.3.1 Building ATF Images
To get the source code, directly execute the atf-56036e.patch file found in the directory
/src/atf. It downloads the ATF sources from GitHub and patches it with BlueField
specific code.
™
platform
Page 21
21
Mellanox Technologies Confidential
Rev 1.1
Since BL1 is permanently burned into the BlueField on-chip boot ROM, the only real boot
loader images which might need to be built are BL2 and BL31 (refer to the EDK2 documentation of how to build EDK2 to use as BL33). Thus we are building the “bl2” and “bl31” targets.
Before doing any build, the environment variable CROSS_COMPILE should point to the
Arm cross-compiler which is being used. For example:
To build for the BlueField platform, we need PLAT set to “bluefield” when invoking
“make”. You also need to set the TARGET_SYSTEM according to the specific system for
which you are building (e.g. “bluewhale” if building for the BlueWhale reference platform).
Every supported system has its own subdirectory under $SOURCE/plat/mellanox/bluefield/system/. If you are using your own system, you can use the “generic” platform as a
starting point, create your own system’s subdirectory under the system directory, copy the
files over from the generic subdirectory, and modify them to suit your particular machine.
You can also pass the BUILD_BASE variable to specify where you want the files to be built.
So to perform a basic build:
NOTE: If ATF is being built in an environment where the Yocto/Poky SDK script has
been run (environment-setup-aarch64-poky-linux), th e user needs to set the LDFLAGS
to NULL (export LDFLAGS="").
After the build finishes the needed bl2.bin and bl31.bin may be found under
$BUILD_BASE/bluefield/<target_system>/release/.
2.3.2 Trusted Board Boot
The other two files in the directory (mbedtls-2.2.1.patch and gen_fuse_info.py) are related to
building ATF with trusted board boot enabled.
For more information of how to perform trusted board boot, please refer to the Secure Boot
document.
2.4 Building UEFI (EDK2)
After running the “edk2-*.patch” command in the directory “\src\edk2\” to set up a source
tree for UEFI, cd into it and run “make -f /path/to/this/file”.
Customizations you may need or want to make are expanded on further below.
Note that EDK2 requires building in the source tree. Also, the EDK2 build system fails with
parallel build, so you must build with -j1.
The image built is BLUEFIELD_EFI.fd and/or BLUEFIELD_EFI_SEC.fd in the
Build/BlueField/RELEASE_GCC49/FV directory.
Page 22
22
Mellanox Technologies Confidential
Rev 1.1
2.4.1 Customizable Build Options
The following are the customizable UEFI build options:
• The mode in which to build EDK2: DEBUG or RELEASE.
EDK2_MODE = RELEASE
• Any particular “defines” to use when building EDK2.
EDK2_DEFINES = \
-DSECURE_BOOT_ENABLE=TRUE \
-DFIRMWARE_VER=0.99 \
• Path to OpenSSL tarball: Set it to an already-downloaded locati on for the tarball, or else
this makefile will download it into the source tree.
• Make sure the ARCH environment variable is NULL (un set ).
DTC_PREFIX =
• Device tree source files:
DTS_FILES = bf-full.dts
DTS_DIR = ../dts
It is important to note that when you actually build EDK2 here make sure you are NOT in an
environment/bash shell where environment-setup-aarch64-poky-linux was run. Simply point
GCC49_AARCH64_PREFIX to:
Customize your local.conf file. For example, change MACHINE to “bluefield”, include
bluefield.conf, and set MLNX_OFED_PAT H.
MACHINE ??= "bluefield"
include conf/bluefield.conf
MLNX_OFED_PATH="<path>/distro/mlnx_ofed"
Then run:
cd ..
bitbake core-image-initramfs
Note that the file system you just created is lo cated in “poky/build/tmp/deploy/images/
bluefield” while your kernel image will be in “tmp/deploy/images/bluefield”.
Common BlueField
• bitbake core-image-initramfs
• bitbake core-image-full
™
bitbake targets are:
• bitbake core-image-full-sdk -c populate_sdk
You can build Yocto/Poky on most major Linux distributions. Mellanox
using CentOS 7.4, however, other distributions such as Ubuntu would also work but may require small modifications to the Yocto config files and/or recipes.
If you are not using CentOS and having difficulties, you may want to try running CentOS in
a container or on a VM first in order to get a successful build with which to compare results.
For more information, please refer to the following URL:
Certain OFED recipes require that the source RPM or tarball already be downloaded to the
build systems. These files are included in the BlueField Runtime Distribution. You should
place these files anywhere you like and then set the appropriate variable in local.conf. For
example, set:
MLNX_OFED_PATH=<your_local_path>/distro/mlnx_ofed
There are various variables that can be set in local.conf which add files created outside of
Yocto to be copied into the root file systems:
• MLNX_OFED_PATH – location of local OFED pack ages. Look in distro/mlnx_ofed for
specific directories.
®
currently runs tests
• MLNX_OFED_VERSION – version of MLNX OFED (e.g. “4.2-1.4.13.”). This is used
with MLNX_OFED_PATH to find files.
• MLNX_OFED_BASE_OS – base OS version of MLNX OFED (e.g. “rhel7.3”). This is
used with MLNX_OFED_PATH to find files.
• MLNX_BLUEFIELD_VERSION_PATH – if there is a “bluefield_version” file in this
location it gets copied to /etc in the root file systems created by Yocto. See the
“update_rootfs_bluefield” function in the meta-bluefield image recipes.
• MLNX_BLUEFIELD_FW_PATH – if this directory exists, the image recipes in meta-
bluefield copy the contents of this directory into /lib/firmware/mellanox on the generated
root file systems
Page 25
25
Mellanox Technologies Confidential
Rev 1.1
• MLNX_BLUEFIELD_BFB_PATH – if there are any bfb files (*.bfb) located at this lo ca-
tion, they are copied into the root file system (/lib/firmware/Mellanox). See the
“update_rootfs_bluefield” function in the meta-bluefield image recipes.
• MLNX_BLUEFIELD_EXTRA_DEV_PATH – any files in the directory specified by thi s
variable are copied into /opt/mlnx/extra on the full root dev file system.
2.5.3 Downloading Upstream Yocto and Building SDK
The meta-bluefield layer supports the Mellanox BlueField SoC.
To use it, edit your conf/bblayers.conf file to include this directory on the list of directories
in BBLAYERS. Similarly, you should also add the layers meta-oe, meta-python, and metanetworking from meta-openembedded, since packages in those layers are used in some of the
images included in meta-bluefield/recipes-bsp/images.
You should edit your conf/local.conf file to set MACHINE to “bluefield”. To be able to
build the same distro configurations used in the Mellanox
®
images (including using the same
kernel version shipped by Mellanox), you should also add:
include conf/bluefield.conf
NOTE: Mellanox is using Yocto Rocko 2.4 for this release.
2.6 Using Yocto as a Cross-compilation SDK and Root Filesystem Genera-
tor
You may download the Yocto/Poky SDK file from the same source from which you acquired the BlueField
Unpacking this file into an SDK directory allows cross-compiling files which are going to
run on the BlueField SoC. This directory may be located anywhere you want.
Alternatively, you may download the upstream Yocto and build your own SDK; for more
information, see “2.5.3 Downloading Upstream Yocto and Building SDK
™
Runtime Distribution. Typically:
”.
To use the SDK cross-compilation tools, you should “source” the top-level “environmentsetup-aarch64-poky-linux” script to set various environment variables, including $PATH,
$CC, $CROSS_COMPILE, etc. The cross-compilation tools (compiler, assembler, linker,
etc.) are located in sysroots/x86_64-pokysdk-linux/usr/bin/aarch64-poky-linux; m a ny othe r
useful tools are in the directories usr/bin, usr/sbin, bin, and sbin beneath sysroots/x86_64pokysdk-linux. The sysroots/aarch64-poky-linux hierarchy contains a copy of a root filesystem for Arm64 so the cross-compilation tools can find headers and libraries in it.
To compile your code you should use aarch64-poky-linux-gcc, and, if necessary, the other
standard aarch64-poky-linux- tools. In general, you should take advantage of the various environment variables in your makefiles rather than relying on any specific name for the tools.
Several of the tools (notably gcc) require a “--sysroot” argument which specifies the
aarch64-poky-linux path—the $(CC) variable handles this for you.
Page 26
26
Mellanox Technologies Confidential
Rev 1.1
Note also that for “configure” based software, the top-level environment setup script also
sets a $CONFIG_SITE environment variable pointing to the top-level site-config-aarch64poky-linux file which includes autoconf definitions for all the known configure variables, to
simplify cross-configuration.
2.7 RShim Host Driver
2.7.1 Building and Installing RShim Host Driver
In order to build and install the RShim host driver, run:
make -C /lib/modules/`uname -r`/build M=$PWD
make -C /lib/modules/`uname -r`/build M=$PWD modules_install
The following kernel modules are installed:
• Common modules:
• rshim.ko – RShim common code including console support
• rshim_net.ko – RShim network driver
• Different backends:
• rshim_usb.ko – RShim USB backend
• rshim_pcie.ko – RShim PCIe backend with firmware burnt
• rshim_pcie_lf.ko – RShim PCIe backend in livefish mode
2.7.2 Loading Modules
Usually rshim.ko and rshim_usb.ko (or rshim_pcie.ko) are loaded automatically after reboot.
If not, run “modprobe rshim” and “modprobe rshim_<usb | pcie | pcie_lf>.ko” to load it
manually. The module rshim_net.ko creat es an RShim network interface and can then be
loaded on demand.
NOTE: Loading multiple backends for the same board is not recommended
as it could cause potential data corruption when both backends read/write
simultaneously.
2.7.3 Device Files
Each RShim backend creates a directory named according to the format /dev/rshim<N>/
with the following files (<N> is the device ID, which could be 0, 1, etc):
• /dev/rshim<N>/boot
Boot device file used to send boot stream to the Arm side. For example:
cat install-bluewhale.bfb > /dev/rshim<N>/boot
• /dev/rshim<N>/console
Console device, which can be used by console tools to connect to the Arm side. For example:
screen /dev/rshim<N>/console
Page 27
27
Mellanox Technologies Confidential
Rev 1.1
• /dev/rshim<N>/rshim
Device file used to access RShim register space. When reading/writing to this file, encode the offset as “((rshim_channel << 16 ) | register_offset)”.
• /dev/rshim<N>/misc:
Key/value pairs used to read/write miscellaneous data. For example:
# Dump the content.
cat /dev/rshim<N>/misc
BOOT_MODE 1
SW_RESET 0
# Initiate a SW reset.
echo "SW_RESET 1" > /dev/rshim<N>/misc
2.7.4 FAQ – What if USB and PCIe Access are Enabled?
In this case both rshim_usb.ko and rshim_pcie.ko are loaded automatically which causes
conflict when they write to RShim simultaneously. One solution is to create a configuration
file to pass “rshim_disable=1” to the specified kernel module.
The following is an example to disable RShim access v ia USB:
Multiple boards could connect to the same ho st machine. Each board has its own device directory (/dev/rshim<N>). The following are some guidelines how to set up RShim networking properly in this case:
• Each target should load only one backend (usb, pcie or pcie_lf)
• The host RShim network interface should have d ifferent MAC and IP addresses, which
can be configured with ifconfig as shown below or saved in configuration:
The “printf” command sets the MAC address to 00:1a:ca: f f:ff:03 (the last six bytes of the
printf value). Either reboot the device or reload the tmfifo driver for the change to take effect.
2.8 OpenOCD on BlueField
To run OpenOCD (On-chip debugger) for BlueField:
1. Load host-side RShim drivers (assuming they have already been installed). Run:
$ sudo modprobe rshim_usb
NOTE: This is a USB use case, for PCI, a different driver must be used.
Find the RShim device—it is usually located at /dev/rshim0/rshim.
Set the environment variable to be used by OpenOCD. Run:
$ export RSHIM_DEST=/dev/rshim0/rshim
2. Run OpenOCD:
$ sudo <install>/bin/mlx-openocd
Once started, OpenOCD runs a gdb-server in the background to accept commands from a
GDB client.
To start the GDB client:
1. Set up the cross-compiler toolchain environment. For example:
$ aarch64-poky-linux-gdb [optional_elf_image]
(gdb) target remote :3333 # Or <IP>:3333 if running from different machine
(gdb) bt
(gdb) <...normal gdb commands...>
Page 29
29
Mellanox Technologies Confidential
Rev 1.1
3 Programming
This chapter is meant for application developers and expert users who wish to develop applications over BlueField
™
SW.
The sample directory contains sample Linux and initramfs content which can be used to validate that the user’s hardware can boot up to the shell prompt. Typically the user would use
their distribution’s kernel and userspace filesystem contents i nstead (e.g. Yocto, RedHat, or
Ubuntu).
The build-images script takes the sample kernel file (“Image”) and the sample “initramfs”
file and unpacks them into partition images and disk images which are typical of what might
be burned into the eMMC device used to boot up. We create a disk image that is partitioned
to have an initial boot filesystem and an additional root filesystem. The boot filesystem holds
the image file and the initramfs; the root filesystem holds an unpacked version of the
initramfs.
The build-bfb script then allows the user to utilize these images to creat e several different
BlueField boot stream files which can boot the BlueField system via USB or PCI from a host
system, or can be copied to BlueField’s eMMC boot partition.
You can use the script to create boot stream files with the following properties:
build-bfb -i rshim
Provide the entire boot environment (ATF, UEFI, kernel, initramfs) in a single boot stream
file.
build-bfb mmc0
build-bfb nvme0
Load the kernel from the boot partition, then boot using the root partition. The initramfs file
on the boot partition is not used. The two variants are examples of how to configure the partition names for different devices.
build-bfb -i mmc0
build-bfb -i nvme0
Load the kernel and the initramfs from both the boot partition, and boot the kernel using the
initramfs. The root partition is not used.
build-bfb --no-gpt --root /dev/nvme0n1p1 mmc0
Load the kernel from the the eMMC (configured as just a large boot partition without GPT),
then boot using an NVMe root partition. The “initramfs” file is not used.
Page 30
30
Mellanox Technologies Confidential
Rev 1.1
4 UEFI Boot Option Management
The UEFI firmware provides boot management function that can be configured by modifying architecturally defined global var i ables which are stored in the UPVS EEPROM. The
boot manager will attempt to load and boot the OS in an order defined by the persistent variables.
The UEFI boot manager can be configured; boot entries may be added or removed from the
boot menu. The UEFI firmware can also effectively generate entries in this boot menu, according to the available network interfaces and possibly the disks attached to the system.
4.1 Boot Option
The boot option is a unique identifier for a UEFI boot entry. This identifier is assigned when
the boot entry is created, and it does not change. It also represents the boot option in several
lists, including the BootOrder array, and it is the name of the directory on disk in which the
system stores data related to the boot entry, including backup copies of the boot entry. A
UEFI boot entry ID has the format “Bootxxxx” where xxxx is a hexadecimal number that
reflects the order in which the boot entr ies are created.
Besides the boot entry ID, the UEFI boot en try has the following fields:
- Description
(e.g: Yocto, CentOS, Linux from rshim)
To display the boot option already installed in the BlueField system, reboot and go to the
UEFI menu screen. To get to the UEFI menu, just hit any key when the screen rolls up after
printing the UEFI firmware version.
UEFI firmware (version 0.99-e2bbe24 built at 18:38:55 on Apr 5 2018)
Boot options are listed as soon as you select t he “Boot Manager” entry.
NOTE: Boot arguments are printed in Hex mode, but you may recognize the boot
parameters printed on the side in ASCII format.
4.3 Creating, Deleting, and Modifying UEFI Boot Option
The file system supported by EFI is based on the FAT file system. An “EFI system partition”
(or ESP) is any partition formatted with one of the UEFI spec-defined variants of FAT and
given a specific GPT partition type to help the firmware read it.
Usually, The ESP is located in “FS0:”. To create a new boot entry, run:
The parameter “add” is used to add an option. The “option#” is the option number to add in
hexadecimal. The “file-path” is the path of the UEFI binary for the option. The quoted parameter is the description of the option being added.
For example, to create a boot entry to boot “Yocto Poky” as a default option, run:
Shell> bcfg boot add 0 FS0:\Image "Yocto Poky"
Where “Image” is the actual kernel image to boot from the MMC partition.
To create a boot entry for CentOS, assuming the distro is already installed and the ESP for-
The boot entry here is installed as a third boot option (option number starts from 0).
“shim.efi” is a trivial EFI application that, when run, attempts to open and execute another
application (e.g. GRUB bootloader).
To add booting parameters to the boot options, you need to create a file, and then append it
to the boot option:
Shell> edit FS0:\options.txt
Add a single line boot arguments to the file in “FS0:\options.txt”, save the file (UCS-2). Finally append the arguments:
Shell> bcfg boot -opt 0 FS0:\options.txt
Boot arguments here are appended to boot option #0. Do not run this command several
times. You have to remove and re-add the entry before you can change the parameters.
To modify the boot option order, for example, to move boot option #2 to boot option #0,
simply run:
Shell> bcfg boot mv 2 0
Page 33
33
Mellanox Technologies Confidential
Rev 1.1
The first numeric parameter is the option to move. The second numeric parameter is the new
option number.
Finally, to remove a boot option , you may run:
Shell> bcfg boot rm 0
The numeric parameter refers to the option number to remove.
Page 34
34
Mellanox Technologies Confidential
Rev 1.1
5 Installing Popular Linux Distributions on BlueField
5.1 Installing CentOS 7.x Distribution
This section provides instructions on how to install the Arm-based CentOS 7.x on a BlueField system.
5.1.1 Requirements
• Host machine running CentOS 7.x
NOTE: CentOS 6.2+ needs slight modification in the “setup.sh” script to set up the
tftp/dhcpd services.
• BlueField prebuilt packages installed under the directory BF_INST_DIR. If they are not
installed yet, get the tarball file BlueField-1.0.xxxxxx.yyyyy.tar.xz and run:
# tar Jxvf BlueField-1.0.xxxxxx.yyyyy.tar.xz -C <PATH>.
Then the “BF_INST_DIR” could be found under the directory “<PATH>/BlueField-
1.0.xxxxxx.yyyyy”.
5.1.2 Host Machine Setup
1. Download the centos installation ISO file from the following URL:
If ConnectX interfaces are expected during the installation (rather than installing OFED
later), download the “mlnx-ofed” file from the Mellanox web at
7.4 x86_64 dd-rhel7.4-mlnx-ofed-xxx.iso.gz. Download the file and decompress it.
(This step is needed if you are performing PXE boot over the ConnectX interface.)
NOTE: UART1 (ttyAMA1) is used by default. To specify a different console use “-c
xxx”. SmartNIC uses UART0, so it takes “-c ttyAMA0”.
NOTE: The option “-k” enables automatic installation according to the kickstart file
ks.cfg.
Page 35
35
Mellanox Technologies Confidential
Rev 1.1
NOTE: Once the option “-t” is specified, a nonpxe.bfb is genera t ed which can be used
to boot the device via the RShim interface. It starts CentOS installation by skipping the
UEFI PXE process which should be fa ster.
5.1.3 Basic Yocto Installation
1. Connect the UART console.
Find the device file and connect to it using minicom or screen. For example:
# screen /dev/ttyUSB0 115200
Use “yum install screen” or “yum install minicom” to install minicom/screen if not
found.
For minicom, set:
• Bps/Par/Bits – 115200 8N1
• Hardware Flow Control – No
• Software Flow Control – No
2. Power cycle the board.
3. Select an image according to the board type and push it from the host machine via the
4. Log into Linux from the UART console (root with no password). Run the following script
to flash the default image and wait until it is d one.
# /opt/mlnx/scripts/bfinst --minifs
This step is needed to update the boot partition images.
5.1.4 PXE Boot
1. Reboot the board. Once the “UEFI firmware ...” message appears on the UART console,
press the “Esc” key several times to enter the UEFI boot menu.
2. Restart the dhcpd/tftp-server services. Run:
systemctl restart dhcpd; systemctl restart xinetd
3. Select the “Boot Manager” in the UART console and press Enter. Then select “EFI Net-
work” in the Boot Manager and press Enter to start the PXE boot.
4. Check the Rx/Tx statistics on host side. Run:
ifconfig tmfifo_net0
Page 36
36
Mellanox Technologies Confidential
Rev 1.1
5. After some time, a list of OS appears. Select “Install centos/7.4 AArch64 – BlueField”
and press Enter to start the CentOS installation.
NOTE: It takes time to fetch the Linux kernel image and initrd. So please be patient and
check the Rx/Tx packet counters. The installation starts when the counters reach ~50K.
5.1.5 CentOS Installation
6. Follow the installation wizard.
5.1.6 Post-installation
1. Enable “yum install” or external net work access after CentOS installation.
a. On the host side:
# modprobe rshim_net
# systemctl restart dhcpd
# echo 1 > /proc/sys/net/ipv4/ip_forward
# iptables -t nat -A POSTROUTING -o <out_intf> -j MASQUERADE
'<out_intf>' is the outgoing network interface to the network.
b. On the BlueField™ device side:
# ifdown eth0; ifup eth0
2. Install driver RPMs from source.
• If “rpmbuild” is not available, run:
# yum install rpm-build
• If development tools are not available, run:
# yum group install "Development Tools"
• If kernel-devel is not installed, run:
# yum install kernel-devel-`uname -r`
All driver source RPMs are located at <BF_INST_DIR>/distr o/SRPMS. Upload them to
the target (e.g. under /opt).
Below is an example which exhibits how to install i2c-mlnx-1.0-0.g6af3317.src.rpm.
NOTE: If MLNX_OFED_LINUX is installed with “--add-kernel-support”, run “dracut
-f” to update drivers in initramfs after MLNX_OFED_LINUX installation.
5.1.7 Building a New bluefield_dd ISO Image
To build an updated driver disk for a different version of RHEL, you may use the sources
located in the “bluefield_dd” directory. Run the build-dd.sh script there and provide it as an
argument the path to the kernel-devel RPM, and it then builds a matching driver disk.
Typically, you would do this as a cross-build when initially booting up a BlueField
so the cross-build environment must be configured prior to running the script.
™
system,
Note that for the Yocto SDK, you must “unset LDFLAGS” before running the script, since
the kernel build uses raw LDFLAGS via “ld”, rather than via “gcc” as the Yocto SDK assumes.
This generates a suitable .iso file in the current directory.
Page 38
38
Mellanox Technologies Confidential
Rev 1.1
5.1.8 PXE Boot Flow
UEFI allows booting over PXE in the same way that is familiar with other operating system
installations.
If the BlueField eMMC is already provisioned with a bootstream containing ATF and UEFI,
it should power on and boot up with that bootstream. If the eMMC also contains a bootable
kernel, you would need to interrupt the boot by hitting “Esc” quickly once UEFI starts up.
This takes you to the UEFI main menu on the serial console. If the eMMC is provisioned
with a bootstream but does not contain a bootable kernel, you would enter the UEFI main
menu on the serial console automatically at p ower on.
If the eMMC is not yet provisioned with a bootstream, you would need to boot it externally
using USB or PCIe. In that case, providing a bootstream containing only ATF and UEFI
would boot the chip and you automatically enter the UEFI main menu on the serial console.
From this point, you may navigate to the network boot option and select the primary ConnectX network interface; or select the RShim network interface, which would bridge to the
external host over USB or PCIe, and use its network interface instead.
At this stage, you should be able to boot the CentOS installation media from a PXE server in
the normal manner, loading Grub and then selecting your kernel of choice from the media.
5.1.9 Non-PXE Boot Flow
It is possible to explicitly boot the PXE boot components of CentOS directly over USB or
PCIe to avoid the requirement of a PXE server bei ng available on the network. This is somewhat equivalent to booting a local bootable CD and then using another media source to find
all the packages to install.
The root of the CentOS install image, for example $ROOT, corresponds to the root of the
ISO image file; that is, the directory which contains EFI, EULA, GPL, LiveOS, Packages,
etc. You should create a bootstream which includes the pxeboot kernel and initramfs and use
that to boot the image. This is assuming the initrd.img file in this directory has already been
updated to include bluefield_dd.iso, as described in the previous section.
You must also determine from where to load the ISO image for the installation. In this example, that destination is http://1.2.3.4/rhel.
After the kernel image is uncompressed, it must be placed, along with the initrd.img, in a
BlueField
Distribution, make sure that the “bin” directory is on your $PATH, and run:
bootstream. To do so, “cd” to the “samples” directory of the BlueField Runtime
This bootstream can then be used to boot the BlueField in the normal way. It comes up in
text mode on the console; you may also select to run the installer from a VNC client.
Page 39
39
Mellanox Technologies Confidential
Rev 1.1
5.1.10 Installation Troubleshooting and FAQ
5.1.10.1 How to reset the board or NIC via the RShim interface from host side
Run the following:
# echo "SW_RESET 1" > /dev/rshim<N>/misc
Make sure to replace “rshim<N>” with the act ual name (e.g. rshim0).
5.1.10.2 How to upgrade existing CentOS to some BlueField release without reinstallation
1. Follow Step 3 of “5.1.3 Basic Yocto Installation
” to push the installation image via
RShim (USB or PCIe).
NOTE: Do not run the bfinst script, or else a new installation will start.
2. Run “/opt/mlnx/scripts/bfrec” to upgrade the boot partitions. This step upgrades the ATF
& UEFI images to this release.
3. Reboot into CentOS. Follow Step 2 of “5.1.6 Post-installation
” to install/upgrade the
tmfifo driver and other drivers as needed from the source RPM.
5.1.10.3 To re-run the “setup.sh” script after host reboot before installing CentOS?
Either re-run the script, or mount /var/pxe/centos7 to the CentOS ISO file.
5.1.10.4 How to change the MAC address of the tmfifo network interface (Arm side)
See section “2.7.62.7.6 Permanently Changing the MAC Address of the Arm Side
”.
5.1.10.5 Why CentOS (Arm side) did not get DHCP address on tmfifo interface (eth0) after re-boot
The host-side DHCP daemon must be restarted after board reboot in order to provide DHCP
service. A configuration file could accomplish t his automatically.
# Create /sbin/ifup-local or add to it.
INTF=$1
if [ "$INTF" = "tmfifo_net0" ]; then
systemctl restart dhcpd
fi
5.1.10.6 How to kickstart auto-installation
Run setup.sh with the “-k” option. The default kickstart file is installed as /var/pxe/ks/ks.cfg.
It should have all packages needed for OFED. Add more packages if needed.
5.2 Running RedHat on BlueField
In general, running RedHat Enterprise Linux or CentOS on BlueField is similar to setting it
up on any other ARM64 server.
A driver disk is required to support the eMMC h ar dware typically used to install the media
onto. The driver disk also supports the tmfifo networking interface that allows creating a net-
Page 40
40
Mellanox Technologies Confidential
Rev 1.1
work interface over the USB or PCIe connection to an external host. For newer RedHat releases, or if the specific storage or netwo rking drivers mentioned are not needed, you can
skip the driver disk.
The way to manage bootflow components with BlueField is through grub boot manager. The
installation should create a /boot/efi VFAT partition that holds the binaries visible to UEFI
for bootup. The standard grub tools then manage the contents of that partition, and the UEFI
EEPROM persistent variables, to control the boot.
It is also possible to use the BlueField runtime distribution tools to directly configure UEFI
to load the kernel and initramfs from the UEFI VFAT boot partition if desired, but typically
using grub is preferred. In particular, you would need to explicitly copy the kernel image to
the VFAT partition whenever it is upgraded so that UEFI could access it; normally it is kept
on an XFS partition.
5.2.1 Provisioning ConnectX Firmware
Prior to installing RedHat, you should ensure that the ConnectX SPI ROM firmware has
been provisioned. If the BlueField is connected to an external host via PCIe, and is not running in Secure Boot mode, this is typically done by using the Mellanox MFT tools on the external host to provision the BlueField. If the BlueField is connected via USB or is configured
in Secure Boot mode, you must provision the SPI ROM by booting a dedicated bootstream
that allows the SPI ROM to be configured by the MFT running on the BlueField ARM cores.
There are multiple ways to access the RedHat installation media from a BlueField device for
installation.
1. You may use the primary C onnectX interfaces on the BlueField to reach the media over
the network.
2. You may configur e a USB or PCIe connection to the BlueField as a network bridge to
reach the media over the network.
NOTE: Requires installing and running the RShim drivers on the host side of the USB
or PCIe connection.
3. You may connect other network or storage devices to the BlueField via PCIe and use
them to connect to or host the RedHat install media.
NOTE: This method has not been tested.
Note that, in principle, it is possible to perform the installation according to the second
method above without first provisioning the ConnectX SPI ROM, but since you need to do
that provisioning anyway, it is recommended to perform it first. In particular, the PCIe network interface available via the external host’s RShim driver is likely too slow prior to provisioning to be usable for a distribution installation.
Page 41
41
Mellanox Technologies Confidential
Rev 1.1
5.2.2 Managing the Driver Disk
As discussed previously, you likely need a driver disk for RedHat installations. Mellanox
provides a number of pre-built driver disks, as well as a documented flow for building one
for any particular RedHat version. See section 5.1.7 “
age” for details on how to do that.
Normally a driver disk can be placed on removable media (like a CDROM or USB stick) and
is auto-detected by the RedHat installer. However, since BlueField typically has no removable media slots, you must provide it over the network. Although, if you are installing over
the network connection via the PCIe/USB link to an external host, you will not have a network connection either. As a result, the procedure documented is for modifying the default
RedHat images/pxeboot/initrd.img file to include the driver disk itself.
To create the updated initrd.img, you should locate the “image/pxeboot” directory in the
RedHat installation media. This will have a kernel image file (vmlinuz) and initrd.img (initial RAM disk). The “bluefield_dd/update-initrd.sh” script takes the path to the initrd.img as
an argument and adds the appropriate BlueField driver disk ISO file to the initrd.img.
Building a New bluefield_dd ISO Im-
When booting the installation media, make sure to include “inst.dd=/bluefield_dd.iso” on the
kernel command line, which will instruct Anaconda to use that driver disk, enabling the use
of the IP over USB/PCIe link (tmfifo) and the DesignWare eMMC (dw_mmc).
5.3 Installing the Reference Yocto Distribution
The BlueField processor should be attached to an external host via a USB connection. The
external host is required to be running Linux. This process has been tested with an external
host running CentOS 7.4.
You will need to install the provided “rshim”, “rshim_net”, and “rshim_usb” drivers on the
external host. These drivers are required to communicate with the BlueField device over the
USB interface.
®
The initramfs contains ConnectX
also installed on the initramfs if you wish to update the ConnectX firmware.
With Yocto running, the default firmware images are under /lib/firmware/mellanox/. The
mst, mlxburn, and flint tools are also available which can be used to update firmware as
usual.
Page 44
44
Mellanox Technologies Confidential
Rev 1.1
How to configure ConnectX firmware
Configuring ConnectX firmware can be done using the mlxconfig tool.
It is possible to configure privileges of both the internal (Arm) and the external host (for
SmartNICs) from a privileged host. According to the configured privilege, a host may or
may not perform certain operations relat ed to the NIC (e.g. determine if a certain host is allowed to read port counters).
For more information and examples please refer to the MFT User Manual which can be
found at: https://www.mellanox.com/page/management_tools.
How to use the UEFI boot menu
Press the “ESC” key after booting to enter the UEFI boot menu and use the arrows to select
the menu option.
It could take 1-2 minutes to enter the Boot Manager depending on how many devices are installed or whether the EXPROM is programmed or not.
Once in the boot manager:
• “EFU Network xxx” entries with device path “PciRoot...” are ConnectX interface
• “EFU Network xxx” entries with device path “MAC(...” are for the RShim interface
Select the interface and press ENTER will start PXE boot.
The following are several useful commands un der UEFI shell:
Shell> ls FS0: # display file
Shell> ls FS0:\EFI # display file
Shell> cls # clear screen
Shell> ifconfig -l # show interfaces
Shell> ifconfig -s eth0 dhcp # request DHCP
Shell> ifconfig -l eth0 # show one interface
Shell> tftp 192.168.100.1 grub.cfg FS0:\grub.cfg # tftp download a file
Shell> bcfg boot dump # dump boot variables
Shell> bcfg boot add 0 FS0:\EFI\centos\shim.efi "CentOS" # create an entry
How to change the default console of the install image
Upgrade to the latest stable boot partition images, see “How to upgrade the boot partition”
above.
CentOS fails into “dracut” mode during installation
This is most likely configuration related.
• If installing through the RShim interface, check whether /var/pxe/centos7 is mounted or
not. If not, either manually mount it or re-run the setup.sh script.
• Check the Linux boot message to see whether eMMC is found or not. If not, the Blue-
Field driver patch is missing. For local installation via RShim, run the setup.sh script with
the absolute path and check if there are any errors. For a corporate PXE server, make sure
the BlueField and ConnectX driver disk are patched into the initrd image.
How to Use the Kernel Debugger (KGDB)
The default Yocto kernel has CONFIG_KGDB and CONFIG_KGDB_SERIAL_CONSOLE
enabled. This allows the Linux kernel on BlueField to be debugged over the serial port. A
single serial port cannot be used both as a console and by KGDB at the same time. It is recommended to use the RShim for console access (/dev/rshim0/console) and the UART por t
(/dev/ttyAMA0 or /dev/ttyAMA1) for KGDB. Kernel GDB over console (KGDBOC) does
not work over the RShim console. If the RShim console is not available, there are open
source packages such as KGDB demux and agent-proxy which allow a single serial port to
be shared.
There are two ways to configure KGDBOC. If t he OS is already booted, then write the name
of the serial device to the KGDBOC module parameter. For example:
In order to attach GDB to the kernel, it must be stopped first. One way to do that is to send a
“g” to /proc/sysrq-trigger.
root@bluefield:~# echo g > /proc/sysrq-trigger
If you want to debug incidents that occur at boot time, that has to be configured through the
kernel boot parameters. Add “kgdboc=ttyAMA1,115200 kgdwait” to the boot arguments to
use UART1 for debugging and force it to wait for GDB to attach before booting.
Once the KGDBOC module is configured and the kernel stopped, run the Arm64 GDB on
the host machine connected to the serial port, then set the remote target to the serial device
on the host side.
Mellanox uses box.com to distribute BlueField software. Contact your sales/support
representative for a custom link to download BlueField software releases.
In this document, we assume the tarball BlueField-<version>.tar.gz is extracted at /
root, to do this, run the following command:
tar -xvf BlueField-<version>.tar.xz -C /root
/isos/aa
A.1.2Preparing the Host-Side Environment
Some required drivers do not compile and load if running CentOS 5.x or earlier.
Before installing the preferred OS on the BlueField SmartNIC, the host must be set up for it to be
capable of provisioning the SmartNIC. The RShim USB driver is installed on the host to communicate with the RShim device on the Blu
that it can push the initial bootloader and supply the OS image for PXE boot through the USB
connection.
eField SoC. The RShim USB driver must be installed so
rch64/CentOS-7-aarch64-Everything.iso
Rev 1.2
This process only needs to be done on the host machine which is provisioning the
SmartNIC, it is not required on the end machine.
33Mellanox Technologies
Page 48
A.1.2.1Setup Procedure With Installation Script
Bring-Up and Driver Installation
If the host is running CentOS 7 (or e
quivalent) on the host, you may run a script to complete all
“-c” flagSpecifies the default UART port for the OS to use since the BlueField SoC has two
Arm UARTs. For the SmartNIC, “ttyAMA0” is used, which is UART0
“-t” Flag
(Optional)
“-k” Flag
(Optional)
When specified and given the argument
case), it generates a “nonpxe.bfb” file which contains the install kernel and rootfs.
If this file is pushed to the RShim boot device, it automatically runs the installation
process and skips the initial UEFI PXE boot operations
Kickstarts auto-installation based on a default kickstart file w
var/pxe/ks/ks.cfg (optional)
A.1.2.2Setup Procedure Without Installation Script
the host is running CentOS 7 or equivalent, please refer to Section 5.1.2.1 which describes a
If
simpler way to perform the installation using an installation script.
C
The following sections demonstrate
ments should be relatively similar.
A.1.2.2.1 Step 1: Set up the RShim Interface
entOS 7 installation, however, installation in other environ-
The RShim driver communicates with the RShim device on the BlueField SoC. The RShim is in
charge of many miscellaneous functions of the SoC, including resetting the Arm cores, providing
Arm. Download and extract it from
of what platform is set (SmartNIC in this
h
ich is installed as /
Rev 1.2
34Mellanox Technologies
Page 49
the initial bootstream, and using the TMFIFO and the RShim network, to exchange network and
console data with the host.
The RShim can be reached by the host via the USB connector and the
however, to use the USB connection.
To enable access to the RShim via the PCIe slot, a link must be open though firmware.
This is done by adding “multi_function.rshim_pf_en = 0x1” to the [fw_boot_config]
section in the firmware's ini file.
A.1.2.2.2 Step 2: Install RShim Drivers
The RShim drivers are installed as a part of MLNX_OFED_LINUX installation process on the
host. See Installing MLNX_OFED on the Host on page 51 for further details.
Bring-Up and Driver Installation
PCIe slot. It is preferable,
•rshim is
the base RShim kernel module required by all other RShim modules
•rshim_usb is a module that accesses the RShim via the USB connector
•rshim_pci is a module that accesses the RShim via the PCIe slot
•rshim_net is a module which creates a network interface to the RShim
The kernel modules do not compile on CentOS 5 or eariler.
A.1.2.2.3 Step 3: Configure RShim Net Interface
To use the RShim net interface, create a udev rule and a config file.
•To create the udev rule, run the following command:
The host should be configured to act as a TFTP server to the SmartNIC via the USB RShim network. This server provides the required files by the SmartNIC to
installing the preferred OS.
Configuring the TFTP server requires a TFTP package. If it is not installed, install it via
“yum install tftp” or “apt-get tftp”, depending on your Linux distribution.
Note: On some versions, the TFTP
“xinetd”.
1. Extract the OS image and copy the required PXE boot components:
These commands assume that you are using kernel version “4.11.0-22.el7a.aarch64”. If
you are using a different version, utilize the corresponding bluefield_dd.iso. If none is
found, compile one by running the following:
--class gnu --class os {
linux (tftp)/centos/7.4/vmlinuz ro ip=dhcp method=http://192.168.100.1/centos7
inst.dd=/
initrd (tftp)/centos/7.4/initrd.img
}
EOF
bluefield_dd.iso console=hvc0
4. Start the TFTP server:
systemctl restart tftp
Based on the system, the user may need to use “system TFTP restart” instead. Also, if
required, the user might need to switch use “xinetd” instead of “TFTP”.
A.1.2.2.5 Step 5: Set Up the DHCP Server
Bring-Up and Driver Installation
DHCP server set up on the host is required for SmartNIC to get a private IP from the host for
PXE boot process completion. Configure the correct server names and domain names so that the
SmartNIC can connect to the network via the host later on.
# Set the domain search according to the network configuration
option domain-search "internal.tilera.com" "mtbu.labs.mlnx";
next-server 192.168.100.1;
filename "/BOOTAA64.EFI";
}
Rev 1.2
37Mellanox Technologies
Page 52
# Specify the IP address for this client.
host pxe_client {
hardware ethernet 00:1a:ca:ff:ff:01;
fixed-address 192.168.100.2;
}
EOF
It is recommended to back up the previous dhcpd.conf file before overwriting it.
A.1.2.2.6 Step 6: Set Up the HTTP Server
The TFTP server allows the PXE boot to load the initrd and kernel. The SmartNIC obtains all the
other required sources through the network, thus, making it necessary to set up an HTTP.
Setting up the HTTP server requires the HTTP package. If it is not installed, please
install it via “yum install httpd” or “apt-get httpd”, depending on your Linx distribution.
Bring-Up and Driver Installation
To configure the http server to serve the contents of the installation
mand:
cat >/etc/httpd/conf.d/pxeboot.conf <<EOF
Alias /centos7 /mnt
<Directory /mnt>
Options Indexes FollowSymLinks
Require ip 127.0.0.1 192.168.100.0/24
</Directory>
EOF
systemctl enable httpd
systemctl restart httpd
A.1.3 Flashing the SmartNIC Bootloader Code
Before installing an OS, flash the bootloader code first. The SmartNIC is shipped with an obsolete bootloader code, and should b
A.1.3.1 Opening a Terminal Connection to the SmartNIC
To open a console window to the SmartNIC, a terminal application is required. The application
“minicom” is used for the flow, however, any standard terminal application can work, e.g.
“screen”.
e updated with the following instructions.
disk, run the following com-
Rev 1.2
38Mellanox Technologies
Page 53
Bring-Up and Driver Installation
Install minicom by running “yum install minicom” or “apt-get install minicom”.
1. On the host, type “minicom” to open minicom on the current terminal, use “minicom -s” to
set it up.
2. Go to the settings menu by pressi
ng “Ctrl-a + o” (the setting menu opens by default when
launching with the “-s” option). Navigate to the “Serial port setup” submenu and set the
“Serial Device” to the one connected (should be one of the /dev/ttyUSBx if using the serialUART cable).
3. Change the baud rate to 115200 8N1,
are set to “No”.
Figure 8: Minicom Settings – Example
4. Select “Save setup as dfl” in order not to have to set it again in the future.
A.1.3.2 Using the Initial Install Bootstream
1. On the host side, ensure that the RShim kernel modules are loaded:
modprobe rshim_usb
modprobe rshim_net
An RShim device is located under the /dev directory, if you only
“rshim0”:
[root@bu-lab02 ~]# ls /dev/rshim0/
boot console net rshim
and ensure that the hardware and software flow control
have one, it should be
Rev 1.2
You can boot a SmartNIC by pushing a bootstream to it, which is done by writing a bootstream file to the /dev/rshimX/boot device. (See step 2 below
The /dev/rshimX/console device can be used as a console instead of the serial-USB console. The primary bootloader does no
support it. In cases where the special UART adapter board is unavailable, this can be
used instead.
t support this device, however, UEFI and Linux
.)
39Mellanox Technologies
Page 54
2. Push the initial install bootstream to the SmartNIC:
On the terminal, various boot messages appear until Linux is load
embedded Linux running off the kernel initramfs pushed in the bootstream.
3. At login prompt, login as root without password.
Figure 9: Yocto Log
4. After Linux is loaded, in the terminal, run the /opt/mlnx/scripts/bfrec script to update the
bootloader.
A.1.4 Installing CentOS 7 on BlueField SmartNIC
If the error “no root is found” appears in the installation process, check or disable the
firewall as needed on the x86 host machine.
ed. This is the Yocto
A.1.4.1 Full PXE Boot Installation
1. Get to the UEFI boot menu.
a. Reboot the SmartNIC by typing “reboot” on the console. A “UEFI firmware…” messa
should appear and the screen clears.
b. Press ESC several times until you enter the UEFI boot menu.
Rev 1.2
ge
Figure 10: UEFI Boot Menu
40Mellanox Technologies
Page 55
Bring-Up and Driver Installation
2. On the host, restart the DHCP and TFTP service:
systemctl restart dhcpd
systemctl restart tftp #might be xinetd
3. Navigate to the Boot Manager.
Figure 11: UEFI Boot Manager
4. Select EFI Network, it will then use the TFTP service on the host to discover all available
PXE boot options. Shortly after, a “..Fetching Netboot Image” message will appear enabling
CentOS installation.
Figure 12: Option to Install CentOS
5. Select CentOS download.
This process may take few minutes as it fetches data over the USB network. Running
“ifconfig” on the host and monitoring the RX/TX packets on the “tmfifo_net0” network
indicates that the fetching data process is not complete.
6. Follow the installation instructions in the configuration menu. Recommended settings are
included.
Rev 1.2
41Mellanox Technologies
Page 56
These configuration inputs are not needed when the kickstart option “-k” is specified
when running the setup.sh script.
Text mode provides a limited set of installation options. It does not offer
custom partitioning for full control over the disk layout. Would you like to use
VNC mode instead?
1) Start VNC
2) Use text mode
Please make your choice from above ['q' to quit | 'c' to continue |
'r' to refresh]: 2
======================================================================================
======================================================================================
Installation:main* 2:shell 3:log 4:storage-lo> Switch tab: Alt+Tab | Help: F1
1) [x] Language settings2) [!] Time settings
(English (United States)) (Timezone is not set.)
5) [!] Installation Destination6) [x] Kdump
(No disks selected) (Kdump is enabled)
[x] Network configuration8) [!] Root password
7)
(Wired (eth0) connected) (Password is not set.)
9) [!] User creation
(No user will be created)
Please make your choice from above ['q' to quit | 'b' to begin
installation | 'r' to refresh]: 2
======================================================================================
======================================================================================
Time settings
Timezone: not set
NTP servers:not configured
1) Set timezone
2) Configure NTP servers
Please make your choice from above ['q' to quit | 'c' to continue |
'r' to refresh]: 1
Please select the timezone.
Use numbers or type names directly [b to region list, q to quit]: 11
======================================================================================
======================================================================================
Timezone settings
Available timezones in region US
1) Alaska4) Eastern6) Mountain
2) Arizona5) Hawaii7) Pacific
3) Central
Please select the timezone.
Use numbers or type names directly [b to region list, q to quit]: 4
======================================================================================
======================================================================================
Installation
7)
(Wired (eth0) connected) (Password is not set.)
9) [!] User creation
(No user will be created)
Please make your choice from above ['q' to quit | 'b' to begin
installation | 'r' to refresh]: 4
======================================================================================
======================================================================================
Base environment
Software selection
Base environment
1) [x] Minimal Instal
2) [ ] Compute Node7) [ ] GNOME Desktop
3) [ ] Infrastructure Server8) [ ] KDE Plasma Workspaces
4) [ ] File and Print Server9) [ ] Development and Creative
5) [ ] Basic Web Server Workstation
Please make your choice from above ['q' to quit | 'c' to continue |
'r' to refresh]: 9
======================================================================================
======================================================================================
Base environment
Software selection
1) [ ] Minimal Install6) [ ] Server with GUI
2) [ ] Compute Node7) [ ] GNOME Desktop
3) [ ] Infrastructure Server8) [ ] KDE Plasma Workspaces
4) [ ] File and Print Server9) [x] Development and Creative
l6) [ ] Server with GUI
Bring-Up and Driver Installation
Rev 1.2
43Mellanox Technologies
Page 58
Bring-Up and Driver Installation
5) [ ] Basic Web Server Workstation
Please make your choice from above ['q' to quit | 'c' to continue | 'r' to refresh]: c
======================================================================================
======================================================================================
Installation
1) [x] Language settings2) [x] Time settings
(English (United States)) (US/Eastern timezone)
5) [!] Installation Destination6) [x] Kdump
(No disks selected) (Kdump is enabled)
7) [x] Network configuration8) [!] Root password
(Wired (eth0) connected) (Password is not set.)
9) [!] User creation
(No user will be created)
Please make your choice from above ['q' to quit | 'b' to begin installation | 'r' to refresh]: 5
======================================================================================
======================================================================================
Probing storage...
Installation Destination
x] 1) : 13.75 GiB (mmcblk0)
[
1 disk selected; 13.75 GiB capacity; 1007.5 KiB free ...
Please make your choice from above ['q' to quit | 'c' to continue | 'r' to refresh]: c
======================================================================================
======================================================================================
Autopartitioning Options
[ ] 1) Replace Existing Linux system(s)
x] 2) Use All Space
[
[ ] 3) Use Free Space
Installation requires partitioning of your hard drive. Select what space to use for the install target.
Please make your choice from above ['q' to quit | 'c' to continue | 'r' to refresh]: c
======================================================================================
======================================================================================
Partition Scheme Options
[ ] 1) Standard Partition
[ ] 2) Btrfs
x] 3) LVM
[
Rev 1.2
44Mellanox Technologies
Page 59
[ ] 4) LVM Thin Provisioning
Select a partition scheme configuration.
Please make your choice from above ['q' to quit | 'c' to continue | 'r' to refresh]: 1
======================================================================================
======================================================================================
Partition Scheme Options
[x] 1) Standard Partition
[ ] 2) Btrfs
[ ] 3) LVM
[ ] 4) LVM Thin Provisioning
Select a partition scheme configuration.
Please make your choice from above ['q' to quit | 'c' to continue | 'r' to refresh]: c
Generating updated storage configuration
Checking storage configuration...
7) [x] Network configuration 8) [!] Root password
(Wired (eth0) connected) (Password is not set.)
9) [!] User creation
(No user will be created)
Please make your choice from above ['q' to quit | 'b' to begin installation |
'r' to refresh]: 8
======================================================================================
======================================================================================
Please select new root password. You will have to type it twice.
9) [ ] User creation
(No user will be created)
Please make your choice from above ['q' to quit | 'b' to begin
installation | 'r' to refresh]: b
7. Enter “b” and press “enter” to initiate the installation process.
45Mellanox Technologies
Page 60
8. Press “Enter” to reboot into CentOS.
Figure 13: CentOS Installation Completion Screen
A.1.4.2 Non-PXE Boot Installation
When the setup script is run with the “-t” option, it generates a nonpxe.bfb file at the directory
where the script is run. The directory contains the install kernel and rootfs which are usually
loaded by UEFI during the initial PXE boot stage. Thus, if pushing this file, the host TFTP server
no longer needs to be used and UEFI would automatically load the install kernel and rootfs from
the boot FIFO. Together with the “-k” kickstart option, the host can be configured to initiate nonPXE boot and automatic CentOS installation, as long as the host HTTP and DHCP servers are
working. To kick off the installation process, run the following command on the host:
u installed CentOS 7 with the kickstart (“-k”) option.
:
47Mellanox Technologies
Page 62
A.2.1.2 Removing Pre-installed Kernel Module
There are cases where the kernel is shipped with an earlier version of the mlx5_core driver taken
from the upstream Linux code. This version does not support the BlueField Arm, but is loaded
before the MLNX_OFED driver, and therefore, needs to be removed.
To remove the kernel module from the initramfs, run the following
mount /root/MLNX_OFED_LINUX-4.2-X.X.X.X-rhel7.4alternate-aarch64.iso /mnt
3. Install MLNX_OFED.
If the kernel on the BlueField is 4.11.0-22.el7a.aarch64, run:
cd /mnt
# ./mlnxofedinstall --bluefield
If the kernel is different than 4
cd /mnt
# ./mlnxofedinstall --add-kernel-support --skip-repo --bluefield
This step might take longer than expected to be completed. If yo
.11.0-22.el7a.aarch64, run:
u are using a different pack-
age than the required one, run “yum install”.
If the date is not set correctly while installing MLNX_OFED, first, set the date (e.g date
-s 'Mon Feb 5 15:02:10 EST 2018'), then run the installation.
4. Restart openibd:
/etc/init.d/openibd restart
Rev 1.2
48Mellanox Technologies
Page 63
A.2.1.4 Setting ECPF as eSwitch Manager and Page Supplier
After installing MLNX_OFED on the Arm core, enable ECPF (Embedded CPU Physical Function) as esw_manager and assign ECPF as the page supplier for all
PFs and VFs. Perform the following:
T
1. Start MST (Mellanox Software
mst start
ools) driver set service:
2. Identify the MST device:
mst status -v
Output example:
MST modules:
----------- MST PCI module is not loaded
MST PCI configuration module loaded
PCI devices:
-----------DEVICE_TYPE MST PCI RDMA NET NUMA
BlueField(rev:0) /dev/mst/mt41682_pciconf0.1 37:00.1 mlx5_1 net-ens1f1 0
mlxconfig -d /dev/mst/mt41682_pciconf0 s INTERNAL_CPU_MODEL=1
mlxconfig -d /dev/mst/mt41682_pciconf0.1 s INTERNAL_CPU_MODEL=1
mlxconfig -d /dev/mst/mt41682_pciconf0 s ECPF_ESWITCH_MANAGER=1 ECPF_PAGE_SUPPLIER=1
mlxconfig -d /dev/mst/mt41682_pciconf0.1 s ECPF_ESWITCH_MANAGER=1 ECPF_PAGE_SUPPLIER=1
Arm:
4. Power cycle the server.
A.2.1.5 Updating SmartNIC Firmware on the Host
The below steps demonstrate how to manually update the firmware if the process fails. The firmware image can be found in the BlueField software package.
After MLNX_OFED is installed on the Arm cores, use the mlx5_core driver to use the
two Ethernet ports on the SmartNIC. If the Ethernet ports on the SmartNIC are connected to the network, there is no need to bridge the host via RShim net to acces
network.
A.2.2Installing MLNX_OFED on the Host
MLNX_OFED should be installed on any host using the SmartNIC. This includes the host used
to provision the SmartNIC as well as the final system where the SmartNIC is attached to.
To install MLNX_OFED on the host:
mount MLNX_OFED_LINUX-4.2-X.X.X.X-rhel7.4-x86_64.iso /mnt
cd /mnt
./mlnxofedinstall
The last step of installing MLNX_OFED is to check and update the firmware. If it is
possible to flash the firmware, flash it back according to the instructions in
Section 5.2.1.3, “Installing MLNX_OFED on the SmartNIC”, on page 48.
Bring-Up and Driver Installation
s the
Manually load the mlx5_core driver on the BlueField Arm before loading
it on the host, as the
BlueField Arm is responsible for managing the memory. Manually blacklist the mlx5_core driver
on the host and load it only after the BlueField Arm loading process is complete. To blacklist the
driver, run:
To prevent the Linux kernel from loading the mlx5_core driver included
inside of the initramfs,
open /boot/grub/grub.conf and append the following to the vmlinux line:
rdblacklist=mlx5_core
Also, change to “ONBOOT=no” in /etc/infiniband/openib.conf.
Once the BlueField Arm driver is loaded, manually load the driver via:
modprobe mlx5_core
When rebooting CentOS on the Arm-side, the host-side driver should be unloaded first.
This is done with “rmmod mlx5_ib mlx5_core ib_core mlx_compat mlxfw”. Reload the
host driver after the Arm driver is loaded.
Rev 1.2
51Mellanox Technologies
Page 66
A.3 Running Open vSwitch on the SmartNIC
A.3.1 Installing Open vSwitch
A.3.1.1Resolving Dependencies
Before building Open vSwitch (OVS), resolve its dependencies. Run the following command:
1. When using version 2.8.0 of the upstream OVS source, run a git clone to get it from GitHub:
git clone https://github.com/openvswitch/ovs.git
2. Checkout version 2.8.0:
cd ovs
git checkout v2.8.0
Bring-Up and Driver Installation
A.3.1.3Building OVS from Source
To bootstrap OVS and build the RPMs, run:
./boot.sh
./configure
make -j8 rpm-fedora
A.3.1.4Installing OVS RPMs
1. Find the RPM under the rpm/rpmbuild/RPMS directory.
2. To install the OVS RPM, run:
cd rpm/rpmbuild/RPMS/aarch64
rpm -i openvswitch-2.8.0-1.el7.centos.aarch64.rpm
cd ../noarch
rpm -I openvswitch-selinux-policy-2.8.0-1.el7.centos.noarch.rpm
systemctl start openvswitch
The RPMs built on the initial SmartNIC can be copied and installed on all the rest of the
SmartNICs.
Rev 1.2
52Mellanox Technologies
Page 67
A.3.2 Bridging the Host to the Network Port with OVS
A.3.2.1 Connecting the Host Function to the SmartNIC Network Port
Prerequisites:
•The following network interfaces should appear on the SmartNIC:
[root@localhost ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen
1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode
DEFAULT qlen 1000
link/ether 00:1a:ca:ff:ff:01 brd ff:ff:ff:ff:ff:ff
3: enp3s0f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq switchid
0002c92dd111 state DOWN mode DEFAULT qlen 1000
link/ether 00:02:c9:2d:d1:11 brd ff:ff:ff:ff:ff:ff
4: enp3s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq switchid 0002c92dd112
state UP mode DEFAULT qlen 1000
link/ether 00:02:c9:2d:d1:12 brd ff:ff:ff:ff:ff:ff
5: rep0-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq switchid 0002c92dd111
state UP mode DEFAULT qlen 1000
link/ether ee:10:47:91:95:92 brd ff:ff:ff:ff:ff:ff
6: rep1-0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq switchid 0002c92dd112
state UP mode DEFAULT qlen 1000
link/ether 2e:c2:de:50:3e:8e brd ff:ff:ff:ff:ff:ff
Bring-Up and Driver Installation
The “eth0” interface is the RShim
network interface. The “enp3s0f0” and “enp3s0f1” interfaces are the two outgoing Ethernet ports, “rep0-0” and “rep1-0” are the two physica
sentors that are connected to t
•The following network interface
[root@bu-lab02 ~]# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode
DEFAULT qlen 1000
link/ether 18:03:73:b9:34:4c brd ff:ff:ff:ff:ff:ff
3: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
link/ether 66:76:45:1a:9b:53 brd ff:ff:ff:ff:ff:ff
4: p4p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 1000
link/ether 12:a8:35:c6:40:eb brd ff:ff:ff:ff:ff:ff
The “p4p1” and “p4p2” interfaces
he host.
s should appear on the host:
are the representors linked to “rep0-0” and “rep1-0” on the
SmartNIC.
If you do not see them, this might be because you did not manually load the host drivers
due to the blacklisting operation. To manually load it, run “modprobe mlx5_core”.
l repre-
Rev 1.2
53Mellanox Technologies
Page 68
Bring-Up and Driver Installation
Procedure:
Using OVS, bridge rep1-0 with enp3s0f1 so the host can directly use the network connection on
[root@localhost ~]# ovs-vsctl show
47b2b4e7-1e13-43e6-9f11-f729429217b0
Bridge "armbr1"
Port "rep1-0"
Interface "rep1-0"
Port "enp3s0f1"
Interface "enp3s0f1"
Port "armbr1"
Interface "armbr1"
type: internal
ovs_version: "2.8.0"
VF needs to be added as well.
Make sure the interfaces are up before the bridging process. An interface is up if it is
displayed via “ifconfig” without the “-a” option. To bring up “rep1-0”, run “ifconfig
rep1-0 up”.
5. The host is now connected to whatever is o
To enable the connection next time the SmartNIC boots, add the “openvswitch” service
to the list of services to be started at boot time. For example, with “systemctl”, run “systemctl enable openvswitch”.
A.3.2.2 Verifying the Host Connection
A.3.2.2.1 Connected Peer-to-Peer
When the SmartNIC is connected to another SmartNIC on another machine, manually assign IP
addresses with the same subnet to both ends of the connection.
1. Assuming the link is connected to p3p1 on the other host, run:
ifconfig p3p1 192.168.200.1/24 up
n the other side of the network port.
Rev 1.2
54Mellanox Technologies
Page 69
2. On the host where the SmartNIC is connected to, run:
ifconfig p4p2 192.168.200.2/24 up
Bring-Up and Driver Installation
3. Have one ping the other. This is an example of SmartNIC pinging
ping 192.168.200.1
If the ping does not work, it is due to mixing up of the two ports. On the other host, try
using the other port instead.
A.3.2.2.2 Connected to Network with DHCP Server
To verify that the port is directly connected to a switch which is connected to the network, bring
up the port like any regular Ethernet port.
1. Assuming the interface is called p4p2 on the host, create a file
ifcfg-p4p2 with the following content:
Running ifconfig, it is possible that the SmartNIC got its own IP
should work as well.
In cases where the host already has a network connection, bring it down first. Assuming
the interface is “em1”, run “ifdown em1”. Having two interfaces to the same subnet
might confuse the host and generate routing issues.
A.3.3 Enabling Offloading
Offloading the OVS and TC rules to the hardware by the SmartNIC is supported.
•To offload the OVS rules to the traffic controller, run:
ovs-vsctl set Open_vSwitch . Other_config:hw-offload=true
address. Pinging other machines
Rev 1.2
55Mellanox Technologies
Page 70
Bring-Up and Driver Installation
•To offload the TC rules to the hardware, run:
ethtool -K enp3s0f1 hw-tc-offload on
ethtool -K rep1-0 hw-tc-offload on
To verify that the rules are offloaded, dump the OVS and hardware rules and check if they match.
These two rules, shown by the pretty_dump.py script, correspond
A.3.4 VXLAN Tunneling Offload
VXLAN tunnels are created on the Arm side and attached to the OVS. VXLAN decapsulation/
encapsulation behavior is similar to normal VXLAN behavior, including over hw_offload=true.
Rev 1.2
to the first and third OVS rules.
56Mellanox Technologies
Page 71
A.3.4.1 Configuring VXLAN Tunnel
1. Consider the enp3s0f0 to be the local VXLAN tunnel interface.
The MTU of the end points (rep0-0 in the example above) of the VXLAN tunnel must
be smaller than the tunnel interfaces (enp3s0f0) as the size of the VXLAN headers.
A.3.5 Connection Tracking
Connection tracking is performed by OVS on the Arm cores. Connection tracking on arm-ovs
currently works without HW offload, so each packet passes through OVS and is tracked by it.
Connection tracking rules are confi
ous different parameters. In the following example, the OVS "in-po
A.3.5.1 Querying Connection Tracking Active Flows
To query connection tracking active flows when the setup is clean:
gured using OpenFlow. Rules can be matched based on vari-
rt" parameter is used.
Rev 1.2
57Mellanox Technologies
Page 72
The MTU of the end points (rep0-0 in the example above) of the VXLAN tunnel must
be smaller than the tunnel interfaces (enp3s0f0) as the size of the VXLAN headers.
2. Set to mark IP packets with the “trk” flag and then proceed to table 1.
Traffic through the same port but from the other direction (out_port=
Other traffic coming through any other port on the same vSwitch sho
A.4 Enabling SR-IOV on the SmartNIC
Virtual functions (VFs) cannot be probed before the Arm reconfigures itself after enabling SRIOV. To ensure this does not happen, perform the following steps on the host side:
The complete setup process can be time consuming. Fortunately, the filesystem installed on one
BlueField can be directly used on another BlueField system. Therefore, the fastest, most efficient
way to install CentOS onto a BlueField system is to restore the eMMC image backup from
another BlueField image.
A.5.1 Backing Up the eMMC Image
Before backing up the eMMC, all of its partitions need to be unmounted to avoid data corruption.
Unmounting the root partition of the CentOS is impractical, therefore using the initial Yocto running entirely on memory is a good option.
1. If the SmartNIC is currently running CentOS, issue a shutdown
unmounts the entire filesystem:
root@localhost:~[root@localhost ~]# shutdown -h now
[ OK ] Started Show Plymouth Power Off Screen.
[ OK ] Stopped Dynamic System Tuning Daemon.
[ OK ] Stopped target Network.
Stopping LSB: Bring up/down networking...
[ OK ] Stopped LSB: Bring up/down networking.
Stopping Network Manager...
[ OK ] Stopped Network Manager.
……
Unmounting /run/user/0...
Unmounting /mnt...
Unmounting /boot/efi...
[ OK ] Deactivated swap /dev/disk/by-path/platform-PRP0001:00-part3.
[ OK ] Deactivated swap /dev/disk/by-partu...4fa-7c6a-4fd4-a795-84415d19f840.
[ OK ] Deactivated swap /dev/disk/by-id/mmc-R1J56L_0x353c1019-part3.
[ OK ] Deactivated swap /dev/mmcblk0p3.
[ OK ] Deactivated swap /dev/disk/by-uuid/...291-1ad6-4e3a-b5b4-9087950a3296.
[ OK ] Unmounted /run/user/0.
[ OK ] Unmounted /boot/efi.
Unmounting /boot...
[ OK ] Unmounted /mnt.
[14370.028599] XFS (mmcblk0p2): Unmounting Filesystem
[ OK ] Unmounted /boot.
[ OK ] Reached target Unmount All Filesystems.
[ OK ] Stopped target Local File Systems (Pre).
[ OK ] Stopped Remount Root and Kernel File Systems.
……
[14370.787777] reboot: Power down
ERROR: System Off: operation not handled.
PANIC at PC : 0x0000000000459b9c
Bring-Up and Driver Installation
command so that the kernel
Rev 1.2
2. On the host, push the install.bfb through the RShim boot devic
3. Once the mini Yocto has finished booting, bring up the interface which is selected to copy
over the eMMC image to the host. Any working network interface can be used, in this example the representor interface is used as it offers a faster transfer
speed (using the RShim net-
work interface is also a good option).
On the BlueField side:
root@bluefield:~# ifconfig rep0-0 192.168.200.2 up
On the host side:
[root@bu-lab02 ~]# ifconfig p4p1 192.168.200.1/24 up
[root@bu-lab02 ~]# ping 192.168.200.2
PING 192.168.200.2 (192.168.200.2) 56(84) bytes of data.
64 bytes from 192.168.200.2: icmp_seq=1 ttl=64 time=0.281 ms
64 bytes from 192.168.200.2: icmp_seq=2 ttl=64 time=0.073 ms
^C
--- 192.168.200.2 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.073/0.177/0.281/0.104 ms
4. Check if netcat is working properly
a. Set up a netcat server on the host to listen to port 12345, and
. This is the tool that is used to pipe data across networks.
let it send the message “Hello
from host” to the client:
[root@bu-lab02 ~]# echo "Hello from host" | nc -l 12345
b. On BlueField, send the message “Hello from BlueField” to the server:
root@bluefield:~# echo "Hello from BlueField" | nc 192.168.200.1 12345
Hello from host
c. On the host, the nc command completes and prints out “Hello from BlueField”:
[root@bu-lab02 ~]# echo "Hello from host" | nc -l 12345
Hello from BlueField
d. This may fail since the iptables forbid listening to the port. If this is the case, flush the
rules by running the followi
iptables -F
Forcing nc to use IPv4 addres
[root@bu-lab02 ~]# nc -4 -l 12345
ng:
ses might resolve the issue:
e. Back up the eMMC image from the BlueField to the host. Set up the host to listen on port
2345, compress what it receives and store it to a file:
The “pv” command is entirely optional. It is used to monitor the progress of the backup.
The backup should finish when the total data consumed is 13.8G, which is approximately 6 minutes if using the representor port.
61Mellanox Technologies
Page 76
Bring-Up and Driver Installation
5. On BlueField, read the entire eMMC boot partition with the “dd” command and pass it to the
host:
To restore the eMMC, the BlueField system cannot be using the eMMC when recovering it, thus
the mini Yocto running entirely on memory is the solution.
1. Push the install.bfb for it to boot from memory and set up the
to be used. This step is the same as when backing up the eMMC image. Instead of transferring
the image from the BlueField to the host, it is done the other way around.
2. Set up the host to extract the image and set up a netcat server to se
Alternatively, if this is not done, at boot time UEFI would stop at the boot menu and you
would have to go to the UEFI console and use the UEFI console bcfg command to achieve the
same affect:
In many cases, bring up environments do not have access to external networks, and thus “yum
install” cannot access its default repositories to download the packages. Also, manually installing
RPMs is also exhaustive due to having to resolve all the dependency packages manually, which
can lead to loads of extra RPMs being downloaded. To address this, a local yum repository can
be set up, so that “yum install” can still be used even on machines with no external network
access, leveraging its ability to automatically resolve all the dependencies.
Therefore, if this pro-
A.6.1 Setting Up a Yum Repository
A yum repository contains a number of RPMs and a “repoinfo” directory which the “createrepo”
command has generated to store the metadata of the present RPMs. Follow the below instructions
to generate the required metadata (create the “repoinfo” directory and files).
1. Copy all the needed RPMs to a singl
mkdir -p /root/localrepo
cp *.rpm /root/localrepo
2. Run the createrepo command to make the directory a repository:
createrepo /root/localrepo
Once completed, a “repoinfo” directory is cre
the createrepo package itself is not on the CentOS default installation. Run the followi
command to enable its usa
yum install -y createrepo yum-utils
Rev 1.2
e directory, for example:
ated that can be used as a repository. However,
ng
ge:
63Mellanox Technologies
Page 78
Bring-Up and Driver Installation
The aforementioned command needs to be run before it can be used, which defeats the
purpose of using it to create the repository. Rather than building a repository from
scratch, the optimal way is to use an already built repository. The best available repository is the CentOS installation disk, so make sure yo
“everything” and not “netinstall”, “minimal” or “dvd”.
u have the image which is
called
3. Mount the CentOS-7-<arch>-everything.iso, apart from other directories. The “repoinfo”
directory makes it a yum repository and a “Package” directory which includes all the RPMs it
contains:
4. Once completed, the mount point is ready to act as a
A.6.2 Yum Repository Usage
To use the created repository and not the default one, update the locations in which CentOS looks
for yum repositories. This data is stored at /etc/yum.repo.d/.
1. Remove the existing repository data to avoid access failure when
unavailable:
2. Create a file where the “baseurl” va
here for simplicity:
cat > /etc/yum.repos.d/myyum.repo<<end
[myyum]
name=myyumsource
baseurl=file:///mnt/x86/
gpgcheck=0
end
3. Flush the cache information so that the system c
yum clean all
repository.
a network connection is
riable points to the repository to use. Turn off “gpgcheck”
an pick up the new repo data:
Rev 1.2
4. When completed, the system should be able to see the
yum repolist
yum repository:
64Mellanox Technologies
Page 79
The command “yum install” should work from this point, as long as the package is not a third
party package which is not included in the CentOS base repository.
A.6.3 Arm CentOS Using Repository from Connected x86 Host
The SmartNIC eMMC size is 16GB, and the CentOS-7-aarch64-Everything.iso image is 7GB.
Therefore, it is impractical to scp the image to the eMMC and mount it there. To address this, the
aarch64 CentOS image is mounted on the connected x86 host, and the host uses an HTTP service
to serve the content of the image to the CentOS running on the Arm cores.
Bring-Up and Driver Installation
This step is already done if you rec
ently used the setup script as it automatically mounts the
image on the host and starts the HTTP service. To verify that it works, try downloading something on the SmartNIC: