LIMITED AND RESTRICTED RIGHTS NOTICE: If data or software is delivered pursuant to a General Services
Administration (GSA) contract, use, reproduction, or disclosure is subject to restrictions set forth in Contract No. GS-35F-
05925.
Reading instructions
• To ensure that you get correct command lines using the copy/paste function, open this Guide with Adobe
Acrobat Reader, a free PDF viewer. You can download it from the official Web site
reader/
• Replace values in angle brackets with the actual values. For example, when you see <*_USERNAME> and
<*_PASSWORD>, enter your actual username and password.
• Between the command lines and in the configuration files, ignore all annotations starting with “#”.
Edit nodes.csv from xCAT dumping data . . . . . 54
Notices and trademarks . . . . . . . . . . . 55
iiLiCO 6.0.0 Installation Guide (for EL8)
Chapter 1.Overview
Public network
Nodes BMC interface
Nodes eth interface
High speed network interface
TCP networking
Login node
Compute node
High speed
network
Parallel file system
Management node
Introduction to LiCO
Lenovo Intelligent Computing Orchestration (LiCO) is an infrastructure management software for highperformance computing (HPC) and artificial intelligence (AI). It provides features like cluster management and
monitoring, job scheduling and management, cluster user management, account management, and file
system management.
With LiCO, users can centralize resource allocation in one supercomputing cluster and carry out HPC and AI
jobs simultaneously. Users can perform operations by logging in to the management system interface with a
browser, or by using command lines after logging in to a cluster login node with another Linux shell.
Typical cluster deployment
This Guide is based on the typical cluster deployment that contains management, login, and compute nodes.
Elements in the cluster are described in the table below.
Table 1. Description of elements in the typical cluster
Element
Management node
Compute nodeCompletes computing tasks.
Login node
Parallel file system
Nodes BMC interface
Nodes eth interface
High speed network
interface
Description
Core of the HPC/AI cluster, undertaking primary functions such as cluster management,
monitoring, scheduling, strategy management, and user & account management.
Connects the cluster to the external network or cluster. Users must use the login node to log
in and upload application data, develop compilers, and submit scheduled tasks.
Provides a shared storage function. It is connected to the cluster nodes through a highspeed network. Parallel file system setup is beyond the scope of this Guide. A simple NFS
setup is used instead.
Used to access the node’s BMC system.
Used to manage nodes in cluster. It can also be used to transfer computing data.
Optional. Used to support the parallel file system. It can also be used to transfer computing
data.
Note: LiCO also supports the cluster deployment that only contains the management and compute nodes.
In this case, all LiCO modules installed on the login node need to be installed on the management node.
Operating environment
Cluster server:
Lenovo ThinkSystem servers
Operating system:
CentOS / Red Hat Enterprise Linux (RHEL) 8.1
Client requirements:
• Hardware: CPU of 2.0 GHz or above, memory of 8 GB or above
• Browser: Chrome (V 62.0 or higher) or Firefox (V 56.0 or higher) recommended
• Display resolution: 1280 x 800 or above
2
LiCO 6.0.0 Installation Guide (for EL8)
Supported servers and chassis models
LiCO can be installed on certain servers, as listed in the table below.
Table 2. Supported servers
Product code
sd5307X21
sd6507X58
sr630
sr645
sr650
sr655
sr665
sr670
Machine type
7X01, 7X02
7D2X, 7D2Y
7X05, 7X06
7Y00, 7Z01
7D2V, 7D2W
7Y36, 7Y37,
7Y38
Product name
Lenovo ThinkSystem
SD530 (0.5U)
Lenovo ThinkSystem
SD650 (2 nodes per 1U
tray)
Lenovo ThinkSystem
SR630 (1U)
Lenovo ThinkSystem
SR645 (1U)
Lenovo ThinkSystem
SR650 (2U)
Lenovo ThinkSystem
SR655 (2U)
Lenovo ThinkSystem
SR665 (2U)
Lenovo ThinkSystem
SR670 (2U)
Appearance
sr850
sr850p
sr950
7X18, 7X19
7D2F, 7D2G,
7D2H
7X11, 7X12,
7X13
Lenovo ThinkSystem
SR850 (2U)
Lenovo ThinkSystem
SR850P (2U)
Lenovo ThinkSystem
SR950 (4U)
LiCO can be installed on certain chassis models, as listed in the table below.
Chapter 1. Overview3
Table 3. Supported chassis models
Product code
d27X20
n1200
Machine type
5456, 5468, 5469
Model name
D2 Enclosure (2U)
NeXtScale n1200
(6U)
Appearance
Prerequisites
• Refer to LiCO best recipe to ensure that the cluster hardware uses proper firmware levels, drivers, and
settings:
• Refer to the OS part of LeSI 20A_SI best recipe to install the OS security patch:
us/en/solutions/HT510293
• Unless otherwise stated in this Guide, all commands are executed on the management node.
• To enable the firewall, modify the firewall rules according to “Firewall settings” on page 49.
• It is important to regularly patch and update components and the OS to prevent security vulnerabilities.
• Additionally it is recommended that known updates at the time of installation be applied during or
immediately after the OS deployment to the managed nodes and prior to the rest of the LiCO setup steps.
# Prefix of compute node hostname. If OS has already been installed on all nodes in the
# cluster, change the configuration according to actual conditions.
compute_prefix="c"
# Compute node hostname list. If OS has already been installed on all nodes
# in the cluster, change the configuration according to actual conditions.
c_name[0]=c1
6LiCO 6.0.0 Installation Guide (for EL8)
c_name[1]=c2
# Compute node IP list. If OS has already been installed on all nodes in the cluster,
# change the configuration according to actual conditions.
c_ip[0]=192.168.0.6
c_ip[1]=192.168.0.16
# Network interface card MAC address corresponding to the compute node IP. If OS has
# already been installed on all nodes in the cluster, change the configuration according
# to actual conditions.
c_mac[0]=fa:16:3e:73:ec:50
c_mac[1]=fa:16:3e:27:32:c6
# Compute node BMC address list.
c_bmc[0]=192.168.1.6
c_bmc[1]=192.168.1.16
# Total login nodes. If there is no login node in the cluster, the number of logins
# must be "0". And the 'l_name', 'l_ip', 'l_mac', and 'l_bmc' lines need to be removed.
num_logins="1"
# Login node hostname list. If OS has already been installed on all nodes in the cluster,
# change the configuration according to actual conditions.
l_name[0]=l1
# Login node IP list. If OS has already been installed on all nodes in the cluster,
# change the configuration according to actual conditions.
l_ip[0]=192.168.0.15
# Network interface card MAC address corresponding to the login node IP.
# If OS has already been installed on all nodes in the cluster, change the configuration
# according to actual conditions.
l_mac[0]=fa:16:3e:2c:7a:47
# Login node BMC address list.
l_bmc[0]=192.168.1.15
# icinga api listener port
icinga_api_port=5665
Step 4. Save the changes to lico_env.local.
This Guide assumes that the node's BMC username and password are consistent. If they are
inconsistent, they need to be modified during the installation.
Step 5. Run the following commands to make the configuration file take effect:
chmod 600 lico_env.local
source lico_env.local
Chapter 2
. Deploy the cluster environment7
After the cluster environment is set up, configure the IP address of the public network on the login or
management node. In this way, you can log in to LiCO Web portal from external network.
Create a local repository
Different steps should be followed depending on the operating system.
For CentOS
Step 1. Run the following command to create a directory for ISO storage:
mkdir -p ${iso_path}
Step 2. Download the CentOS-8.1.1911-x86_64-dvd1.iso and CHECKSUM file from http://vault.centos.org/
8.1.1911/isos/x86_64/
Step 3. Copy the file to ${iso_path}.
Step 4. Run the following commands to get verification code of the iso file, ensure this verification code is
the same as the verification code in CHECKSUM.
cd ${iso_path}
sha256sum CentOS-8.1.1911-x86_64-dvd1.iso
cd ~
Step 5. Run the following commands to mount image:
mkdir -p ${os_repo_dir}
.
mount -o loop ${iso_path}/CentOS-8.1.1911-x86_64-dvd1.iso ${os_repo_dir}
Step 6. Run the following commands to configure local repository:
Note: The Nouveau module is an accelerated open-source driver for NVIDIA cards. This module
should disabled before the installation of the CUDA driver.
For RHEL
Step 1. Run the following command to prepare the OS image for the other nodes:
Note: If you cannot run these commands,check if the xCAT is successfully installed on the management
node, and passwordless SSH is set between the management node and other nodes. You can copy the id_
rsa file and the id_rsa.pub file from the management node to other nodes, and run these commands again.
Chapter 2. Deploy the cluster environment13
Loading...
+ 43 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.