Lenovo LiCO 6.1.0 User Manual

LiCO 6.1.0 User Guide

Eighth Edition (December 2020)

LIMITED AND RESTRICTED RIGHTS NOTICE: If data or software is delivered pursuant to a General Services Administration (GSA) contract, use, reproduction, or disclosure is subject to restrictions set forth in Contract No. GS-35F-

05925.

Contents

Chapter 1. Overview. . . . . . . . . . . 1

Introduction to LiCO . . . . . . . . . . . . . . 1

Features of LiCO . . . . . . . . . . . . . . . 1

Terminology . . . . . . . . . . . . . . . . . 1

Prerequisite . . . . . . . . . . . . . . . . . 2

Operating environment . . . . . . . . . . . . . 2

Chapter 2. Basic operations. . . . . . . 3

Log out . . . . . . . . . . . . . . . . . . . 3

Get current version information. . . . . . . . . . 3

Change the password . . . . . . . . . . . . . 3

View cluster resources and job status . . . . . . . 4

Elements on the cluster overview page. . . . . . . 4

Manage files . . . . . . . . . . . . . . . . . 4

Create a folder . . . . . . . . . . . . . . 4

Rename a folder. . . . . . . . . . . . . . 5

Preview an image . . . . . . . . . . . . . 5

Archive files . . . . . . . . . . . . . . . 5

Extract an archived file . . . . . . . . . . . 5

Upload files . . . . . . . . . . . . . . . 5

Copy and paste files . . . . . . . . . . . . 5

Move files . . . . . . . . . . . . . . . . 6

Duplicate files. . . . . . . . . . . . . . . 6

Container images . . . . . . . . . . . . . . . 6

View container images . . . . . . . . . . . 6

Build a container image . . . . . . . . . . . 7

Import a container image . . . . . . . . . . 9

View a container image . . . . . . . . . . . 9

Download a container image . . . . . . . . 10

Edit a container image . . . . . . . . . . 10

Delete a container image . . . . . . . . . 10

Reupload a container image . . . . . . . . 10

Bills . . . . . . . . . . . . . . . . . . . 10

API key . . . . . . . . . . . . . . . . . . 11

Create a permanent API key . . . . . . . . 11

Create a temporary API key . . . . . . . . 12

View an API key . . . . . . . . . . . . . 12

Delete an API key . . . . . . . . . . . . 12

Change a permanent API key. . . . . . . . 12

Change a temporary key . . . . . . . . . 12

Runtime . . . . . . . . . . . . . . . . . . 13

View runtimes. . . . . . . . . . . . . . 13

Create a runtime. . . . . . . . . . . . . 13

Edit a runtime . . . . . . . . . . . . . . 13

Duplicate a runtime . . . . . . . . . . . 14

Verify a runtime . . . . . . . . . . . . . 14

Delete a runtime . . . . . . . . . . . . . 14

Git publishing . . . . . . . . . . . . . . . 14

Create a publishing task. . . . . . . . . . 15

Chapter 3. Lenovo-accelerated AI . . 17

Job submission – Train . . . . . . . . . . . . 17

Submit an Image Classification – Train job . . 17

Submit an Object Detection – Train job . . . . 18

Submit an Instance Segmentation – Train

job . . . . . . . . . . . . . . . . . . 18

Submit a Medical Image Segmentation – Train

job . . . . . . . . . . . . . . . . . . 19

Submit a Seq2seq – Train job . . . . . . . 19

Submit a Memory Network – Train job . . . . 19

Submit an Image GAN – Train job . . . . . . 19

Job submission – Predict . . . . . . . . . . . 20

Submit an Image Classification – Predict

job . . . . . . . . . . . . . . . . . . 20

Submit an Object Detection – Predict job . . . 20 Submit an Instance Segmentation – Predict

job . . . . . . . . . . . . . . . . . . 21

Submit a Medical Image Segmentation –

Predict job . . . . . . . . . . . . . . . 21

Submit a Seq2seq – Predict job. . . . . . . 21

Submit a Memory Network – Predict job . . . 21

Submit an Image GAN – Predict job . . . . . 22

Deployment . . . . . . . . . . . . . . . . 22

Install LeTrain . . . . . . . . . . . . . . 22

Export the trained model . . . . . . . . . 23

Run the inference toolkit code . . . . . . . 23

Chapter 4. AI Studio . . . . . . . . . . 27

Datasets . . . . . . . . . . . . . . . . . 27

Dataset information . . . . . . . . . . . 27

Dataset operations. . . . . . . . . . . . 28

Dataset details . . . . . . . . . . . . . 28

Create a dataset. . . . . . . . . . . . . 28

Edit an image classification dataset . . . . . 29

Edit an object detection or instance

segmentation dataset. . . . . . . . . . . 30

Training tasks . . . . . . . . . . . . . . . 32

Submit a task . . . . . . . . . . . . . . 33

View job information of a task . . . . . . . 35

Trained models . . . . . . . . . . . . . . . 38

View models . . . . . . . . . . . . . . 38

Delete a model . . . . . . . . . . . . . 39

Publish a model . . . . . . . . . . . . . 39

Deploy a model . . . . . . . . . . . . . 39

Test a model . . . . . . . . . . . . . . 40

Deployed services . . . . . . . . . . . . . . 40

View services . . . . . . . . . . . . . . 40

Inactivate an activated service . . . . . . . 40

Activate an inactivated service . . . . . . . 40

Use an activated service . . . . . . . . . 41

Chapter 5. Dev Tools . . . . . . . . . 43

Create a Jupyter Python/R instance . . . . . . . 43

Create a Jupyter Custom instance . . . . . . . 43

Access the log page of a Jupyter instance . . . . 43

View a Jupyter instance. . . . . . . . . . . . 44

Stop a Jupyter instance. . . . . . . . . . . . 44

Start a Jupyter instance. . . . . . . . . . . . 44

Delete a Jupyter instance . . . . . . . . . . . 44

Filter and search for Jupyter instances . . . . . . 45

Filter Jupyter instances . . . . . . . . . . 45

Search for Jupyter instances . . . . . . . . 45

Chapter 6. Job submission . . . . . . 47

Submit an industry standard AI job . . . . . . . 47

Submit a TensorFlow Single Node job . . . . 47

Submit a TensorFlow Multinode job . . . . . 48

Submit a TensorFlow2 Single Node job. . . . 48

Submit a TensorFlow2 Multinode job. . . . . 48

Submit a Caffe job . . . . . . . . . . . . 49

Submit an Intel Caffe job . . . . . . . . . 49

Submit a MXNet Single Node job . . . . . . 49

Submit a MXNet MultiNode job . . . . . . . 49

Submit a Neon job . . . . . . . . . . . . 50

Submit a Chainer Single Node Job . . . . . 50

Submit a Chainer Multinode job. . . . . . . 50

Submit a PyTorch Single Node job . . . . . 50

Submit a scikit-learn job. . . . . . . . . . 50

Submit an HPC job . . . . . . . . . . . . . 51

Submit an MPI job . . . . . . . . . . . . 51

Submit an ANSYS job . . . . . . . . . . 51

Submit a COMSOL job . . . . . . . . . . 52

Submit a Charliecloud MPI job . . . . . . . 52

Submit a Singularity MPI job . . . . . . . . 53

Submit a general job . . . . . . . . . . . . . 53

Submit a common job . . . . . . . . . . . . 53

Submit a Charliecloud job . . . . . . . . . . . 53

Submit a Singularity job. . . . . . . . . . . . 54

Chapter 7. Manage the job

lifecycle . . . . . . . . . . . . . . . . 55

Cancel a job . . . . . . . . . . . . . . . . 55

Re-run a job . . . . . . . . . . . . . . . . 55

Copy a job . . . . . . . . . . . . . . . . . 55

Delete a job . . . . . . . . . . . . . . . . 55

Job tag . . . . . . . . . . . . . . . . . . 56

Add tags to a job . . . . . . . . . . . . 56

Clear tags for a job . . . . . . . . . . . . 56

Add the same tags to multiple jobs . . . . . 56

Clear tags for multiple jobs. . . . . . . . . 56

Filter jobs by tag. . . . . . . . . . . . . 57

Job comment . . . . . . . . . . . . . . . 57

View comments of a job. . . . . . . . . . 57

Edit comments of a job . . . . . . . . . . 57

GPU job monitoring . . . . . . . . . . . . . 58

VNC management . . . . . . . . . . . . . . 58

Chapter 8. Custom templates. . . . . 61

Create a custom template . . . . . . . . . . . 61

Edit a custom template . . . . . . . . . . . . 61

Copy a custom template . . . . . . . . . . . 61

Delete a custom template . . . . . . . . . . . 61

Publish a custom template . . . . . . . . . . 62

Chapter 9. Workflow. . . . . . . . . . 63

Create a workflow . . . . . . . . . . . . . . 63

Edit a workflow . . . . . . . . . . . . . . . 64

Copy a workflow . . . . . . . . . . . . . . 64

Run a workflow . . . . . . . . . . . . . . . 65

Rerun a workflow . . . . . . . . . . . . . . 65

Cancel a workflow. . . . . . . . . . . . . . 65

Delete a workflow . . . . . . . . . . . . . . 65

Chapter 10. Expert mode . . . . . . . 67

Submit a job using command lines . . . . . . . 67

Example of compiled job files . . . . . . . . . 67

Chapter 11. Reports . . . . . . . . . . 69

Expense reports . . . . . . . . . . . . . . 69

Chapter 12. How to run a

TensorFlow program on LiCO . . . . . 71

Prepare a workspace. . . . . . . . . . . . . 71

(Optional) Prepare a container image . . . . . . 72

Submit a job . . . . . . . . . . . . . . . . 73

Monitor the job and obtain output files . . . . . . 73

Chapter 13. How to run a Caffe

program on LiCO. . . . . . . . . . . . 75

Prepare a workspace. . . . . . . . . . . . . 75

(Optional) Prepare a container image . . . . . . 76

Submit a job . . . . . . . . . . . . . . . . 77

Monitor the job and obtain output files . . . . . . 77

ii LiCO 6.1.0 User Guide

Chapter 14. Additional information . . 79

Failed job submissions . . . . . . . . . . . . 79

Deletion of VNC sessions . . . . . . . . . . . 79

References for Slurm commands . . . . . . . . 79

Data sources for GPU monitoring . . . . . . . . 79

Transform an NGC image . . . . . . . . . . . 79

Transform a Google deep learning container . . . 81

Known issues . . . . . . . . . . . . . . . 82

Notices and trademarks . . . . . . . . . . . 83

iv LiCO 6.1.0 User Guide

Chapter 1. Overview

Introduction to LiCO

Lenovo Intelligent Computing Orchestration (LiCO) is an infrastructure management software for highperformance computing (HPC) and artificial intelligence (AI). It provides features like cluster management and monitoring, job scheduling and management, cluster user management, account management, and file system management.

With LiCO, users can centralize resource allocation in one supercomputing cluster and carry out HPC and AI jobs simultaneously. Users can perform operations by logging in to the management system interface with a browser, or by using command lines after logging in to a cluster login node with another Linux shell.

Features of LiCO

• Cluster resource monitoring: LiCO provides a dashboard to monitor the usage of cluster resources, including CPU, memory, storage, and network.

• Job template storage: LiCO provides multiple job templates, including HPC and AI job templates, which help users submit jobs from Web pages with convenience.

• Customized templates: Users can create their own job templates to support other HPC and AI applications.

• Job management and monitoring: Users can directly view and manage the status and results of jobs. Various common schedulers and a wide range of job types are supported (including AI jobs such as TensorFlow and Caffe).

• E2E training: Users can train image classification models without coding. LiCO also provides E2E support for training, such as dataset management, topology management, and pre-trained model management.

• User management and billing: LiCO manages both local and domain users through the same interface. It supports user top-ups and chargebacks, and offers the ability to set billing groups and fees.

• Customizations: A range of customizations are available, such as enterprise job template customization, report customization, and 3D server visualization.

• Container image management: LiCO provides system container images for every supported AI framework. Users can upload private container images and run AI or HPC jobs on them.

• Expert mode: LiCO provides command line tools to submit and manage jobs. Expert users can log in to the login node via another shell and execute commands.

Terminology

• Computer cluster: a general reference to a collection of server resources including management nodes, login nodes, and computing nodes

• Job: a series of commands in sequence intended to accomplish a particular task

• Job status: the status of a job in the scheduling system, such as waiting, in queue, on hold, running, suspended, or completed

• Node status: the status of a node, such as idle, busy, or off

• Job scheduling system: the distributed program in control of receiving, distributing, executing and registering jobs, also referred to as the operation scheduler or simply scheduler

• Management node: the server in a cluster running management programs such as job scheduling, cluster management and user billing

• Login node: the server in a cluster to which users can log in via Linux and conduct operations

• Computing node: the server in a cluster for executing jobs

• User group: a set of users for which the system has defined an access control policy, so that all users in the same user group have access to the same set of cluster resources

• Billing group: a group of cluster users that are to be billed under one account, also referred to as a billing account. A billing account can be made up of a single user or multiple users.

• NGC Image: NVIDIA Container Runtime for Docker, also known as nvidia-docker. It supports GPU-based applications that are portable across multiple machines. This is achieved through the use of Docker containers.

Prerequisite

• LiCO currently supports Slurm as the scheduler. The commands for Slurm in this Guide are not applicable to other schedulers.

• The paths for jobs involved in this Guide do not support spaces or other special characters.

Operating environment

Cluster server:

Lenovo ThinkSystem servers

Operating system

• CentOS / Red Hat Enterprise Linux (RHEL) 8.2

• SUSE Linux Enterprise server (SLES) 15 SP2

Client requirements:

• Hardware: CPU of 2.0 GHz or above, memory of 8 GB or above

• Browser: Chrome (V 62.0 or higher) or Firefox (V 56.0 or higher) recommended

• Display resolution: 1280 x 800 or above

LiCO 6.1.0 User Guide

Chapter 2. Basic operations

Note: Instructions in this chapter are primarily based on the management system interface. Command line

users can refer to “Submit a job using command lines” on page 67 and “References for Slurm commands” on page 79 for further instructions.

Log in

The client must have direct access to the cluster login node.

Step 1. Open a browser.

Step 2. Type the IP address for the cluster’s login node, such as https://10.220.112.21.

Step 3. Type the username and password.

Step 4. Click Log in.

Log out

Step 1. Place your cursor over in the upper-right corner of the page.

Step 2. Click

Step 3. Click Confirm.

You have logged out from LiCO.

Get current version information

Step 1. Place your cursor over in the upper-right corner of the home page.

Step 2. Click

Step 3. Click a menu item to get required information.

A page that shows the current version information is displayed.

Click User Agreement to get the user agreement.

Click Third party licenses to obtain the third party licenses.

Change the password

Step 1. Place your cursor over in the upper-right corner.

Step 2. Click

A dialog is displayed for you to change the password.

Step 3. Type the current password, and then type the new password twice.

Step 4. Click OK.

View cluster resources and job status

Select Home from the left navigation pane.

The cluster overview page is displayed.

Elements on the cluster overview page

• CPU: Shows the CPU usage in the cluster, with the number of occupied CPU cores and the total number of CPU cores in the cluster.

• Memory: Shows the memory usage in the cluster, with the used memory and the total memory in the cluster.

• Storage: Shows the storage usage in the cluster, with the used storage and total storage.

• Network: Shows the upload and download rates.

• Jobs: Shows the information about jobs that the current user has submitted to the Running or Waiting queue. Switch between Running and Waiting to view the names of current jobs and their running or waiting time. Click More to go to the task details page to view more detailed information about the execution of the jobs.

• Job Status: Shows the status of all jobs submitted by the current user. Jobs can be viewed by queue or time period. When sorted by queue, the names of all queues in a cluster are listed. Time period options include Last hour, Last 1 day, Last 7 days, and Last 30 days. In the figure, the user can choose historical jobs that are running or waiting, and view the number of jobs in the running status at a specific point in time.

• Recently Used Job Templates: Shows the job templates that were recently used by the user and can be leveraged.

Manage files

Create a folder

Step 1. Select Admin ➙ Manage Files from the left navigation pane.

The Manage Files page is displayed.

LiCO 6.1.0 User Guide

Step 2. Right-click in the blank area of the Manage Files page, and select New folder from the shortcut

menu. A new folder is created, with “untitled_folder” as its default name.

Rename a folder

Step 1. Select Admin ➙ Manage Files from the left navigation pane.

Step 2. Right-click a folder you want to rename, and then select Rename.

Step 3. Type the new folder name in the text box.

Preview an image

Step 1. Select Admin ➙ Manage Files from the left navigation pane.

Step 2. Right-click an image, and then select Preview from the shortcut menu.

Archive files

Step 1. Select Admin ➙ Manage Files from the left navigation pane.

Step 2. Right-click a file or folder, place the cursor over Create archive from the shortcut menu, and then

select the compression format.

Extract an archived file

Step 1. Select Admin ➙ Manage Files from the left navigation pane.

Step 2. Right-click an archived file (compressed in TAR, ZIP, or GZIP format) you want to extract, and then

select Extract files from archive from the shortcut menu.

Upload files

Step 1. Select Admin ➙ Manage Files from the left navigation pane.

Step 2. Double-click a folder to open it.

Step 3. Right-click in the blank area, and select Upload files from the shortcut menu.

Step 4. Select one or more local files to upload using either of the following methods:

• Drag and drop a file or files into the dotted dialog.

• Click Select files to upload at the bottom of the Upload files window.

Successfully uploaded files are available on the Manage Files page.

Copy and paste files

Step 1. Select Admin ➙ Manage Files from the left navigation pane.

Chapter 2. Basic operations 5

Step 2. Right-click a file or folder, and then select Copy.

Step 3. Right-click in the blank area, and then select Paste from the shortcut menu.

The copied file or folder has been pasted in the same folder.

Notes:

• Shortcut keys Ctrl+C and Ctrl+V are also applicable as copy and paste operations.

• You can copy a file or folder from one folder to a different folder.

• If the destination folder has a file or folder with the same name as the one being copied, a popup dialog will be displayed. You may then determine whether to overwrite the existing file or folder with the new one.

Move files

Step 1. Select Admin ➙ Manage Files from the left navigation pane.

Step 2. Right-click a file or folder, and then select Cut.

Step 3. Select a destination folder, right-click in the blank area, and then select Paste from the shortcut

menu. The selected file or folder has been moved to the destination folder.

Note: If the destination folder has a file or folder with the same name as the one being copied, a pop-up dialog will be displayed. You may then determine whether to overwrite the existing file or folder with the new one.

Duplicate files

Step 1. Select Admin ➙ Manage Files from the left navigation pane.

Step 2. Right-click a file or folder, and then select Duplicate.

Container images

LiCO runs all AI jobs within a container. The system supports the Singularity container platform, with different AI job templates running on different container images.

In addition to providing a certain number of basic container images, LiCO also allows users to upload customized container images. LiCO 5.2.0 and later versions support running jobs on NGC images. To transform an NGC image to a Singularity container image, refer to “Transform an NGC image” on page 79.

View container images

Select Admin ➙ Container Images from the left navigation pane.

The Container Images page is displayed.

LiCO 6.1.0 User Guide

The parameters on the Container Images page are described as follows:

• Name: self-defined container image name

• Framework: frame to which the container image belongs

• Type: type of the container image, which can be private or system. The value private means that the container image is created by yourself, and the value system means that the container image is created by the system administrator.

• Version: self-defined container image version

• Tag: self-defined container image tag

• Action: action on the container image. For system container images, available actions are Browse and

Download; for private container images, available actions are Edit, Browse, Delete, Download, and Reupload.

Build a container image

Users can build a container image from any of the following sources:

• “Build a container image from a Docker registry” on page 7

• “Build a container image from a Singularity library” on page 8

• “Build a container image from a Singularity definition file” on page 8

• “Build a container image from a system or private image” on page 9

Build a container image from a Docker registry

Step 1. Select Admin ➙ Container Images from the left navigation pane.

Step 2. Click Build above the container image list.

Step 3. Fill in the required information.

• Name: self-defined container image name.

• Workspace: working directory of the container image.

• Source: Select Docker Registry, indicating that the container image is built from Docker Hub.

• Image Path: path of the container image in Docker Hub.

• Authentication: Enter the username and password if the container image repository is a private repository.

• Advanced: This allows you to install Python libraries.

Chapter 2. Basic operations 7

• Use HTTPS: This allows you to enable or disable HTTPS.

Note: To refill in the parameters, click Reset.

Step 4. Click Start Build.

Note: To cancel the container image that is being built, click Cancel Build.

After the container image is built, you can import it by clicking Import in the lower right corner of the build log box. Refer to “Import a container image” on page 9.

Build a container image from a Singularity library

Step 1. Select Admin ➙ Container Images from the left navigation pane.

Step 2. Click Build above the container image list.

Step 3. Fill in the required information.

• Name: self-defined container image name.

• Workspace: working directory of the container image.

• Source: Select Singularity Library, indicating that the container image is built from the Container Library.

• Image Path: path of the container image in the Container Library.

• Advanced: This allows you to install Python libraries.

Note: To refill in the parameters, click Reset.

Step 4. Click Start Build.

Note: To cancel the container image that is being built, click Cancel Build.

After the container image is built, you can import it by clicking Import in the lower right corner of the build log box. Refer to “Import a container image” on page 9.

Build a container image from a Singularity definition file

Step 1. Select Admin ➙ Container Images from the left navigation pane.

Step 2. Click Build above the container image list.

Step 3. Fill in the required information.

• Name: self-defined container image name.

• Workspace: working directory of the container image.

• Source: Select Singularity Definition File, indicating that the container image is built from a custom .def file.

• Definition File: path of the custom .def file.

• Authentication: This allows you to enable authentication, depending on the container image repository type in the .def file.

• Use HTTPS: This allows you to enable or disable HTTPS.

Note: To refill in the parameters, click Reset.

Step 4. Click Start Build.

Note: To cancel the container image that is being built, click Cancel Build.

LiCO 6.1.0 User Guide

After the container image is built, you can import it by clicking Import in the lower right corner of the build log box. Refer to “Import a container image” on page 9.

Build a container image from a system or private image

Step 1. Select Admin ➙ Container Images from the left navigation pane.

Step 2. Click Build above the container image list.

Step 3. Fill in the required information.

• Name: self-defined container image name.

• Workspace: working directory of the container image.

• Source: Select System Image or Private Image, indicating that the container image is built from a system or private image.

• Image Path: path of the container image.

• Advanced: This allows you to install Python libraries.

Note: To refill in the parameters, click Reset.

Step 4. Click Start Build.

Note: To cancel the container image that is being built, click Cancel Build.

After the container image is built, you can import it by clicking Import in the lower right corner of the build log box. Refer to “Import a container image” on page 9.

Import a container image

Step 1. Select Admin ➙ Container Images from the left navigation pane.

Step 2. Click Import above the container image list.

Step 3. Fill in the required information.

• Name: self-defined container image name.

• Framework: frame to which the container image belongs

• Source File: container image file selected, which must be a singularity container image file. Otherwise, the container image will fail to be created.

• Save As: name of the container image file. Ensure that there is no container image file with the same name in the storage path.

• Version: self-defined container image version.

• Tags: self-defined container image tag.

• Description: any description about the imported container image.

Step 4. Click OK.

View a container image

Step 1. Select Admin ➙ Container Images from the left navigation pane.

Step 2. Find the container image for which you want to view its information, and then select Action ➙

Browse.

The information is described as follows:

• Image Path: location of the container image

Chapter 2. Basic operations 9

• Description: description of the container image. If no description has been filled in, this information is not displayed.

Download a container image

Step 1. Select Admin ➙ Container Images from the left navigation pane.

Step 2. Find the container image you want to download, and then select Action ➙ Download.

Step 3. Click Browse.

Step 4. Select a folder to save the container image.

Ensure that there is no container image file with the same name in the storage path.

Step 5. Click OK.

Edit a container image

Step 1. Select Admin ➙ Container Images from the left navigation pane.

Step 2. Find the private container image you want to edit, and then select Action ➙ Edit.

Step 3. Edit the container image information as required.

Step 4. Click OK.

Delete a container image

Step 1. Select Admin ➙ Container Images from the left navigation pane.

Step 2. Find the private container image you want to delete, and then select Action ➙ Delete.

Step 3. Click OK.

Reupload a container image

Step 1. Select Admin ➙ Container Images from the left navigation pane.

Step 2. Find the private container image you want to reupload, and then select Action ➙ Reupload.

Step 3. Click Browse and then select a new container image file you want to upload.

Step 4. Click OK.

Bills

LiCO 5.5.0 and later versions can bill users for jobs and storage instances. Users can download their daily and monthly bills generated automatically on the system.

Select Admin ➙ Bills from the left navigation pane.

The page for download bills is displayed.

LiCO 6.1.0 User Guide

The following types of bills are available:

• Daily Bill

Daily Bill enables users to check the billing information for every job and storage instance on a specific day.

• Monthly Bill

Monthly Bill enables users to check the billing information for every day in a specific month.

API key

LiCO 5.3.0 and later versions provide some open application programming interfaces (APIs). To learn how to use these APIs, refer to the LiCO 6.1.0 OpenAPI Guide. To use these APIs, obtain an API key first.

Create a permanent API key

Step 1. Select Admin ➙ API Key from the left navigation pane.

The API Key page is displayed.

Chapter 2. Basic operations 11

Step 2. Click Create API Key.

Step 3. Click Save.

Create a temporary API key

Step 1. Select Admin ➙ API Key from the left navigation pane.

Step 2. Click Create API Key.

Step 3. Clear the Unlimited check box, and click the box below with the prompt of Please choose a date.

Step 4. Select a date and click Save.

View an API key

Select Admin ➙ API Key from the left navigation pane.

The API Key page is displayed.

Delete an API key

Step 1. Select Admin ➙ API Key from the left navigation pane.

Step 2. Click Delete.

Step 3. Click OK.

Change a permanent API key

To change a permanent API key to a temporary one, complete the following steps:

Step 1. Select Admin ➙ API Key from the left navigation pane.

Step 2. Click Change.

Step 3. Click the box with a calendar icon, and then select a date on the displayed date selector.

Step 4. Click OK.

Change a temporary key

You can change a temporary API key to a permanent one or change its expiration time.

LiCO 6.1.0 User Guide

• “Change a temporary API key to a permanent one” on page 13

• “Change the expiration time of a temporary API key” on page 13

Change a temporary API key to a permanent one

Step 1. Select Admin ➙ API Key from the left navigation pane.

Step 2. Click Change.

Step 3. Click OK.

Change the expiration time of a temporary API key

Step 1. Select Admin ➙ API Key from the left navigation pane.

Step 2. Click Change.

Step 3. Clear the Unlimited check box, click the box with a calendar icon, and then select a date on the

displayed date selector.

Step 4. Click OK.

Runtime

LiCO 5.3.0 and later versions support the runtime feature. With this feature, an isolated runtime environment can be provided for each job. Users can customize their modules and environment variables before using the runtime feature. In addition, this feature is reusable and maintainable for users’ convenience.

View runtimes

Select Admin ➙ Runtime from the left navigation pane.

The Runtime page is displayed.

Create a runtime

Step 1. Select Admin ➙ Runtime from the left navigation pane.

Step 2. Click Create.

Step 3. Fill in the required information.

• Name: self-defined runtime name, which is mandatory.

• Modules: OpenHPC computing modules

• Environments: environments that need to be loaded to run a job

Step 4. Click OK.

Edit a runtime

Step 1. Select Admin ➙ Runtime from the left navigation pane.

Step 2. Find the runtime you want to edit, and then select Action ➙ Edit.

Chapter 2. Basic operations 13

Step 3. Edit the runtime name if needed.

Step 4. Click Add Modules.

Step 5. Select the required modules and click Confirm.

Step 6. Adjust the module loading order as required. To change the order of a module or delete a module,

click

The operations are described as follows:

• Move Up: Move up the selected module by one place.

• Move Down: Move down the selected module by one place.

• Delete: Delete the selected module.

Note: To check whether the selected modules are reasonable, click Verify Modules in the Edit Runtime dialog.

Step 7. Click Add Environments.

The Create Environment dialog is displayed. Fill in the required information and click Confirm.

The parameters are described as follows:

• Variable: variable name, which is mandatory.

• Value: variable value

Step 8. Click OK.

Duplicate a runtime

Step 1. Select Admin ➙ Runtime from the left navigation pane.

Step 2. Find the runtime you want to duplicate, and then select Action ➙ Duplicate.

Step 3. Modify the runtime name and click OK.

Verify a runtime

Step 1. Select Admin ➙ Runtime from the left navigation pane.

Step 2. Find the runtime you want to verify, and then select Action ➙ Verify.

Delete a runtime

Step 1. Select Admin ➙ Runtime from the left navigation pane.

Step 2. Find the runtime you want to delete, and then select Action ➙ Delete.

Step 3. Click OK.

Git publishing

On the LiCO user portal, users can publish any file directory in their workspace to a remote repository on GitLab or GitHub. LiCO 5.3.0 and later versions support to use Git publishing through APIs. To learn how to use the APIs, refer to the LiCO 6.1.0 OpenAPI Guide.

Select Admin ➙ Git Publishing from the left navigation pane.

The Git Publishing page is displayed.

LiCO 6.1.0 User Guide

This page lists all the publishing tasks and shows the publishing history of the current user. Details of each publishing record are provided, such as the Local Path, Repository, Branch Name, Target Folder,

Publishing Status, and Create Time.

Create a publishing task

Step 1. Select Admin ➙ Git Publishing from the left navigation pane.

Step 2. Click Create.

Step 3. Fill in the required information.

• Local Path: Click Browse and select the folder that stores the files to be published.

Note: Git Large File Storage (LFS) replaces large files such as audio samples, videos, datasets, and graphics with text pointers inside Git, while storing the file contents on a remote server like GitHub.com or GitHub Enterprise. When you transfer files with a size larger than 20 MB through Git publishing, LiCO uses the LFS protocol. To increase the upper limit for the file size, contact the administrator.

• Repository: Enter the name of a remote repository on GitLab or GitHub.

• Authentication: Adapt to the transport protocol.

– SSH authentication is used if the repository name starts with git@.

– HTTPS authentication is used if the repository name starts with http or https.

• Branch Name: Enter a remote branch name. This parameter is optional and its default value is master.

• Target Folder: Specify a remote target folder to save the published files. This parameter is optional and can be left blank.

Step 4. Click Publish.

• The Publishing Status would be shown as Publishing.

• To view the logs of the task, select Action ➙ View Logs.

• When the publishing task is completed, the Publishing Status would be shown as Published.

Chapter 2. Basic operations 15

16 LiCO 6.1.0 User Guide

Chapter 3. Lenovo-accelerated AI

Lenovo-accelerated AI is based on the LeTrain project. LeTrain is a distributed training engine based on TensorFlow and optimized by Lenovo. Its goal is to make distributed training as easy as single GPU training and achieve linear scaling performance. Many popular models have been built into LeTrain. You can use them directly without coding. LiCO 5.3.0 and later versions support to submit Lenovo-accelerated AI jobs through APIs. To learn how to use the APIs, refer to the LiCO 6.1.0 OpenAPI Guide.

Select Submit Job from the left navigation pane, and then select the Lenovo Accelerated AI tab. The Lenovo Accelerated AI tab page is displayed. All the job submission tasks are performed on this page.

Job submission – Train

Submit an Image Classification – Train job

Step 1. In the Image Classification area, click Train.

Step 2. Fill in the required information.

• Job Name (required): job name. For LiCO 6.1.0 and later versions, the initial job name will be automatically created in the format of "template name + time".

• Workspace (required): working directory of the job. Job output files will be saved in this directory.

• Topology (required): neural network model.

The following network models are currently supported: alexnet_v2, cifarnet, inception_v1, inception_v2, inception_v3, inception_v4, inception_resnet_v2, lenet, resnet_v1_50, resnet_v1_ 101, resnet_v1_152, resnet_v1_200, resnet_v2_50, resnet_v2_101, resnet_v2_152, resnet_v2_ 200, vgg_a, vgg_16, vgg_19, mobilenet_v1, mobilenet_v1_025, mobilenet_v1_050, mobilenet_ v1_075, nasnet_cifar, nasnet_mobile, nasnet_large

• Dataset Directory (required): training dataset path. The dataset example can be downloaded from the link next to this field.

• Dataset Example: link for downloading a dataset example.

• Train Directory (required): directory that includes the output directory for TensorFlow, summary information, and checkpoints.

• Batch Size (required): size of each batch of data imported for training or validation.

• Learning Rate (required): The learning rate can be changed due to different learning rate policies. You can change the learning rate policy.

• Epoch (required): how many times the data set has been traversed.

– Log Cycle: log output frequency, that is, after how many times of traversal logs will be output

for once.

– Snapshot Cycle: snapshot output frequency, that is, after how many times of traversal

snapshots will be output for once.

• Queue (required): name of the queue on which the job will run. You can only select a queue which you have permission to access. The queue details include:

– Queue status: UP means available.

– Available Nodes: When you put the cursor over it, the number of total nodes and that of free

nodes will be displayed.

– Available Cores: When you put the cursor over it, the number of total CPU cores and that of

free CPU cores will be displayed.

– Available GPU: If the queue has GPUs, when you put the cursor over it, the number of total

GPUs and that of free GPUs will be displayed.

– Available Memory: UNLIMITED means no memory limits for jobs.

– Wall Time: Wall Time means the maximum execution time of jobs on the queue, while

UNLIMITED means no limits.

• Nodes (required): number of nodes for training.

• Exclusive (required): indicates whether the training can use all the CPU resources. You can change the CPU core setting in non-exclusive mode.

• GPU Per Node (required): number of GPUs per node for training.

• Wall Time: maximum execution time of the job. It will be stopped after running for more than the configured Wall Time.

• Optimizer (required): Different optimizers have different settings.

• Weight Decay (required): weight decay ratio.

• Notify Job Completion: determines whether to send a notification when the job is completed and the notification method.

If Email is selected, an email will be sent by LiCO once a job is completed.

Step 3. Click Submit.

Submit an Object Detection – Train job

Step 1. In the Object Detection area, click Train.

Step 2. Fill in the required information.

• Topology (required): Neural network models currently supported include Faster R-CNN and YOLO v3.

Note: If the YOLO v3 network model is used, you are advised to set the learning rate to 1e-5 instead of the default value.

• Pre-trained Model: A pre-trained model is used to help users improve training performance.

• Use Random Seed: indicates whether to shuffle the training data set with random seeds. The default value is no.

For the description of other parameters, see “Submit an Image Classification – Train job” on page

17.

Step 3. Click Submit.

Submit an Instance Segmentation – Train job

Step 1. In the Instance Segmentation area, click Train.

Step 2. Fill in the required information.

Topology (required): The mask-rcnn neural network model is currently supported.

For the description of other parameters, see “Submit an Image Classification – Train job” on page

17.

Step 3. Click Submit.

LiCO 6.1.0 User Guide

Submit a Medical Image Segmentation – Train job

Step 1. In the Medical Image Segmentation area, click Train.

Step 2. Fill in the required information.

Topology (required): The unet neural network model is currently supported.

For the description of other parameters, see “Submit an Image Classification – Train job” on page

17.

Note: Medical image segmentation training is available only for datasets of color images.

Step 3. Click Submit.

Submit a Seq2seq – Train job

Step 1. In the Seq2seq area, click Train.

Step 2. Fill in the required information.

• Layer Number (required): indicates how many layers you want

• Size of Each Layer (required): the size of each layer

• Vocabulary Size (required): total vocabulary size you want to translate

For the description of other parameters, see “Submit an Image Classification – Train job” on page

17.

Step 3. Click Submit.

Submit a Memory Network – Train job

Step 1. In the Memory network area, click Train.

Step 2. Fill in the required information.

• Feature Size (required): size of feature in the memory network

• Number of hops (required): number of hops in the memory network

• Embedding Size (required): size of embedding matrices

• Memory Size (required): maximum size of memory

For the description of other parameters, see “Submit an Image Classification – Train job” on page

17.

Step 3. Click Submit.

Submit an Image GAN – Train job

Step 1. In the Image GAN area, click Train.

Step 2. Fill in the required information.

• Input Image Height (required): height of each input image

• Input Image Width (required): width of each input image

• Output Image Height (required): height of each output image

• Output Image Width (required): width of each output image

• Samples Output Directory: the directory which stores the sample pictures for training

Chapter 3. Lenovo-accelerated AI 19

For the description of other parameters, see “Submit an Image Classification – Train job” on page

17.

Step 3. Click Submit.

Job submission – Predict

Submit an Image Classification – Predict job

Step 1. In the Image Classification area, click Predict.

Step 2. Fill in the required information.

• Job Name (required): job name. For LiCO 6.1.0 and later versions, the initial job name will be automatically created in the format of "template name + time".

• Workspace (required): working directory of the job. Job output files will be saved in this directory.

• Topology (required): the prediction model. It should be set to the same neural network model as the training job.

• Input Directory (required): the directory contains the images that need to be predicted.

• Train Directory (required): the directory contains training checkpoints.

• Output Directory (required): the output directory. The prediction result will be saved in this directory.

• Queue (required): name of the queue on which the job will run. You can only select a queue which you have permission to access. The queue details include:

– Queue status: UP means available.

– Available Nodes: When you put the cursor over it, the number of total nodes and that of free

nodes will be displayed.

– Available Cores: When you put the cursor over it, the number of total CPU cores and that of

free CPU cores will be displayed.

– Available GPU: If the queue has GPUs, when you put the cursor over it, the number of total

GPUs and that of free GPUs will be displayed.

– Available Memory: UNLIMITED means no memory limits for jobs.

– Wall Time: Wall Time means the maximum execution time of jobs on the queue, while

UNLIMITED means no limits.

• Nodes (required): number of nodes for training.

• Exclusive (required): indicates whether the training can use all the CPU resources. You can change the CPU core setting in non-exclusive mode.

• GPU Per Node (required): number of GPUs per node for training.

• Wall Time: maximum execution time of the job. It will be stopped after running for more than the configured Wall Time.

• Notify Job Completion: determines whether to send a notification when the job is completed and the notification method.

Step 3. Click Submit.

Submit an Object Detection – Predict job

Step 1. In the Object Detection area, click Predict.

Step 2. Fill in the required information.

LiCO 6.1.0 User Guide

For the parameter description, see “Submit an Image Classification – Predict job” on page 20.

Step 3. Click Submit.

Submit an Instance Segmentation – Predict job

Step 1. In the Instance Segmentation area, click Predict.

Step 2. Fill in the required information.

For the parameter description, see “Submit an Image Classification – Predict job” on page 20.

Step 3. Click Submit.

Submit a Medical Image Segmentation – Predict job

Step 1. In the Medical Image Segmentation area, click Predict.

Step 2. Fill in the required information.

For the parameter description, see “Submit an Image Classification – Predict job” on page 20.

Note: Medical image segmentation prediction is available only for color images.

Step 3. Click Submit.

Submit a Seq2seq – Predict job

Step 1. In the Seq2seq area, click Predict.

Step 2. Fill in the required information.

• Input File (required): the TXT file that needs to be translated. Its format must be the same as the provided TXT file in the Train job you submitted.

• Layer Number (required): specifies how many layers you want.

• Size of Each Layer (required): specifies the size of each layer.

• Vocabulary Size (required): total vocabulary size you want to translate.

For the description of other parameters, see “Submit an Image Classification – Predict job” on page

20.

Note: Apart from Input File, the parameter values must be exactly the same as those in the Train job.

Step 3. Click Submit.

Submit a Memory Network – Predict job

Step 1. In the Memory network area, click Predict.

Step 2. Fill in the required information.

• Input File (required): the TXT file that needs Q&A. Its format must be the same as the provided TXT file in the Train job you submitted. In addition, the question must end with two separate tabs.

• Feature Size (required): size of feature in the memory network.

• Number of hops (required): number of hops in the memory network.

• Embedding Size (required): size of embedding matrices.

• Memory Size (required): maximum size of memory.

Chapter 3. Lenovo-accelerated AI 21

+ 63 hidden pages

Lenovo LiCO 6.1.0 User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

Chapter 1. Overview

Introduction to LiCO

Features of LiCO

Terminology

Prerequisite

Operating environment

Chapter 2. Basic operations

Log in

Log out

Get current version information

Change the password

View cluster resources and job status

Elements on the cluster overview page

Manage files

Create a folder

Rename a folder

Preview an image

Archive files

Extract an archived file

Upload files

Copy and paste files

Move files

Duplicate files

Container images

View container images

Build a container image

Import a container image

View a container image

Download a container image

Edit a container image

Delete a container image

Reupload a container image

Bills

API key

Create a permanent API key

Create a temporary API key

View an API key

Delete an API key

Change a permanent API key

Change a temporary key

Runtime

View runtimes

Create a runtime

Edit a runtime

Duplicate a runtime

Verify a runtime

Delete a runtime

Git publishing

Create a publishing task

Chapter 3. Lenovo-accelerated AI

Job submission – Train

Submit an Image Classification – Train job

Submit an Object Detection – Train job

Submit an Instance Segmentation – Train job

Submit a Medical Image Segmentation – Train job

Submit a Seq2seq – Train job

Submit a Memory Network – Train job

Submit an Image GAN – Train job

Job submission – Predict

Submit an Image Classification – Predict job

Submit an Object Detection – Predict job

Submit an Instance Segmentation – Predict job

Submit a Medical Image Segmentation – Predict job

Submit a Seq2seq – Predict job

Submit a Memory Network – Predict job