HP Platform LSF Command Reference Guide

Download

Platform™ LSF™ Command Reference

Version 7 Update 3

Release date: May 2008

Last modified: May 16, 2008

Comments to: doc@platform.com

Support: support@platform.com

Although the information in this document has been carefully reviewed, Platform Computing Inc. (“Platform”) does not warrant it to be free of errors or omissions. Platform reserves the right to make corrections, updates, revisions or changes to the information in this document.

UNLESS OTHERWISE EXPRESSLY STATED BY PLATFORM, THE PROGRAM DESCRIBED IN THIS DOCUMENT IS PROVIDED “AS IS” AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL PLATFORM COMPUTING BE LIABLE TO ANYONE FOR SPECIAL, COLLATERAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING WITHOUT LIMITATION ANY LOST PROFITS, DATA, OR SAVINGS, ARISING OUT OF THE USE OF OR INABILITY TO USE THIS PROGRAM.

We’d like to hear from you You can help us make this document better by telling us what you think of the content, organization,

and usefulness of the information. If you find an error, or just want to make a suggestion for improving this document, please address your comments to doc@platform.com.

Your comments should pertain only to Platform documentation. For product support, contact support@platform.com.

Document redistribution and translation

Internal redistribution You may only redistribute this document internally within your organization (for example, on an

Trademarks LSF is a registered trademark of Platform Computing Inc. in the United States and in other

Third-party license agreements

Third-party copyright notices

This document is protected by copyright and you may not redistribute or translate it into another language, in part or in whole.

intranet) provided that you continue to check the Platform Web site for updates and update your version of the documentation. You may not make it available to your organization over the Internet.

jurisdictions.

ACCELERATING INTELLIGENCE, PLATFORM COMPUTING, PLATFORM SYMPHONY, PLATFORM JOBSCHEDULER, PLATFORM ENTERPRISE GRID ORCHESTRATOR, PLATFORM EGO, and the PLATFORM and PLATFORM LSF logos are trademarks of Platform Computing Inc. in the United States and in other jurisdictions.

UNIX is a registered trademark of The Open Group in the United States and in other jurisdictions.

Microsoft is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries.

Windows is a registered trademark of Microsoft Corporation in the United States and other countries.

Other products or services mentioned in this document are identified by the trademarks or service marks of their respective owners.

http://www.platform.com/Company/third.part.license.htm

http://www.platform.com/Company/Third.Party.Copyright.htm

Contents

bacct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

bapp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

badmin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

bbot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

bchkpnt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

bclusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

bgadd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

bgdel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

bhist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

bhosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

bhpart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

bgmod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

bjgroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

bjobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

bkill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

bladmin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

blaunch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

blcollect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

blhosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

blimits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

blinfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

blkill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

blparams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

blplugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

blstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

bltasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

blusers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

bmgroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

bmig . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

bmod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

bparams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

bpeek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

bpost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

bqueues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

bread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

brequeue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

bresources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

Platform LSF Command Reference 3

brestart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

bresume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

brlainfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

brsvadd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

brsvdel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

brsvmod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

brsvs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

brun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

bsla . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

bslots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

bstatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

bstop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

bsub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

bswitch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

btop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

bugroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

busers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

ch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

lsacct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

lsacctmrg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

lsadmin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

lsclusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

lseligible . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

lsfinstall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

lsfmon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

lsfrestart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240

lsfshutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

lsfstartup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

lsgrun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

lshosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246

lsid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

lsinfo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

lsload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

lsloadadj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

lslogin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259

lsltasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

lsmon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

lspasswd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267

lsplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268

lsrcp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

lsrtasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

lsrun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

lstcsh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

pam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282

patchinstall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

pmcadmin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

4 Platform LSF Command Reference

pmcremoverc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

pmcsetrc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

perfadmin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

perfremoverc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292

perfsetrc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

pversions (Windows) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294

pversions (UNIX) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

ssacct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

ssched . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302

taskman . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

tspeek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308

tssub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

wgpasswd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311

wguser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313

Platform LSF Command Reference 5

6 Platform LSF Command Reference

bacct

Synopsis

Displays accounting statistics about finished jobs.

bacct [-b | -l] [-d] [-e] [-w] [-x] [-app application_profile_name]

[-C time0,time1] [-D time0,time1] [-f logfile_name] [-Lp ls_project_name ...] [-m host_name ...|-M host_list_file] [-N host_name | -N host_model | -N cpu_factor] [-P project_name ...] [-q queue_name ...] [-sla service_class_name ...] [-S time0,time1] [-u user_name ... | -u all]

bacct [-b | -l] [-f logfile_name] [job_ID ...]

bacct [-U reservation_ID ... | -U all [-u user_name ... | -u all]

bacct [-h | -V

Description

Displays a summary of accounting statistics for all finished jobs (with a DONE or EXIT status) submitted by the user who invoked the command, on all hosts, projects, and queues in the LSF system. in the current LSF accounting log file:

LSB_SHAREDIR/cluster_name/logdir/lsb.acct.

CPU time is not normalized.

All times are in seconds.

Statistics not reported by can be generated by directly using

Throughput calculation

The throughput (T) of the LSF system, certain hosts, or certain queues is calculated by the formula:

T = N/(ET-BT)

where:

◆ N is the total number of jobs for which accounting statistics are reported

◆ BT is the Start time—when the first job was logged

◆ ET is the End time—when the last job was logged

]

bacct displays statistics for all jobs logged

bacct but of interest to individual system administrators

awk or perl to process the lsb.acct file.

You can use the option

-C time0,time1 to specify the Start time as time0 and the

End time as time1. In this way, you can examine throughput during a specific time period.

Jobs involved in the throughput calculation are only those being logged (that is, with a DONE or EXIT status). Jobs that are running, suspended, or that have never been dispatched after submission are not considered, because they are still in the LSF system and not logged in

lsb.acct.

Platform LSF Command Reference 7

Options

The total throughput of the LSF system can be calculated by specifying -u all without any of the can be calculated by specifying The throughput of certain queues can be calculated by specifying the

-m, -S, -D or job_ID options.

bacct does not show local pending batch jobs killed using bkill -b. bacct shows

MultiCluster jobs and local running jobs even if they are killed using

-b Brief format.

-d Displays accounting statistics for successfully completed jobs (with a DONE

-m, -q, -S, -D or job_ID options. The throughput of certain hosts

-u all without the -q, -S, -D or job_ID options.

-u all without

bkill -b.

status).

-e Displays accounting statistics for exited jobs (with an EXIT status).

-l Long format with additional detail.

-w Wide field format.

-x Displays jobs that have triggered a job exception (overrun, underrun, idle). Use

with the

-l option to show the exception status for individual jobs.

-app application_profile_name

Displays accounting information about jobs submitted to the specified application profile. You must specify an existing application profile configured in

lsb.applications.

-C time0,time1 Displays accounting statistics for jobs that completed or exited during the specified

time interval. Reads

lsb.acct and all archived log files (lsb.acct.n) unless -f is

also used.

The time format is the same as in

-D time0,time1 Displays accounting statistics for jobs dispatched during the specified time interval.

Reads

lsb.acct and all archived log files (lsb.acct.n) unless -f is also used.

The time format is the same as in

-f logfile_name Searches the specified job log file for accounting statistics. Specify either an absolute

bhist(1).

or relative path.

Useful for offline analysis.

The specified file path can contain up to 4094 characters for UNIX, or up to 255 characters for Windows.

-Lp ls_project_name ... Displays accounting statistics for jobs belonging to the specified License Scheduler

projects. If a list of projects is specified, project names must be separated by spaces and enclosed in quotation marks (") or (’).

-M host_list_file Displays accounting statistics for jobs dispatched to the hosts listed in a file

(host_list_file) containing a list of hosts. The host list file has the following format:

◆ Multiple lines are supported

◆ Each line includes a list of hosts separated by spaces

◆ The length of each line must be less than 512 characters

8 Platform LSF Command Reference

-m host_name ...

Displays accounting statistics for jobs dispatched to the specified hosts.

If a list of hosts is specified, host names must be separated by spaces and enclosed in quotation marks (") or (’).

-N host_name | -N host_model | -N cpu_factor

Normalizes CPU time by the CPU factor of the specified host or host model, or by the specified CPU factor.

If you use

-P project_name ... Displays accounting statistics for jobs belonging to the specified projects. If a list of

bacct offline by indicating a job log file, you must specify a CPU factor.

projects is specified, project names must be separated by spaces and enclosed in quotation marks (") or (’).

-q queue_name ... Displays accounting statistics for jobs submitted to the specified queues.

If a list of queues is specified, queue names must be separated by spaces and enclosed in quotation marks (") or (’).

-S time0,time1 Displays accounting statistics for jobs submitted during the specified time interval.

Reads

lsb.acct and all archived log files (lsb.acct.n) unless -f is also used.

The time format is the same as in

-sla service_class_name

bhist(1).

Displays accounting statistics for jobs that ran under the specified service class.

If a default system service class is configured with ENABLE_DEFAULT_EGO_SLA in

lsb.params but not explicitly configured in lsb.applications,

bacct -sla service_class_name displays accounting information for the specified

default service class.

-U reservation_id ... | -U all

Displays accounting statistics for the specified advance reservation IDs, or for all reservation IDs if the keyword

all is specified.

A list of reservation IDs must be separated by spaces and enclosed in quotation marks (") or (’).

The

-U option also displays historical information about reservation modifications.

When combined with the

-U option, -u is interpreted as the user name of the

reservation creator. For example:

bacct -U all -u user2

shows all the advance reservations created by user user2.

Without the

-u option, bacct -U shows all advance reservation information about

jobs submitted by the user.

In a MultiCluster environment, advance reservation information is only logged in the execution cluster, so

bacct displays advance reservation information for local

reservations only. You cannot see information about remote reservations. You cannot specify a remote reservation ID, and the keyword

all only displays

information about reservations in the local cluster.

-u user_name ...|-u all Displays accounting statistics for jobs submitted by the specified users, or by all

users if the keyword

all is specified.

Platform LSF Command Reference 9

Default output format (SUMMARY)

If a list of users is specified, user names must be separated by spaces and enclosed in quotation marks (") or (’). You can specify both user names and user IDs in the list of users.

job_ID ... Displays accounting statistics for jobs with the specified job IDs.

If the reserved job ID 0 is used, it is ignored.

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

Default output format (SUMMARY)

Statistics on jobs. The following fields are displayed:

◆ Total number of done jobs

◆ Total number of exited jobs

◆ Total CPU time consumed

◆ Average CPU time consumed

◆ Maximum CPU time of a job

◆ Minimum CPU time of a job

◆ Total wait time in queues

◆ Average wait time in queue

◆ Maximum wait time in queue

◆ Minimum wait time in queue

◆ Average turnaround time (seconds/job)

◆ Maximum turnaround time

◆ Minimum turnaround time

◆ Average hog factor of a job (cpu time/turnaround time)

◆ Maximum hog factor of a job

◆ Minimum hog factor of a job

◆ Tota l t hro u ghp ut

◆ Beginning time: the completion or exit time of the first job selected

◆ Ending time: the completion or exit time of the last job selected

The total, average, minimum, and maximum statistics are on all specified jobs.

The wait time is the elapsed time from job submission to job dispatch.

The turnaround time is the elapsed time from job submission to job completion.

The hog factor is the amount of CPU time consumed by a job divided by its turnaround time.

The throughput is the number of completed jobs divided by the time period to finish these jobs (jobs/hour).

Brief format (-b)

In addition to the default format SUMMARY, displays the following fields:

10 Platform LSF Command Reference

U/UID

QUEUE Queue to which the job was submitted.

SUBMIT_TIME Time when the job was submitted.

CPU_T CPU time consumed by the job.

WAIT Wait t ime o f the j ob.

TURNAROUND Turnaround time of the job.

FROM Host from which the job was submitted.

EXEC_ON Host or hosts to which the job was dispatched to run.

JOB_NAME The job name assigned by the user, or the command string assigned by default at

Long format (-l)

Name of the user who submitted the job. If LSF fails to get the user name by

getpwuid(3), the user ID is displayed.

job submission with

bsub. If the job name is too long to fit in this field, then only

the latter part of the job name is displayed.

The displayed job name or job command can contain up to 4094 characters for UNIX, or up to 255 characters for Windows.

In addition to the fields displayed by default in SUMMARY and by -b, displays the following fields:

JOBID Identifier that LSF assigned to the job.

PROJECT_NAME Project name assigned to the job.

STATUS Status that indicates the job was either successfully completed (DONE) or exited

(EXIT).

DISPAT_TIME Time when the job was dispatched to run on the execution hosts.

COMPL_TIME Time when the job exited or completed.

HOG_FACTOR Average hog factor, equal to "CPU time" / "turnaround time".

MEM Maximum resident memory usage of all processes in a job. By default, memory

usage is shown in MB. Use LSF_UNIT_FOR_LIMITS in

lsf.conf to specify a

larger unit for display (MB, GB, TB, PB, or EB).

CWD Current working directory of the job.

SWAP Maximum virtual memory usage of all processes in a job. By default, swap space is

shown in MB. Use LSF_UNIT_FOR_LIMITS in

lsf.conf to specify a larger unit

for display (MB, GB, TB, PB, or EB).

INPUT_FILE File from which the job reads its standard input (see bsub -i input_file).

OUTPUT_FILE File to which the job writes its standard output (see bsub -o output_file).

ERR_FILE File in which the job stores its standard error output (see bsub -e err_file).

EXCEPTION STATUS Possible values for the exception status of a job include:

idle

Platform LSF Command Reference 11

Advance Reservations (-U)

The job is consuming less CPU time than expected. The job idle factor (CPU time/runtime) is less than the configured JOB_IDLE threshold for the queue and a job exception has been triggered.

overrun

The job is running longer than the number of minutes specified by the JOB_OVERRUN threshold for the queue and a job exception has been triggered.

underrun

The job finished sooner than the number of minutes specified by the JOB_UNDERRUN threshold for the queue and a job exception has been triggered.

Advance Reservations (-U)

Displays the following fields:

RSVID Advance reservation ID assigned by brsvadd command

TYPE Type of reservation: user or system

CREATOR User name of the advance reservation creator, who submitted the brsvadd

command

USER User name of the advance reservation user, who submitted the job with bsub -U

NCPUS Number of CPUs reserved

RSV_HOSTS List of hosts for which processors are reserved, and the number of processors

reserved

TIME_WINDOW Time window for the reservation.

◆ A one-time reservation displays fields separated by slashes

(

month/day/hour/minute). For example:

11/12/14/0-11/12/18/0

◆ A recurring reservation displays fields separated by colons

(

day:hour:minute). For example:

5:18:0 5:20:0

Termination reasons displayed by bacct

When LSF detects that a job is terminated, bacct -l displays one of the following termination reasons. The corresponding integer value logged to the JOB_FINISH record in

◆ TERM_ADMIN: Job killed by root or LSF administrator (15)

◆ TERM_BUCKET_KILL: Job killed with bkill -b (23)

◆ TERM_CHKPNT: Job killed after checkpointing (13)

lsb.acct is given in parentheses.

◆ TERM_CWD_NOTEXIST: current working directory is not accessible or does

not exist on the execution host (25)

◆ TERM_CPULIMIT: Job killed after reaching LSF CPU usage limit (12)

◆ TERM_DEADLINE: Job killed after deadline expires (6)

◆ TERM_EXTERNAL_SIGNAL: Job killed by a signal external to LSF (17)

12 Platform LSF Command Reference

◆ TERM_FORCE_ADMIN: Job killed by root or LSF administrator without time

for cleanup (9)

◆ TERM_FORCE_OWNER: Job killed by owner without time for cleanup (8)

◆ TERM_LOAD: Job killed after load exceeds threshold (3)

◆ TERM_MEMLIMIT: Job killed after reaching LSF memory usage limit (16)

◆ TERM_OWNER: Job killed by owner (14)

◆ TERM_PREEMPT: Job killed after preemption (1)

◆ TERM_PROCESSLIMIT: Job killed after reaching LSF process limit (7)

◆ TERM_REQUEUE_ADMIN: Job killed and requeued by root or LSF

administrator (11)

◆ TERM_REQUEUE_OWNER: Job killed and requeued by owner (10)

◆ TERM_RUNLIMIT: Job killed after reaching LSF run time limit (5)

◆ TERM_SLURM: Job terminated abnormally in SLURM (node failure) (22)

◆ TERM_SWAP: Job killed after reaching LSF swap usage limit (20)

◆ TERM_THREADLIMIT: Job killed after reaching LSF thread limit (21)

◆ TERM_UNKNOWN: LSF cannot determine a termination reason—0 is logged

but TERM_UNKNOWN is not displayed (0)

◆ TERM_WINDOW: Job killed after queue run window closed (2)

◆ TERM_ZOMBIE: Job exited while LSF is not available (19)

TIP: The integer values logged to the JOB_FINISH record in lsb.acct and termination reason

keywords are mapped in lsbatch.h.

Example: Default format

bacct

Accounting information about jobs that are:

- submitted by users user1.

- accounted on all projects.

- completed normally or exited.

- executed on all hosts.

- submitted to all queues.

- accounted on all service classes.

------------------------------------------------------------------------------

SUMMARY: ( time unit: second ) Total number of done jobs: 60 Total number of exited jobs: 118 Total CPU time consumed: 1011.5 Average CPU time consumed: 5.7 Maximum CPU time of a job: 991.4 Minimum CPU time of a job: 0.0 Total wait time in queues: 134598.0 Average wait time in queue: 756.2 Maximum wait time in queue: 7069.0 Minimum wait time in queue: 0.0 Average turnaround time: 3585 (seconds/job) Maximum turnaround time: 77524 Minimum turnaround time: 6 Average hog factor of a job: 0.00 ( cpu time / turnaround time ) Maximum hog factor of a job: 0.56 Minimum hog factor of a job: 0.00 Total throughput: 0.67 (jobs/hour) during 266.18 hours Beginning time: Aug 8 15:48 Ending time: Aug 19 17:59

Platform LSF Command Reference 13

Example: Jobs that have triggered job exceptions

bacct -x -l

Accounting information about jobs that are:

- submitted by users user1,

- accounted on all projects.

- completed normally or exited

- executed on all hosts.

- submitted to all queues.

- accounted on all service classes.

------------------------------------------------------------------------------

Job <1743>, User <user1>, Project <default>, Status <DONE>, Queue <normal>, Command <sleep 30> Mon Aug 11 18:16:17: Submitted from host <hostB>, CWD <$HOME/jobs>, Output File </dev/null>; Mon Aug 11 18:17:22: Dispatched to <hostC>; Mon Aug 11 18:18:54: Completed <done>.

EXCEPTION STATUS: underrun

Accounting information about this job: CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP

0.19 65 157 done 0.0012 4M 5M

------------------------------------------------------------------------------

Job <1948>, User <user1>, Project <default>, Status <DONE>, Queue <normal>, Command <sleep 550> Tue Aug 12 14:15:03: Submitted from host <hostB>, CWD <$HOME/jobs>, Output File </dev/null>; Tue Aug 12 14:15:15: Dispatched to <hostC>; Tue Aug 12 14:25:08: Completed <done>.

EXCEPTION STATUS: overrun idle

Accounting information about this job: CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP

0.20 12 605 done 0.0003 4M 5M

------------------------------------------------------------------------------

Job <1949>, User <user1>, Project <default>, Status <DONE>, Queue <normal>, Command <sleep 400> Tue Aug 12 14:26:11: Submitted from host <hostB>, CWD <$HOME/jobs>, Output File </dev/null>; Tue Aug 12 14:26:18: Dispatched to <hostC>; Tue Aug 12 14:33:16: Completed <done>.

EXCEPTION STATUS: idle

Accounting information about this job: CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP

0.17 7 425 done 0.0004 4M 5M

Job <719[14]>, Job Name <test[14]>, User <user1>, Project <default>, Status

14 Platform LSF Command Reference

<EXIT>, Queue <normal>, Command </home/user1/job1> Mon Aug 18 20:27:44: Submitted from host <hostB>, CWD <$HOME/jobs>, Output File </dev/null>; Mon Aug 18 20:31:16: [14] dispatched to <hostA>; Mon Aug 18 20:31:18: Completed <exit>.

EXCEPTION STATUS: underrun

Accounting information about this job: CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP

0.19 212 214 exit 0.0009 2M 4M

------------------------------------------------------------------------------

SUMMARY: ( time unit: second ) Total number of done jobs: 45 Total number of exited jobs: 56 Total CPU time consumed: 1009.1 Average CPU time consumed: 10.0 Maximum CPU time of a job: 991.4 Minimum CPU time of a job: 0.1 Total wait time in queues: 116864.0 Average wait time in queue: 1157.1 Maximum wait time in queue: 7069.0 Minimum wait time in queue: 7.0 Average turnaround time: 1317 (seconds/job) Maximum turnaround time: 7070 Minimum turnaround time: 10 Average hog factor of a job: 0.01 ( cpu time / turnaround time ) Maximum hog factor of a job: 0.56 Minimum hog factor of a job: 0.00 Total throughput: 0.59 (jobs/hour) during 170.21 hours Beginning time: Aug 11 18:18 Ending time: Aug 18 20:31

Example: Advance reservation accounting information

bacct -U user1#2

Accounting for:

- advanced reservation IDs: user1#2

- advanced reservations created by user1

----------------------------------------------------------------------------RSVID TYPE CREATOR USER NCPUS RSV_HOSTS TIME_WINDOW user1#2 user user1 user1 1 hostA:1 9/16/17/36-9/16/17/38 SUMMARY: Total number of jobs: 4 Total CPU time consumed: 0.5 second Maximum memory of a job: 4.2 MB Maximum swap of a job: 5.2 MB Total duration time: 0 hour 2 minute 0 second

Example: LSF Job termination reason logging

When a job finishes, LSF reports the last job termination action it took against the

bacct -l 7265

job and logs it into

If a running job exits because of node failure, LSF sets the correct exit information in

lsb.acct, lsb.events, and the job output file.

Use

bacct -l to view job exit information logged to lsb.acct:

lsb.acct.

Accounting information about jobs that are:

- submitted by all users.

- accounted on all projects.

Platform LSF Command Reference 15

Files

- completed normally or exited

- executed on all hosts.

- submitted to all queues.

- accounted on all service classes.

------------------------------------------------------------------------------

Job <7265>, User <lsfadmin>, Project <default>, Status <EXIT>, Queue <normal>,

Command <srun sleep 100000>

Thu Sep 16 15:22:09: Submitted from host <hostA>, CWD <$HOME>;

Thu Sep 16 15:22:20: Dispatched to 4 Hosts/Processors <4*hostA>;

Thu Sep 16 15:22:20: slurm_id=21793;ncpus=4;slurm_alloc=n[13-14];

Thu Sep 16 15:23:21: Completed <exit>; TERM_RUNLIMIT: job killed after reaching

LSF run time limit.

Accounting information about this job:

Share group charged </lsfadmin>

CPU_T WAIT TURNAROUND STATUS HOG_FACTOR MEM SWAP

0.04 11 72 exit 0.0006 0K 0K

------------------------------------------------------------------------------

SUMMARY: ( time unit: second )

Total number of done jobs: 0 Total number of exited jobs: 1

Total CPU time consumed: 0.0 Average CPU time consumed: 0.0

Maximum CPU time of a job: 0.0 Minimum CPU time of a job: 0.0

Total wait time in queues: 11.0

Average wait time in queue: 11.0

Maximum wait time in queue: 11.0 Minimum wait time in queue: 11.0

Average turnaround time: 72 (seconds/job)

Maximum turnaround time: 72 Minimum turnaround time: 72

Average hog factor of a job: 0.00 ( cpu time / turnaround time )

Maximum hog factor of a job: 0.00 Minimum hog factor of a job: 0.00

Files

Reads lsb.acct, lsb.acct.n.

bapp

Synopsis

Description

Options

Displays information about application profile configuration.

bapp [-l | -w] [application_profile_name ...]

bapp [-h | -V]

Displays information about application profiles configured in lsb.applications.

Returns application name, job slot statistics, and job state statistics for all application profiles:

In MultiCluster, returns the information about all application profiles in the local cluster.

CPU time is normalized.

-w Wide format. Fields are displayed without truncation.

-l Long format with additional information.

Displays the following additional information: application profile description, application profile characteristics and statistics, parameters, resource usage limits, associated commands, and job controls.

application_profile_name ...

Displays information about the specified application profile.

-h Prints command usage to stderr and exits.

-V Prints product release version to stderr and exits.

Default output format

Displays the following fields:

APPLICATION_NAME

The name of the application profile. Application profiles are named to correspond to the type of application that usually runs within them.

NJOBS The total number of job slots held currently by jobs in the application profile. This

includes pending, running, suspended and reserved job slots. A parallel job that is running on n processors is counted as n job slots, since it takes n job slots in the application.

PEND The number of job slots used by pending jobs in the application profile.

RUN The number of job slots used by running jobs in the application profile.

SUSP The number of job slots used by suspended jobs in the application profile.

Platform LSF Command Reference 17

Long output format(-l)

In addition to the above fields, the -l option displays the following:

Description A description of the typical use of the application profile.

PA R AM E T ER S/

STATISTICS

SSUSP

The number of job slots in the application profile allocated to jobs that are suspended by LSF because of load levels or run windows.

USUSP

The number of job slots in the application profile allocated to jobs that are suspended by the job submitter or by the LSF administrator.

RSV

The number of job slots in the application profile that are reserved by LSF for pending jobs.

Per-job resource usage limits

The soft resource usage limits that are imposed on the jobs associated with the application profile. These limits are imposed on a per-job and a per-process basis.

The possible per-job limits are:

CPULIMIT

The maximum CPU time a job can use, in minutes, relative to the CPU factor of the named host. CPULIMIT is scaled by the CPU factor of the execution host so that jobs are allowed more time on slower hosts.

MEMLIMIT

The maximum running set size (RSS) of a process.

By default, the limit is shown in KB. Use LSF_UNIT_FOR_LIMITS in specify a larger unit for display (MB, GB, TB, PB, or EB).

lsf.conf to

MEMLIMIT_TYPE

A memory limit is the maximum amount of memory a job is allowed to consume. Jobs that exceed the level are killed. You can specify different types of memory limits to enforce, based on PROCESS, TASK, or JOB (or any combination of the three).

PROCESSLIMIT

The maximum number of concurrent processes allocated to a job.

PROCLIMIT

The maximum number of processors allocated to a job.

SWAPLIMIT

The swap space limit that a job may use.

By default, the limit is shown in KB. Use LSF_UNIT_FOR_LIMITS in specify a larger unit for display (MB, GB, TB, PB, or EB).

THREADLIMIT

The maximum number of concurrent threads allocated to a job.

18 Platform LSF Command Reference

lsf.conf to

Per-process resource usage limits

The possible UNIX per-process resource limits are:

CORELIMIT

The maximum size of a core file.

CHKPNT_DIR The checkpoint directory, if automatic checkpointing is enabled for the application

CHKPNT_INITPERIOD

By default, the limit is shown in KB. Use LSF_UNIT_FOR_LIMITS in specify a larger unit for display (MB, GB, TB, PB, or EB).

DATALIMIT

The maximum size of the data segment of a process, in KB. This restricts the amount of memory a process can allocate.

FILELIMIT

The maximum file size a process can create, in KB.

RUNLIMIT

The maximum wall clock time a process can use, in minutes. RUNLIMIT is scaled by the CPU factor of the execution host.

STACKLIMIT

The maximum size of the stack segment of a process. This restricts the amount of memory a process can use for local variables or recursive function calls.

By default, the limit is shown in KB. Use LSF_UNIT_FOR_LIMITS in specify a larger unit for display (MB, GB, TB, PB, or EB).

profile.

The initial checkpoint period in minutes. The periodic checkpoint does not happen until the initial period has elapsed.

lsf.conf to

CHKPNT_PERIOD The checkpoint period in minutes. The running job is checkpointed automatically

every checkpoint period.

CHKPNT_METHOD The checkpoint method.

MIG The migration threshold in minutes. A value of 0 (zero) specifies that a suspended

job should be migrated immediately.

Where a host migration threshold is also specified, and is lower than the job value, the host value is used.

PRE_EXEC The pre-execution command for the application profile. The PRE_EXEC command

runs on the execution host before the job associated with the application profile is dispatched to the execution host (or to the first host selected for a parallel batch job).

POST_EXEC The post-execution command for the application profile. The POST_EXEC

command runs on the execution host after the job finishes.

JOB_INCLUDE_POSTPROC

If JOB_INCLUDE_POSTPROC= Y, post-execution processing of the job is included as part of the job.

JOB_POSTPROC_TIMEOUT

Platform LSF Command Reference 19

badmin

Synopsis

Description

Administrative tool for LSF.

badmin subcommand

badmin [-h | -V]

IMPORTANT: This command can only be used by LSF administrators.

badmin

subcommands are supplied for

provides a set of subcommands to control and monitor LSF. If no

badmin, badmin prompts for a subcommand from

standard input.

Information about each subcommand is available through the

The

badmin subcommands include privileged and non-privileged subcommands.

help command.

Privileged subcommands can only be invoked by root or LSF administrators. Privileged subcommands are:

reconfig

mbdrestart

qopen

qclose

qact

qinact

hopen

hclose

hrestart

hshutdown

hstartup

diagnose

The configuration file lsf.sudoers(5) must be set to use the privileged command

hstartup by a non-root user.

All other commands are non-privileged commands and can be invoked by any LSF user. If the privileged commands are to be executed by the LSF administrator,

badmin must be installed, because it needs to send the request using a privileged

port.

For subcommands for which multiple hosts can be specified, do not enclose the host names in quotation marks.

Platform LSF Command Reference 21

Subcommand synopsis

ckconfig [-v]

diagnose [job_ID ... | "job_ID[index]" ...]

reconfig [-v] [-f]

mbdrestart [-C comment] [-v] [-f]

qopen [-C comment] [queue_name ... | all]

qclose [-C comment] [queue_name ... | all]

qact [-C comment] [queue_name ... | all]

qinact [-C comment] [queue_name ... | all]

qhist [-t time0,time1] [-f logfile_name] [queue_name ...]

hopen [-C comment] [host_name ... | host_group ... | all]

hclose [-C comment

hrestart [-f] [host_name ... | all]

hshutdown [-f] [host_name ... | all]

hstartup [-f] [host_name ... | all]

hhist [-t time0,time1] [-f logfile_name] [host_name ...]

mbdhist [-t time0,time1] [-f logfile_name]

hist [-t time0,time1] [-f logfile_name]

hghostadd [-C comment] host_group host_name [host_name ...]

hghostdel [-f] [-C comment] host_group host_name [host_name ...]

help [command ...] | ? [

quit

mbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o]

mbdtime [-l timing_level] [-f logfile_name] [-o]

sbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o]

[host_name ...]

sbdtime [-l timing_level] [-f logfile_name] [-o] [host_name ...]

schddebug [-c class_name ...] [-l debug_level] [-f logfile_name]

[-o]

schdtime [-l timing_level] [-f logfile_name] [-o]

showconf mbd | [sbd [ host_name … |

perfmon start [sample_period]| stop | view | setperiod sample_period

-h

] [host_name ... | host_group ... | all]

command ...]

all ]]

-V

22 Platform LSF Command Reference

Options

subcommand Executes the specified subcommand. See Usage section.

Usage

ckconfig [-v] Checks LSF configuration files located in the

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

LSB_CONFDIR/cluster_name/configdir directory, and checks LSF_ENVDIR/lsf.licensescheduler.

The LSB_CONFDIR variable is defined in LSF_ENVDIR or

/etc (if LSF_ENVDIR is not defined).

lsf.conf (see lsf.conf(5)) which is in

By default, check. If warning errors are found,

badmin ckconfig displays only the result of the configuration file

badmin prompts you to display detailed

messages.

-v

Verbose mode. Displays detailed messages about configuration file checking to

stderr.

diagnose [job_ID ... | "job_ID]" ...][

Displays full pending reason list if CONDENSE_PENDING_REASONS=Y is set in

lsb.params. For example:

badmin diagnose 1057

reconfig [-v] [-f] Dynamically reconfigures LSF without restarting mbatchd.

Configuration files are checked for errors and the results displayed to errors are found in the configuration files, a reconfiguration request is sent to

mbatchd and configuration files are reloaded.

With this option, replayed. To restart

mbdrestart

When you issue this command,

mbatchd and mbschd are not restarted and lsb.events is not

mbatchd and mbschd, and replay lsb.events, use badmin

mbatchd is available to service requests while

reconfiguration files are reloaded. Configuration changes made since system boot or the last reconfiguration take effect.

stderr. If no

If warning errors are found, fatal errors are found, reconfiguration is not performed, and

badmin prompts you to display detailed messages. If

badmin exits.

If you add a host to a queue or to a host group, the new host is not recognized by jobs that were submitted before you reconfigured. If you want the new host to be recognized, you must use the command

badmin mbdrestart.

Resource requirements determined by the queue no longer apply to a running job after running

badmin reconfig, For example, if you change the RES_REQ

parameter in a queue and reconfigure the cluster, the previous queue-level resource requirements for running jobs are lost.

-v

Platform LSF Command Reference 23

Usage

Verbose mode. Displays detailed messages about the status of the configuration files. Without this option, the default is to display the results of configuration file checking. All messages from the configuration file check are printed to

-f

Disables interaction and proceeds with reconfiguration if configuration files contain no fatal errors.

mbdrestart [-C comment] [-v] [-f]

Dynamically reconfigures LSF and restarts mbatchd and mbschd.

stderr.

Configuration files are checked for errors and the results printed to errors are found, configuration files are reloaded, restarted, and events in last

mbatchd. While mbatchd restarts, it is unavailable to service requests.

If warning errors are found, fatal errors are found, exits.

lsb.events is large, or many jobs are running, restarting mbatchd can take

several minutes. If you only need to reload the configuration files, use

reconfig

-C comment

Logs the text of comment as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

-v

Verbose mode. Displays detailed messages about the status of configuration files. All messages from configuration checking are printed to

-f

Disables interaction and forces reconfiguration and mbatchd restart to proceed if configuration files contain no fatal errors.

qopen [-C comment] [queue_name ... | all]

Opens specified queues, or all queues if the reserved word all is specified. If no queue is specified, the system default queue is assumed. A queue can accept batch jobs only if it is open.

stderr. If no

mbatchd and mbschd are

lsb.events are replayed to recover the running state of the

badmin prompts you to display detailed messages. If

mbatchd and mbschd restart is not performed, and badmin

badmin

stderr.

-C comment

Logs the text of comment as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

qclose [-C comment] [queue_name ... | all]

Closes specified queues, or all queues if the reserved word all is specified. If no queue is specified, the system default queue is assumed. A queue does not accept any job if it is closed.

-C comment

Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

qact [-C comment] [queue_name ... | all]

24 Platform LSF Command Reference

Activates specified queues, or all queues if the reserved word all is specified. If no queue is specified, the system default queue is assumed. Jobs in a queue can be dispatched if the queue is activated.

A queue inactivated by its run windows cannot be reactivated by this command.

-C comment

Logs the text of the comment as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

qinact [-C comment] [queue_name ... | all]

Inactivates specified queues, or all queues if the reserved word all is specified. If no queue is specified, the system default queue is assumed. No job in a queue can be dispatched if the queue is inactivated.

-C comment

Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

qhist [-t time0,time1] [-f logfile_name] [queue_name ...]

Displays historical events for specified queues, or for all queues if no queue is specified. Queue events are queue opening, closing, activating and inactivating.

-t time0,time1

Displays only those events that occurred during the period from time0 to time1. See

bhist(1) for the time format. The default is to display all queue events in the event

log file (see below).

-f logfile_name

Specify the file name of the event log file. Either an absolute or a relative path name may be specified. The default is to use the event log file currently used by the LSF system:

LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for

offline analysis.

If you specified an administrator comment with the commands

hopen [-C comment] [host_name ... | host_group ... | all]

qclose, qopen, qact, and qinact, qhist displays the comment text.

Opens batch server hosts. Specify the names of any server hosts or host groups. All batch server hosts are opened if the reserved word group is specified, the local host is assumed. A host accepts batch jobs if it is open.

IMPORTANT: If EGO-enabled SLA scheduling is configured through ENABLE_DEFAULT_EGO_SLA

in lsb.params, and a host is closed by EGO, it cannot be reopened by badmin hopen. Hosts closed by EGO have status closed_EGO in bhosts -l output.

-C comment

Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

If you open a host group, each host group member displays with the same comment string.

-C option of the queue control

all is specifie d. If no hos t or ho st

hclose [-C comment] [host_name ... | host_group ... | all]

Platform LSF Command Reference 25

Usage

Closes batch server hosts. Specify the names of any server hosts or host groups. All batch server hosts are closed if the reserved word specified, the local host is assumed. A closed host does not accept any new job, but jobs already dispatched to the host are not affected. Note that this is different from a host closed by a window; all jobs on it are suspended in that case.

-C comment

Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

If you close a h ost g roup, eac h host grou p mem ber dis play s with the sam e comm ent string.

hghostadd [-C comment] host_group host_name [host_name ...]

If dynamic host configuration is enabled, dynamically adds hosts to a host group, . After receiving the host information from the master LIM, adds the host without triggering a

reconfig.

Once the host is added to the group, it is considered to be part of that group with respect to scheduling decision making for both newly submitted jobs and for existing pending jobs.

This command fails if any of the specified host groups or host names are not valid.

all is specified. If no argument is

mbatchd dynamically

RESTRICTION: If EGO- enabled SLA scheduling is configured through ENABLE _DEFAULT_EGO_SLA

in lsb.params, you cannot use hghostadd because all host allocation is under control of Platform EGO.

-C comment

Logs the text as an administrator comment record to lsb.events. The maximum length of the comment string is 512 characters.

hghostdel [-f] [-C comment] host_group host_name [host_name ...]

Dynamically deletes hosts from a host group by triggering an mbatchd reconfig

This command fails if any of the specified host groups or host names are not valid.

CAUTION: If you want to change a dynamic host to a static host, first use the command

badmin hghostdel to remove the dynamic host from any host group that it belongs to, and then configure the host as a static host in lsf.cluster.cluster_name.

RESTRICTION: If EGO- enabled SLA scheduling is configured through ENABLE _DEFAULT_EGO_SLA

in lsb.params, you cannot use hghostdel because all host allocation is under control of Platform EGO.

hrestart [-f] [host_name ... | all]

Restarts sbatchd on the specified hosts, or on all server hosts if the reserved word

all is specified. If no host is specified, the local host is assumed. sbatchd reruns

itself from the beginning. This allows new

sbatchd binaries to be used.

-f

Disables interaction and does not ask for confirmation for restarting sbatchd.

hshutdown [-f] [host_name ... | all]

26 Platform LSF Command Reference

Shuts down sbatchd on the specified hosts, or on all batch server hosts if the reserved word

sbatchd exits upon receiving the request.

-f

Disables interaction and does not ask for confirmation for shutting down sbatchd.

hstartup [-f] [host_name ... | all]

Starts sbatchd on the specified hosts, or on all batch server hosts if the reserved word use the hosts without having to type in passwords. If no host is specified, the local host is assumed.

all is specified. If no host is specified, the local host is assumed.

all is specified. Only root and users listed in the file lsf.sudoers(5) can

all and -f options. These users must be able to use rsh or ssh on all LSF

The shell command specified by LSF_RSH in

-f

Disables interaction and does not ask for confirmation for starting sbatchd.

hhist [-t time0,time1] [-f logfile_name] [host_name ...]

Displays historical events for specified hosts, or for all hosts if no host is specified. Host events are host opening and closing.

-t time0,time1 Displays only those events that occurred during the period from time0 to time1. See

bhist(1) for the time format. The default is to display all queue events in the event

log file (see below).

-f logfile_name Specify the file name of the event log file. Either an absolute or a relative path name

may be specified. The default is to use the event log file currently used by the LSF system:

LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for

offline analysis.

If you specified an administrator comment with the commands

mbdhist [-t time0,time1] [-f logfile_name]

hclose or hopen, hhist displays the comment text.

Displays historical events for mbatchd. Events describe the starting and exiting of

mbatchd.

-t time0,time1 Displays only those events that occurred during the period from time0 to time1. See

bhist(1) for the time format. The default is to display all queue events in the event

log file (see below).

lsf.conf is used before rsh is tried.

-C option of the host control

-f logfile_name Specify the file name of the event log file. Either an absolute or a relative path name

may be specified. The default is to use the event log file currently used by the LSF system:

LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for

offline analysis.

If you specified an administrator comment with the command,

hist [-t time0,time1] [-f logfile_name]

Displays historical events for all the queues, hosts and mbatchd.

-C option of the mbdrestart

mbdhist displays the comment text.

Platform LSF Command Reference 27

Usage

-t time0,time1

Displays only those events that occurred during the period from time0 to time1. See

bhist(1) for the time format. The default is to display all queue events in the event

log file (see below).

-f logfile_name Specify the file name of the event log file. Either an absolute or a relative path name

may be specified. The default is to use the event log file currently used by the LSF system:

LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for

offline analysis.

If you specified an administrator comment with the and

mbatchd commands, hist displays the comment text.

help [command ...] | ? [command ...]

Displays the syntax and functionality of the specified commands.

quit Exits the badmin session.

mbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o]

Sets message log level for mbatchd to include additional information in log files. You mu st be

See

sddebug for an explanation of options.

mbdtime [-l timing_level] [-f logfile_name] [-o]

root or the LSF administrator to use this command.

Sets timing level for mbatchd to include additional timing information in log files. You mu st b e

root or the LSF administrator to use this command. See sbdtime for

an explanation of options.

-C option of the queue, host,

sbddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o] [host_name ...]

Sets the message log level for sbatchd to include additional information in log files. You mu st be

root or the LSF administrator to use this command.

In MultiCluster, debug levels can only be set for hosts within the same cluster. For example, you cannot set debug or timing levels from a host in in

clusterB. You need to be on a host in clusterB to set up debug or timing levels

for

clusterB hosts.

If the command is used without any options, the following default values are used:

class_name=0 (no additional classes are logged)

debug_level=0 (LOG_DEBUG level in parameter LSF_LOG_MASK)

logfile_name=current LSF system log file in the LSF system log file directory, in the format daemon_name

.log.host_name

host_name=local host (host from which command was submitted)

-c class_name ...

Specifies software classes for which debug messages are to be logged.

Format of class_name is the name of a class, or a list of class names separated by spaces and enclosed in quotation marks. Classes are also listed in

Valid log classes are:

◆ LC_ADVRSV - Log advance reservation modifications

clusterA for a host

lsf.h.

◆ LC_AFS - Log AFS messages

◆ LC_AUTH - Log authentication messages

28 Platform LSF Command Reference

◆ LC_CHKPNT - Log checkpointing messages

◆ LC_COMM - Log communication messages

◆ LC_CONF - Print out all parameters in lsb.params

◆ LC_DCE - Log messages pertaining to DCE support

◆ LC_EEVENTD - Log eeventd messages

◆ LC_ELIM - Log ELIM messages

◆ LC_EXEC - Log significant steps for job execution

◆ LC_FAIR - Log fairshare policy messages

◆ LC_FILE - Log file transfer messages

◆ LC_HANG - Mark where a program might hang

◆ LC_JARRAY - Log job array messages

◆ LC_JLIMIT - Log job slot limit messages

◆ LC_LICENSE - Log license management messages (LC_LICENCE is also

supported for backward compatibility)

◆ LC_LOADINDX - Log load index messages

◆ LC_M_LOG - Log multievent logging messages

◆ LC_MPI - Log MPI messages

◆ LC_MULTI - Log messages pertaining to MultiCluster

◆ LC_PEND - Log messages related to job pending reasons

◆ LC_PERFM - Log performance messages

◆ LC_PIM - Log PIM messages

◆ LC_PREEMPT - Log preemption policy messages

◆ LC_SIGNAL - Log messages pertaining to signals

◆ LC_SYS - Log system call messages

◆ LC_TRACE - Log significant program walk steps

◆ LC_XDR - Log everything transferred by XDR

Default: 0 (no additional classes are logged)

-l debug_level

Specifies level of detail in debug messages. The higher the number, the more detail that is logged. Higher levels include all lower levels.

Possible values:

0 LOG_DEBUG level in parameter LSF_LOG_MASK in

lsf.conf.

1 LOG_DEBUG1 level for extended logging. A higher level includes lower logging levels. For example, LOG_DEBUG3 includes LOG_DEBUG2 LOG_DEBUG1, and LOG_DEBUG levels.

2 LOG_DEBUG2 level for extended logging. A higher level includes lower logging

levels. For example, LOG_DEBUG3 includes LOG_DEBUG2 LOG_DEBUG1, and LOG_DEBUG levels.

Platform LSF Command Reference 29

Usage

3 LOG_DEBUG3 level for extended logging. A higher level includes lower logging

levels. For example, LOG_DEBUG3 includes LOG_DEBUG2, LOG_DEBUG1, and LOG_DEBUG levels.

Default: 0 (LOG_DEBUG level in parameter LSF_LOG_MASK)

-f logfile_name

Specify the name of the file into which debugging messages are to be logged. A file name with or without a full path may be specified.

If a file name without a path is specified, the file is saved in the LSF system log directory.

The name of the file that is created has the following format:

logfile_name.daemon_name.

log.host_name

On UNIX, if the specified path is not valid, the log file is created in the directory.

On Windows, if the specified path is not valid, no log file is created.

Default: current LSF system log file in the LSF system log file directory.

-o

Turns off temporary debug settings and resets them to the daemon starting state. The message log level is reset back to the value of LSF_LOG_MASK and classes are reset to the value of LSB_DEBUG_MBD, LSB_DEBUG_SBD.

The log file is also reset back to the default log file.

host_name ...

Optional. Sets debug settings on the specified host or hosts.

Lists of host names must be separated by spaces and enclosed in quotation marks.

Default: local host (host from which command was submitted)

sbdtime [-l timing_level] [-f logfile_name] [-o] [host_name ...]

Sets the timing level for sbatchd to include additional timing information in log files. You must be

root or the LSF administrator to use this command.

In MultiCluster, timing levels can only be set for hosts within the same cluster. For example, you could not set debug or timing levels from a host in clusterA for a host in clusterB. You need to be on a host in clusterB to set up debug or timing levels for clusterB hosts.

If the command is used without any options, the following default values are used:

/tmp

timing_level=no timing information is recorded

logfile_name=current LSF system log file in the LSF system log file directory, in the format daemon_name.

host_name=local host (host from which command was submitted)

-l timing_level

Specifies detail of timing information that is included in log files. Timing messages indicate the execution time of functions in the software and are logged in milliseconds.

Valid values: 1 | 2 | 3 | 4 | 5

30 Platform LSF Command Reference

log.host_name

The higher the number, the more functions in the software that are timed and whose execution time is logged. The lower numbers include more common software functions. Higher levels include all lower levels.

Default: undefined (no timing information is logged)

-f logfile_name

Specify the name of the file into which timing messages are to be logged. A file name with or without a full path may be specified.

If a file name without a path is specified, the file is saved in the LSF system log file directory.

The name of the file created has the following format:

logfile_name.daemon_name.

log.host_name

On UNIX, if the specified path is not valid, the log file is created in the directory.

On Windows, if the specified path is not valid, no log file is created.

Note: Both timing and debug messages are logged in the same files.

Default: current LSF system log file in the LSF system log file directory, in the format daemon_name.

-o

log.host_name.

Optional. Turn off temporary timing settings and reset them to the daemon starting state. The timing level is reset back to the value of the parameter for the corresponding daemon (LSB_TIME_MBD, LSB_TIME_SBD).

The log file is also reset back to the default log file.

host_name ...

Sets the timing level on the specified host or hosts.

Lists of hosts must be separated by spaces and enclosed in quotation marks.

Default: local host (host from which command was submitted)

schddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o]

Sets message log level for mbschd to include additional information in log files. You must be

root or the LSF administrator to use this command.

/tmp

See

sbddebug for an explanation of options.

schdtime [-l timing_level] [-f] [-o]

Sets timing level for mbschd to include additional timing information in log files. You mu st be

See

sbdtime for an explanation of options.

showconf mbd | [sbd [ host_name … | all ]]

root or the LSF administrator to use this command.

Display all configured parameters and their values set in lsf.conf or ego.conf that affect

mbatchd and sbatchd.

In a MultiCluster environment, daemons on the local cluster.

badmin showconf only displays the parameters of

Platform LSF Command Reference 31

Usage

Running badmin showconf from a master candidate host reaches all server hosts in the cluster. Running

badmin showconf from a slave-only host may not be able to

reach other slave-only hosts.

badmin showconf only displays the values used by LSF.

For example, if you define LSF_MASTER_LIST in EGO_MASTER_LIST in

ego.conf, badmin showconf displays the value of

EGO_MASTER_LIST.

badmin showconf displays the value of EGO_MASTER_LIST from wherever it is

defined. You can define either LSF_MASTER_LIST or EGO_MASTER_LIST in

lsf.conf. LIM reads lsf.conf first, and ego.conf if EGO is enabled in the LSF

cluster. The value of LSF_MASTER_LIST is displayed only if EGO_MASTER_LIST is not defined at all in

ego.conf.

For example, if EGO is enabled in the LSF cluster, and you define LSF_MASTER_LIST in

badmin showconf displays the value of EGO_MASTER_LIST in ego.conf.

If EGO is disabled,

lsf.conf, and EGO_MASTER_LIST in ego.conf,

ego.conf not loaded, so whatever is defined in lsf.conf is

displayed.

perfmon start [sample_period] | stop | view | setperiod sample_period

Dynamically enables and controls scheduler performance metric collection.

Collecting and recording performance metric data may affect the performance of LSF. Smaller sampling periods results in the

The following metrics are collected and recorded in each sample period:

◆ The number of queries handled by mbatchd

◆ The number of queries for each of jobs, queues, and hosts. (bjobs, bqueues,

and

bhosts, as well as other daemon requests)

lsf.conf, and

lsb.streams file growing faster.

◆ The number of jobs submitted (divided into job submission requests and jobs

actually submitted)

◆ The number of jobs dispatched

◆ The number of jobs completed

◆ The numbers of jobs sent to remote cluster

◆ The numbers of jobs accepted by from cluster

start [sample_period]

Start performance metric collection dynamically and specifies an optional sampling period in seconds for performance metric collection.

If no sampling period is specified, the default period set in

SCHED_METRIC_SAMPLE_PERIOD in lsb.params is used.

stop

Stop performance metric collection dynamically.

view

Display real time performance metric information for the current sampling period

setperiod sample_period

32 Platform LSF Command Reference

bbot

Synopsis

Description

Moves a pending job relative to the last job in the queue.

bbot job_ID | "job_ID[index_list]" [position]

bbot -h | -V

Changes the queue position of a pending job or job array element, to affect the order in which jobs are considered for dispatch.

By default, LSF dispatches jobs in a queue in the order of arrival (that is, first-come, first-served), subject to availability of suitable server hosts.

The

bbot command allows users and the LSF administrator to manually change the

order in which jobs are considered for dispatch. Users can only operate on their own jobs, whereas the LSF administrator can operate on any user’s jobs.

If invoked by the LSF administrator, with the same priority submitted to the queue.

If invoked by a user, priority submitted by the user to the queue.

Pending jobs are displayed by dispatch.

A user may use fairshare policy. However, if a job scheduled using a fairshare policy is moved by the LSF administrator using unless the same job is subsequently moved by the LSF administrator using this case the job is scheduled again using the same fairshare policy.

To prevent users from changing the queue position of a pending job with configure JOB_POSITION_CONTROL_BY_ADMIN=Y in

You ca nn ot r u n queue.

Options

job_ID | "job_ID[index_list]"

Required. Job ID of the job or job array on which to operate.

For a job array, the index list, the square brackets, and the quotation marks are required. An index list is used to operate on a job array. The index list is a comma separated list whose elements have the syntax start_index[-end_index[:step]] where start_index, end_index and step are positive integers. If the step is omitted, a step of one is assumed. The job array index starts at one. The maximum job array index is 1000. All jobs in the array share the same job_ID and parameters. Each element of the array is distinguished by its array index.

bbot moves the selected job after the last job

bbot moves the selected job after the last job with the same

bjobs in the order in which they are considered for

bbot to change the dispatch order of their jobs scheduled using a

btop, the job is not subject to further fairshare scheduling

bbot; in

bbot,

lsb.params.

bbot on jobs pending in an absolute priority scheduling (APS)

34 Platform LSF Command Reference

bchkpnt

Synopsis

Description

checkpoints one or more checkpointable jobs

bchkpnt [-f] [-k] [-p minutes | -p 0]

job_ID | "job_ID[index_list]" ...

bchkpnt [-f] [-k] [-p minutes | -p 0] -J job_name

bchkpnt -h | -V

Checkpoints the most recently submitted running or suspended checkpointable job.

LSF administrators and

Jobs continue to execute after they have been checkpointed.

root can checkpoint jobs submitted by other users.

LSF invokes the checkpoint.

Only running members of a chunk job can be checkpointed. For chunk jobs in WA IT st at e,

echkpnt(8) executable found in LSF_SERVERDIR to perform the

mbatchd rejects the checkpoint request.

Options

0 (Zero). Checkpoints all of the jobs that satisfy other specified critera.

-f Forces a job to be checkpointed even if non-checkpointable conditions exist (these

conditions are OS-specific).

-k Kills a job after it has been successfully checkpointed.

-p minutes | -p 0 Enables periodic checkpointing and specifies the checkpoint period, or modifies

the checkpoint period of a checkpointed job. Specify checkpointing.

Checkpointing is a resource-intensive operation. To allow your job to make progress while still providing fault tolerance, specify a checkpoint period of 30 minutes or longer.

-J job_name Checkpoints only jobs that have the specified job name.

-m host_name | -m host_group

Checkpoints only jobs dispatched to the specified hosts.

-p 0 (zero) to disable periodic

-q queue_name

Checkpoints only jobs dispatched from the specified queue.

-u "user_name" | -u all

36 Platform LSF Command Reference

Checkpoints only jobs submitted by the specified users. The keyword all specifies all users. Ignored if a job ID other than 0 (zero) is specified. To specify a Windows user account, include the domain name in uppercase letters and use a single backslash (DOMAIN_NAME\user_name) in a Windows command line or a double backslash (DOMAIN_NAME\\user_name) in a UNIX command line.

job_ID | "job_ID[index_list]"

Checkpoints only the specified jobs.

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

Examples

bchkpnt 1234

Checkpoints the job with job ID 1234.

bchkpnt -p 120 1234

Enables periodic checkpointing or changes the checkpoint period to 120 minutes (2 hours) for a job with job ID 1234.

bchkpnt -m hostA -k -u all 0

When issued by root or the LSF administrator, checkpoints and kills all checkpointable jobs on rebooted.

hostA. This is useful when a host needs to be shut down or

bclusters

Synopsis

Description

Options

-app Displays available application profiles in remote clusters.

displays MultiCluster information

bclusters [-app]

bclusters [-h | -V]

For the job forwarding model, displays a list of MultiCluster queues together with their relationship with queues in remote clusters.

For the resource leasing model, displays remote resource provider and consumer information, resource flow information, and connection status between the local and remote cluster.

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

Output

Job Forwarding Information

Displays a list of MultiCluster queues together with their relationship with queues in remote clusters.

Information related to the job forwarding model is displayed under the heading

Forwarding Information.

LOCAL_QUEUE Name of a local MultiCluster send-jobs or receive-jobs queue.

JOB_FLOW Indicates direction of job flow.

send

The local queue is a MultiCluster send-jobs queue (SNDJOBS_TO is defined in the local queue).

recv

The local queue is a MultiCluster receive-jobs queue (RCVJOBS_FROM is defined in the local queue).

Job

REMOTE For send-jobs queues, shows the name of the receive-jobs queue in a remote cluster.

For receive-jobs queues, always “-”.

CLUSTER For send-jobs queues, shows the name of the remote cluster containing the

receive-jobs queue.

38 Platform LSF Command Reference

For receive-jobs queues, shows the name of the remote cluster that can send jobs to the local queue.

STATUS Indicates the connection status between the local queue and remote queue.

The two clusters can exchange information and the system is properly configured.

disc

Communication between the two clusters has not been established. This could occur because there are no jobs waiting to be dispatched, or because the remote master cannot be located.

reject

The remote queue rejects jobs from the send-jobs queue. The local queue and remote queue are connected and the clusters communicate, but the queue-level configuration is not correct. For example, the send-jobs queue in the submission cluster points to a receive-jobs queue that does not exist in the remote cluster.

If the job is rejected, it returns to the submission cluster.

Resource Lease Information

Displays remote resource provider and consumer information, resource flow information, and connection status between the local and remote cluster.

Information related to the resource leasing model is displayed under the heading

Resource Lease Information.

REMOTE_CLUSTER For borrowed resources, name of the remote cluster that is the provider.

For exported resources, name of the remote cluster that is the consumer.

RESOURCE_FLOW Indicates direction of resource flow.

IMPORT

Local cluster is the consumer and borrows resources from the remote cluster (HOSTS parameter in one or more local queue definitions includes remote resources).

EXPORT

Local cluster is the provider and exports resources to the remote cluster.

STATUS Indicates the connection status between the local and remote cluster.

MultiCluster jobs can run.

disc

No communication between the two clusters. This could be a temporary situation or could indicate a MultiCluster configuration error.

conn

The two clusters communicate, but the lease is not established. This should be a temporary situation.

Platform LSF Command Reference 39

Files

Remote Cluster Application Information

bcluster -app displays information related to application profile configuration

under the heading profile information is only displayed for the job forwarding model. not show local cluster application profile information.

Remote Cluster Application Information. Application

REMOTE_CLUSTER The name of the remote cluster.

APP_NAME The name of the application profile available in the remote cluster.

DESCRIPTION The description of the application profile.

Files

Reads lsb.queues and lsb.applications.

bgadd

Synopsis

Description

Options

-L limit Specifies the maximum number of concurrent jobs allowed to run under the job

creates job groups

bgadd [-L limit] [-sla service_class_name] job_group_name

bgadd [-h | -V]

Creates a job group with the job group name specified by job_group_name.

You must provide full group path name for the new job group. The last component of the path is the name of the new group to be created.

You do not need to create the parent job group before you create a sub-group under it. If no groups in the job group hierarchy exist, all groups are created with the specified hierarchy.

group (including child groups) USSUP) under the job group.

Specify a positive number between 0 and 2147483647. If the specified limit is zero (0), no jobs under the job group can run.

You cannot specify a limit for the root job group. The root job group has no job limit. Job groups added with no limits specified inherit any limits of existing parent job groups. The

-L option only limits the lowest level job group created.

-L limits the number of started jobs (RUN, SSUSP,

If a parallel job requests 2 CPUs ( slots used by the job.

By default, a job group has no job limit. Limits persist across reconfiguration.

-sla service_class_name

The name of a service class defined in lsb.serviceclasses, or the name of the SLA defined in ENABLE_DEFAULT_EGO_SLA in attached to the specified SLA.

job_group_name Full path of the job group name.

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

Examples

◆ Create a job group named risk_group under the root group /:

bgadd /risk_group

bsub -n 2), the job group limit is per job, not per

mbatchd restart or

lsb.params. The job group is

Platform LSF Command Reference 41

bgdel

Synopsis

Description

deletes job groups

bgdel [-u user_name | -u all] job_group_name | 0

bgdel -c job_group_name

bgdel [-h | -V]

Deletes a job group with the job group name specified by job_group_name and all its subgroups.

You must provide full group path name for the job group to be deleted. The job group cannot contain any jobs.

Users can only delete their own job groups. LSF administrators can delete any job groups.

Job groups can be created explicitly or implicitly:

◆ A job group is created explicitly with the bgadd command.

◆ A job group is created implicitly by the bsub -g or bmod -g command when

the specified group does not exist. Job groups are also created implicitly when a default job group is configured (DEFAULT_JOBGROUP in LSB_DEFAULT_JOBGROUP environment variable).

lsb.params or

Options

0 Delete the empty job groups. These groups can be explicit or implicit.

-u user_name Delete empty job groups owned by the specified user. Only administrators can use

this option. These groups can be explicit or implicit. If you specify a job group name, the

-u all Delete empty job groups and their sub groups for all users. Only administrators can

use this option. These groups can be explicit or implicit. If you specify a job group name, the

-c job_group_name Delete all the empty groups below the requested job_group_name including the

job_group_name itself. These groups can be explicit or implicit.

job_group_name Full path of the job group name.

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

-u option is ignored.

Example

bgdel /risk_group

Job group /risk_group is deleted.

deletes the job group /risk_group and all its subgroups.

Platform LSF Command Reference 43

bhist

Synopsis

Description

displays historical information about jobs

bhist [-a | -d | -e |-p | -r | -s] [-b | -w] [-l]

[-C start_time,end_time] [-D start_time,end_time] [-f logfile_name | -n number_logfiles | -n 0] [-S start_time,end_time] [-J job_name] [-Lp ls_project_name] [-m host_name] [-N host_name | -N host_model | -N CPU_factor] [-P project_name] [-q queue_name] [-u user_name | -u all]

bhist [-t] [-f logfile_name] [-T start_time,end_time]

bhist [-J job_name] [-N host_name | -N host_model | -N

[job_ID ... | "job_ID[index]" ...]

bhist [-h | -V]

cpu_factor]

By default:

◆ Displays information about your own pending, running and suspended jobs.

Groups information by job

◆ CPU time is not normalized

Options

◆ Searches the event log file currently used by the LSF system:

$LSB_SHAREDIR/cluster_name/logdir/lsb.events (see lsb.events(5))

◆ Displays events occurring in the past week, but this can be changed by setting

the environment variable LSB_BHIST_HOURS to an alternative number of hours

If neither

-l nor -b is present, the default is to display only the fields shown in

Output on page 48.

-a Displays information about both finished and unfinished jobs.

This option overrides

-b Brief format. Displays the information in a brief format. If used with the -s option,

-d, -p, -s, and -r.

shows the reason why each job was suspended.

-d Only displays information about finished jobs.

-e Only displays information about exited jobs.

-l Long format. Displays additional information. If used with -s, shows the reason

why each job was suspended.

If you submitted a job using the this option displays the successful

OR (||) expression to specify alternative resources,

Execution rusage string with which the job

ran.

Platform LSF Command Reference 45

Options

If you submitted a job with multiple resource requirement strings using the bsub -R option for the order, same, rusage, and select sections,

bjobs -l displays a single,

merged resource requirement string for those sections, as if they were submitted using a single

bhist -l can display job exit codes. A job with exit code 131 means that the job

-R.

exceeded a configured resource usage limit and LSF killed the job with signal 3 (131-128=3).

bhist -l can display changes to pending jobs as a result of the following bmod

options:

◆ Absolute priority scheduling (-aps | -apsn)

◆ Runtime estimate (-We | -Wen)

◆ Post-execution command (-Ep | -Epn)

◆ User limits (-ul | -uln)

◆ Current working directory (-cwd | -cwdn)

◆ Checkpoint options (-k | -kn)

◆ Migration threshold (-mig | -mign)

-p Only displays information about pending jobs.

-r Only displays information about running jobs.

-s Only displays information about suspended jobs.

-t Displays job events chronologically.

-w Wide format. Displays the information in a wide format.

-C start_time,end_time

-D start_time,end_time

Only displays jobs that completed or exited during the specified time interval. Specify the span of time for which you want to display the history. If you do not specify a start time, the start time is assumed to be the time of the first occurrence. If you do not specify an end time, the end time is assumed to be now.

Specify the times in the format "yyyy/mm/dd/HH:MM". Do not specify spaces in the time interval string.

The time interval can be specified in many ways. For more specific syntax and examples of time formats, see TIME INTERVAL FORMAT.

Only displays jobs dispatched during the specified time interval. Specify the span of time for which you want to display the history. If you do not specify a start time, the start time is assumed to be the time of the first occurrence. If you do not specify an end time, the end time is assumed to be now.

Specify the times in the format "yyyy/mm/dd/HH:MM". Do not specify spaces in the time interval string.

The time interval can be specified in many ways. For more specific syntax and examples of time formats, see TIME INTERVAL FORMAT.

-S start_time,end_time

46 Platform LSF Command Reference

-T start_time,end_time

Only displays information about jobs submitted during the specified time interval. Specify the span of time for which you want to display the history. If you do not specify a start time, the start time is assumed to be the time of the first occurrence. If you do not specify an end time, the end time is assumed to be now.

Specify the times in the format "yyyy/mm/dd/HH:MM". Do not specify spaces in the time interval string.

The time interval can be specified in many ways. For more specific syntax and examples of time formats, see TIME INTERVAL FORMAT.

Used together with -t.

Only displays information about job events within the specified time interval. Specify the span of time for which you want to display the history. If you do not specify a start time, the start time is assumed to be the time of the first occurrence. If you do not specify an end time, the end time is assumed to be now.

Specify the times in the format

yyyy/mm/dd/HH:MM. Do not specify spaces in the

time interval string.

The time interval can be specified in many ways. For more specific syntax and examples of time formats, see Time Interval Format on page 49.

-f logfile_name Searches the specified event log. Specify either an absolute or a relative path.

Useful for analysis directly on the file.

The specified file path can contain up to 4094 characters for UNIX, or up to 255 characters for Windows.

-J job_name Only displays the jobs that have the specified job name.

The specified job name can contain up to 4094 characters for UNIX, or up to 255 characters for Windows.

-Lp ls_project_name Only displays information about jobs belonging to the specified License Scheduler

project.

-m host_name Only displays jobs dispatched to the specified host.

-n number_logfiles | -n 0

Searches the specified number of event logs, starting with the current event log and working through the most recent consecutively numbered logs. The maximum number of logs you can search is 100. Specify 0 to specify all the event log files in

$(LSB_SHAREDIR)/cluster_name/logdir (up to a maximum of 100 files).

If you delete a file, you break the consecutive numbering, and older files are inaccessible to

bhist.

For example, if you specify 3, LSF searches

lsb.events.2. If you specify 4, LSF searches lsb.events, lsb.events.1, lsb.events.2, and lsb.events.3. However, if lsb.events.2 is missing, both

searches include only

-N host_name | -N host_model | -N cpu_factor

Normalizes CPU time by the specified CPU factor, or by the CPU factor of the specified host or host model.

lsb.events, lsb.events.1, and

lsb.events and lsb.events.1.

Platform LSF Command Reference 47

Output

If you use bhist directly on an event log, you must specify a CPU factor.

Use

lsinfo to get host model and CPU factor information.

-P project_name Only displays information about jobs belonging to the specified project.

-q queue_name Only displays information about jobs submitted to the specified queue.

-u user_name | -u all Displays information about jobs submitted by the specified user, or by all users if

the keyword domain name in uppercase letters and use a single backslash (DOMAIN_NAME\ user_name) in a Windows command line or a double backslash (DOMAIN_NAME\\user_name) in a UNIX command line.

job_ID | "job_ID[index]"

Searches all event log files and only displays information about the specified jobs. If you specify a job array, displays all elements chronologically.

all is specified. To specify a Windows user account, include the

This option overrides all other options except with

-J, only those jobs listed here that have the specified job name are displayed.

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

-J, -N, -h, and -V. When it is used

Output

Default format

Statistics of the amount of time that a job has spent in various states:

PEND The total waiting time excluding user suspended time before the job is dispatched.

PSUSP The total user suspended time of a pending job.

RUN The total run time of the job.

USUSP The total user suspended time after the job is dispatched.

SSUSP The total system suspended time after the job is dispatched.

UNKWN The total unknown time of the job (job status becomes unknown if sbatchd on the

execution host is temporarily unreachable).

TOTAL The total time that the job has spent in all states; for a finished job, it is the

turnaround time (that is, the time interval from job submission to job completion).

Long format (-l)

The -l option displays a long format listing with the following additional fields:

Project The project the job was submitted from.

Application Profile The application profile the job was submitted to.

Command The job command.

48 Platform LSF Command Reference

Detailed history includes job group modification, the date and time the job was forwarded and the name of the cluster to which the job was forwarded.

The displayed job command can contain up to 4094 characters for UNIX, or up to 255 characters for Windows.

Initial checkpoint period

The initial checkpoint period specified at the job level, by bsub -k, or in an application profile with CHKPNT_INITPERIOD.

Checkpoint period The checkpoint period specified at the job level, by bsub -k, in the queue with

CHKPNT, or in an application profile with CHKPNT_PERIOD.

Checkpoint directory

Migration

The checkpoint directory specified at the job level, by bsub -k, in the queue with CHKPNT, or in an application profile with CHKPNT_DIR.

The migration threshold specified at the job level, by bsub -mig.

threshold

Files

Reads lsb.events

Time Interval Format

You use the time interval to define a start and end time for collecting the data to be retrieved and displayed. While you can specify both a start and an end time, you can also let one of the values default. You can specify either of the times as an absolute time, by specifying the date or time, or you can specify them relative to the current time.

Specify the time interval is follows:

start_time,end_time|start_time,|,end_time|start_time

Specify start_time or end_time in the following format:

[year/][month/][day][/hour:minute|/hour:]|.|.-relative_int

Where:

◆ year is a four-digit number representing the calendar year.

◆ month is a number from 1 to 12, where 1 is January and 12 is December.

◆ day is a number from 1 to 31, representing the day of the month.

◆ hour is an integer from 0 to 23, representing the hour of the day on a 24-hour

clock.

◆ minute is an integer from 0 to 59, representing the minute of the hour.

◆ . (period) represents the current month/day/hour:minute.

◆ .-relative_int is a number, from 1 to 31, specifying a relative start or end time

prior to now.

start_time,end_time

Platform LSF Command Reference 49

Time Interval Format

Specifies both the start and end times of the interval.

start_time,

Specifies a start time, and lets the end time default to now.

,end_time

Specifies to start with the first logged occurrence, and end at the time specified.

start_time

Starts at the beginning of the most specific time period specified, and ends at the maximum value of the time period specified. For example, of February—start February 1 at 00:00 a.m. and end at the last possible minute in February: February 28th at midnight.

Absolute Time Examples

Assume the current time is May 9 17:06 2008:

1,8 = May 1 00:00 2008 to May 8 23:59 2008

,4 = the time of the first occurrence to May 4 23:59 2008

6 = May 6 00:00 2008 to May 6 23:59 2008

2/ = Feb 1 00:00 2008 to Feb 28 23:59 2008

2/ specifies the month

/12: = May 9 12:00 2008 to May 9 12:59 2008

2/1 = Feb 1 00:00 2008 to Feb 1 23:59 2008

2/1, = Feb 1 00:00 to the current time

,. = the time of the first occurrence to the current time

,2/10: = the time of the first occurrence to May 2 10:59 2008

2001/12/31,2008/5/1 = from Dec 31, 2001 00:00:00 to May 1st 2008 23:59:59

Relative Time Examples

.-9, = April 30 17:06 2008 to the current time

,.-2/ = the time of the first occurrence to Mar 7 17:06 2008

.-9,.-2 = nine days ago to two days ago (April 30, 2008 17:06 to May 7, 2008 17:06)

50 Platform LSF Command Reference

bhosts

Synopsis

Description

displays hosts and their static and dynamic resources

bhosts [-e | -l | -w] [-x] [-X] [-R "res_req"]

[host_name | host_group] ...

bhosts [-e | -l | -w] [-X] [-R "res_req"] [cluster_name]

bhosts [-e ] -s [resource_name ...]

bhosts [-h | -V]

By default, returns the following information about all hosts: host name, host status, job state statistics, and job slot limits.

bhosts displays output for condensed host groups. These host groups are defined

CONDENSE in the HostGroup section of lsb.hosts. These host groups are

displayed as a single entry with the name as defined by

HostGroup section of lsb.hosts.

The

-l and -X options display uncondensed output.

The

-s option displays information about the numeric resources (shared or

GROUP_NAME in the

host-based) and their associated hosts.

Options

With MultiCluster, displays the information about hosts available to the local cluster. Use -e to view information about exported hosts.

-e MultiCluster only. Displays information about resources that have been exported to

another cluster.

-l Displays host information in a (long) multi-line format. In addition to the default

fields, displays information about the CPU factor, the current load, and the load thresholds.

Also displays information about the dispatch windows.

If you specified an administrator comment with the commands

-w Displays host information in wide format. Fields are displayed without truncation.

For condensed host groups, the number of hosts with the

hclose or hopen, -l displays the comment text.

-w option displays the overall status and the

ok, unavail, unreach, and busy status in the following

-C option of the host control

format:

host_group_status num_ok/num_unavail/num_unreach/num_busy

where

◆ host_group_status is the overall status of the host group. If a single host in the

host group is

ok, the overall status is also ok.

Platform LSF Command Reference 51

Options

◆ num_ok, num_unavail, num_unreach, and num_busy are the number of hosts

that are

ok, unavail, unreach, and busy, respectively.

For example, if there are five in a condensed host group

hg1 ok 5/2/1/3

ok, two unavail, one unreach, and three busy hosts

hg1, its status is displayed as the following:

If any hosts in the host group are closed, the status for the host group is displayed as

closed, with no status for the other states:

hg1 closed

-x Display hosts whose job exit rate has exceeded the threshold configured by

EXIT_RATE in configured in next time LSF checks host exceptions and invokes

Use with the

If no hosts exceed the job exit rate,

There is no exceptional host found

lsb.hosts for longer than JOB_EXIT_RATE_DURATION

lsb.params, and are still high. By default, these hosts are closed the

eadmin.

-l option to show detailed information about host exceptions.

bhosts -x displays:

-X Displays uncondensed output for host groups.

-R "res_req" Only displays information about hosts that satisfy the resource requirement

expression. For more information about resource requirements, see Administering Platform LSF. The size of the resource requirement string is limited to 512 bytes.

LSF supports ordering of resource requirements on all load indices, including external load indices, either static or dynamic.

-s [resource_name ...]

Displays information about the specified resources (shared or host-based). The resources must have numeric values. Returns the following information: the resource names, the total and reserved amounts, and the resource locations.

bhosts -s only shows consumable resources.

When LOCAL_TO is configured for a license feature in

bhosts -s shows different resource information depending on the cluster locality

of the features. For example:

From

bhosts -s

RESOURCE TOTAL RESERVED LOCATION

hspice 36.0 0.0 host1

From clusterB in siteB:

bhosts -s

RESOURCE TOTAL RESERVED LOCATION

hspice 76.0 0.0 host2

host_name ... | host_group ...

Only displays information about the specified hosts. Do not use quotes when specifying multiple hosts.

lsf.licensescheduler,

clusterA:

52 Platform LSF Command Reference

For host groups, the names of the hosts belonging to the group are displayed instead of the name of the host group. Do not use quotes when specifying multiple host groups.

cluster_name MultiCluster only. Displays information about hosts in the specified cluster.

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

Output

Host-Based Default

Displays the following fields:

HOST_NAME The name of the host. If a host has batch jobs running and the host is removed from

the configuration, the host name is displayed as

For condensed host groups, this is the name of host group.

STATUS With MultiCluster, not shown for fully exported hosts.

lost_and_found.

The current status of the host and the dispatched to hosts with an

ok status. The possible values for host status are as

sbatchd daemon. Batch jobs can only be

follows:

The host is available to accept batch jobs.

For condensed host groups, if a single host in the host group is is also shown as

If any host in the host group is not

ok.

ok, bhosts displays the first host status it

encounters as the overall status for the condensed host group. Use

ok, the overall status

bhosts -X to see

the status of individual hosts in the host group.

unavail

The host is down, or LIM and sbatchd on the host are unreachable.

unreach

LIM on the host is running but sbatchd is unreachable.

closed

The host is not allowed to accept any remote batch jobs. There are several reasons for the host to be closed (see Host-Based

unlicensed

-l Options).

The host does not have a valid LSF license.

JL/U With MultiCluster, not shown for fully exported hosts.

The maximum number of job slots that the host can process on a per user basis. If a dash (-) is displayed, there is no limit.

For condensed host groups, this is the total number of job slots that all hosts in the host group can process on a per user basis.

Platform LSF Command Reference 53

Output

The host does not allocate more than JL/U job slots for one user at the same time. These job slots are used by running jobs, as well as by suspended or pending jobs that have slots reserved for them.

For preemptive scheduling, the accounting is different. These job slots are used by running jobs and by pending jobs that have slots reserved for them (see the description of PREEMPTIVE in

lsb.queues(5) and JL/U in lsb.hosts(5)).

MAX The maximum number of job slots available. If a dash (-) is displayed, there is no

limit.

For condensed host groups, this is the total maximum number of job slots available in all hosts in the host group.

These job slots are used by running jobs, as well as by suspended or pending jobs that have slots reserved for them.

If preemptive scheduling is used, suspended jobs are not counted (see the description of PREEMPTIVE in

A host does not always have to allocate this many job slots if there are waiting jobs; the host must also satisfy its configured load conditions to accept more jobs.

lsb.queues(5) and MXJ in lsb.hosts(5)).

NJOBS The number of job slots used by jobs dispatched to the host. This includes running,

suspended, and chunk jobs.

For condensed host groups, this is the total number of job slots used by jobs dispatched to any host in the host group.

RUN The number of job slots used by jobs running on the host.

For condensed host groups, this is the total number of job slots used by jobs running on any host in the host group.

SSUSP The number of job slots used by system suspended jobs on the host.

For condensed host groups, this is the total number of job slots used by system suspended jobs on any host in the host group.

USUSP The number of job slots used by user suspended jobs on the host. Jobs can be

suspended by the user or by the LSF administrator.

For condensed host groups, this is the total number of job slots used by user suspended jobs on any host in the host group.

RSV The number of job slots used by pending jobs that have jobs slots reserved on the

host.

For condensed host groups, this is the total number of job slots used by pending jobs that have job slots reserved on any host in the host group.

Host-Based -l Option

In addition to the above fields, the -l option also displays the following:

loadSched, loadStop

54 Platform LSF Command Reference

The scheduling and suspending thresholds for the host. If a threshold is not defined, the threshold from the queue definition applies. If both the host and the queue define a threshold for a load index, the most restrictive threshold is used.

The migration threshold is the time that a job dispatched to this host can remain suspended by the system before LSF attempts to migrate the job to another host.

If the host’s operating system supports checkpoint copy, this is indicated here. With checkpoint copy, the operating system automatically copies all open files to the checkpoint directory when a process is checkpointed. Checkpoint copy is currently supported only on Cray systems.

STATUS The long format shown by the -l option gives the possible reasons for a host to be

closed:

closed_Adm

The host is closed by the LSF administrator or root (see badmin(8)). No job can be dispatched to the host, but jobs that are executing on the host are not affected.

closed_Lock

The host is locked by the LSF administrator or root (see lsadmin(8)). All batch jobs on the host are suspended by LSF.

closed_Wind

The host is closed by its dispatch windows, which are defined in the configuration file

lsb.hosts(5). Jobs already started are not affected by the dispatch windows.

closed_Full

The configured maximum number of batch job slots on the host has been reached (see MAX field below).

closed_Excl

The host is currently running an exclusive job.

closed_Busy

The host is overloaded, because some load indices go beyond the configured thresholds (see

lsb.hosts(5)). The displayed thresholds that cause the host to be

busy are preceded by an asterisk (*).

closed_LIM

LIM on the host is unreachable, but sbatchd is ok.

closed_EGO

For EGO-enabled SLA scheduling, host is closed because it has not been allocated by EGO to run LSF jobs. Hosts allocated from EGO display status

ok.

CPUF Displays the CPU normalization factor of the host (see lshosts(1)).

DISPATCH_WINDOW

Displays the dispatch windows for each host. Dispatch windows are the time windows during the week when batch jobs can be run on each host. Jobs already started are not affected by the dispatch windows. When the dispatch windows close, jobs are not suspended. Jobs already running continue to run, but no new jobs are started until the windows reopen. The default for the dispatch window is no restriction or always open (that is, twenty-four hours a day and seven days a week). For the dispatch window specification, see the description for the DISPATCH_WINDOWS keyword under the

-l option in bqueues(1).

Platform LSF Command Reference 55

Output

CURRENT LOAD Displays the total and reserved host load.

Reserved

You specify reserved resources by using bsub -R. These resources are reserved by jobs running on the host.

To ta l

The total load has different meanings depending on whether the load index is increasing or decreasing.

For increasing load indices, such as run queue lengths, CPU utilization, paging activity, logins, and disk I/O, the total load is the consumed plus the reserved amount. The total load is calculated as the sum of the current load and the reserved load. The current load is the load seen by

lsload(1).

For decreasing load indices, such as available memory, idle time, available swap space, and available space in tmp, the total load is the available amount. The total load is the difference between the current load and the reserved load. This difference is the available resource as seen by

lsload(1).

LOAD THRESHOLD Displays the scheduling threshold loadSched and the suspending threshold

loadStop. Also displays the migration threshold if defined and the checkpoint

support if the host supports checkpointing.

The format for the thresholds is the same as for batch job queues (see and

lsb.queues(5)). For an explanation of the thresholds and load indices, see the

description for the "QUEUE SCHEDULING PARAMETERS" keyword under the

-l option in bqueues(1).

THRESHOLD AND LOAD USED FOR EXCEPTIONS

Displays the configured threshold of EXIT_RATE for the host and its current load value for host exceptions.

ADMIN ACTION COMMENT

If the LSF administrator specified an administrator comment with the -C option of the

badmin host control commands hclose or hopen, the comment text is

displayed.

Resource-Based -s Option

The -s option displays the following: the amounts used for scheduling, the amounts reserved, and the associated hosts for the resources. Only resources (shared or host-based) with numeric values are displayed. See on how to configure shared resources.

The following fields are displayed:

RESOURCE The name of the resource.

bqueues(1))

lim(8), and lsf.cluster(5)

TOTAL The total amount free of a resource used for scheduling.

RESERVED The amount reserved by jobs. You specify the reserved resource using bsub -R.

LOCATION The hosts that are associated with the resource.

56 Platform LSF Command Reference

Files

bhpart

Synopsis

Description

Options

-r Displays the entire information tree associated with the host partition recursively.

host_partition_name ...

-h Prints command usage to stderr and exits.

displays information about host partitions

bhpart [-r] [host_partition_name ...]

bhpart [-h | -V]

By default, displays information about all host partitions. Host partitions are used to configure host-partition fairshare scheduling.

Displays information about the specified host partitions only.

-V Prints LSF release version to stderr and exits.

Output

The following fields are displayed for each host partition:

HOST_PARTITION_NAME

Name of the host partition.

HOSTS

Hosts or host groups that are members of the host partition. The name of a host group is appended by a slash (

USER/GROUP

Name of users or user groups who have access to the host partition (see

bugroup(1)).

Number of shares of resources assigned to each user or user group in this host partition, as configured in the file priority for when fairshare scheduling is configured at the host level.

PRIORITY

Dynamic user priority for the user or user group. Larger values represent higher priorities. Jobs belonging to the user or user group with the highest priority are considered first for dispatch.

/) (see bmgroup(1)).

lsb.hosts. The shares affect dynamic user

In general, users or user groups with larger SHARES, fewer STARTED and RESERVED, and a lower CPU_TIME and RUN_TIME have higher PRIORITY.

58 Platform LSF Command Reference

STARTED

RESERVED

CPU_TIME

RUN_TIME

Number of job slots used by running or suspended jobs owned by users or user groups in the host partition.

Number of job slots reserved by the jobs owned by users or user groups in the host partition.

Cumulative CPU time used by jobs of users or user groups executed in the host partition. Measured in seconds, to one decimal place.

LSF calculates the cumulative CPU time using the actual (not normalized) CPU time and a decay factor such that 1 hour of recently-used CPU time decays to 0.1 hours after an interval of time specified by HIST_HOURS in by default).

Wall-clock run time plus historical run time of jobs of users or user groups that are executed in the host partition. Measured in seconds.

LSF calculates the historical run time using the actual run time of finished jobs and a decay factor such that 1 hour of recently-used run time decays to 0.1 hours after an interval of time specified by HIST_HOURS in Wall-clock run time is the run time of running jobs.

lsb.params (5 hours by default).

lsb.params (5 hours

Files

bgmod

Synopsis

Description

Options

-L limit Changes the limit of job_group_name to the specified limit value. If the job group

modifies job groups

bgmod [-L limit | -Ln] job_group_name

bgmod [-h | -V]

Modifies the job group with the job group name specified by job_group_name.

Only root, LSF administrators, the job group creator, or the creator of the parent job groups can use

You must provide full group path name for the modified job group. The last component of the path is the name of the job group to be modified.

has parent job groups, the new limit cannot exceed the limits of any higher level job groups. Similarly, if the job group has child job groups, the new value must be greater than any limits on the lower level job groups.

limit specifies the maximum number of concurrent jobs allowed to run under the job group (including child groups) SSUSP, USSUP) under the job group.

bgmod to modify a job group limit.

-L limits the number of started jobs (RUN,

Specify a positive number between 0 and 2147483647. If the specified limit is zero (0), no jobs under the job group can run.

You cannot specify a limit for the root job group. The root job group has no job limit. The -L option only limits the lowest level job group specified.

If a parallel job requests 2 CPUs ( slots used by the job.

-Ln Removes the existing job limit for the job group. If the job group has parent job

groups, the job modified group automatically inherits any limits from its direct parent job group.

job_group_name Full path of the job group name.

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

Examples

The following command only modifies the limit of group

/canada/projects/test1. It does not modify limits of /canada

/canada/projects.

bgmod -L 6 /canada/projects/test1

bsub -n 2), the job group limit is per job, not per

60 Platform LSF Command Reference

bjgroup

displays information about job groups

Synopsis

bjgroup [-N] [-s [group_name]]

bjgroup [-h | -V]

Description

Displays job group information.

Options

-s Sorts job groups by group hierarchy.

For example, for job groups named displays:

bjgroup

GROUP_NAME NJOBS PEND RUN SSUSP USUSP FINISH SLA JLIMIT OWNER

/A 0 0 0 0 0 0 () 0/10 user1

/X 0 0 0 0 0 0 () 0/- user2

/A/B 0 0 0 0 0 0 () 0/5 user1

/X/Y 0 0 0 0 0 0 () 0/5 user2

For the same job groups, bjgroup -s displays:

bjgroup -s

GROUP_NAME NJOBS PEND RUN SSUSP USUSP FINISH SLA JLIMIT OWNER

/A 0 0 0 0 0 0 () 0/10 user1

/A/B 0 0 0 0 0 0 () 0/5 user1

/X 0 0 0 0 0 0 () 0/- user2

/X/Y 0 0 0 0 0 0 () 0/5 user2

Specify a job group name to show the hierarchy of a single job group:

bjgroup -s /X

GROUP_NAME NJOBS PEND RUN SSUSP USUSP FINISH SLA JLIMIT OWNER

/X 25 0 25 0 0 0 puccini 25/100 user1

/X/Y 20 0 20 0 0 0 puccini 20/30 user1

/X/Z 5 0 5 0 0 0 puccini 5/10 user2

Specify a job group name with a trailing slash character (/) to show only the root job group:

bjgroup -s /X/

GROUP_NAME NJOBS PEND RUN SSUSP USUSP FINISH SLA JLIMIT OWNER

/X 25 0 25 0 0 0 puccini 25/100 user1

/A, /A/B, /X and /X/Y, bjgroup without -s

62 Platform LSF Command Reference

Displays job group information by job slots instead of number of jobs. NSLOTS,

-N

PEND, RUN, SSUSP, USUSP, RSV are all counted in slots rather than number of jobs:

bjgroup -N

GROUP_NAME NSLOTS PEND RUN SSUSP USUSP RSV SLA OWNER

/X 25 0 25 0 0 0 puccini user1

/A/B 20 0 20 0 0 0 wagner batch

by itself shows job slot info for all job groups, and can combine with -s to sort

-N

the job groups by hierarchy:

bjgroup -N -s

GROUP_NAME NSLOTS PEND RUN SSUSP USUSP RSV SLA OWNER

/A 0 0 0 0 0 0 wagner batch

/A/B 0 0 0 0 0 0 wagner user1

/X 25 0 25 0 0 0 puccini user1

/X/Y 20 0 20 0 0 0 puccini batch

/X/Z 5 0 5 0 0 0 puccini batch

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

Default output

GROUP_NAME

NJOBS

PEND

RUN

SSUSP

USUSP

FINISH

A list of job groups is displayed with the following fields:

The name of the job group.

The current number of jobs in the job group. A parallel job is counted as 1 job, regardless of the number of job slots it uses.

The number of pending jobs in the job group.

The number of running jobs in the job group.

The number of system-suspended jobs in the job group.

The number of user-suspended jobs in the job group.

The number of jobs in the specified job group in EXITED or DONE state.

Platform LSF Command Reference 63

Job slots (-N) output

SLA

The name of the service class that the job group is attached to with

bgadd -sla service_class_name. If the job group is not attached to any service class,

empty parentheses

() are displayed in the SLA name column.

JLIMIT

The job group limit set by bgadd -L or bgmod -L. Job groups that have no configured limits or no limit usage are indicated by a dash ( displayed in a USED/LIMIT format. For example, if a limit of 5 jobs is configured and 1 job is started,

bjgroup displays the job limit under JLIMIT as 1/5.

-). Job group limits are

OWNER

The job group owner.

Example

bjgroup

GROUP_NAME NJOBS PEND RUN SSUSP USUSP FINISH SLA JLIMIT OWNER

/fund1_grp 5 4 0 1 0 0 Venezia 1/5 user1

/fund2_grp 11 2 5 0 0 4 Venezia 5/5 user1

/bond_grp 2 2 0 0 0 0 Venezia 0/- user2

/risk_grp 2 1 1 0 0 0 () 1/- user2

/admi_grp 4 4 0 0 0 0 () 0/- user2

Job slots (-N) output

NSLOTS, PEND, RUN, SSUSP, USUSP, RSV are all counted in slots rather than number of jobs. A list of job groups is displayed with the following fields:

GROUP_NAME

The name of the job group.

NSLOTS

The total number of job slots held currently by jobs in the job group. This includes pending, running, suspended and reserved job slots. A parallel job that is running on n processors is counted as n job slots, since it takes n job slots in the job group.

PEND

The number of job slots used by pending jobs in the job group.

RUN

The number of job slots used by running jobs in the job group.

SSUSP

The number of job slots used by system-suspended jobs in the job group.

USUSP

The number of job slots used by user-suspended jobs in the job group.

64 Platform LSF Command Reference

RSV

The number of job slots in the job group that are reserved by LSF for pending jobs.

SLA

The name of the service class that the job group is attached to with

bgadd -sla service_class_name. If the job group is not attached to any service class,

empty parentheses

() are displayed in the SLA name column.

OWNER

The job group owner.

Example

bjgroup -N

GROUP_NAME NSLOTS PEND RUN SSUSP USUSP RSV SLA OWNER

/X 25 0 25 0 0 0 puccini user1

/A/B 20 0 20 0 0 0 wagner batch

bjobs

Synopsis

displays information about LSF jobs

bjobs [-A] [-a] [-W] [-w | -l] [-X] [-x]

job_ID | "job_ID[index_list]" ...

bjobs [-A] [-d] [-p] [-r] [-s] [-W] [-w | -l] [-X] [-x]

-app application_profile_name] [-g job_group_name]

job_ID |"job_ID[index_list]" ...

bjobs [-w | -l | -aps] [-A] [-a] [-d] [-p] [-s] [-r] [-X] [-x]

[-m host_name] [-q queue_name] [-u user_name | -u user_group | -u all | -G user_group]

-g job_group] [-sla service_class] [-P project_name]

[ [-N host_spec] [-Lp license_project] [-app application_profile] [-J name_spec] [job_ID |"job_ID[index_list]" ...]

bjobs [-h | -V]

Description

Options

By default, displays information about your own pending, running and suspended jobs.

bjobs displays output for condensed host groups. These host groups are defined by CONDENSE in the HostGroup section of lsb.hosts. These host groups are displayed

as a single entry with the name as defined by of

lsb.hosts. The -l and -X options display uncondensed output.

If you defined LSB_SHORT_HOSTLIST=1 in

GROUP_NAME in the HostGroup section

lsf.conf, parallel jobs running in

the same condensed host group are displayed as an abbreviated list.

To display older historical information, use

-A Displays summarized information about job arrays. If you specify job arrays with

the job array ID, and also specify

-A, do not include the index list with the job array

bhist.

ID.

You ca n u s e

-a Displays information about jobs in all states, including finished jobs that finished

recently, within an interval specified by CLEAN_PERIOD in

-w to show the full array specification, if necessary.

lsb.params (the

default period is 1 hour).

66 Platform LSF Command Reference

Use -a with -x option to display all jobs that have triggered a job exception (overrun, underrun, idle).

-aps Displays absolute priority scheduling (APS) information for pending jobs in a

queue with APS_PRIORITY enabled. The APS value is calculated based on the current scheduling cycle, so jobs are not guaranteed to be dispatched in this order.

Pending jobs are ordered by APS value. Jobs with system APS values are listed first, from highest to lowest APS value. Jobs with calculated APS values are listed next ordered from high to low value. Finally, jobs not in an APS queue are listed. Jobs with equal APS values are listed in order of submission time. APS values of jobs not in an APS queue are shown with a dash (

If queues are configured with the same priority,

-).

bjobs -aps may not show jobs in

the correct expected dispatch order. Jobs may be dispatched in the order the queues are configured in

lsb.queues. You should avoid configuring queues with the same

priority.

-d Displays information about jobs that finished recently, within an interval specified

by CLEAN_PERIOD in

-l Long format. Displays detailed information for each job in a multiline format.

The

-l option displays the following additional information: project name, job

lsb.params (the default period is 1 hour).

command, current working directory on the submission host, initial checkpoint period, checkpoint directory, migration threshold, pending and suspending reasons, job status, resource usage, resource usage limits information, runtime resource usage information on the execution hosts.

Use

bjobs -A -l to display detailed information for job arrays including job array

job limit (

If JOB_IDLE is configured in the queue, use

%job_limit) if set.

bjobs -l to display job idle exception

information.

If you submitted your job with the with the

brsvadd command, bjobs -l shows the reservation ID used by the job.

If LSF_HPC_EXTENSIONS="SHORT_PIDLIST" is specified in

-U option to use advance reservations created

lsf.conf, the

output from bjobs is shortened to display only the first PID and a count of the process group IDs (PGIDs) and process IDs for the job. Without SHORT_PIDLIST, all of the process IDs (PIDs) for a job are displayed.

If you submitted a job with multiple resource requirement strings using the option for the order, same, rusage, and select sections,

bjobs -l displays a single,

bsub -R

merged resource requirement string for those sections, as if they were submitted using a single

If you submitted a job using the this option displays the

For jobs submitted to an absolute priority scheduling (APS) queue,

-R.

OR (||) expression to specify alternative resources,

Execution rusage string with which the job runs.

-l shows the

ADMIN factor value and the system APS value if they have been set by the administrator for the job:

-p Displays pending jobs, together with the pending reasons that caused each job not

to be dispatched during the last dispatch turn. The pending reason shows the number of hosts for that reason, or names the hosts if

-l is also specified.

Platform LSF Command Reference 67

Options

With MultiCluster, -l shows the names of hosts in the local cluster.

Each pending reason is associated with one or more hosts and it states the cause why these hosts are not allocated to run the job. In situations where the job requests specific hosts (using

bsub -m), users may see reasons for unrelated hosts also being

displayed, together with the reasons associated with the requested hosts.

The life cycle of a pending reason ends after the time indicated by PEND_REASON_UPDATE_INTERVAL in

lsb.params.

When the job slot limit is reached for a job array (

bsub -J "jobArray[indexList]%job_slot_limit") the following message is

displayed:

The job array has reached its job slot limit.

-r Displays running jobs.

-s Displays suspended jobs, together with the suspending reason that caused each job

to become suspended.

The suspending reason may not remain the same while the job stays suspended. For example, a job may have been suspended due to the paging rate, but after the paging rate dropped another load index could prevent the job from being resumed. The suspending reason is updated according to the load index. The reasons could be as old as the time interval specified by SBD_SLEEP_TIME in

lsb.params. So the

reasons shown may not reflect the current load situation.

-W Provides resource usage information for: PROJ_NAME, CPU_USED, MEM,

SWAP, PIDS, START_TIME, FINISH_TIME.

-w Wide format. Displays job information without truncating fields.

-X Displays uncondensed output for host groups.

-x Displays unfinished jobs that have triggered a job exception (overrun, underrun,

idle). Use with the

-l option to show the actual exception status. Use with -a to

display all jobs that have triggered a job exception.

-app application_profile_name

Displays information about jobs submitted to the specified application profile. You must specify an existing application profile.

-G user_group Only displays jobs associated with a user group submitted with bsub -G for the

specified user group. The

–G option does not display jobs from subgroups within

the specified user group.

-G option cannot be used together with the -u option. You can only specify a

The user group name. The keyword all is not supported for

-g job_group_name Displays information about jobs attached to the job group specified by

-G.

job_group_name. For example:

bjobs -g /risk_group

JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 113 user1 PEND normal hostA myjob Jun 17 16:15 111 user2 RUN normal hostA hostA myjob Jun 14 15:13 110 user1 RUN normal hostB hostA myjob Jun 12 05:03 104 user3 RUN normal hostA hostC myjob Jun 11 13:18

68 Platform LSF Command Reference

Use -g with -sla to display job groups attached to a service class. Once a job group is attached to a service class, all jobs submitted to that group are subject to the SLA.

bjobs -l with -g displays the full path to the group to which a job is attached. For

example:

bjobs -l -g /risk_group

Job <101>, User <user1>, Project <default>, Job Group </risk_group>, Status <RUN>, Queue <normal>, Command <myjob> Tue Jun 17 16:21:49: Submitted from host <hostA>, CWD </home/user1; Tue Jun 17 16:22:01: Started on <hostA>; ...

-J job_name Displays information about the specified jobs or job arrays. Only displays jobs that

were submitted by the user running this command.

The job name can be up to 4094 characters long for UNIX and Linux or up to 255 characters for Windows.

-Lp ls_project_name Displays jobs that belong to the specified LSF License Scheduler project.

-m host_name ... | -m host_group ... | -m cluster_name ...

Only displays jobs dispatched to the specified hosts. To see the available hosts, use

bhosts.

If a host group is specified, displays jobs dispatched to all hosts in the group. To determine the available host groups, use

bmgroup.

With MultiCluster, displays jobs in the specified cluster. If a remote cluster name is specified, you see the remote job ID, even if the execution host belongs to the local cluster. To determine the available clusters, use

-N host_name |-Nhost_model |-Ncpu_factor

Displays the normalized CPU time consumed by the job. Normalizes using the CPU factor specified, or the CPU factor of the host or host model specified.

-P project_name Only displays jobs that belong to the specified project.

-q queue_name Only displays jobs in the specified queue.

The command

bqueues returns a list of queues configured in the system, and

information about the configurations of these queues.

In MultiCluster, you cannot specify remote queues.

-sla service_class_name

Displays jobs belonging to the specified service class.

bjobs also displays information about jobs assigned to a default SLA configured

with ENABLE_DEFAULT_EGO_SLA in

Use

-sla with -g to display job groups attached to a service class. Once a job

group is attached to a service class, all jobs submitted to that group are subject to the SLA.

Use

bsla to display the configuration properties of service classes configured in

lsb.serviceclasses, the default SLA configured in lsb.params, and dynamic

information about the state of each service class.

bclusters.

lsb.params.

-u user_name... | -u user_group... | -u all

Platform LSF Command Reference 69

Output

job_ID | "job_ID[index]"

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

Output

Only displays jobs that have been submitted by the specified users or user groups. The keyword

all specifies all users. To specify a Windows user account, include the

domain name in uppercase letters and use a single backslash (DOMAIN_NAME\ user_name) in a Windows command line or a double backslash

(DOMAIN_NAME\\user_name) in a UNIX command line.

The

-u option cannot be used with the -G option.

Displays information about the specified jobs or job arrays.

If you use

-A, specify job array IDs without the index list.

Pending jobs are displayed in the order in which they are considered for dispatch. Jobs in higher priority queues are displayed before those in lower priority queues. Pending jobs in the same priority queues are displayed in the order in which they were submitted but this order can be changed by using the commands

bbot. If more than one job is dispatched to a host, the jobs on that host are listed in

btop or

the order in which they are considered for scheduling on this host by their queue priorities and dispatch times. Finished jobs are displayed in the order in which they were completed.

Default Display

A listing of jobs is displayed with the following fields:

JOBID The job ID that LSF assigned to the job.

USER The user who submitted the job.

STAT The current status of the job (see JOB STATUS below).

QUEUE The name of the job queue to which the job belongs. If the queue to which the job

belongs has been removed from the configuration, the queue name is displayed as

lost_and_found. Use bhist to get the original queue name. Jobs in the lost_and_found queue remain pending until they are switched with the bswitch

command into another queue.

In a MultiCluster resource leasing environment, jobs scheduled by the consumer cluster display the remote queue name in the format queue_name@cluster_name. By default, this field truncates at 10 characters, so you might not see the cluster name unless you use

FROM_HOST The name of the host from which the job was submitted.

With MultiCluster, if the host is in a remote cluster, the cluster name and remote job ID are appended to the host name, in the format host_name@cluster_name:job_ID. By default, this field truncates at 11 characters; you might not see the cluster name and job ID unless you use

-w or -l.

70 Platform LSF Command Reference

EXEC_HOST The name of one or more hosts on which the job is executing (this field is empty if

the job has not been dispatched). If the host on which the job is running has been removed from the configuration, the host name is displayed as Use

bhist to get the original host name.

If the host is part of a condensed host group, the host name is displayed as the name of the condensed host group.

If you configure a host to belong to more than one condensed host groups using wildcards,

bjobs can display any of the host groups as execution host name.

lost_and_found.

JOB_NAME The job name assigned by the user, or the command string assigned by default at

job submission with the latter part of the job name is displayed.

The displayed job name or job command can contain up to 4094 characters for UNIX, or up to 255 characters for Windows.

bsub. If the job name is too long to fit in this field, then only

SUBMIT_TIME The submission time of the job.

-l output

The -l option displays a long format listing with the following additional fields:

Project The project the job was submitted from.

Application Profile The application profile the job was submitted to.

Command The job command.

CWD The current working directory on the submission host.

Initial checkpoint period

The initial checkpoint period specified at the job level, by bsub -k, or in an application profile with CHKPNT_INITPERIOD.

Checkpoint period The checkpoint period specified at the job level, by bsub -k, in the queue with

CHKPNT, or in an application profile with CHKPNT_PERIOD.

Checkpoint directory

Migration

The checkpoint directory specified at the job level, by bsub -k, in the queue with CHKPNT, or in an application profile with CHKPNT_DIR.

The migration threshold specified at the job level, by bsub -mig.

threshold

Post-execute

The post-execution command specified at the job-level, by bsub -Ep.

Command

PENDING REASONS The reason the job is in the PEND or PSUSP state. The names of the hosts

associated with each reason are displayed when both specified.

-p and -l options are

SUSPENDING REASONS

The reason the job is in the USUSP or SSUSP state.

Platform LSF Command Reference 71

Output

loadSched

The load scheduling thresholds for the job.

loadStop

The load suspending thresholds for the job.

JOB STATUS Possible values for the status of a job include:

PEND

The job is pending, that is, it has not yet been started.

PSUSP

The job has been suspended, either by its owner or the LSF administrator, while pending.

RUN

The job is currently running.

USUSP

The job has been suspended, either by its owner or the LSF administrator, while running.

SSUSP

The job has been suspended by LSF. The job has been suspended by LSF due to either of the following two causes:

◆ The load conditions on the execution host or hosts have exceeded a threshold

according to the

◆ The run window of the job’s queue is closed. See bqueues(1), bhosts(1), and

lsb.queues(5).

loadStop vector defined for the host or queue.

DONE

The job has terminated with status of 0.

EXIT

The job has terminated with a non-zero status – it may have been aborted due to an error in its execution, or killed by its owner or the LSF administrator.

For example, exit code 131 means that the job exceeded a configured resource usage limit and LSF killed the job.

UNKWN

mbatchd has lost contact with the sbatchd on the host on which the job runs.

WAI T

For jobs submitted to a chunk job queue, members of a chunk job that are waiting to run.

ZOMBI

A job becomes ZOMBI if:

◆ A non-rerunnable job is killed by bkill while the sbatchd on the execution

host is unreachable and the job is shown as UNKWN.

72 Platform LSF Command Reference

◆ The host on which a rerunnable job is running is unavailable and the job has

been requeued by LSF with a new job ID, as if the job were submitted as a new job.

◆ After the execution host becomes available, LSF tries to kill the ZOMBI job.

Upon successful termination of the ZOMBI job, the job’s status is changed to EXIT.

With MultiCluster, when a job running on a remote execution cluster becomes a ZOMBI job, the execution cluster treats the job the same way as local ZOMBI jobs. In addition, it notifies the submission cluster that the job is in ZOMBI state and the submission cluster requeues the job.

RUNTIME Estimated run time for the job, specified by bsub -We or bmod -We.

RESOURCE USAGE For the MultiCluster job forwarding model, this information is not shown if

MultiCluster resource usage updating is disabled.

The values for the current usage of a job include:

CPU time

Cumulative total CPU time in seconds of all processes in a job.

IDLE_FACTOR

Job idle information (CPU time/runtime) if JOB_IDLE is configured in the queue, and the job has triggered an idle exception.

MEM

Total resident memory usage of all processes in a job. By default, memory usage is shown in MB. Use LSF_UNIT_FOR_LIMITS in

lsf.conf to specify a larger unit

for display (MB, GB, TB, PB, or EB).

SWAP

Total virtual memory usage of all processes in a job. By default, swap space is shown in MB. Use LSF_UNIT_FOR_LIMITS in

lsf.conf to specify a larger unit for

display (MB, GB, TB, PB, or EB).

NTHREAD

Number of currently active threads of a job.

PGID

Currently active process group ID in a job.

PIDs

Currently active processes in a job.

RESOURCE LIMITS The hard resource usage limits that are imposed on the jobs in the queue (see

getrlimit(2) and lsb.queues(5)). These limits are imposed on a per-job and a

per-process basis.

The possible per-job resource usage limits are:

◆ CPULIMIT

◆ PROCLIMIT

◆ MEMLIMIT

Platform LSF Command Reference 73

Output

◆ SWAPLIMIT

◆ PROCESSLIMIT

◆ THREADLIMIT

◆ OPENFILELIMIT

The possible UNIX per-process resource usage limits are:

◆ RUNLIMIT

◆ FILELIMIT

◆ DATALIMIT

◆ STACKLIMIT

◆ CORELIMIT

If a job submitted to the queue has any of these limits specified (see the lower of the corresponding job limits and queue limits are used for the job.

If no resource limit is specified, the resource is assumed to be unlimited. User shell limits that are unlimited are not displayed.

EXCEPTION STATUS Possible values for the exception status of a job include:

idle

The job is consuming less CPU time than expected. The job idle factor (CPU time/runtime) is less than the configured JOB_IDLE threshold for the queue and a job exception has been triggered.

overrun

The job is running longer than the number of minutes specified by the JOB_OVERRUN threshold for the queue and a job exception has been triggered.

underrun

The job finished sooner than the number of minutes specified by the JOB_UNDERRUN threshold for the queue and a job exception has been triggered.

Job Array Summary Information

If you use -A, displays summary information about job arrays. The following fields are displayed:

bsub(1)), then

JOBID Job ID of the job array.

ARRAY_SPEC Array specification in the format of name[index]. The array specification may be

truncated, use

-w option together with -A to show the full array specification.

OWNER Owner of the job array.

NJOBS Number of jobs in the job array.

PEND Number of pending jobs of the job array.

RUN Number of running jobs of the job array.

DONE Number of successfully completed jobs of the job array.

74 Platform LSF Command Reference

EXIT Number of unsuccessfully completed jobs of the job array.

SSUSP Number of LSF system suspended jobs of the job array.

USUSP Number of user suspended jobs of the job array.

PSUSP Number of held jobs of the job array.

Examples

bjobs -pl

Displays detailed information about all pending jobs of the invoker.

bjobs -ps

Display only pending and suspended jobs.

bjobs -u all -a

Displays all jobs of all users.

bjobs -d -q short -m hostA -u user1

Displays all the recently finished jobs submitted by user1 to the queue short, and executed on the host

bjobs 101 102 203 509

Display jobs with job_ID 101, 102, 203, and 509.

bjobs -X 101 102 203 509

hostA.

bkill

Synopsis

Description

sends signals to kill, suspend, or resume unfinished jobs

bkill [-l] [-app application_profile_name] [-g job_group_name]

[-sla service_class_name] [-J job_name] [-m host_name |

-m host_group] [-q queue_name] [-r |

-s signal_value | signal_name] [-u user_name |

-u user_group | -u all] [job_ID ... | 0 | "job_ID[index]" ...]

bkill [ -l] [-b] [-app application_profile_name] [-g job_group_name]

[-sla service_class_name] [-J job_name] [-m host_name |

-m host_group] [-q queue_name] [-u user_name |

-u user_group | -u all] [job_ID ... | 0 | "job_ID[index]" ...]

bkill [-h | -V]

By default, sends a set of signals to kill the specified jobs. On UNIX, SIGINT and SIGTERM are sent to give the job a chance to clean up before termination, then SIGKILL is sent to kill the job. The time interval between sending each signal is defined by the JOB_TERMINATE_INTERVAL parameter in

lsb.params(5).

PEND

RUN

By default, kills the last job submitted by the user running the command. You must specify a job ID or

-q without a job ID, bkill kills the last job submitted by the user running the

command. Specify job ID

-app, -g, -J, -m, -u, or -q. If you specify -app, -g, -J, -m, -u, or

0 (zero) to kill multiple jobs.

On Windows, job control messages replace the SIGINT and SIGTERM signals (but only customized applications can process them) and the

TerminateProcess()

system call is sent to kill the job.

Exit code 130 is returned when a dispatched job is killed with

Only

root and LSF administrators can run bkill -r. The -r option is ignored for

bkill.

other users.

Users can only operate on their own jobs. Only

root and LSF administrators can

operate on jobs submitted by other users.

If a signal request fails to reach the job execution host, LSF tries the operation later when the host becomes reachable. LSF retries the most recent signal request.

If a job is running in a queue with CHUNK_JOB_SIZE set,

bkill has the following

results depending on job state:

Job is removed from chunk (NJOBS -1, PEND -1)

All jobs in the chunk are suspended (NRUN -1, NSUSP +1)

76 Platform LSF Command Reference

USUSP

WAIT

Job finishes, next job in the chunk starts if one exists (NJOBS -1, PEND -1, SUSP

-1, RUN +1)

Job finishes (NJOBS-1, PEND -1)

Options

If the job cannot be killed, use

bkill -r to remove the job from the LSF system

without waiting for the job to terminate, and free the resources of the job.

0 Kills all the jobs that satisfy other options (-app. -g, -m, -q, -u, and -J).

-b Kills large numbers of jobs as soon as possible. Local pending jobs are killed

immediately and cleaned up as soon as possible, ignoring the time interval specified by CLEAN_PERIOD in

lsb.acct.

lsb.params. Jobs killed in this manner are not logged to

Other jobs, such as running jobs, are killed as soon as possible and cleaned up normally.

If the

-b option is used with the 0 subcommand, bkill kills all applicable jobs and

silently skips the jobs that cannot be killed.

bkill -b 0

Operation is in progress

The -b option is ignored if used with the -r or -s options.

-l Displays the signal names supported by bkill. This is a subset of signals supported

/bin/kill and is platform-dependent.

-r Removes a job from the LSF system without waiting for the job to terminate in the

operating system.

Only

root and LSF administrators can run bkill -r. The -r option is ignored for

other users.

Sends the same series of signals as

bkill without -r, except that the job is removed

from the system immediately, the job is marked as EXIT, and the job resources that LSF monitors are released as soon as LSF receives the first signal.

Also operates on jobs for which a cannot be reached to be acted on by

bkill command has been issued but which

sbatchd (jobs in ZOMBI state). If sbatchd

recovers before the jobs are completely removed, LSF ignores the zombi jobs killed with

bkill -r.

Use

bkill -r only on jobs that cannot be killed in the operating system, or on jobs

that cannot be otherwise removed using

The

-app application_profile_name

-r option cannot be used with the -s option.

bkill.

Operates only on jobs associated with the specified application profile. You must specify an existing application profile. If job_ID or 0 is not specified, only the most recently submitted qualifying job is operated on.

-g job_group_name Operates only on jobs in the job group specified by job_group_name.

Platform LSF Command Reference 77

Options

Use -g with -sla to kill jobs in job groups attached to a service class.

bkill does not kill jobs in lower level job groups in the path. For example, jobs are

attached to job groups

bsub -g /risk_group myjob

Job <115> is submitted to default queue <normal>.

bsub -g /risk_group/consolidate myjob2

Job <116> is submitted to default queue <normal>.

The following bkill command only kills jobs in /risk_group, not the subgroup

/risk_group/consolidate:

bkill -g /risk_group 0

Job <115> is being terminated

bkill -g /risk_group/consolidate 0

Job <116> is being terminated

-J job_name Operates only on jobs with the specified job name. The -J option is ignored if a job

ID other than 0 is specified in the job_ID option.

-m host_name | -m host_group

Operates only on jobs dispatched to the specified host or host group.

If job_ID is not specified, only the most recently submitted qualifying job is operated on. The job_ID option. See and host groups.

/risk_group and /risk_group/consolidate:

-m option is ignored if a job ID other than 0 is specified in the bhosts(1) and bmgroup(1) for more information about hosts

-q queue_name Operates only on jobs in the specified queue.

If job_ID is not specified, only the most recently submitted qualifying job is operated on.

The

See

-s signal_value | signal_name

Sends the specified signal to specified jobs. You can specify either a name, stripped of the SIG prefix (such as KILL), or a number (such as 9).

Eligible UNIX signal names are listed by

The

Use of using

bresume.

Sending the SIGSTOP signal to sequential jobs or the SIGTSTP to parallel jobs is the same as using

You cannot suspend a job that is already suspended, or resume a job that is not suspended. Using SIGSTOP or SIGTSTP on a job that is in the USUSP state has no effect and using SIGCONT on a job that is not in either the PSUSP or the USUSP state has no effect. See

-q option is ignored if a job ID other than 0 is specified in the job_ID option.

bqueues(1) for more information about queues.

bkill -l.

-s option cannot be used with the -r option.

bkill -s to suspend and resume jobs by using the appropriate signal instead

bstop or bresume. Sending the SIGCONT signal is the same as using

bstop.

bjobs(1) for more information about job states.

-sla service_class_name

Operates on jobs belonging to the specified service class.

If job_ID is not specified, only the most recently submitted job is operated on.

78 Platform LSF Command Reference

Use -sla with -g to kill jobs in job groups attached to a service class.

The

-sla option is ignored if a job ID other than 0 is specified in the job_ID option.

bsla to display the configuration properties of service classes configured in

Use

lsb.serviceclasses, the default SLA configured with

ENABLE_DEFAULT_EGO_SLA in the state of each service class.

-u user_name | -u user_group | -u all

Operates only on jobs submitted by the specified user or user group, or by all users if the reserved user name include the domain name in uppercase letters and use a single backslash (DOMAIN_NAME\user_name) in a Windows command line or a double backslash (DOMAIN_NAME\\user_name) in a UNIX command line.

If job_ID is not specified, only the most recently submitted qualifying job is operated on. The job_ID option.

job_ID ... | 0 | "job_ID[index]" ...

Operates only on jobs that are specified by job_ID or "job_ID[index]", where "job_ID[index]" specifies selected job array elements (see quotation marks must enclose the job ID and index, and index must be enclosed in square brackets.

lsb.params, and dynamic information about

all is specified. To specify a Windows user account,

-u option is ignored if a job ID other than 0 is specified in the

bjobs(1)). For job arrays,

Examples

Jobs submitted by any user can be specified here without using the

-u option. If you

use the reserved job ID 0, all the jobs that satisfy other options (that is, and

-J) are operated on; all other job IDs are ignored.

The options IDs are returned at job submission time (see the

bjobs command (see bjobs(1)).

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

bkill -s 17 -q night

-u, -q, -m and -J have no effect if a job ID other than 0 is specified. Job bsub(1)) and may be obtained with

Sends signal 17 to the last job that was submitted by the invoker to queue night.

bkill -q short -u all 0

Kills all the jobs that are in the queue short.

bkill -r 1045

Forces the removal of unkillable job 1045.

bkill -sla Tofino 0

Kill all jobs belonging to the service class named Tofino.

bkill -g /risk_group 0

Kills all jobs in the job group /risk_group.

bkill -app fluent

-m, -q, -u

Platform LSF Command Reference 79

bladmin

reconfigures the Platform LSF License Scheduler daemon (bld)

Synopsis

bladmin subcommand

bladmin [-h | -V]

Description

Use this command to reconfigure the License Scheduler daemon (bld).

You must be a License Scheduler administrator to use this command.

Subcommand List

ckconfig [-v]

reconfig [host_name ... | all]

shutdown [host_name ... | all]

blddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o]

blcdebug [-l debug_level] [-f logfile_name] [-o] collector_name ...

| all

-h

-V

Platform LSF Command Reference 81

Options

ckconfig [-v] Checks LSF License Scheduler configuration in

LSF_ENVDIR/lsf.licensescheduler and lsf.conf.

By default, check. If warning errors are found,

bladmin ckconfig displays only the result of the configuration file

bladmin prompts you to use the -v option to

display detailed messages.

-v

Verbose mode. Displays detailed messages about configuration file checking to

stderr.

reconfig [host_name ... | all]

Reconfigures License Scheduler.

shutdown [host_name ... | all]

Shuts down License Scheduler.

blddebug [-c class_name ...] [-l debug_level] [-f logfile_name] [-o]

Sets the message log level for bld to include additional information in log files. You must be

If the

root or the LSF administrator to use this command.

bladmin blddebug is used without any options, the following default values

are used:

◆ class_name=0 (no additional classes are logged)

◆ debug_level=0 (LOG_DEBUG level in parameter LS_LOG_MASK)

◆ logfile_name=current LSF system log file in the LSF system log file directory, in

the format daemon_name

.log.host_name

-c class_name ...

Specifies software classes for which debug messages are to be logged.

Format of class_name is the name of a class, or a list of class names separated by spaces and enclosed in quotation marks. Classes are also listed in

lsf.h.

Valid log classes:

◆ LC_AUTH - Log authentication messages

◆ LC_COMM - Log communication messages

◆ LC_FLEX - Log everything related to FLEX_STAT or FLEX_EXEC

Macrovision APIs

◆ LC_LICENCE - Log license management messages

◆ LC_PREEMPT - Log preemption policy messages

◆ LC_TRACE - Log significant program walk steps

◆ LC_XDR - Log everything transferred by XDR

Default: 0 (no additional classes are logged)

-l debug_level

Specifies level of detail in debug messages. The higher the number, the more detail that is logged. Higher levels include all lower levels.

82 Platform LSF Command Reference

Possible values:

0 LOG_DEBUG level in parameter LS_LOG_MASK in

lsf.conf.

1 LOG_DEBUG1 level for extended logging. A higher level includes lower logging levels. For example, LOG_DEBUG3 includes LOG_DEBUG2 LOG_DEBUG1, and LOG_DEBUG levels.

2 LOG_DEBUG2 level for extended logging. A higher level includes lower logging

levels. For example, LOG_DEBUG3 includes LOG_DEBUG2 LOG_DEBUG1, and LOG_DEBUG levels.

3 LOG_DEBUG3 level for extended logging. A higher level includes lower logging

levels. For example, LOG_DEBUG3 includes LOG_DEBUG2, LOG_DEBUG1, and LOG_DEBUG levels.

Default: 0 (LOG_DEBUG level in parameter LS_LOG_MASK)

-f logfile_name

Specifies the name of the file where debugging messages are logged. The file name can be a full path. If a file name without a path is specified, the file is saved in the LSF system log directory.

The name of the file has the following format:

logfile_name.daemon_name.

On UNIX, if the specified path is not valid, the log file is created in the

log.host_name

/tmp

directory.

On Windows, if the specified path is not valid, no log file is created.

Default: current LSF system log file in the LSF system log file directory.

-o

Turns off temporary debug settings and resets them to the daemon starting state. The message log level is reset back to the value of LS_LOG_MASK and classes are reset to the value of LSB_DEBUG_BLD. The log file is also reset back to the default log file.

blcdebug [-l debug_level] [-f logfile_name] [-o] collector_name | all

Sets the message log level for blcollect to include additional information in log files. You must be

If the

bladmin blcdebug is used without any options, the following default values

root or the LSF administrator to use this command.

are used:

◆ debug_level=0 (LOG_DEBUG level in parameter LS_LOG_MASK)

◆ logfile_name=current LSF system log file in the LSF system log file directory, in

the format daemon_name

◆ collector_name=default

-l debug_level

.log.host_name

Specifies level of detail in debug messages. The higher the number, the more detail that is logged. Higher levels include all lower levels.

Possible values:

0 LOG_DEBUG level in parameter LS_LOG_MASK in

lsf.conf.

Platform LSF Command Reference 83

blaunch

Synopsis

Description

launches parallel tasks on a set of hosts

blaunch [-n] [-u host_file | -z host_name ... | host_name]

command [argument ...]

blaunch [-h | -V]

IMPORTANT: You cannot run blaunch directly from the command line.

RESTRICTION: The command blaunch does not work with user account mapping. Do not run

blaunch on a user account mapping host.

Most MPI implementations and many distributed applications use rsh and ssh as their task launching mechanism. The replacement for

rsh and ssh as a transparent method for launching parallel

blaunch command provides a drop-in

applications within LSF.

blaunch supports the following core command line options as rsh and ssh:

◆ rsh host_name command

Options

-u host_file Executes the task on all hosts listed in the host_file.

host_name The name of the host where remote tasks are to be launched.

-z host_name ... Executes the task on all specified hosts.

◆ ssh host_name command

All other

blaunch transparently connects directly to the RES/SBD on the remote host, and

rsh and ssh options are silently ignored.

subsequently creates and tracks the remote tasks, and provides the connection back to LSF. You do not need to insert

blaunch only works under LSF. It can only be used to launch tasks on remote hosts

pam, taskstarter or any other wrapper.

that are part of a job allocation. It cannot be used as a standalone command.

blaunch is not supported on Windows.

When no host names are specified, LSF allocates all hosts listed in the environment variable LSB_MCPU_HOSTS.

-n Standard input is taken from /dev/null.

Specify the path to a file that contains a list of host names. Each host name must listed on a separator line in the host list file.

This option is exclusive of the

-z option.

Platform LSF Command Reference 85

Diagnostics

Whereas the host name value for rsh and ssh is a single host name, you can use the

-z option to specify a space-delimited list of hosts where tasks are started in

parallel.

Specify a list of hosts on which to execute the task. If multiple host names are specified, the host names must be enclosed by quotation marks ( separated by white space.

" or ') and

command [argument ...]

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

Diagnostics

blcollect

Synopsis

Description

license information collection daemon that collects license usage information

blcollect -c collector_name -m host_name [...] -p

license_scheduler_port [-i lmstat_interval | -D lmstat_path]

blcollect [-h | -V]

Periodically collects license usage information from Macrovision FLEXnet. It queries FLEXnet for license usage information from the FLEXnet command, and passes the information to the License Scheduler daemon (

blcollect daemon improves performance by allowing you to distribute license

information queries on multiple hosts.

By default, license information is collected from FLEXnet on one host. Use

blcollect to distribute the license collection on multiple hosts.

lmstat

bld). The

For each service domain configuration in name for but you can specify one collector to serve multiple service domains. You can choose any collector name you want, but must use that exact name when you run

blcollect.

blcollect to use. You can only specify one collector per service domain,

lsf.licensescheduler, specify one

Options

-c Required. Specify the collector name you set in lsf.licensescheduler. You must

use the collector name ( the configuration file.

-m Required. Specifies a space-separated list of hosts to which license information is

sent. The hosts do not need to be running License Scheduler or a FLEXnet. Use fully qualified host names.

-p Required. You must specify the License Scheduler listening port, which is set in

lsf.licensescheduler and has a default value of 9581.

-i lmstat_interval Optional. The frequency in seconds of the calls that License Scheduler makes to

lmstat to collect license usage information from FLEXnet.

The default interval is 60 seconds.

-D lmstat_path Optional. Location of the FLEXnet command lmstat.

-h Prints command usage to stderr and exits.

LIC_COLLECT) you define in the ServiceDomain section of

blhosts

Synopsis

Description

Options

Output

displays the names of all the hosts running the License Scheduler daemon (bld)

blhosts [-h | -V]

Displays a list of hosts running the License Scheduler daemon. This includes the License Scheduler master host and all the candidate License Scheduler hosts running

-h Prints command usage to stderr and exits.

-V Prints release version to stderr and exits.

bld.

blimits

Synopsis

Description

displays information about resource allocation limits of running jobs

blimits [-w] [-n limit_name ...]

[-m host_name | -m host_group | -m cluster_name ...] [-P project_name ...] [-q queue_name ...] [-u user_name | -u user_group ...]

blimits -c

blimits -h | -V

Displays current usage of resource allocation limits configured in Limit sections in

lsb.resources:

◆ Configured limit policy name

◆ Users (-u option)

◆ Queues (-q option)

◆ Hosts (-m option)

◆ Project names (-P option)

◆ Limits (SLOTS, MEM, TMP, SWP, JOBS)

◆ Limit configuration (-c option). This is the same as bresources with no

options.

Resources that have no configured limits or no limit usage are indicated by a dash (-). Limits are displayed in a USED/LIMIT format. For example, if a limit of 10 slots is configured and 3 slots are in use, then

blimits displays the limit for SLOTS as

3/10.

Note that if there are no jobs running against resource allocation limits, LSF indicates that there is no information to be displayed:

No resource usage found.

If limits MEM, SWP, or TMP are configured as percentages, both the limit and the amount used are displayed in MB. For example,

lshosts displays maxmem of 249

MB, and MEM is limited to 10% of available memory. If 10 MB out of 25 MB are used,

blimits displays the limit for MEM as 10/25 (10 MB USED from a 25 MB

LIMIT).

Limits are displayed for both the vertical tabular format and the horizontal format for Limit sections. If a vertical format Limit section has no name,

blimits displays

NONAMEnnn under the NAME column for these limits, where the unnamed limits are numbered in the order the vertical-format Limit sections appear in the

lsb.resources file.

If a resource consumer is configured as

all, the limit usage for that consumer is

indicated by a dash (-)

Platform LSF Command Reference 89

Options

PER_HOST slot limits are not displayed. The bhosts commands displays these as MXJ limits.

In MultiCluster,

blimits returns the information about all limits in the local

cluster.

Limit names and policies are set up by the LSF administrator. See

lsb.resources(5) for more information.

Options

-c Displays all resource configurations in lsb.resources. This is the same as

bresources with no options.

-w Displays resource allocation limits information in a wide format. Fields are

displayed without truncation.

-n limit_name ... Displays resource allocation limits the specified named Limit sections. If a list of

limit sections is specified, Limit section names must be separated by spaces and enclosed in quotation marks (") or (’).

-m host_name | -m host_group | -m cluster_name ...

Displays resource allocation limits for the specified hosts. Do not use quotes when specifying multiple hosts.

To see the available hosts, use

For host groups:

◆ If the limits are configured with HOSTS, the name of the host group is

displayed.

bhosts.

◆ If the limits are configured with PER_HOST, the names of the hosts belonging

to the group are displayed instead of the name of the host group.

TIP: PER_HOST slot limits are not displayed. The bhosts command displays these as MXJ limits.

For a list of host groups see bmgroup(1).

In MultiCluster, if a cluster name is specified, displays resource allocation limits in the specified cluster.

-P project_name ... Displays resource allocation limits for the specified projects.

If a list of projects is specified, project names must be separated by spaces and enclosed in quotation marks (") or (’).

-q queue_name ... Displays resource allocation limits for the specified queues.

The command

bqueues returns a list of queues configured in the system, and

information about the configurations of these queues.

In MultiCluster, you cannot specify remote queues.

-u user_name | -u user_group ...

Displays resource allocation limits for the specified users.

If a list of users is specified, user names must be separated by spaces and enclosed in quotation marks (") or (’). You can specify both user names and user IDs in the list of users.

90 Platform LSF Command Reference

If a user group is specified, displays the resource allocation limits that include that group in their configuration. For a list of user groups see

-h Prints command usage to stderr and exits.

-V Prints LSF release version to stderr and exits.

bugroup(1)).

Output

Configured limits and resource usage for built-in resources (slots, mem, tmp, and swp load indices, and running and suspended job limits) are displayed as INTERNAL RESOURCE LIMITS separately from custom external resources, which are shown as EXTERNAL RESOURCE LIMITS.

Resource Consumers

blimits displays the following fields for resource consumers:

NAME The name of the limit policy as specified by the Limit section NAME parameter.

USERS List of user names or user groups on which the displayed limits are enforced, as

specified by the Limit section parameters USERS or PER_USER.

User group names have a slash (/) added at the end of the group name. See

bugroup(1).

QUEUES The name of the queue to which the limits apply, as specified by the Limit section

parameters QUEUES or PER_QUEUES.

If the queue has been removed from the configuration, the queue name is displayed as

lost_and_found. Use bhist to get the original queue name. Jobs in the

lost_and_found queue remain pending until they are switched with the bswitch

command into another queue.

-w or -l.

HOSTS List of hosts and host groups on which the displayed limits are enforced, as specified

by the Limit section parameters HOSTS or PER_HOSTS.

Host group names have a slash (/) added at the end of the group name. See

bmgroup(1).

TIP: PER_HOST slot limits are not displayed. The bhosts command displays these as MXJ limits.

PROJECTS List of project names on which limits are enforced., as specified by the Limit section

parameters PROJECTS or PER_PROJECT.

Resource Limits

blimits displays resource allocation limits for the following resources:

SLOTS Number of slots currently used and maximum number of slots configured for the

limit policy, as specified by the Limit section SLOTS parameter.

Platform LSF Command Reference 91

Example

MEM Amount of memory currently used and maximum configured for the limit policy,

as specified by the Limit section MEM parameter.

TMP Amount of tmp space currently used and maximum amount of tmp space

configured for the limit policy, as specified by the Limit section TMP parameter.

SWP Amount of swap space currently used and maximum amount of swap space

configured for the limit policy, as specified by the Limit section SWP parameter.

JOBS Number of currently running and suspended jobs and the maximum number of

jobs configured for the limit policy, as specified by the Limit section JOBS parameter.

Example

The following command displays limit configuration and dynamic usage information for project

blimits -P proj1

INTERNAL RESOURCE LIMITS:

NAME USERS QUEUES HOSTS PROJECTS SLOTS MEM TMP SWP JOBS limit1 user1 - hostA proj1 2/6 - - - NONAME022 - - hostB proj1 proj2 1/3 - - - -

proj1:

EXTERNAL RESOURCE LIMITS:

NAME USERS QUEUES HOSTS PROJECTS tmp1 limit1 user1 - hostA proj1 1/1

blinfo

Synopsis

Description

displays static License Scheduler configuration information

blinfo -Lp | -p | -D | -G | -P

blinfo [-a [-t token_name | "token_name ..."]] [-o alpha | total]

[-g "feature_group ..."]

blinfo -A [-t token_name | "token_name ..."] [-o alpha | total ]

[-g "feature_group ..."]

blinfo -C [-t token_name | "token_name ..."] [-o alpha | total]

[-g "feature_group ..."]

blinfo [-t token_name | "token_name ..."] [-o alpha | total]

[-g "feature_group ..."]

blinfo [ -h | -V ]

Displays different license configuration information, depending on the option selected.

By default, displays information about the distribution of licenses managed by License Scheduler.

Options

-A When LOCAL_TO is configured for a feature in lsf.licensescheduler, shows

the feature allocation by cluster locality.

You can optionally provide license token names.

-a Shows all information, including information about non-shared licenses

(NON_SHARED_DISTRIBUTION) and workload distribution (WORKLOAD_DISTRIBUTION).

You can optionally provide license token names.

blinfo -a does not display NON_SHARED information for hierarchical project

group scheduling policies. Use

-C When LOCAL_TO is configured for a feature in lsf.licensescheduler, shows

the cluster locality information for the features.

You can optionally provide license token names.

-D Lists the License Scheduler service domains and the corresponding FLEXnet

license server hosts.

-G Lists the hierarchical configuration information.

If PRIORITY is defined in the this option also shows the priorities of each project.

blinfo -G to see hierarchical group configuration.

ProjectGroup Section of lsf.licensescheduler,

Platform LSF Command Reference 93

Output

-g feature_group ...

When FEATURE_GROUP is configured for a group of license features in

lsf.licensescheduler, shows only information about the features configured in

the FEATURE_LIST of specified feature groups. You can specify more than one feature group at one time.

When you specify feature names with

-t, features in the feature list defined by -t

and feature groups are both displayed.

Feature groups listed with

-g but not defined in lsf.licensescheduler are

ignored.

-Lp Lists the active projects managed by License Scheduler.

-Lp only displays projects associated with configured features.

If PRIORITY is defined in the

Projects Section of lsf.licensescheduler, this

option also lists the priorities of each project.

-o alpha | total Sorts license feature information alphabetically, by total licenses, or by available

licenses.

◆ alpha: Features are listed in descending alphabetical order.

◆ total: Features are sorted by the descending order of the sum of licenses that are

allocated to LSF workload from all the service domains configured to supply licenses to the feature. Licenses borrowed by non-LSF workload are not included in this amount.

-P When LS_FEATURE_PERCENTAGE=Y, lists the license ownership in percentage.

-p Displays values of lsf.licensescheduler configuration parameters and

lsf.conf parameters related to License Scheduler. This is useful for

troubleshooting.

-t token_name |"token_name ..."

Only shows information about specified license tokens. Use spaces to separate multiple names, and enclose them in quotation marks.

-h Prints command usage to stderr and exits.

-V Prints the License Scheduler release version to stderr and exits.

Output

Default output

Displays the following fields:

FEATURE The license name. This becomes the license token name.

When LOCAL_TO is configured for a feature in shows the cluster locality information for the license features.

SERVICE_DOMAIN The name of the service domain that provided the license.

TOTAL The total number of licenses managed by FLEXnet. This number comes from

FLEXnet.

lsf.licensescheduler, blinfo

94 Platform LSF Command Reference

DISTRIBUTION The distribution of the licenses among license projects in the format [project_name,

percentage[ project is entitled to use when there is competition for licenses. The percentage is calculated from the share specified in the configuration file.

/number_licenses_owned]]. This determines how many licenses a

Allocation output (-A)

FEATURE The license name. This becomes the license token name.

When LOCAL_TO is configured for a feature in shows the cluster locality information for the license features.

PROJECT The License Scheduler project name.

ALLOCATION

The percentage of shares assigned to each cluster for a feature and a project.

All output (-a)

Same as Default Output with NON_SHARED_DISTRIBUTION.

NON-SHARED_DISTRIBUTION

This column is displayed directly under DISTRIBUTION with the -a option. If there are non-shared licenses, then the non-shared license information is output in the following format: [project_name, number_licenses_non_shared]

If there are no non-shared licenses, then the following license information is output

- (dash)

Cluster locality output (-C)

NAME The license feature token name.

When LOCAL_TO is configured for a feature in shows the cluster locality information for the license features.

lsf.licensescheduler, blinfo

FLEX_NAME The actual FLEXnet feature name—the name used by FLEXnet to identify the type

of license. May be different from the License Scheduler token name if a different FLEX_NAME is specified in

lsf.licensescheduler.

CLUSTER_NAME The name of the cluster the feature is assigned to.

FEATURE The license feature name. This becomes the license token name.

When LOCAL_TO is configured for a feature in shows the cluster locality information for the license features.

lsf.licensescheduler, blinfo

SERVICE_DOMAIN The service domain name.

Service Domain Output (-D)

SERVICE_DOMAIN The service domain name.

Platform LSF Command Reference 95

Output

LIC_SERVERS Names of FLEXnet license server hosts that belong the to service domain. Each host

name is enclosed in parentheses, as shown:

(port_number@host_name)

Redundant hosts (that share the same FLEXnet license file) are grouped together as shown:

(port_number@host_name port_number@host_name port_number@host_name)

Hierarchical Output (-G)

The following fields describe the values of their corresponding configuration fields in the

ProjectGroup Section of lsf.licensescheduler.

GROUP The project names in the hierarchical grouping and its relationships. Each entry

specifies the name of the hierarchical group and its members. The entry is enclosed in parentheses as shown:

(group (member ...))

SHARES The shares assigned to the hierarchical group member projects.

OWNERSHIP The number of licenses that each project owns.

LIMITS The maximum number of licenses that the hierarchical group member project can

use at any one time.

NON_SHARED The number of licenses that the hierarchical group member projects use exclusively.

PRIORITY The priority of the project if it is different from the default behavior. A larger

number indicates a higher priority.

DESCRIPTION The description of the project group.

Project Output (-Lp)

List of active License Scheduler projects.

-Lp only displays projects associated with configured features.

PROJECT The project name.

PRIORITY The priority of the project if it is different from the default behavior. A larger

number indicates a higher priority.

DESCRIPTION The description of the project.

Parameters Output (-p)

ADMIN The License Scheduler administrator. Defined in lsf.licensescheduler.

DISTRIBUTION_POLICY_VIOLATION_ACTION

This parameter includes

◆ The interval (a multiple of LM_STAT_INVERVAL periods) at which License

Scheduler checks for distribution policy violations, and

96 Platform LSF Command Reference

◆ The directory path and command that License Scheduler runs when reporting

a violation

Defined in

lsf.licensescheduler.

EXT_FILTER_PORT TCP listening port used by all external plug-ins to communicate with License

Scheduler hosts. Defined in

lsf.licensescheduler.

FLX_LICENSE_FILE Path to the file that contains the license keys FLEXnet.Ext.Filter and

FLEXnet.Usage.Snapshot to enable the FLEXnet APIs. Defined in

lsf.licensescheduler.

HOSTS License Scheduler candidate hosts. Defined in lsf.licensescheduler.

LM_REMOVE_INTERVAL

Minimum time a job must have a license checked out before lmremove can remove the license. Defined in

lsf.licensescheduler.

LM_STAT_INTERVAL Time interval between calls that License Scheduler makes to collect license usage

information from FLEXnet license management. Defined in

lsf.licensescheduler.

LS_MAX_TASKMAN_SESSIONS

Maximum number of taskman jobs that run simultaneously. Defined in

lsf.licensescheduler.

LSF_LIC_SCHED_HOSTS

List of hosts that are candidate LSF License Scheduler hosts. Defined in lsf.conf.

LSF_LIC_SCHED_PREEMPT_REQUEUE

Specifies whether to requeue or suspend a job whose license is preempted by LSF License Scheduler. Defined in

lsf.conf.

LSF_LIC_SCHED_PREEMPT_SLOT_RELEASE

Specifies whether to release the slot of a job that is suspended when its license is preempted by LSF License Scheduler. Defined in

lsf.conf.

LSF_LIC_SCHED_PREEMPT_STOP

Specifies whether to use job controls to stop a job that is preempted. Defined in

lsf.conf.

LSF_LICENSE_FILE Location of the LSF license file, which includes License Scheduler keys. Defined in

lsf.conf.

PORT TCP listening port used by License Scheduler. Defined in lsf.licensescheduler.

Platform LSF Command Reference 97

Examples

blinfo -a displays both NON_SHARED_DISTRIBUTION and

WORKLOAD_DISTRIBUTION information:

blinfo -a

FEATURE SERVICE_DOMAIN TOTAL DISTRIBUTION g1 LS 3 [p1, 50.0%] [p2, 50.0% / 2] NON_SHARED_DISTRIBUTION [p2, 2] WORKLOAD_DISTRIBUTION [LSF 66.7%, NON_LSF 33.3%]

blinfo -a

NON_SHARED_DISTRIBUTION is not defined:

blinfo -a

FEATURE SERVICE_DOMAIN TOTAL DISTRIBUTION g1 LS 0 [p1, 50.0%] [p2, 50.0%] WORKLOAD_DISTRIBUTION [LSF 66.7%, NON_LSF 33.3%] g2 LS 0 [p1, 50.0%] [p2, 50.0%] g33 WS 0 [p1, 50.0%] [p2, 50.0%]

blinfo -a

WORKLOAD_DISTRIBUTION is not defined:

blinfo -a

FEATURE SERVICE_DOMAIN TOTAL DISTRIBUTION g1 LS 3 [p1, 50.0%] [p2, 50.0% / 2] NON_SHARED_DISTRIBUTION [p2, 2]

does not display NON_SHARED_DISTRIBUTION, if the

does not display WORKLOAD_DISTRIBUTION, if the

Files

blkill

Synopsis

Description

Options

task_ID Task ID of the task you want to kill.

-t seconds Specify how many seconds to delay before killing the task. A value of 0 means to kill

terminates an interactive License Scheduler task

blkill [-t seconds] task_ID

blkill [-h | -V]

Terminates a running or waiting interactive task in License Scheduler.

Users can kill their own tasks. You must be a License Scheduler administrator to terminate another user’s task.

By default,

the task immediately (do not give the user any time to save work).

-h Prints command usage to stderr and exits.

blkill notifies the user and waits 30 seconds before killing the task.

-V Prints License Scheduler release version to stderr and exits.

Platform LSF Command Reference 99

blparams

displays information about configurable License Scheduler parameters defined in the files

Synopsis

blparams [-h | -V]

Description

Displays the following parameter values:

ADMIN

The License Scheduler administrator. Defined in lsf.licensescheduler.

DISTRIBUTION_POLICY_VIOLATION_ACTION

This parameter includes

◆ The interval (a multiple of LM_STAT_INVERVAL periods) at which License

lsf.licensescheduler and lsf.conf

Scheduler checks for distribution policy violations, and

◆ The directory path and command that License Scheduler runs when reporting

a violation

Defined in

EXT_FILTER_PORT

TCP listening port used by all external plugins to communicate with License Scheduler hosts. Defined in

FLX_LICENSE_FILE

Path to the file that contains the license keys FLEXnet.Ext.Filter and FLEXnet.Usage.Snapshot to enable the FLEXnet APIs. Defined in

lsf.licensescheduler.

HOSTS

License Scheduler candidate hosts. Defined in lsf.licensescheduler.

LM_REMOVE_INTERVAL

Minimum time a job must have a license checked out before lmremove can remove the license. Defined in

LM_STAT_INTERVAL

Time interval between calls that License Scheduler makes to collect license usage information from FLEXnet license management. Defined in

lsf.licensescheduler.

LS_DEBUG_BLD

Sets the debugging log class for the LSF License Schedulerbld daemon. Defined in

lsf.licensescheduler.

100 Platform LSF Command Reference

HP Platform LSF Command Reference Guide

Specifications and Main Features

Frequently Asked Questions

User Manual

bacct

Synopsis

Description

Throughput calculation

Options

Default output format (SUMMARY)

Brief format (-b)

Long format (-l)

Advance Reservations (-U)

Termination reasons displayed by bacct

Example: Default format

Example: Jobs that have triggered job exceptions

Example: Advance reservation accounting information

Example: LSF Job termination reason logging

Files

See also

bapp

Synopsis

Description

Options

Default output format

Long output format(-l)

See also

badmin

Synopsis

Description

Subcommand synopsis

Options

Usage

See also

bbot

Synopsis

Description

Options

See also

bchkpnt

Synopsis

Description

Options

Examples

See also

bclusters

Synopsis

Description

Options

Output

Files

See also

bgadd

Synopsis

Description

Options

Examples

See also

bgdel

Synopsis

Description

Options

Example

See also

bhist

Synopsis

Description

Options

Output

Files

See also

Time Interval Format

bhosts

Synopsis

Description

Options

Output

Files

See also

bhpart