Although the information in this document has been carefully reviewed, Platform Computing Inc.
(“Platform”) does not warrant it to be free of errors or omissions. Platform reserves the right to make
corrections, updates, revisions or changes to the information in this document.
UNLESS OTHERWISE EXPRESSLY STATED BY PLATFORM, THE PROGRAM DESCRIBED IN
THIS DOCUMENT IS PROVIDED “AS IS” AND WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO
EVENT WILL PLATFORM COMPUTING BE LIABLE TO ANYONE FOR SPECIAL,
COLLATERAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING WITHOUT
LIMITATION ANY LOST PROFITS, DATA, OR SAVINGS, ARISING OUT OF THE USE OF OR
INABILITY TO USE THIS PROGRAM.
We’d like to hear from youYou can help us make this document better by telling us what you think of the content, organization,
and usefulness of the information. If you find an error, or just want to make a suggestion for improving
this document, please address your comments to doc@platform.com.
Your comments should pertain only to Platform documentation. For product support, contact
support@platform.com.
Document redistribution
and translation
Internal redistributionYou may only redistribute this document internally within your organization (for example, on an
TrademarksLSF is a registered trademark of Platform Computing Inc. in the United States and in other
Third-party license
agreements
Third-party copyright
notices
This document is protected by copyright and you may not redistribute or translate it into another
language, in part or in whole.
intranet) provided that you continue to check the Platform Web site for updates and update your
version of the documentation. You may not make it available to your organization over the Internet.
jurisdictions.
ACCELERATING INTELLIGENCE, PLATFORM COMPUTING, PLATFORM SYMPHONY,
PLATFORM JOBSCHEDULER, PLATFORM ENTERPRISE GRID ORCHESTRATOR, PLATFORM
EGO, and the PLATFORM and PLATFORM LSF logos are trademarks of Platform Computing Inc. in
the United States and in other jurisdictions.
UNIX is a registered trademark of The Open Group in the United States and in other jurisdictions.
Microsoft is either a registered trademark or a trademark of Microsoft Corporation in the United
States and/or other countries.
Windows is a registered trademark of Microsoft Corporation in the United States and other countries.
Other products or services mentioned in this document are identified by the trademarks or service
marks of their respective owners.
Displays a summary of accounting statistics for all finished jobs (with a DONE or
EXIT status) submitted by the user who invoked the command, on all hosts,
projects, and queues in the LSF system.
in the current LSF accounting log file:
LSB_SHAREDIR/cluster_name/logdir/lsb.acct.
CPU time is not normalized.
All times are in seconds.
Statistics not reported by
can be generated by directly using
Throughput calculation
The throughput (T) of the LSF system, certain hosts, or certain queues is calculated
by the formula:
T = N/(ET-BT)
where:
◆N is the total number of jobs for which accounting statistics are reported
◆BT is the Start time—when the first job was logged
◆ET is the End time—when the last job was logged
]
bacct displays statistics for all jobs logged
bacct but of interest to individual system administrators
awk or perl to process the lsb.acct file.
You can use the option
-Ctime0,time1 to specify the Start time as time0 and the
End time as time1. In this way, you can examine throughput during a specific time
period.
Jobs involved in the throughput calculation are only those being logged (that is,
with a DONE or EXIT status). Jobs that are running, suspended, or that have never
been dispatched after submission are not considered, because they are still in the
LSF system and not logged in
lsb.acct.
Platform LSF Command Reference 7
Options
Options
The total throughput of the LSF system can be calculated by specifying -u all
without any of the
can be calculated by specifying
The throughput of certain queues can be calculated by specifying
the
-m, -S, -D or job_ID options.
bacct does not show local pending batch jobs killed using bkill -b. bacct shows
MultiCluster jobs and local running jobs even if they are killed using
-b Brief format.
-d Displays accounting statistics for successfully completed jobs (with a DONE
-m, -q, -S, -D or job_ID options. The throughput of certain hosts
-u all without the -q, -S, -D or job_ID options.
-u all without
bkill -b.
status).
-e Displays accounting statistics for exited jobs (with an EXIT status).
-l Long format with additional detail.
-w Wide field format.
-x Displays jobs that have triggered a job exception (overrun, underrun, idle). Use
with the
-l option to show the exception status for individual jobs.
-app application_profile_name
Displays accounting information about jobs submitted to the specified application
profile. You must specify an existing application profile configured in
lsb.applications.
-C time0,time1 Displays accounting statistics for jobs that completed or exited during the specified
time interval. Reads
lsb.acct and all archived log files (lsb.acct.n) unless -f is
also used.
The time format is the same as in
-D time0,time1 Displays accounting statistics for jobs dispatched during the specified time interval.
Reads
lsb.acct and all archived log files (lsb.acct.n) unless -f is also used.
The time format is the same as in
-f logfile_name Searches the specified job log file for accounting statistics. Specify either an absolute
bhist(1).
bhist(1).
or relative path.
Useful for offline analysis.
The specified file path can contain up to 4094 characters for UNIX, or up to 255
characters for Windows.
-Lp ls_project_name ... Displays accounting statistics for jobs belonging to the specified License Scheduler
projects. If a list of projects is specified, project names must be separated by spaces
and enclosed in quotation marks (") or (’).
-M host_list_file Displays accounting statistics for jobs dispatched to the hosts listed in a file
(host_list_file) containing a list of hosts. The host list file has the following format:
◆Multiple lines are supported
◆Each line includes a list of hosts separated by spaces
◆The length of each line must be less than 512 characters
8Platform LSF Command Reference
-m host_name ...
Displays accounting statistics for jobs dispatched to the specified hosts.
If a list of hosts is specified, host names must be separated by spaces and enclosed
in quotation marks (") or (’).
-N host_name | -N host_model | -N cpu_factor
Normalizes CPU time by the CPU factor of the specified host or host model, or by
the specified CPU factor.
If you use
-P project_name ... Displays accounting statistics for jobs belonging to the specified projects. If a list of
bacct offline by indicating a job log file, you must specify a CPU factor.
projects is specified, project names must be separated by spaces and enclosed in
quotation marks (") or (’).
-q queue_name ... Displays accounting statistics for jobs submitted to the specified queues.
If a list of queues is specified, queue names must be separated by spaces and
enclosed in quotation marks (") or (’).
-S time0,time1 Displays accounting statistics for jobs submitted during the specified time interval.
Reads
lsb.acct and all archived log files (lsb.acct.n) unless -f is also used.
The time format is the same as in
-sla service_class_name
bhist(1).
Displays accounting statistics for jobs that ran under the specified service class.
If a default system service class is configured with ENABLE_DEFAULT_EGO_SLA
in
lsb.params but not explicitly configured in lsb.applications,
bacct -sla service_class_name displays accounting information for the specified
default service class.
-U reservation_id ... | -U all
Displays accounting statistics for the specified advance reservation IDs, or for all
reservation IDs if the keyword
all is specified.
A list of reservation IDs must be separated by spaces and enclosed in quotation
marks (") or (’).
The
-U option also displays historical information about reservation modifications.
When combined with the
-U option, -u is interpreted as the user name of the
reservation creator. For example:
bacct -U all -u user2
shows all the advance reservations created by user user2.
Without the
-u option, bacct -U shows all advance reservation information about
jobs submitted by the user.
In a MultiCluster environment, advance reservation information is only logged in
the execution cluster, so
bacct displays advance reservation information for local
reservations only. You cannot see information about remote reservations. You
cannot specify a remote reservation ID, and the keyword
all only displays
information about reservations in the local cluster.
-u user_name ...|-u all Displays accounting statistics for jobs submitted by the specified users, or by all
users if the keyword
all is specified.
Platform LSF Command Reference 9
Default output format (SUMMARY)
If a list of users is specified, user names must be separated by spaces and enclosed
in quotation marks (") or (’). You can specify both user names and user IDs in the
list of users.
job_ID ... Displays accounting statistics for jobs with the specified job IDs.
If the reserved job ID 0 is used, it is ignored.
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
Default output format (SUMMARY)
Statistics on jobs. The following fields are displayed:
◆Total number of done jobs
◆Total number of exited jobs
◆Total CPU time consumed
◆Average CPU time consumed
◆Maximum CPU time of a job
◆Minimum CPU time of a job
◆Total wait time in queues
◆Average wait time in queue
◆Maximum wait time in queue
◆Minimum wait time in queue
◆Average turnaround time (seconds/job)
◆Maximum turnaround time
◆Minimum turnaround time
◆Average hog factor of a job (cpu time/turnaround time)
◆Maximum hog factor of a job
◆Minimum hog factor of a job
◆Tota l t hro u ghp ut
◆Beginning time: the completion or exit time of the first job selected
◆Ending time: the completion or exit time of the last job selected
The total, average, minimum, and maximum statistics are on all specified jobs.
The wait time is the elapsed time from job submission to job dispatch.
The turnaround time is the elapsed time from job submission to job completion.
The hog factor is the amount of CPU time consumed by a job divided by its
turnaround time.
The throughput is the number of completed jobs divided by the time period to
finish these jobs (jobs/hour).
Brief format (-b)
In addition to the default format SUMMARY, displays the following fields:
10 Platform LSF Command Reference
U/UID
QUEUE Queue to which the job was submitted.
SUBMIT_TIME Time when the job was submitted.
CPU_T CPU time consumed by the job.
WAIT Wait t ime o f the j ob.
TURNAROUND Turnaround time of the job.
FROM Host from which the job was submitted.
EXEC_ON Host or hosts to which the job was dispatched to run.
JOB_NAME The job name assigned by the user, or the command string assigned by default at
Long format (-l)
Name of the user who submitted the job. If LSF fails to get the user name by
getpwuid(3), the user ID is displayed.
job submission with
bsub. If the job name is too long to fit in this field, then only
the latter part of the job name is displayed.
The displayed job name or job command can contain up to 4094 characters for
UNIX, or up to 255 characters for Windows.
In addition to the fields displayed by default in SUMMARY and by -b, displays the
following fields:
JOBID Identifier that LSF assigned to the job.
PROJECT_NAME Project name assigned to the job.
STATUS Status that indicates the job was either successfully completed (DONE) or exited
(EXIT).
DISPAT_TIME Time when the job was dispatched to run on the execution hosts.
COMPL_TIME Time when the job exited or completed.
HOG_FACTOR Average hog factor, equal to "CPU time" / "turnaround time".
MEM Maximum resident memory usage of all processes in a job. By default, memory
usage is shown in MB. Use LSF_UNIT_FOR_LIMITS in
lsf.conf to specify a
larger unit for display (MB, GB, TB, PB, or EB).
CWD Current working directory of the job.
SWAP Maximum virtual memory usage of all processes in a job. By default, swap space is
shown in MB. Use LSF_UNIT_FOR_LIMITS in
lsf.conf to specify a larger unit
for display (MB, GB, TB, PB, or EB).
INPUT_FILE File from which the job reads its standard input (see bsub -i input_file).
OUTPUT_FILE File to which the job writes its standard output (see bsub -o output_file).
ERR_FILE File in which the job stores its standard error output (see bsub -e err_file).
EXCEPTION STATUS Possible values for the exception status of a job include:
idle
Platform LSF Command Reference 11
Advance Reservations (-U)
The job is consuming less CPU time than expected. The job idle factor
(CPU time/runtime) is less than the configured JOB_IDLE threshold for the queue
and a job exception has been triggered.
overrun
The job is running longer than the number of minutes specified by the
JOB_OVERRUN threshold for the queue and a job exception has been triggered.
underrun
The job finished sooner than the number of minutes specified by the
JOB_UNDERRUN threshold for the queue and a job exception has been triggered.
Advance Reservations (-U)
Displays the following fields:
RSVID Advance reservation ID assigned by brsvadd command
TYPE Type of reservation: user or system
CREATOR User name of the advance reservation creator, who submitted the brsvadd
command
USER User name of the advance reservation user, who submitted the job with bsub -U
NCPUS Number of CPUs reserved
RSV_HOSTS List of hosts for which processors are reserved, and the number of processors
reserved
TIME_WINDOW Time window for the reservation.
◆A one-time reservation displays fields separated by slashes
(
month/day/hour/minute). For example:
11/12/14/0-11/12/18/0
◆A recurring reservation displays fields separated by colons
(
day:hour:minute). For example:
5:18:0 5:20:0
Termination reasons displayed by bacct
When LSF detects that a job is terminated, bacct -l displays one of the following
termination reasons. The corresponding integer value logged to the JOB_FINISH
record in
◆TERM_ADMIN: Job killed by root or LSF administrator (15)
◆TERM_BUCKET_KILL: Job killed with bkill -b (23)
◆TERM_CHKPNT: Job killed after checkpointing (13)
lsb.acct is given in parentheses.
◆TERM_CWD_NOTEXIST: current working directory is not accessible or does
not exist on the execution host (25)
◆TERM_CPULIMIT: Job killed after reaching LSF CPU usage limit (12)
◆TERM_DEADLINE: Job killed after deadline expires (6)
◆TERM_EXTERNAL_SIGNAL: Job killed by a signal external to LSF (17)
12 Platform LSF Command Reference
◆TERM_FORCE_ADMIN: Job killed by root or LSF administrator without time
for cleanup (9)
◆TERM_FORCE_OWNER: Job killed by owner without time for cleanup (8)
◆TERM_LOAD: Job killed after load exceeds threshold (3)
◆TERM_MEMLIMIT: Job killed after reaching LSF memory usage limit (16)
◆TERM_OWNER: Job killed by owner (14)
◆TERM_PREEMPT: Job killed after preemption (1)
◆TERM_PROCESSLIMIT: Job killed after reaching LSF process limit (7)
◆TERM_REQUEUE_ADMIN: Job killed and requeued by root or LSF
administrator (11)
◆TERM_REQUEUE_OWNER: Job killed and requeued by owner (10)
◆TERM_RUNLIMIT: Job killed after reaching LSF run time limit (5)
◆TERM_SLURM: Job terminated abnormally in SLURM (node failure) (22)
◆TERM_SWAP: Job killed after reaching LSF swap usage limit (20)
◆TERM_THREADLIMIT: Job killed after reaching LSF thread limit (21)
◆TERM_UNKNOWN: LSF cannot determine a termination reason—0 is logged
but TERM_UNKNOWN is not displayed (0)
◆TERM_WINDOW: Job killed after queue run window closed (2)
◆TERM_ZOMBIE: Job exited while LSF is not available (19)
TIP: The integer values logged to the JOB_FINISH record in lsb.acct and termination reason
SUMMARY: ( time unit: second )
Total number of done jobs: 60 Total number of exited jobs: 118
Total CPU time consumed: 1011.5 Average CPU time consumed: 5.7
Maximum CPU time of a job: 991.4 Minimum CPU time of a job: 0.0
Total wait time in queues: 134598.0
Average wait time in queue: 756.2
Maximum wait time in queue: 7069.0 Minimum wait time in queue: 0.0
Average turnaround time: 3585 (seconds/job)
Maximum turnaround time: 77524 Minimum turnaround time: 6
Average hog factor of a job: 0.00 ( cpu time / turnaround time )
Maximum hog factor of a job: 0.56 Minimum hog factor of a job: 0.00
Total throughput: 0.67 (jobs/hour) during 266.18 hours
Beginning time: Aug 8 15:48 Ending time: Aug 19 17:59
SUMMARY: ( time unit: second )
Total number of done jobs: 45 Total number of exited jobs: 56
Total CPU time consumed: 1009.1 Average CPU time consumed: 10.0
Maximum CPU time of a job: 991.4 Minimum CPU time of a job: 0.1
Total wait time in queues: 116864.0
Average wait time in queue: 1157.1
Maximum wait time in queue: 7069.0 Minimum wait time in queue: 7.0
Average turnaround time: 1317 (seconds/job)
Maximum turnaround time: 7070 Minimum turnaround time: 10
Average hog factor of a job: 0.01 ( cpu time / turnaround time )
Maximum hog factor of a job: 0.56 Minimum hog factor of a job: 0.00
Total throughput: 0.59 (jobs/hour) during 170.21 hours
Beginning time: Aug 11 18:18 Ending time: Aug 18 20:31
Example: Advance reservation accounting information
bacct -U user1#2
Accounting for:
- advanced reservation IDs: user1#2
- advanced reservations created by user1
----------------------------------------------------------------------------RSVID TYPE CREATOR USER NCPUS RSV_HOSTS TIME_WINDOW
user1#2 user user1 user1 1 hostA:1 9/16/17/36-9/16/17/38
SUMMARY:
Total number of jobs: 4
Total CPU time consumed: 0.5 second
Maximum memory of a job: 4.2 MB
Maximum swap of a job: 5.2 MB
Total duration time: 0 hour 2 minute 0 second
Example: LSF Job termination reason logging
When a job finishes, LSF reports the last job termination action it took against the
bacct -l 7265
job and logs it into
If a running job exits because of node failure, LSF sets the correct exit information
in
lsb.acct, lsb.events, and the job output file.
Use
bacct -l to view job exit information logged to lsb.acct:
Displays information about application profile configuration.
bapp [-l | -w] [application_profile_name ...]
bapp [-h | -V]
Displays information about application profiles configured in lsb.applications.
Returns application name, job slot statistics, and job state statistics for all
application profiles:
In MultiCluster, returns the information about all application profiles in the local
cluster.
CPU time is normalized.
-w Wide format. Fields are displayed without truncation.
-l Long format with additional information.
Displays the following additional information: application profile description,
application profile characteristics and statistics, parameters, resource usage limits,
associated commands, and job controls.
application_profile_name ...
Displays information about the specified application profile.
-h Prints command usage to stderr and exits.
-V Prints product release version to stderr and exits.
Default output format
Displays the following fields:
APPLICATION_NAME
The name of the application profile. Application profiles are named to correspond
to the type of application that usually runs within them.
NJOBS The total number of job slots held currently by jobs in the application profile. This
includes pending, running, suspended and reserved job slots. A parallel job that is
running on n processors is counted as n job slots, since it takes n job slots in the
application.
PEND The number of job slots used by pending jobs in the application profile.
RUN The number of job slots used by running jobs in the application profile.
SUSP The number of job slots used by suspended jobs in the application profile.
Platform LSF Command Reference 17
Long output format(-l)
Long output format(-l)
In addition to the above fields, the -l option displays the following:
Description A description of the typical use of the application profile.
PA R AM E T ER S/
STATISTICS
SSUSP
The number of job slots in the application profile allocated to jobs that are
suspended by LSF because of load levels or run windows.
USUSP
The number of job slots in the application profile allocated to jobs that are
suspended by the job submitter or by the LSF administrator.
RSV
The number of job slots in the application profile that are reserved by LSF for
pending jobs.
Per-job resource usage limits
The soft resource usage limits that are imposed on the jobs associated with the
application profile. These limits are imposed on a per-job and a per-process basis.
The possible per-job limits are:
CPULIMIT
The maximum CPU time a job can use, in minutes, relative to the CPU factor of the
named host. CPULIMIT is scaled by the CPU factor of the execution host so that
jobs are allowed more time on slower hosts.
MEMLIMIT
The maximum running set size (RSS) of a process.
By default, the limit is shown in KB. Use LSF_UNIT_FOR_LIMITS in
specify a larger unit for display (MB, GB, TB, PB, or EB).
lsf.conf to
MEMLIMIT_TYPE
A memory limit is the maximum amount of memory a job is allowed to consume.
Jobs that exceed the level are killed. You can specify different types of memory
limits to enforce, based on PROCESS, TASK, or JOB (or any combination of the
three).
PROCESSLIMIT
The maximum number of concurrent processes allocated to a job.
PROCLIMIT
The maximum number of processors allocated to a job.
SWAPLIMIT
The swap space limit that a job may use.
By default, the limit is shown in KB. Use LSF_UNIT_FOR_LIMITS in
specify a larger unit for display (MB, GB, TB, PB, or EB).
THREADLIMIT
The maximum number of concurrent threads allocated to a job.
18 Platform LSF Command Reference
lsf.conf to
Per-process resource usage limits
The possible UNIX per-process resource limits are:
CORELIMIT
The maximum size of a core file.
CHKPNT_DIR The checkpoint directory, if automatic checkpointing is enabled for the application
CHKPNT_INITPERIOD
By default, the limit is shown in KB. Use LSF_UNIT_FOR_LIMITS in
specify a larger unit for display (MB, GB, TB, PB, or EB).
DATALIMIT
The maximum size of the data segment of a process, in KB. This restricts the
amount of memory a process can allocate.
FILELIMIT
The maximum file size a process can create, in KB.
RUNLIMIT
The maximum wall clock time a process can use, in minutes. RUNLIMIT is scaled
by the CPU factor of the execution host.
STACKLIMIT
The maximum size of the stack segment of a process. This restricts the amount of
memory a process can use for local variables or recursive function calls.
By default, the limit is shown in KB. Use LSF_UNIT_FOR_LIMITS in
specify a larger unit for display (MB, GB, TB, PB, or EB).
profile.
The initial checkpoint period in minutes. The periodic checkpoint does not happen
until the initial period has elapsed.
lsf.conf to
lsf.conf to
CHKPNT_PERIOD The checkpoint period in minutes. The running job is checkpointed automatically
every checkpoint period.
CHKPNT_METHOD The checkpoint method.
MIG The migration threshold in minutes. A value of 0 (zero) specifies that a suspended
job should be migrated immediately.
Where a host migration threshold is also specified, and is lower than the job value,
the host value is used.
PRE_EXEC The pre-execution command for the application profile. The PRE_EXEC command
runs on the execution host before the job associated with the application profile is
dispatched to the execution host (or to the first host selected for a parallel batch
job).
POST_EXEC The post-execution command for the application profile. The POST_EXEC
command runs on the execution host after the job finishes.
JOB_INCLUDE_POSTPROC
If JOB_INCLUDE_POSTPROC= Y, post-execution processing of the job is included as
part of the job.
JOB_POSTPROC_TIMEOUT
Platform LSF Command Reference 19
See also
Timeout in minutes for job post-execution processing. If post-execution processing
takes longer than the timeout,
(POST_ERR status), and kills the process group of the job’s post-execution
processes.
sbatchd reports that post-execution has failed
REQUEUE_EXIT_VALUES
Jobs that exit with these values are automatically requeued.
RES_REQ Resource requirements of the application profile. Only the hosts that satisfy these
resource requirements can be used by the application profile.
JOB_STARTER An executable file that runs immediately prior to the batch job, taking the batch job
file as an input argument. All jobs submitted to the application profile are run via
the job starter, which is generally used to create a specific execution environment
before processing the jobs themselves.
CHUNK_JOB_SIZE Chunk jobs only. Specifies the maximum number of jobs allowed to be dispatched
together in a chunk job. All of the jobs in the chunk are scheduled and dispatched
as a unit rather than individually.
RERUNNABLE If the RERUNNABLE field displays yes, jobs in the application profile are
automatically restarted or rerun if the execution host becomes unavailable.
However, a job in the application profile is not restarted if you use
the rerunnable option from the job.
bmod to remove
RESUME_CONTROL The configured actions for the resume job control.
The configured actions are displayed in the format [action_type, command] where
action_type is RESUME.
SUSPEND_CONTROL
The configured actions for the suspend job control.
The configured actions are displayed in the format [action_type, command] where action_type is SUSPEND.
TERMINATE_CONTROL
The configured actions for terminate job control.
The configured actions are displayed in the format [action_type, command] where action_type is TERMINATE.
IMPORTANT: This command can only be used by LSF administrators.
badmin
subcommands are supplied for
provides a set of subcommands to control and monitor LSF. If no
badmin, badmin prompts for a subcommand from
standard input.
Information about each subcommand is available through the
The
badmin subcommands include privileged and non-privileged subcommands.
help command.
Privileged subcommands can only be invoked by root or LSF administrators.
Privileged subcommands are:
reconfig
mbdrestart
qopen
qclose
qact
qinact
hopen
hclose
hrestart
hshutdown
hstartup
diagnose
The configuration file lsf.sudoers(5) must be set to use the privileged command
hstartup by a non-root user.
All other commands are non-privileged commands and can be invoked by any LSF
user. If the privileged commands are to be executed by the LSF administrator,
badmin must be installed, because it needs to send the request using a privileged
port.
For subcommands for which multiple hosts can be specified, do not enclose the
host names in quotation marks.
subcommand Executes the specified subcommand. See Usage section.
Usage
ckconfig [-v] Checks LSF configuration files located in the
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
LSB_CONFDIR/cluster_name/configdir directory, and checks
LSF_ENVDIR/lsf.licensescheduler.
The LSB_CONFDIR variable is defined in
LSF_ENVDIR or
/etc (if LSF_ENVDIR is not defined).
lsf.conf (see lsf.conf(5)) which is in
By default,
check. If warning errors are found,
badmin ckconfig displays only the result of the configuration file
badmin prompts you to display detailed
messages.
-v
Verbose mode. Displays detailed messages about configuration file checking to
stderr.
diagnose [job_ID ... | "job_ID]" ...][
Displays full pending reason list if CONDENSE_PENDING_REASONS=Y is set in
lsb.params. For example:
badmin diagnose 1057
reconfig [-v] [-f] Dynamically reconfigures LSF without restarting mbatchd.
Configuration files are checked for errors and the results displayed to
errors are found in the configuration files, a reconfiguration request is sent to
mbatchd and configuration files are reloaded.
With this option,
replayed. To restart
mbdrestart
When you issue this command,
mbatchd and mbschd are not restarted and lsb.events is not
mbatchd and mbschd, and replay lsb.events, use badmin
.
mbatchd is available to service requests while
reconfiguration files are reloaded. Configuration changes made since system boot
or the last reconfiguration take effect.
stderr. If no
If warning errors are found,
fatal errors are found, reconfiguration is not performed, and
badmin prompts you to display detailed messages. If
badmin exits.
If you add a host to a queue or to a host group, the new host is not recognized by
jobs that were submitted before you reconfigured. If you want the new host to be
recognized, you must use the command
badmin mbdrestart.
Resource requirements determined by the queue no longer apply to a running job
after running
badmin reconfig, For example, if you change the RES_REQ
parameter in a queue and reconfigure the cluster, the previous queue-level resource
requirements for running jobs are lost.
-v
Platform LSF Command Reference 23
Usage
Verbose mode. Displays detailed messages about the status of the configuration
files. Without this option, the default is to display the results of configuration file
checking. All messages from the configuration file check are printed to
-f
Disables interaction and proceeds with reconfiguration if configuration files
contain no fatal errors.
mbdrestart [-C comment] [-v] [-f]
Dynamically reconfigures LSF and restarts mbatchd and mbschd.
stderr.
Configuration files are checked for errors and the results printed to
errors are found, configuration files are reloaded,
restarted, and events in
last
mbatchd. While mbatchd restarts, it is unavailable to service requests.
If warning errors are found,
fatal errors are found,
exits.
If
lsb.events is large, or many jobs are running, restarting mbatchd can take
several minutes. If you only need to reload the configuration files, use
reconfig
-C comment
.
Logs the text of comment as an administrator comment record to lsb.events. The
maximum length of the comment string is 512 characters.
-v
Verbose mode. Displays detailed messages about the status of configuration files.
All messages from configuration checking are printed to
-f
Disables interaction and forces reconfiguration and mbatchd restart to proceed if
configuration files contain no fatal errors.
qopen [-C comment] [queue_name ... | all]
Opens specified queues, or all queues if the reserved word all is specified. If no
queue is specified, the system default queue is assumed. A queue can accept batch
jobs only if it is open.
stderr. If no
mbatchd and mbschd are
lsb.events are replayed to recover the running state of the
badmin prompts you to display detailed messages. If
mbatchd and mbschd restart is not performed, and badmin
badmin
stderr.
-C comment
Logs the text of comment as an administrator comment record to lsb.events. The
maximum length of the comment string is 512 characters.
qclose [-C comment] [queue_name ... | all]
Closes specified queues, or all queues if the reserved word all is specified. If no
queue is specified, the system default queue is assumed. A queue does not accept
any job if it is closed.
-C comment
Logs the text as an administrator comment record to lsb.events. The maximum
length of the comment string is 512 characters.
qact [-C comment] [queue_name ... | all]
24 Platform LSF Command Reference
Activates specified queues, or all queues if the reserved word all is specified. If no
queue is specified, the system default queue is assumed. Jobs in a queue can be
dispatched if the queue is activated.
A queue inactivated by its run windows cannot be reactivated by this command.
-C comment
Logs the text of the comment as an administrator comment record to lsb.events.
The maximum length of the comment string is 512 characters.
qinact [-C comment] [queue_name ... | all]
Inactivates specified queues, or all queues if the reserved word all is specified. If
no queue is specified, the system default queue is assumed. No job in a queue can
be dispatched if the queue is inactivated.
-C comment
Logs the text as an administrator comment record to lsb.events. The maximum
length of the comment string is 512 characters.
Displays historical events for specified queues, or for all queues if no queue is
specified. Queue events are queue opening, closing, activating and inactivating.
-t time0,time1
Displays only those events that occurred during the period from time0 to time1. See
bhist(1) for the time format. The default is to display all queue events in the event
log file (see below).
-f logfile_name
Specify the file name of the event log file. Either an absolute or a relative path name
may be specified. The default is to use the event log file currently used by the LSF
system:
LSB_SHAREDIR/cluster_name/logdir/lsb.events. Option -f is useful for
offline analysis.
If you specified an administrator comment with the
commands
qclose, qopen, qact, and qinact, qhist displays the comment text.
Opens batch server hosts. Specify the names of any server hosts or host groups. All
batch server hosts are opened if the reserved word
group is specified, the local host is assumed. A host accepts batch jobs if it is open.
IMPORTANT: If EGO-enabled SLA scheduling is configured through ENABLE_DEFAULT_EGO_SLA
in lsb.params, and a host is closed by EGO, it cannot be reopened by badmin hopen. Hosts
closed by EGO have status closed_EGO in bhosts -l output.
-C comment
Logs the text as an administrator comment record to lsb.events. The maximum
length of the comment string is 512 characters.
If you open a host group, each host group member displays with the same comment
string.
Closes batch server hosts. Specify the names of any server hosts or host groups. All
batch server hosts are closed if the reserved word
specified, the local host is assumed. A closed host does not accept any new job, but
jobs already dispatched to the host are not affected. Note that this is different from
a host closed by a window; all jobs on it are suspended in that case.
-C comment
Logs the text as an administrator comment record to lsb.events. The maximum
length of the comment string is 512 characters.
If you close a h ost g roup, eac h host grou p mem ber dis play s with the sam e comm ent
string.
If dynamic host configuration is enabled, dynamically adds hosts to a host group, .
After receiving the host information from the master LIM,
adds the host without triggering a
reconfig.
Once the host is added to the group, it is considered to be part of that group with
respect to scheduling decision making for both newly submitted jobs and for
existing pending jobs.
This command fails if any of the specified host groups or host names are not valid.
all is specified. If no argument is
mbatchd dynamically
RESTRICTION: If EGO- enabled SLA scheduling is configured through ENABLE _DEFAULT_EGO_SLA
in lsb.params, you cannot use hghostadd because all host allocation is under control of
Platform EGO.
-C comment
Logs the text as an administrator comment record to lsb.events. The maximum
length of the comment string is 512 characters.
Dynamically deletes hosts from a host group by triggering an mbatchd reconfig
This command fails if any of the specified host groups or host names are not valid.
CAUTION: If you want to change a dynamic host to a static host, first use the command
badmin hghostdel to remove the dynamic host from any host group that it belongs to, and
then configure the host as a static host in lsf.cluster.cluster_name.
RESTRICTION: If EGO- enabled SLA scheduling is configured through ENABLE _DEFAULT_EGO_SLA
in lsb.params, you cannot use hghostdel because all host allocation is under control of
Platform EGO.
hrestart [-f] [host_name ... | all]
Restarts sbatchd on the specified hosts, or on all server hosts if the reserved word
all is specified. If no host is specified, the local host is assumed. sbatchd reruns
itself from the beginning. This allows new
sbatchd binaries to be used.
-f
Disables interaction and does not ask for confirmation for restarting sbatchd.
hshutdown [-f] [host_name ... | all]
26 Platform LSF Command Reference
Shuts down sbatchd on the specified hosts, or on all batch server hosts if the
reserved word
sbatchd exits upon receiving the request.
-f
Disables interaction and does not ask for confirmation for shutting down sbatchd.
hstartup [-f] [host_name ... | all]
Starts sbatchd on the specified hosts, or on all batch server hosts if the reserved
word
use the
hosts without having to type in passwords. If no host is specified, the local host is
assumed.
all is specified. If no host is specified, the local host is assumed.
all is specified. Only root and users listed in the file lsf.sudoers(5) can
all and -f options. These users must be able to use rsh or ssh on all LSF
The shell command specified by LSF_RSH in
-f
Disables interaction and does not ask for confirmation for starting sbatchd.
◆LC_EXEC - Log significant steps for job execution
◆LC_FAIR - Log fairshare policy messages
◆LC_FILE - Log file transfer messages
◆LC_HANG - Mark where a program might hang
◆LC_JARRAY - Log job array messages
◆LC_JLIMIT - Log job slot limit messages
◆LC_LICENSE - Log license management messages (LC_LICENCE is also
supported for backward compatibility)
◆LC_LOADINDX - Log load index messages
◆LC_M_LOG - Log multievent logging messages
◆LC_MPI - Log MPI messages
◆LC_MULTI - Log messages pertaining to MultiCluster
◆LC_PEND - Log messages related to job pending reasons
◆LC_PERFM - Log performance messages
◆LC_PIM - Log PIM messages
◆LC_PREEMPT - Log preemption policy messages
◆LC_SIGNAL - Log messages pertaining to signals
◆LC_SYS - Log system call messages
◆LC_TRACE - Log significant program walk steps
◆LC_XDR - Log everything transferred by XDR
Default: 0 (no additional classes are logged)
-l debug_level
Specifies level of detail in debug messages. The higher the number, the more detail
that is logged. Higher levels include all lower levels.
Possible values:
0 LOG_DEBUG level in parameter LSF_LOG_MASK in
lsf.conf.
1 LOG_DEBUG1 level for extended logging. A higher level includes lower logging
levels. For example, LOG_DEBUG3 includes LOG_DEBUG2 LOG_DEBUG1, and
LOG_DEBUG levels.
2 LOG_DEBUG2 level for extended logging. A higher level includes lower logging
levels. For example, LOG_DEBUG3 includes LOG_DEBUG2 LOG_DEBUG1, and
LOG_DEBUG levels.
Platform LSF Command Reference 29
Usage
3 LOG_DEBUG3 level for extended logging. A higher level includes lower logging
levels. For example, LOG_DEBUG3 includes LOG_DEBUG2, LOG_DEBUG1, and
LOG_DEBUG levels.
Default: 0 (LOG_DEBUG level in parameter LSF_LOG_MASK)
-f logfile_name
Specify the name of the file into which debugging messages are to be logged. A file
name with or without a full path may be specified.
If a file name without a path is specified, the file is saved in the LSF system log
directory.
The name of the file that is created has the following format:
logfile_name.daemon_name.
log.host_name
On UNIX, if the specified path is not valid, the log file is created in the
directory.
On Windows, if the specified path is not valid, no log file is created.
Default: current LSF system log file in the LSF system log file directory.
-o
Turns off temporary debug settings and resets them to the daemon starting state.
The message log level is reset back to the value of LSF_LOG_MASK and classes are
reset to the value of LSB_DEBUG_MBD, LSB_DEBUG_SBD.
The log file is also reset back to the default log file.
host_name ...
Optional. Sets debug settings on the specified host or hosts.
Lists of host names must be separated by spaces and enclosed in quotation marks.
Default: local host (host from which command was submitted)
Sets the timing level for sbatchd to include additional timing information in log
files. You must be
root or the LSF administrator to use this command.
In MultiCluster, timing levels can only be set for hosts within the same cluster. For
example, you could not set debug or timing levels from a host in clusterA for a host
in clusterB. You need to be on a host in clusterB to set up debug or timing levels for
clusterB hosts.
If the command is used without any options, the following default values are used:
/tmp
timing_level=no timing information is recorded
logfile_name=current LSF system log file in the LSF system log file directory, in the format daemon_name.
host_name=local host (host from which command was submitted)
-l timing_level
Specifies detail of timing information that is included in log files. Timing messages
indicate the execution time of functions in the software and are logged in
milliseconds.
Valid values: 1 | 2 | 3 | 4 | 5
30 Platform LSF Command Reference
log.host_name
The higher the number, the more functions in the software that are timed and
whose execution time is logged. The lower numbers include more common
software functions. Higher levels include all lower levels.
Default: undefined (no timing information is logged)
-f logfile_name
Specify the name of the file into which timing messages are to be logged. A file
name with or without a full path may be specified.
If a file name without a path is specified, the file is saved in the LSF system log file
directory.
The name of the file created has the following format:
logfile_name.daemon_name.
log.host_name
On UNIX, if the specified path is not valid, the log file is created in the
directory.
On Windows, if the specified path is not valid, no log file is created.
Note: Both timing and debug messages are logged in the same files.
Default: current LSF system log file in the LSF system log file directory, in the
format daemon_name.
-o
log.host_name.
Optional. Turn off temporary timing settings and reset them to the daemon starting
state. The timing level is reset back to the value of the parameter for the
corresponding daemon (LSB_TIME_MBD, LSB_TIME_SBD).
The log file is also reset back to the default log file.
host_name ...
Sets the timing level on the specified host or hosts.
Lists of hosts must be separated by spaces and enclosed in quotation marks.
Default: local host (host from which command was submitted)
Moves a pending job relative to the last job in the queue.
bbot job_ID | "job_ID[index_list]" [position]
bbot -h | -V
Changes the queue position of a pending job or job array element, to affect the
order in which jobs are considered for dispatch.
By default, LSF dispatches jobs in a queue in the order of arrival (that is, first-come,
first-served), subject to availability of suitable server hosts.
The
bbot command allows users and the LSF administrator to manually change the
order in which jobs are considered for dispatch. Users can only operate on their
own jobs, whereas the LSF administrator can operate on any user’s jobs.
If invoked by the LSF administrator,
with the same priority submitted to the queue.
If invoked by a user,
priority submitted by the user to the queue.
Pending jobs are displayed by
dispatch.
A user may use
fairshare policy. However, if a job scheduled using a fairshare policy is moved by the
LSF administrator using
unless the same job is subsequently moved by the LSF administrator using
this case the job is scheduled again using the same fairshare policy.
To prevent users from changing the queue position of a pending job with
configure JOB_POSITION_CONTROL_BY_ADMIN=Y in
You ca nn ot r u n
queue.
Options
job_ID | "job_ID[index_list]"
Required. Job ID of the job or job array on which to operate.
For a job array, the index list, the square brackets, and the quotation marks are
required. An index list is used to operate on a job array. The index list is a comma
separated list whose elements have the syntax start_index[-end_index[:step]]
where start_index, end_index and step are positive integers. If the step is omitted, a
step of one is assumed. The job array index starts at one. The maximum job array
index is 1000. All jobs in the array share the same job_ID and parameters. Each
element of the array is distinguished by its array index.
bbot moves the selected job after the last job
bbot moves the selected job after the last job with the same
bjobs in the order in which they are considered for
bbot to change the dispatch order of their jobs scheduled using a
btop, the job is not subject to further fairshare scheduling
bbot; in
bbot,
lsb.params.
bbot on jobs pending in an absolute priority scheduling (APS)
34 Platform LSF Command Reference
See also
position
Optional. The position argument can be specified to indicate where in the queue
the job is to be placed. position is a positive number that indicates the target
position of the job from the end of the queue. The positions are relative to only the
applicable jobs in the queue, depending on whether the invoker is a regular user or
the LSF administrator. The default value of 1 means the position is after all other
jobs with the same priority.
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
bjobs(1), bswitch(1), btop(1), JOB_POSITION_CONTROL_BY_ADMIN in
Checkpoints the most recently submitted running or suspended checkpointable
job.
LSF administrators and
Jobs continue to execute after they have been checkpointed.
root can checkpoint jobs submitted by other users.
LSF invokes the
checkpoint.
Only running members of a chunk job can be checkpointed. For chunk jobs in
WA IT st at e,
echkpnt(8) executable found in LSF_SERVERDIR to perform the
mbatchd rejects the checkpoint request.
Options
0 (Zero). Checkpoints all of the jobs that satisfy other specified critera.
-f Forces a job to be checkpointed even if non-checkpointable conditions exist (these
conditions are OS-specific).
-k Kills a job after it has been successfully checkpointed.
-p minutes | -p 0 Enables periodic checkpointing and specifies the checkpoint period, or modifies
the checkpoint period of a checkpointed job. Specify
checkpointing.
Checkpointing is a resource-intensive operation. To allow your job to make
progress while still providing fault tolerance, specify a checkpoint period of 30
minutes or longer.
-J job_name Checkpoints only jobs that have the specified job name.
-m host_name | -m host_group
Checkpoints only jobs dispatched to the specified hosts.
-p 0 (zero) to disable periodic
-q queue_name
Checkpoints only jobs dispatched from the specified queue.
-u "user_name" | -u all
36 Platform LSF Command Reference
Checkpoints only jobs submitted by the specified users. The keyword all specifies
all users. Ignored if a job ID other than 0 (zero) is specified. To specify a Windows
user account, include the domain name in uppercase letters and use a single
backslash (DOMAIN_NAME\user_name) in a Windows command line or a double
backslash (DOMAIN_NAME\\user_name) in a UNIX command line.
job_ID | "job_ID[index_list]"
Checkpoints only the specified jobs.
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
Examples
bchkpnt 1234
Checkpoints the job with job ID 1234.
bchkpnt -p 120 1234
Enables periodic checkpointing or changes the checkpoint period to 120 minutes (2
hours) for a job with job ID 1234.
bchkpnt -m hostA -k -u all 0
When issued by root or the LSF administrator, checkpoints and kills all
checkpointable jobs on
rebooted.
hostA. This is useful when a host needs to be shut down or
-app Displays available application profiles in remote clusters.
displays MultiCluster information
bclusters [-app]
bclusters [-h | -V]
For the job forwarding model, displays a list of MultiCluster queues together with
their relationship with queues in remote clusters.
For the resource leasing model, displays remote resource provider and consumer
information, resource flow information, and connection status between the local
and remote cluster.
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
Output
Job Forwarding Information
Displays a list of MultiCluster queues together with their relationship with queues
in remote clusters.
Information related to the job forwarding model is displayed under the heading
Forwarding Information.
LOCAL_QUEUEName of a local MultiCluster send-jobs or receive-jobs queue.
JOB_FLOWIndicates direction of job flow.
send
The local queue is a MultiCluster send-jobs queue (SNDJOBS_TO is defined in the
local queue).
recv
The local queue is a MultiCluster receive-jobs queue (RCVJOBS_FROM is defined
in the local queue).
Job
REMOTEFor send-jobs queues, shows the name of the receive-jobs queue in a remote cluster.
For receive-jobs queues, always “-”.
CLUSTERFor send-jobs queues, shows the name of the remote cluster containing the
receive-jobs queue.
38 Platform LSF Command Reference
For receive-jobs queues, shows the name of the remote cluster that can send jobs to
the local queue.
STATUSIndicates the connection status between the local queue and remote queue.
ok
The two clusters can exchange information and the system is properly configured.
disc
Communication between the two clusters has not been established. This could
occur because there are no jobs waiting to be dispatched, or because the remote
master cannot be located.
reject
The remote queue rejects jobs from the send-jobs queue. The local queue and
remote queue are connected and the clusters communicate, but the queue-level
configuration is not correct. For example, the send-jobs queue in the submission
cluster points to a receive-jobs queue that does not exist in the remote cluster.
If the job is rejected, it returns to the submission cluster.
Resource Lease Information
Displays remote resource provider and consumer information, resource flow
information, and connection status between the local and remote cluster.
Information related to the resource leasing model is displayed under the heading
Resource Lease Information.
REMOTE_CLUSTERFor borrowed resources, name of the remote cluster that is the provider.
For exported resources, name of the remote cluster that is the consumer.
RESOURCE_FLOWIndicates direction of resource flow.
IMPORT
Local cluster is the consumer and borrows resources from the remote cluster
(HOSTS parameter in one or more local queue definitions includes remote
resources).
EXPORT
Local cluster is the provider and exports resources to the remote cluster.
STATUSIndicates the connection status between the local and remote cluster.
ok
MultiCluster jobs can run.
disc
No communication between the two clusters. This could be a temporary situation
or could indicate a MultiCluster configuration error.
conn
The two clusters communicate, but the lease is not established. This should be a
temporary situation.
Platform LSF Command Reference 39
Files
Remote Cluster Application Information
bcluster -app displays information related to application profile configuration
under the heading
profile information is only displayed for the job forwarding model.
not show local cluster application profile information.
Creates a job group with the job group name specified by job_group_name.
You must provide full group path name for the new job group. The last component
of the path is the name of the new group to be created.
You do not need to create the parent job group before you create a sub-group under
it. If no groups in the job group hierarchy exist, all groups are created with the
specified hierarchy.
group (including child groups)
USSUP) under the job group.
Specify a positive number between 0 and 2147483647. If the specified limit is zero
(0), no jobs under the job group can run.
You cannot specify a limit for the root job group. The root job group has no job
limit. Job groups added with no limits specified inherit any limits of existing parent
job groups. The
-L option only limits the lowest level job group created.
-L limits the number of started jobs (RUN, SSUSP,
If a parallel job requests 2 CPUs (
slots used by the job.
By default, a job group has no job limit. Limits persist across
reconfiguration.
-sla service_class_name
The name of a service class defined in lsb.serviceclasses, or the name of the
SLA defined in ENABLE_DEFAULT_EGO_SLA in
attached to the specified SLA.
job_group_name Full path of the job group name.
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
Examples
◆Create a job group named risk_group under the root group /:
bgadd /risk_group
bsub -n 2), the job group limit is per job, not per
mbatchd restart or
lsb.params. The job group is
Platform LSF Command Reference 41
See also
See also
◆Create a job group named portfolio1 under job group /risk_group:
bgadd /risk_group/portfolio1
bgdel, bjgroup
42 Platform LSF Command Reference
bgdel
Synopsis
Description
deletes job groups
bgdel [-u user_name | -u all] job_group_name | 0
bgdel -c job_group_name
bgdel [-h | -V]
Deletes a job group with the job group name specified by job_group_name and all
its subgroups.
You must provide full group path name for the job group to be deleted. The job
group cannot contain any jobs.
Users can only delete their own job groups. LSF administrators can delete any job
groups.
Job groups can be created explicitly or implicitly:
◆A job group is created explicitly with the bgadd command.
◆A job group is created implicitly by the bsub -g or bmod -g command when
the specified group does not exist. Job groups are also created implicitly when
a default job group is configured (DEFAULT_JOBGROUP in
LSB_DEFAULT_JOBGROUP environment variable).
lsb.params or
Options
0 Delete the empty job groups. These groups can be explicit or implicit.
-u user_name Delete empty job groups owned by the specified user. Only administrators can use
this option. These groups can be explicit or implicit. If you specify a job group
name, the
-u all Delete empty job groups and their sub groups for all users. Only administrators can
use this option. These groups can be explicit or implicit. If you specify a job group
name, the
-c job_group_name Delete all the empty groups below the requested job_group_name including the
job_group_name itself. These groups can be explicit or implicit.
job_group_name Full path of the job group name.
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
-u option is ignored.
-u option is ignored.
Example
bgdel /risk_group
Job group /risk_group is deleted.
deletes the job group /risk_group and all its subgroups.
◆Displays information about your own pending, running and suspended jobs.
Groups information by job
◆CPU time is not normalized
Options
◆Searches the event log file currently used by the LSF system:
$LSB_SHAREDIR/cluster_name/logdir/lsb.events (see lsb.events(5))
◆Displays events occurring in the past week, but this can be changed by setting
the environment variable LSB_BHIST_HOURS to an alternative number of
hours
If neither
-l nor -b is present, the default is to display only the fields shown in
Output on page 48.
-a Displays information about both finished and unfinished jobs.
This option overrides
-b Brief format. Displays the information in a brief format. If used with the -s option,
-d, -p, -s, and -r.
shows the reason why each job was suspended.
-d Only displays information about finished jobs.
-e Only displays information about exited jobs.
-l Long format. Displays additional information. If used with -s, shows the reason
why each job was suspended.
If you submitted a job using the
this option displays the successful
OR (||) expression to specify alternative resources,
Execution rusage string with which the job
ran.
Platform LSF Command Reference 45
Options
If you submitted a job with multiple resource requirement strings using the bsub -R
option for the order, same, rusage, and select sections,
bjobs -l displays a single,
merged resource requirement string for those sections, as if they were submitted
using a single
bhist -l can display job exit codes. A job with exit code 131 means that the job
-R.
exceeded a configured resource usage limit and LSF killed the job with signal 3
(131-128=3).
bhist -l can display changes to pending jobs as a result of the following bmod
options:
◆Absolute priority scheduling (-aps | -apsn)
◆Runtime estimate (-We | -Wen)
◆Post-execution command (-Ep | -Epn)
◆User limits (-ul | -uln)
◆Current working directory (-cwd | -cwdn)
◆Checkpoint options (-k | -kn)
◆Migration threshold (-mig | -mign)
-p Only displays information about pending jobs.
-r Only displays information about running jobs.
-s Only displays information about suspended jobs.
-t Displays job events chronologically.
-w Wide format. Displays the information in a wide format.
-C start_time,end_time
-D start_time,end_time
Only displays jobs that completed or exited during the specified time interval.
Specify the span of time for which you want to display the history. If you do not
specify a start time, the start time is assumed to be the time of the first occurrence.
If you do not specify an end time, the end time is assumed to be now.
Specify the times in the format "yyyy/mm/dd/HH:MM". Do not specify spaces in
the time interval string.
The time interval can be specified in many ways. For more specific syntax and
examples of time formats, see TIME INTERVAL FORMAT.
Only displays jobs dispatched during the specified time interval. Specify the span
of time for which you want to display the history. If you do not specify a start time,
the start time is assumed to be the time of the first occurrence. If you do not specify
an end time, the end time is assumed to be now.
Specify the times in the format "yyyy/mm/dd/HH:MM". Do not specify spaces in
the time interval string.
The time interval can be specified in many ways. For more specific syntax and
examples of time formats, see TIME INTERVAL FORMAT.
-S start_time,end_time
46 Platform LSF Command Reference
-T start_time,end_time
Only displays information about jobs submitted during the specified time interval.
Specify the span of time for which you want to display the history. If you do not
specify a start time, the start time is assumed to be the time of the first occurrence.
If you do not specify an end time, the end time is assumed to be now.
Specify the times in the format "yyyy/mm/dd/HH:MM". Do not specify spaces in
the time interval string.
The time interval can be specified in many ways. For more specific syntax and
examples of time formats, see TIME INTERVAL FORMAT.
Used together with -t.
Only displays information about job events within the specified time interval.
Specify the span of time for which you want to display the history. If you do not
specify a start time, the start time is assumed to be the time of the first occurrence.
If you do not specify an end time, the end time is assumed to be now.
Specify the times in the format
yyyy/mm/dd/HH:MM. Do not specify spaces in the
time interval string.
The time interval can be specified in many ways. For more specific syntax and
examples of time formats, see Time Interval Format on page 49.
-f logfile_name Searches the specified event log. Specify either an absolute or a relative path.
Useful for analysis directly on the file.
The specified file path can contain up to 4094 characters for UNIX, or up to 255
characters for Windows.
-J job_name Only displays the jobs that have the specified job name.
The specified job name can contain up to 4094 characters for UNIX, or up to 255
characters for Windows.
-Lp ls_project_name Only displays information about jobs belonging to the specified License Scheduler
project.
-m host_name Only displays jobs dispatched to the specified host.
-n number_logfiles | -n 0
Searches the specified number of event logs, starting with the current event log and
working through the most recent consecutively numbered logs. The maximum
number of logs you can search is 100. Specify 0 to specify all the event log files in
$(LSB_SHAREDIR)/cluster_name/logdir (up to a maximum of 100 files).
If you delete a file, you break the consecutive numbering, and older files are
inaccessible to
bhist.
For example, if you specify 3, LSF searches
lsb.events.2. If you specify 4, LSF searches lsb.events, lsb.events.1,
lsb.events.2, and lsb.events.3. However, if lsb.events.2 is missing, both
searches include only
-N host_name | -N host_model | -N cpu_factor
Normalizes CPU time by the specified CPU factor, or by the CPU factor of the
specified host or host model.
lsb.events, lsb.events.1, and
lsb.events and lsb.events.1.
Platform LSF Command Reference 47
Output
If you use bhist directly on an event log, you must specify a CPU factor.
Use
lsinfo to get host model and CPU factor information.
-P project_name Only displays information about jobs belonging to the specified project.
-q queue_name Only displays information about jobs submitted to the specified queue.
-u user_name | -u all Displays information about jobs submitted by the specified user, or by all users if
the keyword
domain name in uppercase letters and use a single backslash (DOMAIN_NAME\user_name) in a Windows command line or a double backslash
(DOMAIN_NAME\\user_name) in a UNIX command line.
job_ID | "job_ID[index]"
Searches all event log files and only displays information about the specified jobs.
If you specify a job array, displays all elements chronologically.
all is specified. To specify a Windows user account, include the
This option overrides all other options except
with
-J, only those jobs listed here that have the specified job name are displayed.
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
-J, -N, -h, and -V. When it is used
Output
Default format
Statistics of the amount of time that a job has spent in various states:
PENDThe total waiting time excluding user suspended time before the job is dispatched.
PSUSPThe total user suspended time of a pending job.
RUNThe total run time of the job.
USUSPThe total user suspended time after the job is dispatched.
SSUSP The total system suspended time after the job is dispatched.
UNKWN The total unknown time of the job (job status becomes unknown if sbatchd on the
execution host is temporarily unreachable).
TOTALThe total time that the job has spent in all states; for a finished job, it is the
turnaround time (that is, the time interval from job submission to job completion).
Long format (-l)
The -l option displays a long format listing with the following additional fields:
ProjectThe project the job was submitted from.
Application ProfileThe application profile the job was submitted to.
Command The job command.
48 Platform LSF Command Reference
Detailed history includes job group modification, the date and time the job was
forwarded and the name of the cluster to which the job was forwarded.
The displayed job command can contain up to 4094 characters for UNIX, or up to
255 characters for Windows.
Initial checkpoint
period
The initial checkpoint period specified at the job level, by bsub -k, or in an
application profile with CHKPNT_INITPERIOD.
Checkpoint periodThe checkpoint period specified at the job level, by bsub -k, in the queue with
CHKPNT, or in an application profile with CHKPNT_PERIOD.
Checkpoint
directory
Migration
The checkpoint directory specified at the job level, by bsub -k, in the queue with
CHKPNT, or in an application profile with CHKPNT_DIR.
The migration threshold specified at the job level, by bsub -mig.
You use the time interval to define a start and end time for collecting the data to be
retrieved and displayed. While you can specify both a start and an end time, you can
also let one of the values default. You can specify either of the times as an absolute
time, by specifying the date or time, or you can specify them relative to the current
time.
◆year is a four-digit number representing the calendar year.
◆month is a number from 1 to 12, where 1 is January and 12 is December.
◆day is a number from 1 to 31, representing the day of the month.
◆hour is an integer from 0 to 23, representing the hour of the day on a 24-hour
clock.
◆minute is an integer from 0 to 59, representing the minute of the hour.
◆. (period) represents the current month/day/hour:minute.
◆.-relative_int is a number, from 1 to 31, specifying a relative start or end time
prior to now.
start_time,end_time
Platform LSF Command Reference 49
Time Interval Format
Specifies both the start and end times of the interval.
start_time,
Specifies a start time, and lets the end time default to now.
,end_time
Specifies to start with the first logged occurrence, and end at the time specified.
start_time
Starts at the beginning of the most specific time period specified, and ends at the
maximum value of the time period specified. For example,
of February—start February 1 at 00:00 a.m. and end at the last possible minute in
February: February 28th at midnight.
Absolute Time Examples
Assume the current time is May 9 17:06 2008:
1,8 = May 1 00:00 2008 to May 8 23:59 2008
,4 = the time of the first occurrence to May 4 23:59 2008
6 = May 6 00:00 2008 to May 6 23:59 2008
2/ = Feb 1 00:00 2008 to Feb 28 23:59 2008
2/ specifies the month
/12: = May 9 12:00 2008 to May 9 12:59 2008
2/1 = Feb 1 00:00 2008 to Feb 1 23:59 2008
2/1, = Feb 1 00:00 to the current time
,. = the time of the first occurrence to the current time
,2/10: = the time of the first occurrence to May 2 10:59 2008
2001/12/31,2008/5/1 = from Dec 31, 2001 00:00:00 to May 1st 2008 23:59:59
Relative Time Examples
.-9, = April 30 17:06 2008 to the current time
,.-2/ = the time of the first occurrence to Mar 7 17:06 2008
.-9,.-2 = nine days ago to two days ago (April 30, 2008 17:06 to May 7, 2008 17:06)
50 Platform LSF Command Reference
bhosts
Synopsis
Description
displays hosts and their static and dynamic resources
◆host_group_status is the overall status of the host group. If a single host in the
host group is
ok, the overall status is also ok.
Platform LSF Command Reference 51
Options
◆num_ok, num_unavail, num_unreach, and num_busy are the number of hosts
that are
ok, unavail, unreach, and busy, respectively.
For example, if there are five
in a condensed host group
hg1 ok 5/2/1/3
ok, two unavail, one unreach, and three busy hosts
hg1, its status is displayed as the following:
If any hosts in the host group are closed, the status for the host group is displayed
as
closed, with no status for the other states:
hg1 closed
-x Display hosts whose job exit rate has exceeded the threshold configured by
EXIT_RATE in
configured in
next time LSF checks host exceptions and invokes
Use with the
If no hosts exceed the job exit rate,
There is no exceptional host found
lsb.hosts for longer than JOB_EXIT_RATE_DURATION
lsb.params, and are still high. By default, these hosts are closed the
eadmin.
-l option to show detailed information about host exceptions.
bhosts -x displays:
-X Displays uncondensed output for host groups.
-R "res_req" Only displays information about hosts that satisfy the resource requirement
expression. For more information about resource requirements, see Administering
Platform LSF. The size of the resource requirement string is limited to 512 bytes.
LSF supports ordering of resource requirements on all load indices, including
external load indices, either static or dynamic.
-s [resource_name ...]
Displays information about the specified resources (shared or host-based). The
resources must have numeric values. Returns the following information: the
resource names, the total and reserved amounts, and the resource locations.
bhosts -s only shows consumable resources.
When LOCAL_TO is configured for a license feature in
bhosts -s shows different resource information depending on the cluster locality
of the features. For example:
From
bhosts -s
RESOURCE TOTAL RESERVED LOCATION
hspice 36.0 0.0 host1
From clusterB in siteB:
bhosts -s
RESOURCE TOTAL RESERVED LOCATION
hspice 76.0 0.0 host2
host_name ... | host_group ...
Only displays information about the specified hosts. Do not use quotes when
specifying multiple hosts.
lsf.licensescheduler,
clusterA:
52 Platform LSF Command Reference
For host groups, the names of the hosts belonging to the group are displayed instead
of the name of the host group. Do not use quotes when specifying multiple host
groups.
cluster_name MultiCluster only. Displays information about hosts in the specified cluster.
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
Output
Host-Based Default
Displays the following fields:
HOST_NAME The name of the host. If a host has batch jobs running and the host is removed from
the configuration, the host name is displayed as
For condensed host groups, this is the name of host group.
STATUSWith MultiCluster, not shown for fully exported hosts.
lost_and_found.
The current status of the host and the
dispatched to hosts with an
ok status. The possible values for host status are as
sbatchd daemon. Batch jobs can only be
follows:
ok
The host is available to accept batch jobs.
For condensed host groups, if a single host in the host group is
is also shown as
If any host in the host group is not
ok.
ok, bhosts displays the first host status it
encounters as the overall status for the condensed host group. Use
ok, the overall status
bhosts -X to see
the status of individual hosts in the host group.
unavail
The host is down, or LIM and sbatchd on the host are unreachable.
unreach
LIM on the host is running but sbatchd is unreachable.
closed
The host is not allowed to accept any remote batch jobs. There are several reasons
for the host to be closed (see Host-Based
unlicensed
-l Options).
The host does not have a valid LSF license.
JL/UWith MultiCluster, not shown for fully exported hosts.
The maximum number of job slots that the host can process on a per user basis. If
a dash (-) is displayed, there is no limit.
For condensed host groups, this is the total number of job slots that all hosts in the
host group can process on a per user basis.
Platform LSF Command Reference 53
Output
The host does not allocate more than JL/U job slots for one user at the same time.
These job slots are used by running jobs, as well as by suspended or pending jobs
that have slots reserved for them.
For preemptive scheduling, the accounting is different. These job slots are used by
running jobs and by pending jobs that have slots reserved for them (see the
description of PREEMPTIVE in
lsb.queues(5) and JL/U in lsb.hosts(5)).
MAXThe maximum number of job slots available. If a dash (-) is displayed, there is no
limit.
For condensed host groups, this is the total maximum number of job slots available
in all hosts in the host group.
These job slots are used by running jobs, as well as by suspended or pending jobs
that have slots reserved for them.
If preemptive scheduling is used, suspended jobs are not counted (see the
description of PREEMPTIVE in
A host does not always have to allocate this many job slots if there are waiting jobs;
the host must also satisfy its configured load conditions to accept more jobs.
lsb.queues(5) and MXJ in lsb.hosts(5)).
NJOBS The number of job slots used by jobs dispatched to the host. This includes running,
suspended, and chunk jobs.
For condensed host groups, this is the total number of job slots used by jobs
dispatched to any host in the host group.
RUNThe number of job slots used by jobs running on the host.
For condensed host groups, this is the total number of job slots used by jobs
running on any host in the host group.
SSUSP The number of job slots used by system suspended jobs on the host.
For condensed host groups, this is the total number of job slots used by system
suspended jobs on any host in the host group.
USUSPThe number of job slots used by user suspended jobs on the host. Jobs can be
suspended by the user or by the LSF administrator.
For condensed host groups, this is the total number of job slots used by user
suspended jobs on any host in the host group.
RSVThe number of job slots used by pending jobs that have jobs slots reserved on the
host.
For condensed host groups, this is the total number of job slots used by pending
jobs that have job slots reserved on any host in the host group.
Host-Based -l Option
In addition to the above fields, the -l option also displays the following:
loadSched,
loadStop
54 Platform LSF Command Reference
The scheduling and suspending thresholds for the host. If a threshold is not
defined, the threshold from the queue definition applies. If both the host and the
queue define a threshold for a load index, the most restrictive threshold is used.
The migration threshold is the time that a job dispatched to this host can remain
suspended by the system before LSF attempts to migrate the job to another host.
If the host’s operating system supports checkpoint copy, this is indicated here. With
checkpoint copy, the operating system automatically copies all open files to the
checkpoint directory when a process is checkpointed. Checkpoint copy is currently
supported only on Cray systems.
STATUSThe long format shown by the -l option gives the possible reasons for a host to be
closed:
closed_Adm
The host is closed by the LSF administrator or root (see badmin(8)). No job can be
dispatched to the host, but jobs that are executing on the host are not affected.
closed_Lock
The host is locked by the LSF administrator or root (see lsadmin(8)). All batch
jobs on the host are suspended by LSF.
closed_Wind
The host is closed by its dispatch windows, which are defined in the configuration
file
lsb.hosts(5). Jobs already started are not affected by the dispatch windows.
closed_Full
The configured maximum number of batch job slots on the host has been reached
(see MAX field below).
closed_Excl
The host is currently running an exclusive job.
closed_Busy
The host is overloaded, because some load indices go beyond the configured
thresholds (see
lsb.hosts(5)). The displayed thresholds that cause the host to be
busy are preceded by an asterisk (*).
closed_LIM
LIM on the host is unreachable, but sbatchd is ok.
closed_EGO
For EGO-enabled SLA scheduling, host is closed because it has not been allocated
by EGO to run LSF jobs. Hosts allocated from EGO display status
ok.
CPUFDisplays the CPU normalization factor of the host (see lshosts(1)).
DISPATCH_WINDOW
Displays the dispatch windows for each host. Dispatch windows are the time
windows during the week when batch jobs can be run on each host. Jobs already
started are not affected by the dispatch windows. When the dispatch windows close,
jobs are not suspended. Jobs already running continue to run, but no new jobs are
started until the windows reopen. The default for the dispatch window is no
restriction or always open (that is, twenty-four hours a day and seven days a week).
For the dispatch window specification, see the description for the
DISPATCH_WINDOWS keyword under the
-l option in bqueues(1).
Platform LSF Command Reference 55
Output
CURRENT LOADDisplays the total and reserved host load.
Reserved
You specify reserved resources by using bsub -R. These resources are reserved by
jobs running on the host.
To ta l
The total load has different meanings depending on whether the load index is
increasing or decreasing.
For increasing load indices, such as run queue lengths, CPU utilization, paging
activity, logins, and disk I/O, the total load is the consumed plus the reserved
amount. The total load is calculated as the sum of the current load and the reserved
load. The current load is the load seen by
lsload(1).
For decreasing load indices, such as available memory, idle time, available swap
space, and available space in tmp, the total load is the available amount. The total
load is the difference between the current load and the reserved load. This
difference is the available resource as seen by
lsload(1).
LOAD THRESHOLDDisplays the scheduling threshold loadSched and the suspending threshold
loadStop. Also displays the migration threshold if defined and the checkpoint
support if the host supports checkpointing.
The format for the thresholds is the same as for batch job queues (see
and
lsb.queues(5)). For an explanation of the thresholds and load indices, see the
description for the "QUEUE SCHEDULING PARAMETERS" keyword under the
-l option in bqueues(1).
THRESHOLD AND LOAD USED FOR EXCEPTIONS
Displays the configured threshold of EXIT_RATE for the host and its current load
value for host exceptions.
ADMIN ACTION COMMENT
If the LSF administrator specified an administrator comment with the -C option of
the
badmin host control commands hclose or hopen, the comment text is
displayed.
Resource-Based -s Option
The -s option displays the following: the amounts used for scheduling, the amounts
reserved, and the associated hosts for the resources. Only resources (shared or
host-based) with numeric values are displayed. See
on how to configure shared resources.
The following fields are displayed:
RESOURCEThe name of the resource.
bqueues(1))
lim(8), and lsf.cluster(5)
TOTALThe total amount free of a resource used for scheduling.
RESERVEDThe amount reserved by jobs. You specify the reserved resource using bsub -R.
LOCATIONThe hosts that are associated with the resource.
56 Platform LSF Command Reference
Files
See also
Reads lsb.hosts.
lsb.hosts, bqueues, lshosts, badmin, lsadmin
Platform LSF Command Reference 57
bhpart
bhpart
Synopsis
Description
Options
-r Displays the entire information tree associated with the host partition recursively.
host_partition_name ...
-h Prints command usage to stderr and exits.
displays information about host partitions
bhpart [-r] [host_partition_name ...]
bhpart [-h | -V]
By default, displays information about all host partitions. Host partitions are used
to configure host-partition fairshare scheduling.
Displays information about the specified host partitions only.
-V Prints LSF release version to stderr and exits.
Output
The following fields are displayed for each host partition:
HOST_PARTITION_NAME
Name of the host partition.
HOSTS
Hosts or host groups that are members of the host partition. The name of a host
group is appended by a slash (
USER/GROUP
Name of users or user groups who have access to the host partition (see
bugroup(1)).
SHARES
Number of shares of resources assigned to each user or user group in this host
partition, as configured in the file
priority for when fairshare scheduling is configured at the host level.
PRIORITY
Dynamic user priority for the user or user group. Larger values represent higher
priorities. Jobs belonging to the user or user group with the highest priority are
considered first for dispatch.
/) (see bmgroup(1)).
lsb.hosts. The shares affect dynamic user
In general, users or user groups with larger SHARES, fewer STARTED and
RESERVED, and a lower CPU_TIME and RUN_TIME have higher PRIORITY.
58 Platform LSF Command Reference
STARTED
RESERVED
CPU_TIME
RUN_TIME
Number of job slots used by running or suspended jobs owned by users or user
groups in the host partition.
Number of job slots reserved by the jobs owned by users or user groups in the host
partition.
Cumulative CPU time used by jobs of users or user groups executed in the host
partition. Measured in seconds, to one decimal place.
LSF calculates the cumulative CPU time using the actual (not normalized) CPU
time and a decay factor such that 1 hour of recently-used CPU time decays to 0.1
hours after an interval of time specified by HIST_HOURS in
by default).
Wall-clock run time plus historical run time of jobs of users or user groups that are
executed in the host partition. Measured in seconds.
LSF calculates the historical run time using the actual run time of finished jobs and
a decay factor such that 1 hour of recently-used run time decays to 0.1 hours after
an interval of time specified by HIST_HOURS in
Wall-clock run time is the run time of running jobs.
lsb.params (5 hours by default).
lsb.params (5 hours
Files
See also
Reads lsb.hosts.
bugroup(1), bmgroup(1), lsb.hosts(5)
Platform LSF Command Reference 59
bgmod
bgmod
Synopsis
Description
Options
-L limit Changes the limit of job_group_name to the specified limit value. If the job group
modifies job groups
bgmod [-L limit | -Ln] job_group_name
bgmod [-h | -V]
Modifies the job group with the job group name specified by job_group_name.
Only root, LSF administrators, the job group creator, or the creator of the parent job
groups can use
You must provide full group path name for the modified job group. The last
component of the path is the name of the job group to be modified.
has parent job groups, the new limit cannot exceed the limits of any higher level job
groups. Similarly, if the job group has child job groups, the new value must be
greater than any limits on the lower level job groups.
limit specifies the maximum number of concurrent jobs allowed to run under the
job group (including child groups)
SSUSP, USSUP) under the job group.
bgmod to modify a job group limit.
-L limits the number of started jobs (RUN,
Specify a positive number between 0 and 2147483647. If the specified limit is zero
(0), no jobs under the job group can run.
You cannot specify a limit for the root job group. The root job group has no job
limit. The -L option only limits the lowest level job group specified.
If a parallel job requests 2 CPUs (
slots used by the job.
-Ln Removes the existing job limit for the job group. If the job group has parent job
groups, the job modified group automatically inherits any limits from its direct
parent job group.
job_group_name Full path of the job group name.
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
Examples
The following command only modifies the limit of group
/canada/projects/test1. It does not modify limits of /canada
or
/canada/projects.
bgmod -L 6 /canada/projects/test1
bsub -n 2), the job group limit is per job, not per
60 Platform LSF Command Reference
See also
To m o di fy lim its o f /canada or/canada/projects, you must specify the exact
group name:
bgmod -L 6 /canada
or
bgmod -L 6 /canada/projects
bgadd, bgdel, bjgroup
Platform LSF Command Reference 61
bjgroup
bjgroup
displays information about job groups
Synopsis
bjgroup [-N] [-s [group_name]]
bjgroup [-h | -V]
Description
Displays job group information.
Options
-s Sorts job groups by group hierarchy.
For example, for job groups named
displays:
bjgroup
GROUP_NAME NJOBS PEND RUN SSUSP USUSP FINISH SLA JLIMIT OWNER
/A 0 0 0 0 0 0 () 0/10 user1
/X 0 0 0 0 0 0 () 0/- user2
/A/B 0 0 0 0 0 0 () 0/5 user1
/X/Y 0 0 0 0 0 0 () 0/5 user2
For the same job groups, bjgroup -s displays:
bjgroup -s
GROUP_NAME NJOBS PEND RUN SSUSP USUSP FINISH SLA JLIMIT OWNER
/A 0 0 0 0 0 0 () 0/10 user1
/A/B 0 0 0 0 0 0 () 0/5 user1
/X 0 0 0 0 0 0 () 0/- user2
/X/Y 0 0 0 0 0 0 () 0/5 user2
Specify a job group name to show the hierarchy of a single job group:
bjgroup -s /X
GROUP_NAME NJOBS PEND RUN SSUSP USUSP FINISH SLA JLIMIT OWNER
/X 25 0 25 0 0 0 puccini 25/100 user1
/X/Y 20 0 20 0 0 0 puccini 20/30 user1
/X/Z 5 0 5 0 0 0 puccini 5/10 user2
Specify a job group name with a trailing slash character (/) to show only the root
job group:
bjgroup -s /X/
GROUP_NAME NJOBS PEND RUN SSUSP USUSP FINISH SLA JLIMIT OWNER
/X 25 0 25 0 0 0 puccini 25/100 user1
/A, /A/B, /X and /X/Y, bjgroup without -s
62 Platform LSF Command Reference
Displays job group information by job slots instead of number of jobs. NSLOTS,
-N
PEND, RUN, SSUSP, USUSP, RSV are all counted in slots rather than number of
jobs:
bjgroup -N
GROUP_NAME NSLOTS PEND RUN SSUSP USUSP RSV SLA OWNER
/X 25 0 25 0 0 0 puccini user1
/A/B 20 0 20 0 0 0 wagner batch
by itself shows job slot info for all job groups, and can combine with -s to sort
-N
the job groups by hierarchy:
bjgroup -N -s
GROUP_NAME NSLOTS PEND RUN SSUSP USUSP RSV SLA OWNER
/A 0 0 0 0 0 0 wagner batch
/A/B 0 0 0 0 0 0 wagner user1
/X 25 0 25 0 0 0 puccini user1
/X/Y 20 0 20 0 0 0 puccini batch
/X/Z 5 0 5 0 0 0 puccini batch
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
Default output
GROUP_NAME
NJOBS
PEND
RUN
SSUSP
USUSP
FINISH
A list of job groups is displayed with the following fields:
The name of the job group.
The current number of jobs in the job group. A parallel job is counted as 1 job,
regardless of the number of job slots it uses.
The number of pending jobs in the job group.
The number of running jobs in the job group.
The number of system-suspended jobs in the job group.
The number of user-suspended jobs in the job group.
The number of jobs in the specified job group in EXITED or DONE state.
Platform LSF Command Reference 63
Job slots (-N) output
SLA
The name of the service class that the job group is attached to with
bgadd -sla service_class_name. If the job group is not attached to any service class,
empty parentheses
() are displayed in the SLA name column.
JLIMIT
The job group limit set by bgadd -L or bgmod -L. Job groups that have no
configured limits or no limit usage are indicated by a dash (
displayed in a USED/LIMIT format. For example, if a limit of 5 jobs is configured
and 1 job is started,
bjgroup displays the job limit under JLIMIT as 1/5.
-). Job group limits are
OWNER
The job group owner.
Example
bjgroup
GROUP_NAME NJOBS PEND RUN SSUSP USUSP FINISH SLA JLIMIT OWNER
/fund1_grp 5 4 0 1 0 0 Venezia 1/5 user1
/fund2_grp 11 2 5 0 0 4 Venezia 5/5 user1
/bond_grp 2 2 0 0 0 0 Venezia 0/- user2
/risk_grp 2 1 1 0 0 0 () 1/- user2
/admi_grp 4 4 0 0 0 0 () 0/- user2
Job slots (-N) output
NSLOTS, PEND, RUN, SSUSP, USUSP, RSV are all counted in slots rather than
number of jobs. A list of job groups is displayed with the following fields:
GROUP_NAME
The name of the job group.
NSLOTS
The total number of job slots held currently by jobs in the job group. This includes
pending, running, suspended and reserved job slots. A parallel job that is running
on n processors is counted as n job slots, since it takes n job slots in the job group.
PEND
The number of job slots used by pending jobs in the job group.
RUN
The number of job slots used by running jobs in the job group.
SSUSP
The number of job slots used by system-suspended jobs in the job group.
USUSP
The number of job slots used by user-suspended jobs in the job group.
64 Platform LSF Command Reference
RSV
The number of job slots in the job group that are reserved by LSF for pending jobs.
SLA
The name of the service class that the job group is attached to with
bgadd -sla service_class_name. If the job group is not attached to any service class,
empty parentheses
() are displayed in the SLA name column.
OWNER
The job group owner.
Example
bjgroup -N
GROUP_NAME NSLOTS PEND RUN SSUSP USUSP RSV SLA OWNER
By default, displays information about your own pending, running and suspended
jobs.
bjobs displays output for condensed host groups. These host groups are defined by
CONDENSE in the HostGroup section of lsb.hosts. These host groups are displayed
as a single entry with the name as defined by
of
lsb.hosts. The -l and -X options display uncondensed output.
If you defined LSB_SHORT_HOSTLIST=1 in
GROUP_NAME in the HostGroup section
lsf.conf, parallel jobs running in
the same condensed host group are displayed as an abbreviated list.
To display older historical information, use
-A Displays summarized information about job arrays. If you specify job arrays with
the job array ID, and also specify
-A, do not include the index list with the job array
bhist.
ID.
You ca n u s e
-a Displays information about jobs in all states, including finished jobs that finished
recently, within an interval specified by CLEAN_PERIOD in
-w to show the full array specification, if necessary.
lsb.params (the
default period is 1 hour).
66 Platform LSF Command Reference
Use -a with -x option to display all jobs that have triggered a job exception
(overrun, underrun, idle).
-aps Displays absolute priority scheduling (APS) information for pending jobs in a
queue with APS_PRIORITY enabled. The APS value is calculated based on the
current scheduling cycle, so jobs are not guaranteed to be dispatched in this order.
Pending jobs are ordered by APS value. Jobs with system APS values are listed first,
from highest to lowest APS value. Jobs with calculated APS values are listed next
ordered from high to low value. Finally, jobs not in an APS queue are listed. Jobs
with equal APS values are listed in order of submission time. APS values of jobs not
in an APS queue are shown with a dash (
If queues are configured with the same priority,
-).
bjobs -aps may not show jobs in
the correct expected dispatch order. Jobs may be dispatched in the order the queues
are configured in
lsb.queues. You should avoid configuring queues with the same
priority.
-d Displays information about jobs that finished recently, within an interval specified
by CLEAN_PERIOD in
-l Long format. Displays detailed information for each job in a multiline format.
The
-l option displays the following additional information: project name, job
lsb.params (the default period is 1 hour).
command, current working directory on the submission host, initial checkpoint
period, checkpoint directory, migration threshold, pending and suspending
reasons, job status, resource usage, resource usage limits information, runtime
resource usage information on the execution hosts.
Use
bjobs -A -l to display detailed information for job arrays including job array
job limit (
If JOB_IDLE is configured in the queue, use
%job_limit) if set.
bjobs -l to display job idle exception
information.
If you submitted your job with the
with the
brsvadd command, bjobs -l shows the reservation ID used by the job.
If LSF_HPC_EXTENSIONS="SHORT_PIDLIST" is specified in
-U option to use advance reservations created
lsf.conf, the
output from bjobs is shortened to display only the first PID and a count of the
process group IDs (PGIDs) and process IDs for the job. Without SHORT_PIDLIST,
all of the process IDs (PIDs) for a job are displayed.
If you submitted a job with multiple resource requirement strings using the
option for the order, same, rusage, and select sections,
bjobs -l displays a single,
bsub -R
merged resource requirement string for those sections, as if they were submitted
using a single
If you submitted a job using the
this option displays the
For jobs submitted to an absolute priority scheduling (APS) queue,
-R.
OR (||) expression to specify alternative resources,
Execution rusage string with which the job runs.
-l shows the
ADMIN factor value and the system APS value if they have been set by the
administrator for the job:
-p Displays pending jobs, together with the pending reasons that caused each job not
to be dispatched during the last dispatch turn. The pending reason shows the
number of hosts for that reason, or names the hosts if
-l is also specified.
Platform LSF Command Reference 67
Options
With MultiCluster, -l shows the names of hosts in the local cluster.
Each pending reason is associated with one or more hosts and it states the cause
why these hosts are not allocated to run the job. In situations where the job requests
specific hosts (using
bsub -m), users may see reasons for unrelated hosts also being
displayed, together with the reasons associated with the requested hosts.
The life cycle of a pending reason ends after the time indicated by
PEND_REASON_UPDATE_INTERVAL in
lsb.params.
When the job slot limit is reached for a job array
(
bsub -J "jobArray[indexList]%job_slot_limit") the following message is
displayed:
The job array has reached its job slot limit.
-r Displays running jobs.
-s Displays suspended jobs, together with the suspending reason that caused each job
to become suspended.
The suspending reason may not remain the same while the job stays suspended. For
example, a job may have been suspended due to the paging rate, but after the paging
rate dropped another load index could prevent the job from being resumed. The
suspending reason is updated according to the load index. The reasons could be as
old as the time interval specified by SBD_SLEEP_TIME in
lsb.params. So the
reasons shown may not reflect the current load situation.
-W Provides resource usage information for: PROJ_NAME, CPU_USED, MEM,
SWAP, PIDS, START_TIME, FINISH_TIME.
-w Wide format. Displays job information without truncating fields.
-X Displays uncondensed output for host groups.
-x Displays unfinished jobs that have triggered a job exception (overrun, underrun,
idle). Use with the
-l option to show the actual exception status. Use with -a to
display all jobs that have triggered a job exception.
-app application_profile_name
Displays information about jobs submitted to the specified application profile. You
must specify an existing application profile.
-G user_group Only displays jobs associated with a user group submitted with bsub -G for the
specified user group. The
–G option does not display jobs from subgroups within
the specified user group.
-G option cannot be used together with the -u option. You can only specify a
The
user group name. The keyword all is not supported for
-g job_group_name Displays information about jobs attached to the job group specified by
-G.
job_group_name. For example:
bjobs -g /risk_group
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
113 user1 PEND normal hostA myjob Jun 17 16:15
111 user2 RUN normal hostA hostA myjob Jun 14 15:13
110 user1 RUN normal hostB hostA myjob Jun 12 05:03
104 user3 RUN normal hostA hostC myjob Jun 11 13:18
68 Platform LSF Command Reference
Use -g with -sla to display job groups attached to a service class. Once a job group
is attached to a service class, all jobs submitted to that group are subject to the SLA.
bjobs -l with -g displays the full path to the group to which a job is attached. For
example:
bjobs -l -g /risk_group
Job <101>, User <user1>, Project <default>, Job Group </risk_group>, Status <RUN>, Queue
<normal>, Command <myjob>
Tue Jun 17 16:21:49: Submitted from host <hostA>, CWD </home/user1;
Tue Jun 17 16:22:01: Started on <hostA>;
...
-J job_name Displays information about the specified jobs or job arrays. Only displays jobs that
were submitted by the user running this command.
The job name can be up to 4094 characters long for UNIX and Linux or up to 255
characters for Windows.
-Lp ls_project_name Displays jobs that belong to the specified LSF License Scheduler project.
Only displays jobs dispatched to the specified hosts. To see the available hosts, use
bhosts.
If a host group is specified, displays jobs dispatched to all hosts in the group. To
determine the available host groups, use
bmgroup.
With MultiCluster, displays jobs in the specified cluster. If a remote cluster name is
specified, you see the remote job ID, even if the execution host belongs to the local
cluster. To determine the available clusters, use
-N host_name |-Nhost_model |-Ncpu_factor
Displays the normalized CPU time consumed by the job. Normalizes using the
CPU factor specified, or the CPU factor of the host or host model specified.
-P project_name Only displays jobs that belong to the specified project.
-q queue_name Only displays jobs in the specified queue.
The command
bqueues returns a list of queues configured in the system, and
information about the configurations of these queues.
In MultiCluster, you cannot specify remote queues.
-sla service_class_name
Displays jobs belonging to the specified service class.
bjobs also displays information about jobs assigned to a default SLA configured
with ENABLE_DEFAULT_EGO_SLA in
Use
-sla with -g to display job groups attached to a service class. Once a job
group is attached to a service class, all jobs submitted to that group are subject to
the SLA.
Use
bsla to display the configuration properties of service classes configured in
lsb.serviceclasses, the default SLA configured in lsb.params, and dynamic
information about the state of each service class.
bclusters.
lsb.params.
-u user_name... | -u user_group... | -u all
Platform LSF Command Reference 69
Output
job_ID | "job_ID[index]"
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
Output
Only displays jobs that have been submitted by the specified users or user groups.
The keyword
all specifies all users. To specify a Windows user account, include the
domain name in uppercase letters and use a single backslash (DOMAIN_NAME\
user_name) in a Windows command line or a double backslash
(DOMAIN_NAME\\user_name) in a UNIX command line.
The
-u option cannot be used with the -G option.
Displays information about the specified jobs or job arrays.
If you use
-A, specify job array IDs without the index list.
Pending jobs are displayed in the order in which they are considered for dispatch.
Jobs in higher priority queues are displayed before those in lower priority queues.
Pending jobs in the same priority queues are displayed in the order in which they
were submitted but this order can be changed by using the commands
bbot. If more than one job is dispatched to a host, the jobs on that host are listed in
btop or
the order in which they are considered for scheduling on this host by their queue
priorities and dispatch times. Finished jobs are displayed in the order in which they
were completed.
Default Display
A listing of jobs is displayed with the following fields:
JOBID The job ID that LSF assigned to the job.
USERThe user who submitted the job.
STATThe current status of the job (see JOB STATUS below).
QUEUEThe name of the job queue to which the job belongs. If the queue to which the job
belongs has been removed from the configuration, the queue name is displayed as
lost_and_found. Use bhist to get the original queue name. Jobs in the
lost_and_found queue remain pending until they are switched with the bswitch
command into another queue.
In a MultiCluster resource leasing environment, jobs scheduled by the consumer
cluster display the remote queue name in the format queue_name@cluster_name.
By default, this field truncates at 10 characters, so you might not see the cluster
name unless you use
FROM_HOSTThe name of the host from which the job was submitted.
With MultiCluster, if the host is in a remote cluster, the cluster name and remote job
ID are appended to the host name, in the format host_name@cluster_name:job_ID.
By default, this field truncates at 11 characters; you might not see the cluster name
and job ID unless you use
-w or -l.
-w or -l.
70 Platform LSF Command Reference
EXEC_HOSTThe name of one or more hosts on which the job is executing (this field is empty if
the job has not been dispatched). If the host on which the job is running has been
removed from the configuration, the host name is displayed as
Use
bhist to get the original host name.
If the host is part of a condensed host group, the host name is displayed as the name
of the condensed host group.
If you configure a host to belong to more than one condensed host groups using
wildcards,
bjobs can display any of the host groups as execution host name.
lost_and_found.
JOB_NAME The job name assigned by the user, or the command string assigned by default at
job submission with
the latter part of the job name is displayed.
The displayed job name or job command can contain up to 4094 characters for
UNIX, or up to 255 characters for Windows.
bsub. If the job name is too long to fit in this field, then only
SUBMIT_TIME The submission time of the job.
-l output
The -l option displays a long format listing with the following additional fields:
ProjectThe project the job was submitted from.
Application ProfileThe application profile the job was submitted to.
Command The job command.
CWD The current working directory on the submission host.
Initial checkpoint
period
The initial checkpoint period specified at the job level, by bsub -k, or in an
application profile with CHKPNT_INITPERIOD.
Checkpoint periodThe checkpoint period specified at the job level, by bsub -k, in the queue with
CHKPNT, or in an application profile with CHKPNT_PERIOD.
Checkpoint
directory
Migration
The checkpoint directory specified at the job level, by bsub -k, in the queue with
CHKPNT, or in an application profile with CHKPNT_DIR.
The migration threshold specified at the job level, by bsub -mig.
threshold
Post-execute
The post-execution command specified at the job-level, by bsub -Ep.
Command
PENDING REASONS The reason the job is in the PEND or PSUSP state. The names of the hosts
associated with each reason are displayed when both
specified.
-p and -l options are
SUSPENDING REASONS
The reason the job is in the USUSP or SSUSP state.
Platform LSF Command Reference 71
Output
loadSched
The load scheduling thresholds for the job.
loadStop
The load suspending thresholds for the job.
JOB STATUSPossible values for the status of a job include:
PEND
The job is pending, that is, it has not yet been started.
PSUSP
The job has been suspended, either by its owner or the LSF administrator, while
pending.
RUN
The job is currently running.
USUSP
The job has been suspended, either by its owner or the LSF administrator, while
running.
SSUSP
The job has been suspended by LSF. The job has been suspended by LSF due to
either of the following two causes:
◆The load conditions on the execution host or hosts have exceeded a threshold
according to the
◆The run window of the job’s queue is closed. See bqueues(1), bhosts(1), and
lsb.queues(5).
loadStop vector defined for the host or queue.
DONE
The job has terminated with status of 0.
EXIT
The job has terminated with a non-zero status – it may have been aborted due to an
error in its execution, or killed by its owner or the LSF administrator.
For example, exit code 131 means that the job exceeded a configured resource usage
limit and LSF killed the job.
UNKWN
mbatchd has lost contact with the sbatchd on the host on which the job runs.
WAI T
For jobs submitted to a chunk job queue, members of a chunk job that are waiting
to run.
ZOMBI
A job becomes ZOMBI if:
◆A non-rerunnable job is killed by bkill while the sbatchd on the execution
host is unreachable and the job is shown as UNKWN.
72 Platform LSF Command Reference
◆The host on which a rerunnable job is running is unavailable and the job has
been requeued by LSF with a new job ID, as if the job were submitted as a new
job.
◆After the execution host becomes available, LSF tries to kill the ZOMBI job.
Upon successful termination of the ZOMBI job, the job’s status is changed to
EXIT.
With MultiCluster, when a job running on a remote execution cluster becomes
a ZOMBI job, the execution cluster treats the job the same way as local ZOMBI
jobs. In addition, it notifies the submission cluster that the job is in ZOMBI
state and the submission cluster requeues the job.
RUNTIMEEstimated run time for the job, specified by bsub -We or bmod -We.
RESOURCE USAGE For the MultiCluster job forwarding model, this information is not shown if
MultiCluster resource usage updating is disabled.
The values for the current usage of a job include:
CPU time
Cumulative total CPU time in seconds of all processes in a job.
IDLE_FACTOR
Job idle information (CPU time/runtime) if JOB_IDLE is configured in the queue,
and the job has triggered an idle exception.
MEM
Total resident memory usage of all processes in a job. By default, memory usage is
shown in MB. Use LSF_UNIT_FOR_LIMITS in
lsf.conf to specify a larger unit
for display (MB, GB, TB, PB, or EB).
SWAP
Total virtual memory usage of all processes in a job. By default, swap space is shown
in MB. Use LSF_UNIT_FOR_LIMITS in
lsf.conf to specify a larger unit for
display (MB, GB, TB, PB, or EB).
NTHREAD
Number of currently active threads of a job.
PGID
Currently active process group ID in a job.
PIDs
Currently active processes in a job.
RESOURCE LIMITSThe hard resource usage limits that are imposed on the jobs in the queue (see
getrlimit(2) and lsb.queues(5)). These limits are imposed on a per-job and a
per-process basis.
The possible per-job resource usage limits are:
◆CPULIMIT
◆PROCLIMIT
◆MEMLIMIT
Platform LSF Command Reference 73
Output
◆SWAPLIMIT
◆PROCESSLIMIT
◆THREADLIMIT
◆OPENFILELIMIT
The possible UNIX per-process resource usage limits are:
◆RUNLIMIT
◆FILELIMIT
◆DATALIMIT
◆STACKLIMIT
◆CORELIMIT
If a job submitted to the queue has any of these limits specified (see
the lower of the corresponding job limits and queue limits are used for the job.
If no resource limit is specified, the resource is assumed to be unlimited. User shell
limits that are unlimited are not displayed.
EXCEPTION STATUSPossible values for the exception status of a job include:
idle
The job is consuming less CPU time than expected. The job idle factor
(CPU time/runtime) is less than the configured JOB_IDLE threshold for the queue
and a job exception has been triggered.
overrun
The job is running longer than the number of minutes specified by the
JOB_OVERRUN threshold for the queue and a job exception has been triggered.
underrun
The job finished sooner than the number of minutes specified by the
JOB_UNDERRUN threshold for the queue and a job exception has been triggered.
Job Array Summary Information
If you use -A, displays summary information about job arrays. The following fields
are displayed:
bsub(1)), then
JOBID Job ID of the job array.
ARRAY_SPEC Array specification in the format of name[index]. The array specification may be
truncated, use
-w option together with -A to show the full array specification.
OWNER Owner of the job array.
NJOBS Number of jobs in the job array.
PEND Number of pending jobs of the job array.
RUNNumber of running jobs of the job array.
DONE Number of successfully completed jobs of the job array.
74 Platform LSF Command Reference
EXIT Number of unsuccessfully completed jobs of the job array.
SSUSP Number of LSF system suspended jobs of the job array.
USUSP Number of user suspended jobs of the job array.
PSUSP Number of held jobs of the job array.
Examples
bjobs -pl
Displays detailed information about all pending jobs of the invoker.
bjobs -ps
Display only pending and suspended jobs.
bjobs -u all -a
Displays all jobs of all users.
bjobs -d -q short -m hostA -u user1
Displays all the recently finished jobs submitted by user1 to the queue short, and
executed on the host
bjobs 101 102 203 509
Display jobs with job_ID 101, 102, 203, and 509.
bjobs -X 101 102 203 509
hostA.
See also
Display jobs with job ID 101, 102, 203, and 509 as uncondensed output even if these
jobs belong to hosts in condensed host groups.
bjobs -sla Uclulet
Displays all jobs belonging to the service class Uclulet.
bjobs -app fluent
Displays all jobs belonging to the application profile fluent.
By default, sends a set of signals to kill the specified jobs. On UNIX, SIGINT and
SIGTERM are sent to give the job a chance to clean up before termination, then
SIGKILL is sent to kill the job. The time interval between sending each signal is
defined by the JOB_TERMINATE_INTERVAL parameter in
lsb.params(5).
PEND
RUN
By default, kills the last job submitted by the user running the command. You must
specify a job ID or
-q without a job ID, bkill kills the last job submitted by the user running the
command. Specify job ID
-app, -g, -J, -m, -u, or -q. If you specify -app, -g, -J, -m, -u, or
0 (zero) to kill multiple jobs.
On Windows, job control messages replace the SIGINT and SIGTERM signals (but
only customized applications can process them) and the
TerminateProcess()
system call is sent to kill the job.
Exit code 130 is returned when a dispatched job is killed with
Only
root and LSF administrators can run bkill -r. The -r option is ignored for
bkill.
other users.
Users can only operate on their own jobs. Only
root and LSF administrators can
operate on jobs submitted by other users.
If a signal request fails to reach the job execution host, LSF tries the operation later
when the host becomes reachable. LSF retries the most recent signal request.
If a job is running in a queue with CHUNK_JOB_SIZE set,
bkill has the following
results depending on job state:
Job is removed from chunk (NJOBS -1, PEND -1)
All jobs in the chunk are suspended (NRUN -1, NSUSP +1)
76 Platform LSF Command Reference
USUSP
WAIT
Job finishes, next job in the chunk starts if one exists (NJOBS -1, PEND -1, SUSP
-1, RUN +1)
Job finishes (NJOBS-1, PEND -1)
Options
If the job cannot be killed, use
bkill -r to remove the job from the LSF system
without waiting for the job to terminate, and free the resources of the job.
0 Kills all the jobs that satisfy other options (-app. -g, -m, -q, -u, and -J).
-b Kills large numbers of jobs as soon as possible. Local pending jobs are killed
immediately and cleaned up as soon as possible, ignoring the time interval specified
by CLEAN_PERIOD in
lsb.acct.
lsb.params. Jobs killed in this manner are not logged to
Other jobs, such as running jobs, are killed as soon as possible and cleaned up
normally.
If the
-b option is used with the 0 subcommand, bkill kills all applicable jobs and
silently skips the jobs that cannot be killed.
bkill -b 0
Operation is in progress
The -b option is ignored if used with the -r or -s options.
-l Displays the signal names supported by bkill. This is a subset of signals supported
by
/bin/kill and is platform-dependent.
-r Removes a job from the LSF system without waiting for the job to terminate in the
operating system.
Only
root and LSF administrators can run bkill -r. The -r option is ignored for
other users.
Sends the same series of signals as
bkill without -r, except that the job is removed
from the system immediately, the job is marked as EXIT, and the job resources that
LSF monitors are released as soon as LSF receives the first signal.
Also operates on jobs for which a
cannot be reached to be acted on by
bkill command has been issued but which
sbatchd (jobs in ZOMBI state). If sbatchd
recovers before the jobs are completely removed, LSF ignores the zombi jobs killed
with
bkill -r.
Use
bkill -r only on jobs that cannot be killed in the operating system, or on jobs
that cannot be otherwise removed using
The
-app application_profile_name
-r option cannot be used with the -s option.
bkill.
Operates only on jobs associated with the specified application profile. You must
specify an existing application profile. If job_ID or 0 is not specified, only the most
recently submitted qualifying job is operated on.
-g job_group_name Operates only on jobs in the job group specified by job_group_name.
Platform LSF Command Reference 77
Options
Use -g with -sla to kill jobs in job groups attached to a service class.
bkill does not kill jobs in lower level job groups in the path. For example, jobs are
attached to job groups
bsub -g /risk_group myjob
Job <115> is submitted to default queue <normal>.
bsub -g /risk_group/consolidate myjob2
Job <116> is submitted to default queue <normal>.
The following bkill command only kills jobs in /risk_group, not the subgroup
/risk_group/consolidate:
bkill -g /risk_group 0
Job <115> is being terminated
bkill -g /risk_group/consolidate 0
Job <116> is being terminated
-J job_name Operates only on jobs with the specified job name. The -J option is ignored if a job
ID other than 0 is specified in the job_ID option.
-m host_name | -m host_group
Operates only on jobs dispatched to the specified host or host group.
If job_ID is not specified, only the most recently submitted qualifying job is
operated on. The
job_ID option. See
and host groups.
/risk_group and /risk_group/consolidate:
-m option is ignored if a job ID other than 0 is specified in the
bhosts(1) and bmgroup(1) for more information about hosts
-q queue_name Operates only on jobs in the specified queue.
If job_ID is not specified, only the most recently submitted qualifying job is
operated on.
The
See
-s signal_value | signal_name
Sends the specified signal to specified jobs. You can specify either a name, stripped
of the SIG prefix (such as KILL), or a number (such as 9).
Eligible UNIX signal names are listed by
The
Use
of using
bresume.
Sending the SIGSTOP signal to sequential jobs or the SIGTSTP to parallel jobs is
the same as using
You cannot suspend a job that is already suspended, or resume a job that is not
suspended. Using SIGSTOP or SIGTSTP on a job that is in the USUSP state has no
effect and using SIGCONT on a job that is not in either the PSUSP or the USUSP
state has no effect. See
-q option is ignored if a job ID other than 0 is specified in the job_ID option.
bqueues(1) for more information about queues.
bkill -l.
-s option cannot be used with the -r option.
bkill -s to suspend and resume jobs by using the appropriate signal instead
bstop or bresume. Sending the SIGCONT signal is the same as using
bstop.
bjobs(1) for more information about job states.
-sla service_class_name
Operates on jobs belonging to the specified service class.
If job_ID is not specified, only the most recently submitted job is operated on.
78 Platform LSF Command Reference
Use -sla with -g to kill jobs in job groups attached to a service class.
The
-sla option is ignored if a job ID other than 0 is specified in the job_ID option.
bsla to display the configuration properties of service classes configured in
Use
lsb.serviceclasses, the default SLA configured with
ENABLE_DEFAULT_EGO_SLA in
the state of each service class.
-u user_name | -u user_group | -u all
Operates only on jobs submitted by the specified user or user group, or by all users
if the reserved user name
include the domain name in uppercase letters and use a single backslash
(DOMAIN_NAME\user_name) in a Windows command line or a double backslash
(DOMAIN_NAME\\user_name) in a UNIX command line.
If job_ID is not specified, only the most recently submitted qualifying job is
operated on. The
job_ID option.
job_ID ... | 0 | "job_ID[index]" ...
Operates only on jobs that are specified by job_ID or "job_ID[index]", where
"job_ID[index]" specifies selected job array elements (see
quotation marks must enclose the job ID and index, and index must be enclosed in
square brackets.
lsb.params, and dynamic information about
all is specified. To specify a Windows user account,
-u option is ignored if a job ID other than 0 is specified in the
bjobs(1)). For job arrays,
Examples
Jobs submitted by any user can be specified here without using the
-u option. If you
use the reserved job ID 0, all the jobs that satisfy other options (that is,
and
-J) are operated on; all other job IDs are ignored.
The options
IDs are returned at job submission time (see
the
bjobs command (see bjobs(1)).
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
bkill -s 17 -q night
-u, -q, -m and -J have no effect if a job ID other than 0 is specified. Job
bsub(1)) and may be obtained with
Sends signal 17 to the last job that was submitted by the invoker to queue night.
bkill -q short -u all 0
Kills all the jobs that are in the queue short.
bkill -r 1045
Forces the removal of unkillable job 1045.
bkill -sla Tofino 0
Kill all jobs belonging to the service class named Tofino.
bkill -g /risk_group 0
Kills all jobs in the job group /risk_group.
bkill -app fluent
-m, -q, -u
Platform LSF Command Reference 79
See also
See also
Kills the most recently submitted job associated with the application profile fluent
for the current user.
bkill -app fluent 0
Kills all jobs associated with the application profile fluent for the current user.
Sets the message log level for bld to include additional information in log files. You
must be
If the
root or the LSF administrator to use this command.
bladmin blddebug is used without any options, the following default values
are used:
◆class_name=0 (no additional classes are logged)
◆debug_level=0 (LOG_DEBUG level in parameter LS_LOG_MASK)
◆logfile_name=current LSF system log file in the LSF system log file directory, in
the format daemon_name
.log.host_name
-c class_name ...
Specifies software classes for which debug messages are to be logged.
Format of class_name is the name of a class, or a list of class names separated by
spaces and enclosed in quotation marks. Classes are also listed in
lsf.h.
Valid log classes:
◆LC_AUTH - Log authentication messages
◆LC_COMM - Log communication messages
◆LC_FLEX - Log everything related to FLEX_STAT or FLEX_EXEC
Macrovision APIs
◆LC_LICENCE - Log license management messages
◆LC_PREEMPT - Log preemption policy messages
◆LC_TRACE - Log significant program walk steps
◆LC_XDR - Log everything transferred by XDR
Default: 0 (no additional classes are logged)
-l debug_level
Specifies level of detail in debug messages. The higher the number, the more detail
that is logged. Higher levels include all lower levels.
82 Platform LSF Command Reference
Possible values:
0 LOG_DEBUG level in parameter LS_LOG_MASK in
lsf.conf.
1 LOG_DEBUG1 level for extended logging. A higher level includes lower logging
levels. For example, LOG_DEBUG3 includes LOG_DEBUG2 LOG_DEBUG1, and
LOG_DEBUG levels.
2 LOG_DEBUG2 level for extended logging. A higher level includes lower logging
levels. For example, LOG_DEBUG3 includes LOG_DEBUG2 LOG_DEBUG1, and
LOG_DEBUG levels.
3 LOG_DEBUG3 level for extended logging. A higher level includes lower logging
levels. For example, LOG_DEBUG3 includes LOG_DEBUG2, LOG_DEBUG1, and
LOG_DEBUG levels.
Default: 0 (LOG_DEBUG level in parameter LS_LOG_MASK)
-f logfile_name
Specifies the name of the file where debugging messages are logged. The file name
can be a full path. If a file name without a path is specified, the file is saved in the
LSF system log directory.
The name of the file has the following format:
logfile_name.daemon_name.
On UNIX, if the specified path is not valid, the log file is created in the
log.host_name
/tmp
directory.
On Windows, if the specified path is not valid, no log file is created.
Default: current LSF system log file in the LSF system log file directory.
-o
Turns off temporary debug settings and resets them to the daemon starting state.
The message log level is reset back to the value of LS_LOG_MASK and classes are
reset to the value of LSB_DEBUG_BLD. The log file is also reset back to the default
log file.
blcdebug [-l debug_level] [-f logfile_name] [-o] collector_name | all
Sets the message log level for blcollect to include additional information in log
files. You must be
If the
bladmin blcdebug is used without any options, the following default values
root or the LSF administrator to use this command.
are used:
◆debug_level=0 (LOG_DEBUG level in parameter LS_LOG_MASK)
◆logfile_name=current LSF system log file in the LSF system log file directory, in
the format daemon_name
◆collector_name=default
-l debug_level
.log.host_name
Specifies level of detail in debug messages. The higher the number, the more detail
that is logged. Higher levels include all lower levels.
Possible values:
0 LOG_DEBUG level in parameter LS_LOG_MASK in
lsf.conf.
Platform LSF Command Reference 83
See also
1 LOG_DEBUG1 level for extended logging. A higher level includes lower logging
levels. For example, LOG_DEBUG3 includes LOG_DEBUG2 LOG_DEBUG1, and
LOG_DEBUG levels.
2 LOG_DEBUG2 level for extended logging. A higher level includes lower logging
levels. For example, LOG_DEBUG3 includes LOG_DEBUG2 LOG_DEBUG1, and
LOG_DEBUG levels.
3 LOG_DEBUG3 level for extended logging. A higher level includes lower logging
levels. For example, LOG_DEBUG3 includes LOG_DEBUG2, LOG_DEBUG1, and
LOG_DEBUG levels.
Default: 0 (LOG_DEBUG level in parameter LS_LOG_MASK)
-f logfile_name
Specifies the name of the file where debugging messages are logged. The file name
can be a full path. If a file name without a path is specified, the file is saved in the
LSF system log directory.
The name of the file has the following format:
See also
logfile_name.daemon_name.
On UNIX, if the specified path is not valid, the log file is created in the
log.host_name
/tmp
directory.
On Windows, if the specified path is not valid, no log file is created.
Default: current LSF system log file in the LSF system log file directory.
-o
Turns off temporary debug settings and resets them to the daemon starting state.
The message log level is reset back to the value of LS_LOG_MASK and classes are
reset to the value of LSB_DEBUG_BLD. The log file is also reset back to the default
log file.
If a collector name is not specified, default value is to restore the original log mask
and log file directory for the
collector_name ... | all
default collector.
Specifies the collector names separated by blanks. all means all the collectors.
IMPORTANT: You cannot run blaunch directly from the command line.
RESTRICTION: The command blaunch does not work with user account mapping. Do not run
blaunch on a user account mapping host.
Most MPI implementations and many distributed applications use rsh and ssh as
their task launching mechanism. The
replacement for
rsh and ssh as a transparent method for launching parallel
blaunch command provides a drop-in
applications within LSF.
blaunch supports the following core command line options as rsh and ssh:
◆rsh host_name command
Options
-u host_file Executes the task on all hosts listed in the host_file.
host_name The name of the host where remote tasks are to be launched.
-z host_name ... Executes the task on all specified hosts.
◆ssh host_name command
All other
blaunch transparently connects directly to the RES/SBD on the remote host, and
rsh and ssh options are silently ignored.
subsequently creates and tracks the remote tasks, and provides the connection back
to LSF. You do not need to insert
blaunch only works under LSF. It can only be used to launch tasks on remote hosts
pam, taskstarter or any other wrapper.
that are part of a job allocation. It cannot be used as a standalone command.
blaunch is not supported on Windows.
When no host names are specified, LSF allocates all hosts listed in the environment
variable LSB_MCPU_HOSTS.
-n Standard input is taken from /dev/null.
Specify the path to a file that contains a list of host names. Each host name must
listed on a separator line in the host list file.
This option is exclusive of the
-z option.
Platform LSF Command Reference 85
Diagnostics
Whereas the host name value for rsh and ssh is a single host name, you can use the
-z option to specify a space-delimited list of hosts where tasks are started in
parallel.
Specify a list of hosts on which to execute the task. If multiple host names are
specified, the host names must be enclosed by quotation marks (
separated by white space.
" or ') and
command [argument ...]
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
Diagnostics
See also
This option is exclusive of the
Specify the command to execute. This must be the last argument on the command
line.
Exit status is 0 if all commands are executed correctly.
lsb_getalloc(3), lsb_launch(3)
-u option.
86 Platform LSF Command Reference
blcollect
Synopsis
Description
license information collection daemon that collects license usage information
Periodically collects license usage information from Macrovision FLEXnet. It
queries FLEXnet for license usage information from the FLEXnet
command, and passes the information to the License Scheduler daemon (
blcollect daemon improves performance by allowing you to distribute license
information queries on multiple hosts.
By default, license information is collected from FLEXnet on one host. Use
blcollect to distribute the license collection on multiple hosts.
lmstat
bld). The
For each service domain configuration in
name for
but you can specify one collector to serve multiple service domains. You can choose
any collector name you want, but must use that exact name when you run
blcollect.
blcollect to use. You can only specify one collector per service domain,
lsf.licensescheduler, specify one
Options
-c Required. Specify the collector name you set in lsf.licensescheduler. You must
use the collector name (
the configuration file.
-m Required. Specifies a space-separated list of hosts to which license information is
sent. The hosts do not need to be running License Scheduler or a FLEXnet. Use fully
qualified host names.
-p Required. You must specify the License Scheduler listening port, which is set in
lsf.licensescheduler and has a default value of 9581.
-i lmstat_interval Optional. The frequency in seconds of the calls that License Scheduler makes to
lmstat to collect license usage information from FLEXnet.
The default interval is 60 seconds.
-D lmstat_path Optional. Location of the FLEXnet command lmstat.
-h Prints command usage to stderr and exits.
LIC_COLLECT) you define in the ServiceDomain section of
See also
-V Prints release version to stderr and exits.
lsf.licensescheduler
Platform LSF Command Reference 87
blhosts
blhosts
Synopsis
Description
Options
Output
displays the names of all the hosts running the License Scheduler daemon (bld)
blhosts [-h | -V]
Displays a list of hosts running the License Scheduler daemon. This includes the
License Scheduler master host and all the candidate License Scheduler hosts
running
-h Prints command usage to stderr and exits.
-V Prints release version to stderr and exits.
bld.
See also
Prints out the names of all the hosts running the License Scheduler daemon (bld).
For example, the following sample output shows the License Scheduler master host
and two candidate License Scheduler hosts running
bld is running on:
master: host1.domain1.com
slave: host2.domain1 host3.domain1
blinfo, blstat, bladmin
bld:
88 Platform LSF Command Reference
blimits
Synopsis
Description
displays information about resource allocation limits of running jobs
Displays current usage of resource allocation limits configured in Limit sections in
lsb.resources:
◆Configured limit policy name
◆Users (-u option)
◆Queues (-q option)
◆Hosts (-m option)
◆Project names (-P option)
◆Limits (SLOTS, MEM, TMP, SWP, JOBS)
◆Limit configuration (-c option). This is the same as bresources with no
options.
Resources that have no configured limits or no limit usage are indicated by a dash
(-). Limits are displayed in a USED/LIMIT format. For example, if a limit of 10 slots
is configured and 3 slots are in use, then
blimits displays the limit for SLOTS as
3/10.
Note that if there are no jobs running against resource allocation limits, LSF
indicates that there is no information to be displayed:
No resource usage found.
If limits MEM, SWP, or TMP are configured as percentages, both the limit and the
amount used are displayed in MB. For example,
lshosts displays maxmem of 249
MB, and MEM is limited to 10% of available memory. If 10 MB out of 25 MB are
used,
blimits displays the limit for MEM as 10/25 (10 MB USED from a 25 MB
LIMIT).
Limits are displayed for both the vertical tabular format and the horizontal format
for Limit sections. If a vertical format Limit section has no name,
blimits displays
NONAMEnnn under the NAME column for these limits, where the unnamed
limits are numbered in the order the vertical-format Limit sections appear in the
lsb.resources file.
If a resource consumer is configured as
all, the limit usage for that consumer is
indicated by a dash (-)
Platform LSF Command Reference 89
Options
PER_HOST slot limits are not displayed. The bhosts commands displays these as
MXJ limits.
In MultiCluster,
blimits returns the information about all limits in the local
cluster.
Limit names and policies are set up by the LSF administrator. See
lsb.resources(5) for more information.
Options
-c Displays all resource configurations in lsb.resources. This is the same as
bresources with no options.
-w Displays resource allocation limits information in a wide format. Fields are
displayed without truncation.
-n limit_name ... Displays resource allocation limits the specified named Limit sections. If a list of
limit sections is specified, Limit section names must be separated by spaces and
enclosed in quotation marks (") or (’).
Displays resource allocation limits for the specified hosts. Do not use quotes when
specifying multiple hosts.
To see the available hosts, use
For host groups:
◆If the limits are configured with HOSTS, the name of the host group is
displayed.
bhosts.
◆If the limits are configured with PER_HOST, the names of the hosts belonging
to the group are displayed instead of the name of the host group.
TIP: PER_HOST slot limits are not displayed. The bhosts command displays these as MXJ limits.
For a list of host groups see bmgroup(1).
In MultiCluster, if a cluster name is specified, displays resource allocation limits in
the specified cluster.
-P project_name ... Displays resource allocation limits for the specified projects.
If a list of projects is specified, project names must be separated by spaces and
enclosed in quotation marks (") or (’).
-q queue_name ... Displays resource allocation limits for the specified queues.
The command
bqueues returns a list of queues configured in the system, and
information about the configurations of these queues.
In MultiCluster, you cannot specify remote queues.
-u user_name | -u user_group ...
Displays resource allocation limits for the specified users.
If a list of users is specified, user names must be separated by spaces and enclosed
in quotation marks (") or (’). You can specify both user names and user IDs in the
list of users.
90 Platform LSF Command Reference
If a user group is specified, displays the resource allocation limits that include that
group in their configuration. For a list of user groups see
-h Prints command usage to stderr and exits.
-V Prints LSF release version to stderr and exits.
bugroup(1)).
Output
Configured limits and resource usage for built-in resources (slots, mem, tmp, and
swp load indices, and running and suspended job limits) are displayed as
INTERNAL RESOURCE LIMITS separately from custom external resources,
which are shown as EXTERNAL RESOURCE LIMITS.
Resource Consumers
blimits displays the following fields for resource consumers:
NAMEThe name of the limit policy as specified by the Limit section NAME parameter.
USERSList of user names or user groups on which the displayed limits are enforced, as
specified by the Limit section parameters USERS or PER_USER.
User group names have a slash (/) added at the end of the group name. See
bugroup(1).
QUEUESThe name of the queue to which the limits apply, as specified by the Limit section
parameters QUEUES or PER_QUEUES.
If the queue has been removed from the configuration, the queue name is displayed
as
lost_and_found. Use bhist to get the original queue name. Jobs in the
lost_and_found queue remain pending until they are switched with the bswitch
command into another queue.
In a MultiCluster resource leasing environment, jobs scheduled by the consumer
cluster display the remote queue name in the format queue_name@cluster_name.
By default, this field truncates at 10 characters, so you might not see the cluster
name unless you use
-w or -l.
HOSTSList of hosts and host groups on which the displayed limits are enforced, as specified
by the Limit section parameters HOSTS or PER_HOSTS.
Host group names have a slash (/) added at the end of the group name. See
bmgroup(1).
TIP: PER_HOST slot limits are not displayed. The bhosts command displays these as MXJ limits.
PROJECTSList of project names on which limits are enforced., as specified by the Limit section
parameters PROJECTS or PER_PROJECT.
Resource Limits
blimits displays resource allocation limits for the following resources:
SLOTSNumber of slots currently used and maximum number of slots configured for the
limit policy, as specified by the Limit section SLOTS parameter.
Platform LSF Command Reference 91
Example
MEMAmount of memory currently used and maximum configured for the limit policy,
as specified by the Limit section MEM parameter.
TMPAmount of tmp space currently used and maximum amount of tmp space
configured for the limit policy, as specified by the Limit section TMP parameter.
SWPAmount of swap space currently used and maximum amount of swap space
configured for the limit policy, as specified by the Limit section SWP parameter.
JOBSNumber of currently running and suspended jobs and the maximum number of
jobs configured for the limit policy, as specified by the Limit section JOBS
parameter.
Example
The following command displays limit configuration and dynamic usage
information for project
blimits -P proj1
INTERNAL RESOURCE LIMITS:
NAME USERS QUEUES HOSTS PROJECTS SLOTS MEM TMP SWP JOBS
limit1 user1 - hostA proj1 2/6 - - - NONAME022 - - hostB proj1 proj2 1/3 - - - -
Displays different license configuration information, depending on the option
selected.
By default, displays information about the distribution of licenses managed by
License Scheduler.
Options
-A When LOCAL_TO is configured for a feature in lsf.licensescheduler, shows
the feature allocation by cluster locality.
You can optionally provide license token names.
-a Shows all information, including information about non-shared licenses
(NON_SHARED_DISTRIBUTION) and workload distribution
(WORKLOAD_DISTRIBUTION).
You can optionally provide license token names.
blinfo -a does not display NON_SHARED information for hierarchical project
group scheduling policies. Use
-C When LOCAL_TO is configured for a feature in lsf.licensescheduler, shows
the cluster locality information for the features.
You can optionally provide license token names.
-D Lists the License Scheduler service domains and the corresponding FLEXnet
license server hosts.
-G Lists the hierarchical configuration information.
If PRIORITY is defined in the
this option also shows the priorities of each project.
blinfo -G to see hierarchical group configuration.
ProjectGroup Section of lsf.licensescheduler,
Platform LSF Command Reference 93
Output
-g feature_group ...
When FEATURE_GROUP is configured for a group of license features in
lsf.licensescheduler, shows only information about the features configured in
the FEATURE_LIST of specified feature groups. You can specify more than one
feature group at one time.
When you specify feature names with
-t, features in the feature list defined by -t
and feature groups are both displayed.
Feature groups listed with
-g but not defined in lsf.licensescheduler are
ignored.
-Lp Lists the active projects managed by License Scheduler.
-Lp only displays projects associated with configured features.
If PRIORITY is defined in the
Projects Section of lsf.licensescheduler, this
option also lists the priorities of each project.
-o alpha | total Sorts license feature information alphabetically, by total licenses, or by available
licenses.
◆alpha: Features are listed in descending alphabetical order.
◆total: Features are sorted by the descending order of the sum of licenses that are
allocated to LSF workload from all the service domains configured to supply
licenses to the feature. Licenses borrowed by non-LSF workload are not
included in this amount.
-P When LS_FEATURE_PERCENTAGE=Y, lists the license ownership in percentage.
-p Displays values of lsf.licensescheduler configuration parameters and
lsf.conf parameters related to License Scheduler. This is useful for
troubleshooting.
-t token_name |"token_name ..."
Only shows information about specified license tokens. Use spaces to separate
multiple names, and enclose them in quotation marks.
-h Prints command usage to stderr and exits.
-V Prints the License Scheduler release version to stderr and exits.
Output
Default output
Displays the following fields:
FEATUREThe license name. This becomes the license token name.
When LOCAL_TO is configured for a feature in
shows the cluster locality information for the license features.
SERVICE_DOMAINThe name of the service domain that provided the license.
TOTALThe total number of licenses managed by FLEXnet. This number comes from
FLEXnet.
lsf.licensescheduler, blinfo
94 Platform LSF Command Reference
DISTRIBUTIONThe distribution of the licenses among license projects in the format [project_name,
percentage[
project is entitled to use when there is competition for licenses. The percentage is
calculated from the share specified in the configuration file.
/number_licenses_owned]]. This determines how many licenses a
Allocation output (-A)
FEATUREThe license name. This becomes the license token name.
When LOCAL_TO is configured for a feature in
shows the cluster locality information for the license features.
PROJECTThe License Scheduler project name.
ALLOCATION
The percentage of shares assigned to each cluster for a feature and a project.
All output (-a)
Same as Default Output with NON_SHARED_DISTRIBUTION.
NON-SHARED_DISTRIBUTION
This column is displayed directly under DISTRIBUTION with the -a option. If there
are non-shared licenses, then the non-shared license information is output in the
following format: [project_name, number_licenses_non_shared]
If there are no non-shared licenses, then the following license information is output
- (dash)
Cluster locality output (-C)
NAMEThe license feature token name.
When LOCAL_TO is configured for a feature in
shows the cluster locality information for the license features.
lsf.licensescheduler, blinfo
lsf.licensescheduler, blinfo
FLEX_NAMEThe actual FLEXnet feature name—the name used by FLEXnet to identify the type
of license. May be different from the License Scheduler token name if a different
FLEX_NAME is specified in
lsf.licensescheduler.
CLUSTER_NAME The name of the cluster the feature is assigned to.
FEATUREThe license feature name. This becomes the license token name.
When LOCAL_TO is configured for a feature in
shows the cluster locality information for the license features.
lsf.licensescheduler, blinfo
SERVICE_DOMAINThe service domain name.
Service Domain Output (-D)
SERVICE_DOMAINThe service domain name.
Platform LSF Command Reference 95
Output
LIC_SERVERSNames of FLEXnet license server hosts that belong the to service domain. Each host
name is enclosed in parentheses, as shown:
(port_number@host_name)
Redundant hosts (that share the same FLEXnet license file) are grouped together as
shown: