5-5Submitting a Non-MPI Parallel Job...............................................................................................56
5-6Submitting a Non-MPI Parallel Job to Run One Task per Node...................................................56
5-7Submitting an MPI Job..................................................................................................................57
5-8Submitting an MPI Job with the LSF-SLURM External Scheduler Option...................................57
5-9Using the External Scheduler to Submit a Job to Run on Specific Nodes.....................................59
5-10Using the External Scheduler to Submit a Job to Run One Task per Node..................................59
5-11Using the External Scheduler to Submit a Job That Excludes One or More Nodes.....................59
5-12Using the External Scheduler to Launch a Command in Parallel on Ten Nodes.........................59
5-13Using the External Scheduler to Constrain Launching to Nodes with a Given Feature..............60
5-14Submitting a Job Script..................................................................................................................60
5-15Submitting a Batch Script with the LSF-SLURM External Scheduler Option...............................61
5-16Submitting a Batch Job Script That Uses a Subset of the Allocation.............................................61
5-17Submitting a Batch job Script That Uses the srun --overcommit Option......................................61
5-18Environment Variables Available in a Batch Job Script.................................................................62
8-1The vtjacobic Example Program....................................................................................................86
8-2C Example – Running the vtjacobic Example Program................................................................86
9-1Simple Launch of a Serial Program...............................................................................................92
9-2Displaying Queued Jobs by Their JobIDs.....................................................................................93
9-3Reporting on Failed Jobs in the Queue.........................................................................................93
9-4Terminating a Job by Its JobID......................................................................................................93
9-5Cancelling All Pending Jobs..........................................................................................................93
9-6Sending a Signal to a Job...............................................................................................................93
9-7Using the sinfo Command (No Options)......................................................................................93
9-8Reporting Reasons for Downed, Drained, and Draining Nodes..................................................94
10-1Examples of LSF-HPC Job Launch................................................................................................97
10-2Examples of Launching LSF-HPC Jobs Without the srun Command..........................................98
10-3Job Allocation Information for a Running Job............................................................................108
10-4Job Allocation Information for a Finished Job.............................................................................108
10-5Using the bjobs Command (Short Output).................................................................................109
10-6Using the bjobs Command (Long Output)..................................................................................109
10-7Using the bhist Command (Short Output)..................................................................................109
10-8Using the bhist Command (Long Output)..................................................................................110
10-9Launching an Interactive MPI Job...............................................................................................112
10-10Launching an Interactive MPI Job on All Cores in the Allocation..............................................113
13
14
About This Document
This document provides information about using the features and functions of the HP XC System
Software. It describes how the HP XC user and programmingenvironments differ from standard
Linux® system environments. In addition, this manual focuses on building and running
applications in the HP XC environment and is intended to guide an application developer to
take maximum advantage of HP XC features and functions by providing an understanding of
the underlying mechanisms of the HP XC programming environment.
An HP XC system is integratedwith several open source software components. Some open source
software components are being used for underlying technology, and their deployment is
transparent. Some open source software components require user-level documentation specific
to HP XC systems, and that kind of information is included in this document, when required.
HP relies on the documentation providedby the open source developersto supply the information
you need to use their product. For links to open source software documentation for products
that are integrated with the HP XC system, see “Supplementary Software Products” (page 18).
Documentation for third-party hardware and software components that are supported on the
HP XC system is supplied by the third-party vendor. However, information about the operation
of third-party software is included in this document if the functionality of the third-party
component differs from standard behavior when used in the XC environment. In this case, HP
XC documentation supersedes information supplied by the third-party vendor. For links to
related third-party Web sites, see “Supplementary Software Products” (page 18).
Standard Linux® administrative tasks or the functions provided by standard Linux tools and
commands are documented in commercially available Linux reference manuals and on various
Web sites. For more information aboutobtaining documentation for standard Linuxadministrative
tasks and associated topics, see the list of Web sites and additional publications provided in
“Related Software Products and Additional Publications” (page 19).
Intended Audience
This document is intended for experienced Linux users who run applications developed by
others, and for experienced system or application developers who develop, build, and run
application code on an HP XC system.
This document assumes that the user understands, and has experience with, multiprocessor
systems and the Message Passing Interface (MPI), and is familiar with HP XC architecture and
concepts.
New and Changed Information in This Edition
•Chapter 7 contains updated information on xcxclus and xcxperf commands.
•A new section on Unified Parallel C was added.
•There is a description of the ovp utility --opts=--queue option that allows you to specify
the LSF queue for performance health tests.
•A note regarding which ovp utility's performance health tests apply to Standard LSF and
which apply to LSF-HPC incorporated with SLURM was added.
Typographic Conventions
This document uses the following typographical conventions:
%, $, or #
A percent sign represents the C shell system prompt. A dollar
sign represents the system prompt for the Korn, POSIX, and
Bourne shells. A number sign represents the superuser prompt.
Intended Audience15
audit(5)A manpage. The manpage name is audit, and it is located in
Section 5.
Command
Computer output
A command name or qualified command phrase.
Text displayed by the computer.
Ctrl+xA key sequence. A sequence such as Ctrl+x indicates that you
must hold down the key labeled Ctrl while you press another
key or mouse button.
ENVIRONMENT VARIABLEThe name of an environment variable, for example, PATH.
[ERROR NAME]
The name of an error, usually returned in the errno variable.
KeyThe name of a keyboard key. Return and Enter both refer to the
same key.
TermThe defined use of an important word or phrase.
User input
Variable
Commands and other text that you type.
The name of a placeholder in a command, function, or other
syntax display that you replace with an actual value.
[ ]The contents are optional in syntax. If the contents are a list
separated by |, you can choose one of the items.
{ }The contents are required in syntax. If the contents are a list
separated by |, you must choose one of the items.
. . .The preceding element can be repeated an arbitrary number of
times.
|Separates items in a list of choices.
WARNINGA warning calls attention to important information that if not
understood or followed will result in personal injury or
nonrecoverable system problems.
CAUTIONA caution calls attention to important information that if not
understood or followed will result in data loss, data corruption,
or damage to hardware or software.
IMPORTANTThis alert provides essential information to explain a concept or
to complete a task
NOTEA note contains additional information to emphasize or
supplement important points of the main text.
HP XC and Related HP Products Information
The HP XC System Software Documentation Set, the Master Firmware List, and HP XC HowTo
documents are available at this HP Technical Documentation Web site:
http://www.docs.hp.com/en/linuxhpc.html
The HP XC System Software Documentation Set includes the following core documents:
HP XC System Software Release Notes
HP XC Hardware Preparation Guide
HP XC System Software Installation Guide
16
Describes important, last-minute information about firmware,
software, or hardware that might affect the system. This
document is not shipped on the HP XC documentation CD. It
is available only on line.
Describes hardware preparation tasks specific to HP XC that
are required to prepare each supported hardware model for
installation and configuration, including required node and
switch connections.
Provides step-by-step instructions for installing the HP XC
System Software on the head node and configuring the system.
HP XC System Software Administration Guide
HP XC System Software User's Guide
QuickSpecs for HP XC System Software
Provides an overview of the HP XC system administrative
environment, cluster administration tasks, node maintenance
tasks, LSF® administration tasks, and troubleshooting
procedures.
Provides anoverview ofmanaging theHP XC user environment
with modules, managing jobs with LSF, and describes how to
build, run, debug, and troubleshoot serial and parallel
applications on an HP XC system.
Provides a product overview, hardwarerequirements, software
requirements, software licensing information, ordering
information, and information about commercially available
software that has been qualified to interoperate with the HP XC
System Software. The QuickSpecs are located on line:
http://www.hp.com/go/clusters
See the following sources for information about related HP products.
HP XC Program Development Environment
The Program Development Environment home page provide pointers to tools that have been
tested in the HP XC program development environment (for example, TotalView® and other
HP Message Passing Interface (HP-MPI) is an implementation of the MPI standard that has been
integrated in HP XC systems. The home page and documentation is located at the following Web
site:
http://www.hp.com/go/mpi
HP Serviceguard
HP Serviceguard is a service availability tool supported on an HP XC system. HP Serviceguard
enables some system services to continue if a hardware or software failure occurs. The HP
Serviceguard documentation is available at the following Web site:
http://www.docs.hp.com/en/ha.html
HP Scalable Visualization Array
The HP Scalable Visualization Array (SVA) is a scalable visualization solution that is integrated
with the HP XC System Software. The SVA documentation is available at the following Web site:
http://www.docs.hp.com/en/linuxhpc.html
HP Cluster Platform
The cluster platform documentation describes site requirements, shows you how to set up the
servers and additional devices, and provides procedures to operate and manage the hardware.
These documents are available at the following Web site:
http://www.docs.hp.com/en/linuxhpc.html
HP Integrity and HP ProLiant Servers
Documentation for HP Integrity and HP ProLiant servers is available at the following Web site:
http://www.docs.hp.com/en/hw.html
Related Information
This section provides useful links to third-party, open source, and other related software products.
Related Information17
Supplementary Software ProductsThis section provides links to third-party and open source
software products that are integrated into the HP XC System Software core technology. In the
HP XC documentation, except where necessary, references to third-party and open source
software components are generic, and the HP XC adjective is not added to any reference to a
third-party or open source command or product name. For example, the SLURM srun command
is simply referred to as the srun command.
The location of each Web site or link to a particular topic listed in this section is subject to change
without notice by the site provider.
•http://www.platform.com
Home page for Platform Computing Corporation, the developer of the Load Sharing Facility
(LSF). LSF-HPC with SLURM, the batch system resource manager used on an HP XC system,
is tightly integrated with the HP XC and SLURM software. Documentation specific to
LSF-HPC with SLURM is provided in the HP XC documentation set.
Standard LSF is also available as an alternative resource management system (instead of
LSF-HPC with SLURM) for HP XC. This is the version of LSF that is widely discussed on
the Platform Web site.
For your convenience, the following Platform Computing Corporation LSF documents are
shipped on the HP XC documentation CD in PDF format:
—Administering Platform LSF
—Administration Primer
—Platform LSF Reference
—Quick Reference Card
—Running Jobs with Platform LSF
LSF procedures and information supplied in the HP XC documentation, particularly the
documentation relating to the LSF-HPC integration with SLURM, supersedes the information
supplied in the LSF manuals from Platform Computing Corporation.
The Platform Computing Corporation LSF manpages are installed by default. lsf_diff(7)
supplied by HP describes LSF command differences when using LSF-HPC with SLURM on
an HP XC system.
The following documents in the HP XC System Software Documentation Set provide
information about administering and using LSF on an HP XC system:
—HP XC System Software Administration Guide
—HP XC System Software User's Guide
18
•http://www.llnl.gov/LCdocs/slurm/
Documentation for the Simple Linux Utility for Resource Management (SLURM), which is
integrated with LSF to manage job and compute resources on an HP XC system.
•http://www.nagios.org/
Home page for Nagios®, a system and network monitoring application that is integrated
into an HP XC system to provide monitoring capabilities. Nagios watches specified hosts
and services and issues alerts when problems occur and when problems are resolved.
•http://oss.oetiker.ch/rrdtool
Home page of RRDtool, a round-robin database tool and graphing system. In the HP XC
system, RRDtool is used with Nagios to provide a graphical view of system status.
•http://supermon.sourceforge.net/
Home page for Supermon, a high-speed cluster monitoring system that emphasizes low
perturbation, high sampling rates, and an extensible data protocol and programming
interface. Supermonworks in conjunction with Nagios to provide HP XC system monitoring.
•http://www.llnl.gov/linux/pdsh/
Home page for the parallel distributed shell (pdsh), which executes commands across HP
XC client nodes in parallel.
•http://www.balabit.com/products/syslog_ng/
Home page for syslog-ng, a logging tool that replaces the traditional syslog functionality.
The syslog-ng tool is a flexible and scalable audit trail processing tool. It provides a
centralized, securely stored log of all devices on the network.
•http://systemimager.org
Home page for SystemImager®, which is the underlying technology that distributes the
golden image to all nodes and distributes configuration changes throughout the system.
•http://linuxvirtualserver.org
Home page for the Linux Virtual Server (LVS), the load balancer running on the Linux
operating system that distributes login requests on the HP XC system.
•http://www.macrovision.com
Home pagefor Macrovision®, developer of the FLEXlm™ license management utility, which
is used for HP XC license management.
•http://sourceforge.net/projects/modules/
Web site for Modules, which provide for easy dynamic modification of a user's environment
through modulefiles, which typically instruct the module command to alter or set shell
environment variables.
•http://dev.mysql.com/
Home page for MySQL AB, developer of the MySQL database. This Web site contains a link
to the MySQL documentation, particularly the MySQL Reference Manual.
Related Software Products and Additional PublicationsThis section provides pointers to Web
sites for related software products and provides references to useful third-party publications.
The location of each Web site or link to a particular topic is subject to change without notice by
the site provider.
Linux Web Sites
•http://www.redhat.com
Home page for Red Hat®, distributors of Red Hat Enterprise Linux Advanced Server, a
Linux distribution with which the HP XC operating environment is compatible.
•http://www.linux.org/docs/index.html
This Web site for the Linux Documentation Project (LDP) contains guides that describe
aspects of working with Linux, from creating your own Linux system from scratch to bash
script writing. This site also includes links to Linux HowTo documents, frequently asked
questions (FAQs), and manpages.
•http://www.linuxheadquarters.com
Web site providing documents and tutorials for the Linux user. Documents contain
instructions on installing and using applications for Linux, configuring hardware, and a
variety of other topics.
•http://www.gnu.org
Home page for the GNU Project. This site provides online software and information for
many programs and utilities that are commonly used on GNU/Linux systems. Online
information include guides for using the bash shell, emacs, make, cc, gdb, and more.
Related Information19
MPI Web Sites
•http://www.mpi-forum.org
Contains the official MPI standards documents, errata, and archives of the MPI Forum. The
MPI Forum is an open group with representatives from many organizations that define and
maintain the MPI standard.
•http://www-unix.mcs.anl.gov/mpi/
A comprehensive site containing general information, such as the specification and FAQs,
and pointers to other resources, including tutorials, implementations, and other MPI-related
sites.
Provides information about how to use the mdadm RAID management utility.
Additional Publications
For more information about standard Linux system administration or other related software
topics, consider using one of the following publications, which must be purchased separately:
•Linux Administration Unleashed, by Thomas Schenk, et al.
•Linux Administration Handbook, by Evi Nemeth, Garth Snyder, Trent R. Hein, et al.
•Managing NFS and NIS, by Hal Stern, Mike Eisler, and Ricardo Labiaga (O'Reilly)
•MySQL, by Paul Debois
•MySQL Cookbook, by Paul Debois
•High Performance MySQL, by Jeremy Zawodny and Derek J. Balling (O'Reilly)
•Perl Cookbook, Second Edition, by Tom Christiansen and Nathan Torkington
•Perl in A Nutshell: A Desktop Quick Reference , by Ellen Siever, et al.
Manpages
Manpages provide online reference and command information fromthe command line. Manpages
are supplied with the HP XC system for standard HP XC components, Linux user commands,
LSF commands, and other software components that are distributed with the HP XC system.
20
Manpages for third-party software components might be provided as a part of the deliverables
for that component.
Using discover(8) as an example, you can use either one of the following commands to display a
manpage:
$ man discover
$ man 8 discover
If you are not sure about a command you need to use, enter the man command with the -k option
to obtain a list of commands that are related to a keyword. For example:
$ man -k keyword
HP Encourages Your Comments
HP encourages comments concerning this document. We are committed to providing
documentation that meets your needs. Send any errors found, suggestions for improvement, or
compliments to:
feedback@fc.hp.com
Include the document title, manufacturing part number, and any comment, error found, or
suggestion for improvement you have concerning this document.
HP Encourages Your Comments21
22
1 Overview of the User Environment
The HP XC system is a collection of computer nodes, networks, storage, and software, built into
a cluster, that work together. It is designed to maximize workload and I/O performance, and to
provide the efficient management of large, complex, and dynamic workloads.
This chapter addresses the following topics:
•“System Architecture” (page 23)
•“User Environment” (page 27)
•“Application Development Environment” (page 28)
•“Run-Time Environment” (page 29)
•“Components, Tools, Compilers, Libraries, and Debuggers” (page 31)
1.1System Architecture
The HP XC architecture is designed as a clustered system with single system traits. From a user
perspective, this architecture achieves a single system view, providing capabilities such as the
following:
•Single user login
•Single file system namespace
•Integrated view of system resources
•Integrated program development environment
•Integrated job submission environment
1.1.1 HP XC System Software
The HP XC System Software enables the nodes in the platform to run cohesively to achieve a
single system view. You can determine the version of the HP XC System Software from the
/etc/hptc-release file.
$ cat /etc/hptc-release
HP XC V#.# RCx PKn date
Where:
#.#
RCx
PKnn
dateIs the date (in yyyymmdd format) the software was released.
Is the version of the HP XC System Software
Is the release candidate of the version.
Indicates there was a cumulative patch kit for the HP XC System Software installed.
For example, if PK02 appears in the output, it indicates that both cumulative patch kits
PK01 and PK02 have been installed.
This field is blank if no patch kits are installed.
1.1.2 Operating System
The HP XC system is a high-performance compute cluster that runs HP XC Linux for High
Performance Computing Version 1.0 (HPC Linux) as its software base. Any serial or thread-parallel
applications, or applications built shared with HP-MPI that run correctly on Red Hat Enterprise
Linux Advanced Server Version 3.0 or Version 4.0, also run correctly on HPC Linux.
1.1.3 Node Platforms
The HP XC System Software is available on several platforms. You can determine the platform
by examining the top few fields of the /proc/cpuinfo file, for example, by using the head
command:
1.1 System Architecture23
$ head /proc/cpuinfo
Table 1-1 presents the representative output for each of the platforms. This output may differ
according to changes in models and so on.
Table 1-1 Determining the Node Platform
Platform
CP3000
CP4000
CP6000
CP300BL
(Blade-only
XC systems)
Note:
Partial Output of /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 3
model name : Intel(R) Xeon(TM)
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 5
model name : AMD Opteron(tm)
processor : 0
vendor : GenuineIntel
arch : IA-64
family : Itanium 2
model : 1
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 6
model name : Intel(R) Xeon(TM) CPU 3.73GHz
The /proc/cpuinfo file is dynamic.
1.1.4 Node Specialization
The HP XC system is implemented as a sea-of-nodes. Each node in the system contains the same
software image on its local disk. There are two types of nodes in the system — a head node and
client nodes.
head nodeThe node is installed with the HP XC system software first — it is used to
generate other HP XC (client) nodes. The head node is generally of interest
only to the administrator of the HP XC system.
client nodesAll the other the nodes that make up the system. They are replicated from
the head node and are usually given one or more specialized roles to perform
various system functions, such as logging into the system or running jobs.
The HP XC system allows for the specialization of client nodes to enable efficient and flexible
distribution of the workload. Nodes can be assigned one or more specialized roles that determine
how a particular node is used and what system services it provides. Of the many different roles
that can be assigned to a client node, the following roles contain services that are of special interest
to the general user:
login roleThe role most visible to users is on nodes that have the login role. Nodes
with the login role are where you log in and interact with the system to
perform various tasks. Forexample, once logged in to anode with login role,
you can execute commands, build applications, or submit jobs to compute
nodes for execution. There can be one or several nodes with the login role
in an HP XC system, depending upon cluster size and requirements. Nodes
with the login role are a part of the Linux Virtual Server ring, which
24Overview of the User Environment
compute roleThe compute role is assigned to nodes where jobs are to be distributed and
1.1.5 Storage and I/O
The HP XC system supports both shared (global) and private (local) disks and file systems.
Shared file systems can be mounted on all the other nodes by means of Lustre™ or NFS. This
gives users a single view of all the shared data on disks attached to the HP XC system.
SAN Storage
The HP XC system uses the HP StorageWorks Scalable File Share (HP StorageWorks SFS), which
is based on Lustre technology and uses the Lustre File System from Cluster File Systems, Inc.
This is a turnkey Lustre system from HP. It supplies access to Lustre file systems through Lustre
client/server protocols over various system interconnects. The HP XC system is a client to the
HP StorageWorks SFS server.
distributes login requests from users. A node with the login role is referred
to as a login node in this manual.
run. Although all nodes in the HP XC system are capable of carrying out
computations, the nodes with the compute role are the primary nodes used
to run jobs. Nodes with the compute role become a part of the resource pool
used by LSF-HPC and SLURM, which manage and distribute the job
workload. Jobs that are submitted to compute nodes must be launched from
nodes with the login role. Nodes with the compute role are referred to as
compute nodes in this manual.
Local Storage
Local storage for each node holds the operating system, a copy of the HP XC System Software,
and temporary space that can be used by jobs running on the node.
HP XC file systems are described in detail in “File System”.
1.1.6 File System
Each node of the HP XC system has its own local copy of all the HP XC System Software files
including the Linux distribution; it also has its own local user files. Every node can also import
files from NFS or Lustre file servers. HP XC System Software uses NFS 3, including both client
and server functionality. HP XC System Software also enables Lustre client services for
high-performance and high-availability file I/O. These Lustre client services require the separate
installation of Lustre software, provided with the HP Storage Works Scalable File Share (SFS).
NFS files can be shared exclusively among the nodes of the HP XC System or can be shared
between the HP XC and external systems. External NFS files can be shared with any node having
a direct external network connection. It is also possible to set up NFS to import external files to
HP XC nodes without external network connections, by routing through a node with an external
network connection. Your system administrator can choose touse either the HP XC administrative
network or the HP XC system interconnect for NFS operations. The HP XC system interconnect
can potentially offer higher performance, but only at the potential decrease in the performance
of application communications.
For high-performance or high-availability file I/O, the Lustre file system is available on HP XC.
The Lustre file system uses POSIX-compliant syntax and semantics. The HP XC System Software
includes kernel modifications required for Lustre client services which enables the operation of
the separately installable Lustre client software. The Lustre file server product used on HP XC
is the HP StorageWorks Scalable File Share (SFS), which fully supports the HP XC System
Software.
The SFS also includes HP XC Lustre client software. The SFS can be integrated with the HP XC
so that Lustre I/O is performed over the same high-speed system interconnect fabric used by
1.1 System Architecture25
the HP XC. So, for example, if the HP XC system interconnect is based on a Quadrics® QsNet
II® switch, then the SFS will serve files over ports on that switch. The file operations are able to
proceed at the full bandwidth of the HP XC system interconnect because these operations are
implemented directly over the low-level communications libraries. Further optimizations of file
I/O can be achieved at the application level using special file system commands – implemented
as ioctls – which allow a program to interrogate the attributes of the file system, modify the
stripe size and other attributes of new (zero-length) files, and so on. Some of these optimizations
are implicit in the HP-MPI I/O library, which implements the MPI-2 file I/O standard.
File System Layout
In an HP XC system, the basic file system layout is the same as that of the Red Hat Advanced
Server 3.0 Linux file system.
The HP XC file system is structured to separate cluster-specific files, base operating system files,
and user-installed software files. This allows for flexibility and ease of potential upgrades of the
system software and keeps software from conflicting with user installed software. Files are
segregated into the following types and locations:
•Software specific to HP XC is located in /opt/hptc
•HP XC configuration data is located in /opt/hptc/etc
•Clusterwide directory structure (file system) is located in /hptc_cluster
Be aware of the following information about the HP XC file system layout:
•Open source software that by default would be installed under the /usr/local directory
is instead installed in the /opt/hptc directory.
•Software installed in the /opt/hptc directory is not intended to be updated by users.
•Software packages are installed in directories under the /opt/hptc directory under their
own names. The exception to this is third-party software, which usually goes in /opt/r.
•Four directories under the /opt/hptc directory contain symbolic links to files included in
the packages:
—/opt/hptc/bin
—/opt/hptc/sbin
—/opt/hptc/lib
—/opt/hptc/man
Each package directory should have a directory corresponding to each of these directories
in which every file has a symbolic link created in the /opt/hptc/ directory.
1.1.7 System Interconnect Network
The HP XC system interconnect provides high-speed connectivity for parallel applications. The
system interconnect network provides a high-speed communications path used primarily for
user file service and for communications within user applications that are distributed among
nodes of the system. The system interconnect network is a private network within the HP XC.
Typically, every node in the HP XC is connected to the system interconnect.
Table 1-2 indicates the types of system interconnects that are used on HP XC systems.
Table 1-2 HP XC System Interconnects
26Overview of the User Environment
CP6000CP4000CP3000
XXQuadrics QSNet II®
XXMyrinet®
XXXGigabit Ethernet®
XXXInfiniBand®
Additional information on supported system interconnects is provided in the HP XC HardwarePreparation Guide.
1.1.8 Network Address Translation (NAT)
The HP XC system uses Network Address Translation (NAT) to enable nodes in the HP XC
system that do not have direct external network connections to open outbound network
connections to external network resources.
1.2 Determining System Configuration Information
You can determine various system configuration parameters with a few commands:
Use the following command to display the version of the
HP XC System Software:
Use either of these commands to display the Kernelversion:
Use the following command to display the RPMs:
Use the following command to display the amount of free
and used memory in megabytes:
Use the following command to display the disk partitions
and their sizes:
Use the following command to display the swap usage
summary by device:
Use the following commands to display the cache
information; this is not available on all systems.
1.3 User Environment
This section introduces some general information about logging in, configuring, and using the
HP XC environment.
1.3.1 LVS
The HP XC system uses the Linux Virtual Server (LVS) to present a single host name for user
logins. LVS is a highly scalable virtual server built on a system of real servers. By using LVS, the
architecture of the HP XC system is transparent to end users, and they see only a single virtual
server. This eliminates the need for users to know how the system is configured in order to
successfully log in and use the system. Any changes in the system configuration are transparent
to end users. LVS also provides load balancing across login nodes, which distributes login requests
to different servers.
The HP XC system provides the Modules Package (not to be confused with Linux kernel modules)
to configure and modify the user environment. The Modules Package enables dynamic
modification of auser’s environment by means of modulefiles. Modulefiles provide a convenient
means for users to tailor their working environment as necessary. One of the key features of
modules is to allow multiple versions of the same software to be used in a controlled manner.
A modulefile contains information to configure the shell for an application. Typically, a modulefile
contains instructions that alter or set shell environment variables, such as PATH and MANPATH,
to enable access to various installed software. Many users on a system can share modulefiles,
and users may have their own collection to supplement or replace the shared modulefiles.
1.2 Determining System Configuration Information27
Modulefiles can be loaded into the your environment automatically when you log in to the
system, or any time you need to alter the environment. The HP XC system does not preload
modulefiles.
See Chapter 3 “Configuring Your Environment with Modulefiles” for more information.
1.3.3 Commands
The HP XC user environment includes standard Linux commands, LSF commands, SLURM
commands, HP-MPI commands, and modules commands. This section provides a brief overview
of these command sets.
Linux commandsYou can use standard Linux user commands and tools on the HP XC
LSF commandsHP XC supports LSF-HPC and the use of standard LSF commands,
SLURM commandsHP XC uses the Simple Linux Utility for Resource Management
HP-MPI commandsYou can run standard HP-MPI commands from the command line.
Modules commandsThe HP XC system uses standard Modules commands to load and
system. StandardLinux commands are not described in this document,
but you can access Linux command descriptions in Linux
documentation and manpages. Run the Linux man command with the
Linux command name to display the corresponding manpage.
some of which operate differently in the HP XC environment from
standard LSF behavior. The use of LSF-HPC commands in the HP XC
environment is described in Chapter 10 “Using LSF-HPC”, and in the
HP XC lsf_diff manpage. Information about standard LSF
commands is available in Platform Computing Corporation LSF
documentation, and in the LSF manpages. For your convenience, the
HP XC Documentation CD contains XC LSF manuals from Platform
Computing. LSF manpages are available on the HP XC system.
(SLURM) for system resource management and job scheduling.
Standard SLURM commands areavailable through thecommand line.
SLURM functionality is described in Chapter 9 “Using SLURM”.
Descriptions of SLURM commands are available in the SLURM
manpages. Invoke the mancommand with the SLURM command name
to access them.
Descriptions of HP-MPI commands are available in the HP-MPI
documentation, which is supplied with the HP XC system software.
unload modulefiles, which are used to configure and modify the user
environment. Modules commands are described in “Overview of
Modules”.
1.4 Application Development Environment
The HP XC system provides an environment that enables developing, building, and running
applications using multiple nodes with multiple cores. These applications can range from parallel
applications using many cores to serial applications using a single core.
1.4.1 Parallel Applications
The HP XCparallel application development environment allows parallelapplication processes
to be started and stopped together on a large number of application processors, along with the
I/O and process control structures to manage these kinds of applications.
Full details and examples of how to build, run, debug, and troubleshoot parallel applications
are provided in “Developing Parallel Applications”.
28Overview of the User Environment
1.4.2 Serial Applications
You can build and run serial applications under the HP XC development environment. A serial
application is a command or application that does not use any form of parallelism.
Full details and examples of how to build, run, debug, and troubleshoot serial applications are
provided in “Building Serial Applications”.
1.5 Run-Time Environment
This section describes LSF-HPC, SLURM, and HP-MPI, and how these components work together
to provide the HP XC run-time environment. LSF-HPC focuses on scheduling (and managing
the workload) and SLURM provides efficient and scalable resource management of the compute
nodes.
Another HP XC environment features standard LSF without the interaction with the SLURM
resource manager.
1.5.1 SLURM
Simple Linux Utility for Resource Management (SLURM) is a resource management system that
is integrated into the HP XC system. SLURM is suitable for use on large and small Linux clusters.
It was developed by Lawrence Livermore National Lab and Linux Networks. As a resource
manager, SLURM allocates exclusive or unrestricted access to resources (application and compute
nodes) for users to perform work, and provides a framework to start, execute and monitor work
(normally a parallel job) on the set of allocated nodes.
A SLURM system consists of two daemons, one configuration file, and a set of commands and
APIs. The central controller daemon, slurmctld, maintains the global state and directs
operations. A slurmd daemon is deployed to each computing node and responds to job-related
requests, such as launching jobs, signalling, and terminating jobs. End users and system software
(such as LSF-HPC) communicate with SLURM by means of commands or APIs — for example,
allocating resources, launching parallel jobs on allocated resources, and terminating running
jobs.
SLURM groups compute nodes (the nodes where jobs are run) together into “partitions”. The
HP XC system can have one or several partitions. When HP XC is installed, a single partition of
compute nodes is created by default for LSF-HPC batch jobs. The system administrator has the
option of creating additional partitions. For example, another partition could be created for
interactive jobs.
1.5.2 Load Sharing Facility (LSF-HPC)
The LoadSharing Facility for High Performance Computing (LSF-HPC) fromPlatform Computing
Corporation is a batch system resource manager that has been integrated with SLURM for use
on the HP XC system. LSF-HPC for SLURM is included with the HP XC System Software, and
is an integral part of the HP XC environment. LSF-HPC interacts with SLURM to obtain and
allocate available resources, andto launch and control all the jobs submitted to LSF-HPC. LSF-HPC
accepts, queues, schedules, dispatches, and controls all the batch jobs that users submit, according
to policies and configurations established by the HP XC site administrator. On an HP XC system,
LSF-HPC for SLURM is installed and runs on one HP XC node, known as the LSF execution
host.
A complete description of LSF-HPC is provided in Chapter 10 “Using LSF-HPC”. In addition,
for your convenience, the HP XC Documentation CD contains LSF manuals from Platform
Computing.
1.5 Run-Time Environment29
1.5.3 Standard LSF
Standard LSF is also available on the HP XC system. The information for using standard LSF is
documented in the LSF manuals from Platform Computing. For your convenience, the HP XC
documentation CD contains these manuals.
1.5.4 How LSF-HPC and SLURM Interact
In theHP XC environment, LSF-HPC cooperates withSLURM to combine the powerful scheduling
functionality ofLSF-HPC with the scalableparallel job launching capabilities of SLURM. LSF-HPC
acts primarily as a workload scheduler on top of the SLURM system, providing policy and
topology-based scheduling for end users. SLURM provides an execution and monitoring layer
for LSF-HPC. LSF-HPC uses SLURM to detect system topology information, make scheduling
decisions, and launch jobs on allocated resources.
When a job is submitted to LSF-HPC, LSF-HPC schedules the job based on job resource
requirements. LSF-HPC communicates with SLURM to allocate the required HP XC compute
nodes for the job from the SLURM lsf partition. LSF-HPC provides node-level scheduling for
parallel jobs, and core-level scheduling for serial jobs. Because of node-level scheduling, a parallel
job may be allocated more cores than it requested, depending on its resource request; the srun
or mpirun -srun launch commands within the job still honor the original request. LSF-HPC
always tries to pack multiple serial jobs on the same node, with one core per job. Parallel jobs
and serial jobs cannot coexist on the same node.
After the LSF-HPC scheduler allocates the SLURM resources for a job, the SLURM allocation
information is recorded with the job. You can view this information with the bjobs and bhist
commands.
When LSF-HPCstarts a job, it sets the SLURM_JOBID and SLURM_NPROCS environment variables
in the job environment. SLURM_JOBID associates the LSF-HPC job with SLURM's allocated
resources. The SLURM_NPROCS environment variable is set to the originally requested number
of cores. LSF-HPC dispatches the job from the LSF execution host, which is the same node on
which LSF-HPC daemons run. The LSF-HPC JOB_STARTER script, which is configured for all
queues, uses the srun command to launch a user job on the first node in the allocation. Your job
can contain additional srun or mpirun commands to launch tasks to all nodes in the allocation.
While a job is running, all resource limits supported by LSF-HPC enforced, including core limit,
CPU time limit, data limit, file size limit, memory limit and stack limit. When you terminate a
job, LSF-HPC uses the SLURM scancel command to propagate the signal to the entire job.
After a job finishes, LSF-HPC releases all allocated resources.
A detailed description, along with an example and illustration, of how LSF-HPC and SLURM
cooperate to launch and manage jobs is provided in “How LSF-HPC and SLURM Launch and
Manage a Job”. It is highly recommended that you review this information.
In summary, and in general:
LSF-HPCDetermines WHEN and WHERE the job will run. LSF-HPC communicates with
SLURM to determine WHICH resources are available, and SELECTS the appropriate
set of nodes for the job.
SLURMAllocates nodes for jobs as determined by LSF-HPC. It CONTROLS task/rank
distribution within the allocated nodes. SLURM also starts the executables on each
host as requested by the HP-MPI mpirun command.
HP-MPIDetermines HOW the job runs. It is part of the application, so it performs
communication. HP-MPI can also pinpoint the processor on which each rank runs.
1.5.5 HP-MPI
HP-MPI is a high-performance implementation of the Message Passing Interface (MPI) standard
and is included withthe HP XC system. HP-MPI uses SLURM to launch jobs on an HP XCsystem
30Overview of the User Environment
— however, it manages the global MPI exchange so that all processes can communicate with
each other.
See the HP-MPI documentation for more information.
1.6 Components, Tools, Compilers, Libraries, and Debuggers
This section provides a brief overview of the some of the common tools, compilers, libraries, and
debuggers available for use on HP XC.
An HP XC system is integrated with several open source software components. HP XC
incorporates a Linux operating system and its standard commands and tools, and maintains the
Linux ABI. In addition, HP XC incorporates LSF-HPC and SLURM to launch and manage jobs,
and includes HP-MPI for high-performance, parallel, message-passing applications.
You can use most standard open source compilers and tools on an HP XC system; however, you
must purchase them separately. Several open source and commercially available software
packages have been tested with the HP XC Software. The following lists some of the software
packages that have been tested for use with HP XC. This list provides a sample of what is available
on HP XC and is not intended as a complete list. Some packages listed are actually included as
part of the HPC Linux distribution and as such can be used as part of the HP XC development
environment. The tested software packages include, but are not limited to, the following:
•Intel Fortran 95, C, C++ Compiler Version 7.1, 8.0, 9.0, 9.1 including OpenMP, for Itanium®
(includes ldb debugger)
•gcc version 3.2.3 (included in the HP XC distribution)
•g77 version 3.2.3 (included in the HP XC distribution)
•Portland Group PGI Fortran90, C, C++ Version 5.1, including OpenMP, for CP4000
•Quadrics SHMEM, as part of QsNet II user libraries, on Itanium systems connected with
the Quadrics QsNet II switch (included in the HP XC distribution)
•Etnus TotalView debugger Version 6.4
•gdb (part of the HP XC Linux distribution)
•Intel MKL V6.1 on Itanium
•AMD Math Core Library Version 2.0 on CP4000
•valgrind 2.0.0 (http://valgrind.kde.org) in 32-bit mode only
•oprofile 0.7.1 (http://oprofile.sourceforge.net)
•PAPI 3.2 (http://icl.cs.utk.edu/papi)
•Intel Visual Analyzer/Tracer (formally Pallas Vampir and Vampirtrace performance analyzer
) on Itanium
•GNU make, including distributed parallel make (included in the HP XC distribution)
Other standard tools and libraries are available, and you can most likely used them on HP XC
as you would on any other standard Linux system.
IMPORTANT:Software that is not described in the HP XC QuickSpecs or in the HP XC
documentation set has not been tested with the HP XC System Software. HP does not warrant
the operation of any software that is not specifically identified in the product specification (the
HP XC QuickSpecs) and is not documented in the HP XC documentation set.
1.6 Components, Tools, Compilers, Libraries, and Debuggers31
32
2 Using the System
This chapter describes the tasks and commands that the general user must know to use the
system. It addresses the following topics:
•“Logging In to the System” (page 33)
•“Overview of Launching and Managing Jobs” (page 33)
•“Performing Other Common User Tasks” (page 35)
•“Getting System Help and Information” (page 36)
2.1 Logging In to the System
Logging in to an HP XC system is similar to logging in to any standard Linux system. You can
only login on nodes that have the login role; those nodes are enabled for logins.
2.1.1 LVS Login Routing
The HP XC system uses the Linux Virtual Server (LVS) facility to present a set of login nodes
with a single cluster name. When you log in to the system, LVS automatically routes your login
request to an available login node on the system. LVS load balances login sessions across the
login nodes and improves the availability of login access. When you log in to the HP XC system,
you do not have to know specific node names to log in, only the HP XC system's cluster name.
2.1.2 Using the Secure Shell to Log In
Secure Shell (ssh) is the preferred method for accessing the HP XC system.
Typically, you access the HP XC system using the ssh command to get a login shell or to execute
commands. For example:
The ssh service also allows file transfer using the scp or sftp commands over the same port as
ssh.
The typical r* UNIX commands, such as rlogin, rsh, and rcp, are not installed on an HP XC
system by default because of their inherent insecurity. The ssh command transfers all login and
password information in an encrypted form instead of the plaintext form used by the r* UNIX
commands (as well as telnet and ftp).
If you want to use ssh without password prompting, you must set up ssh authentication keys.
See ssh(1) for information about using ssh authentication keys.
The Secure Shell is further discussed in “Enabling Remote Execution with OpenSSH” (page 117).
You can bypass the need to enter a login and password each time you log in by updating the
ssh keys. The ssh_create_shared_keys command, used on a one-time basis, updates the
appropriate ssh key files in your $HOME/.ssh directory, provided:
•You have a user account on the HP XC system.
•Your home directory is mounted on all the nodes to which you log in or on which you launch
jobs.
Ensure that you deny write permission on your home directory to the group and others; otherwise
the ssh command . The following command accomplishes that for you:
$ chmod go -w $HOME
2.2 Overview of Launching and Managing Jobs
This section provides a brief description of some of the many ways to launch jobs, manage jobs,
and get information about jobs on an HP XC system. This section is intended only as a quick
2.1 Logging In to the System33
overview about some basic ways of running and managing jobs. Full information and details
about the HP XC job launch environment are provided in “Using SLURM”) and the LSF-HPC
section of “Using LSF-HPC”) of this document.
2.2.1 Introduction
As described in “Run-Time Environment” (page 29), SLURM and LSF-HPC cooperate to run
and manage jobs on the HP XC system, combining LSF-HPC's powerful and flexible scheduling
functionality with SLURM's scalable parallel job-launching capabilities.
SLURM is the low-level resource manager and job launcher, and performs core allocation for
jobs. LSF-HPC gathers information about the cluster from SLURM. When a job is ready to be
launched, LSF-HPC creates a SLURM node allocation and dispatches the job to that allocation.
Although you can launch jobs directly using SLURM, HP recommends that you use LSF-HPC
to take advantage of its scheduling and job management capabilities. You can add SLURM
options to the LSF-HPC job launch command line to further define job launch requirements. Use
the HP-MPI mpirun command and its optionswithin LSF-HPC to launch jobs that require MPI's
high-performance message-passing capabilities.
When the HP XC system is installed, a SLURM partition of nodes is created to contain LSF-HPC
jobs. This partition is called the lsf partition.
When a job is submitted to LSF-HPC, the LSF-HPC scheduler prioritizes the job and waits until
the required resources (compute nodes from the lsf partition) are available.
When the requested resources are available for the job, LSF-HPC creates a SLURM allocation of
nodes on behalf of the user, sets the SLURM JobID for the allocation, and dispatches the job
with the LSF-HPC JOB_STARTER script to the first allocated node.
A detailed explanation of how SLURM and LSF-HPC interact to launch and manage jobs is
provided in “How LSF-HPC and SLURM Launch and Manage a Job” (page 102).
2.2.2 Getting Information About Queues
The LSF bqueues command lists the configured job queues in LSF-HPC. By default, bqueues
returns the following information about all queues:
•Queue name
•Queue priority
•Queue status
•Job slot statistics
•Job state statistics
To get information about queues, enter the bqueues as follows:
$ bqueues
For more information about using this command and a sample of its output, see “Examining
System Queues” (page 106)
2.2.3 Getting Information About Resources
The LSF bhosts, lshosts, and lsload commands are quick ways to get information about
system resources. LSF-HPC daemons run on only one node in the HP XC system, so the bhosts
and lshosts commands will list one host — which represents all the resources of the HP XC
system. The total number of cores for that host should be equal to the total number of cores
assigned to the SLURM lsf partition.
•The LSF bhosts command provides a summary of the jobs on the system and information
about the current state of LSF-HPC.
$ bhosts
34Using the System
For more informationabout using this command and a sample of its output, see “Examining
System Core Status” (page 105)
•The LSF lshosts command displays machine-specific information for the LSF execution
host node.
$ lshosts
For more information about using this command and a sample of its output, see “Getting
Information About the LSF Execution Host Node” (page 105) .
•The LSF lsload command displays load information for the LSF execution host node.
$ lsload
For more information about using this command and a sample of its output, see “Getting
Host Load Information” (page 106).
2.2.4 Getting Information About System Partitions
You can view information about system partitions with the SLURM sinfo command. The sinfo
command reports the state of all partitions and nodes managed by SLURM and provides a wide
variety of filtering, sorting, and formatting options. The sinfo command displays a summary
of available partition and node (not job) information, such as partition names, nodes per partition,
and cores per node).
$ sinfo
For more information about using the sinfo command and a sample of its output, see “Getting
Information About the lsf Partition” (page 106) .
2.2.5 Launching Jobs
To launch a job on an HP XC system, use the LSF bsub command. The bsub command submits
batch jobs or interactive batch jobs to an LSF-HPC queue for execution.
See “Submitting Jobs” (page 53) for full information about launching jobs.
2.2.6 Getting Information About Your Jobs
Use theLSF bjobs and bhist commands to obtain information about your running or completed
jobs:
•Use the bjobs command to examine the status of a running job (“Examining the Status of
a Job” (page 108)).
•Use the bhist command to obtain brief or full information about finished jobs (“Viewing
the Historical Information for a Job” (page 109))
You can view the components of the actual SLURM allocation command with the LSF bjobs
-l and bhist -l commands.
2.2.7 Stopping and Suspending Jobs
You can suspend or stop your jobs with the bstop and bkill commands:
•Use the bstop command to stop or suspend an LSF-HPC job.
•Use the bkill command to kill an LSF-HPC job.
2.2.8 Resuming Suspended Jobs
Use the LSF bresume command to resume a stopped or suspended job.
2.3 Performing Other Common User Tasks
This section contains general information about using the HP XC system.
2.3 Performing Other Common User Tasks35
2.3.1 Determining the LSF Cluster Name and the LSF Execution Host
The lsid command returns the LSF cluster name, the LSF-HPC version, and the name of the
LSF execution host:
$ lsid
Platform LSF HPC version number for SLURM, date and time stamp
Copyright 1992-2005 Platform Computing Corporation
My cluster name is hptclsf
My master name is lsfhost.localdomain
In this example, hptclsf is the LSF cluster name, and lsfhost.localdomain is the name of
the virtual IP address used by the node where LSF-HPC is installed and running (LSF execution
host).
2.4 Getting System Help and Information
In addition to the hardcopy documentation described in the preface of this document (“About
This Document”), the HP XC system also provides system help and information in the form on
online manpages.
Manpages provide online reference and command information from the system command line.
Manpages are supplied with the HP XC system for standard HP XC components, Linux user
commands, LSF commands, SLURM commands, and other software components that are
distributed with the HP XC system, such as HP-MPI. Manpages for third-party vendor software
components may be provided as a part of the deliverables for that software component.
To access manpages, type the man command with the name of a command. For example:
$ man sinfo
This command accesses the manpage for the SLURM sinfo command.
If you are unsure of the name of the command that you need to reference, you can enter the man
command with the -k option and a keyword, to get a list of commands that are related to the
keyword you entered. For example:
$ man -k keyword
The MANPATH environment variable is a list of directories that the man command uses to find a
manpage. You may need to add to your MANPATH variable to identify the location of manpages
for additional software packages.
36Using the System
3 Configuring Your Environment with Modulefiles
The HP XC system supports the use of Modules software to make it easier to configure and
modify the your environment. Modules software enables dynamic modification of your
environment by the use of modulefiles.
This chapter addresses the following topics:
•“Overview of Modules” (page 37)
•“Supplied Modulefiles” (page 38)
•“Modulefiles Automatically Loaded on the System” (page 40)
•“Viewing Available Modulefiles” (page 40)
•“Viewing Loaded Modulefiles” (page 40)
•“Loading a Modulefile” (page 40)
•“Unloading a Modulefile” (page 41)
•“Viewing Modulefile-Specific Help” (page 41)
•“Modulefile Conflicts” (page 41)
•“Creating a Modulefile” (page 42)
3.1 Overview of Modules
A modulefile contains information to configure the shell for an application. Typically, a modulefile
contains instructions that alter or set shell environment variables, such as PATH and MANPATH,
to enable access to various installed software.
Modules enable multiple versions of the same software to be used in your environment in a
controlled manner. For example, two different versions of the Intel C compiler can be installed
on the system at the same time – the version used is based upon which Intel C compiler modulefile
is loaded.
The HP XC software provides a number of modulefiles. You can also create your own modulefiles.
Modulefiles can be shared by many users on a system, and users can have their own collections
of modulefiles to supplement or replace the shared modulefiles.
For further information about the Modules software supplied with the HP XC system, see the
Modules Web site at the following URL:
http://sourceforge.net/projects/modules/
A modulefile does not provide configuration of your environment until it is explicitly loaded.
That is, the specific modulefile for a software product or application must be loaded in your
environment (with the module load command) before the configuration information in the
modulefile is effective.
You or your systemadministrator can configure your environmentso that any desiredmodulefiles
are automatically loaded for you when you log in to the system. You can also load a modulefile
yourself, as described in “Loading a Modulefile” (page 40).
The Modules software is initialized when you log in to the HP XC system. It provides access to
the commands that you can use to display information about modulefiles, load or unload
modulefiles, or view a list of available modulefiles.
Modulefiles do not affect packages other than their intended package. For example, a modulefile
for a compiler will not adjust MPI_CC (the environment variable used by HP-MPI to control
which compiler to use). A modulefile for a compiler simply makes it easier to access that particular
compiler; it does not try to determine how the compiler will be used.
Similarly, a modulefile for HP-MPI will not try to adjust LD_LIBRARY_PATH to correspond to
the compiler that the mpicc command uses. The modulefile for MPI simply makes it easier to
3.1 Overview of Modules37
access the mpi** scripts and libraries. You can specify the compiler it uses through a variety of
mechanisms long after the modulefile is loaded.
The previous scenarios were chosen in particular because the HP-MPI mpicc command uses
heuristics to try to find a suitable compiler when MPI_CC or other default-overriding mechanisms
are not in effect. It is possible that mpicc will choose a compiler inconsistent with the most
recently loaded compiler module. This could cause inconsistencies in the use of shared objects.
If you have multiple compilers (perhaps with incompatible shared objects) installed, it is probably
wise to set MPI_CC (and others) explicitly to the commands made available by the compiler's
modulefile.
The contents of the modulefiles in the modulefiles_hptc RPM use the vendor-intended
location of the installed software. In many cases, this is under the /opt directory, but in a few
cases (for example, the PGI compilers and the TotalView debugger) this is under the /usr
directory.
If you install a software package other than in the intended place, you must create or edit an
appropriate modulefile under the /opt/modules/modulefiles directory.
For the packages that install by default into the /usr directory (currently the PGI compilers and
TotalView), their corresponding modulefiles will try their vendor-intended location under the
/usr directory. If they do not find that directory, the packages will also search under the /opt
directory. Therefore, you do not need to make any changes to the modulefiles if you want to
install third-party software consistently as the vendor intended or consistently under the /opt
directory,
If the package is the stable product intended to be used by the site, editing an existing modulefile
is appropriate. While each modulefile has its unique characteristics, they all set some variables
describing the top-level directory, and editing to adjust the string should be sufficient. You may
need to repeat the adjustment if you update the modulefiles_hptc RPM or otherwise rebuild
your system.
If the package is a variant, for example, a beta version of a compiler, first copy the default
modulefile to a well-named copy, then edit the copy. You need root access to modify the
modulefiles, which is generally needed to install packages in either the /opt or /usr directories.
If you download a package into a private directory, you can create a private modulefiles directory.
You can then copy the corresponding default modulefile from under the
/opt/modules/modulefiles directory into a private modulefiles directory, edit the file, and
then register the directory with the module use command.
3.2 Supplied Modulefiles
The HP XC system provides the Modules Package (not to be confused with Linux kernel modules)
to configure and modify the user environment. The Modules Package enables dynamic
modification of a user's environment by means of modulefiles.
The HP XC system supplies the modulefiles listed in Table 3-1.
Table 3-1 Supplied Modulefiles
Sets the HP XC User Environment to Use:Modulefile
gcc/3.4/default
gcc/4.0/default
gcc/4.1/default
hpcpi
hptc
icc/8.0/default
38Configuring Your Environment with Modulefiles
GCC C compiler, Version 3.4.
GCC compiler, Version 4.0.
GCC compiler, Version 4.1.
HP-CPI profiling tools.
HPTC loader.
Intel C/C++ Version 8.0 compilers.
Table 3-1 Supplied Modulefiles (continued)
Sets the HP XC User Environment to Use:Modulefile
icc/8.1/default
icc/9.0/default
icc/9.1/default
idb/7.3/default
idb/9.0/default
idb/9.1/default
ifort/8.0/default
ifort/8.1/default
ifort/9.0/default
ifort/9.1/default
imkl/8.0 (default)
intel/7.1
intel/8.0
intel/8.1
intel/9.0
intel/9.1 (default)
Intel C/C++ Version 8.1 compilers.
Intel C/C++ Version 9.0 compilers.
Intel C/C++ Version 9.1 compilers.
Intel IDB debugger.
Intel IDB debugger.
Intel IDB debugger.
Intel Fortran Version 8.0 compilers.
Intel Fortran Version 8.1 compilers.
Intel Fortran Version 9.0 compilers.
Intel Fortran Version 9.1 compilers.
Intel Math Kernel Library.
Intel Version 7.1 compilers.
Intel Version 8.0 compilers.
Intel Version 8.1 compilers.
Intel Version 9.0 compilers.
Intel Version 9.1 compilers.
mlib/intel/7.1
mlib/intel/8.0
mlib/intel/8.1
mlib/intel/9.0
mpi/hp/default
mpi/quadrics/default
pathscale/2.2/default
pathscale/2.4/default
pathscale/2.5/default
pbspro/default
pgi/5.1/default
pgi/5.2/default
pgi/6.0/default
pgi/6.1/default
pgi/default (default)
totalview/default
HP Math Library for Intel 7.1 compilers.
HP Math Library for Intel 8.0 compilers.
HP Math Library for Intel 8.1 compilers.
HP Math Library for Intel 9.0 compilers.
HP-MPI.
Quadrics MPI
Pathscale 2.2 Debugger
Pathscale 2.4 Debugger
Pathscale 2.5 Debugger
PBS Pro batch queuing.
PGI Version 5.1 compilers.
PGI Version 5.2 compilers.
PGI Version 6.0 compilers.
PGI Version 6.1 compilers.
PGI Version 6.0 compilers.
TotalView debugger.
xtools
Graphical performance utilities (see Chapter 7 (page 73))
See “Viewing Available Modulefiles” for information on how to find which modules are available
on your system. See “Viewing Loaded Modulefiles” to determine which of those modules are
currently loaded.
3.2 Supplied Modulefiles39
Each modulesupplies its own online help. See “Viewing Modulefile-Specific Help” for information
on how to view it.
3.3 Modulefiles Automatically Loaded on the System
The HP XC system does not load any modulefiles into your environment by default. However,
there may be modulefiles designated by your system administrator that are automatically loaded.
“Viewing Loaded Modulefiles”describes how you can determine what modulefiles are currently
loaded on your system.
You can also automatically load your own modules by creating a login script and designating
the modulefiles tobe loaded in the script. You can alsoadd or remove modules from their current
environment on a per-module basis, as described in “Loading a Modulefile”.
3.4 Viewing Available Modulefiles
Available modulefiles are modulefiles that have been provided with the HP XC system software
and are available for you to load. A modulefile must be loaded before it provides changes to
your environment, as described in the introduction to this section. You can view the modulefiles
that are available on the system by issuing the module avail command:
$ module avail
3.5 Viewing Loaded Modulefiles
A loaded modulefile is a modulefile that has been explicitly loaded in your environment by the
module load command. To view the modulefiles thatare currently loaded in yourenvironment,
issue the module list command:
$ module list
3.6 Loading a Modulefile
You can load a modulefile in to your environment to enable easier access to software that you
want to use by executing the module load command. You can load a modulefile for the current
session, or you can set up your environment to load the modulefile whenever you log in to the
system.
When loadinga modulefile, note thatcertain modulefiles cannot beloaded while other modulefiles
are currently loaded. For example, this can happen with different versions of the same software.
If a modulefile you are attempting to load conflicts with a currently loaded modulefile, the
modulefile will not be loaded and an error message will be displayed.
If you encounter a modulefile conflict when loading a modulefile, you must unload the conflicting
modulefile before you load the new modulefile. See “Modulefile Conflicts” (page 41) for further
information about modulefile conflicts.
3.6.1 Loading a Modulefile for the Current Session
You can load a modulefile for your current login session as needed. To do this, issue the module
load command as shown in the following example, which illustrates the TotalView modulefile
being loaded:
$ module load totalview
Loading a modulefile in this manner affects your environment for the current session only.
3.6.2 Automatically Loading a Modulefile at Login
If you frequently use one or more modulefiles that are not loaded when you log in to the system,
you can set up your environment to automatically load those modulefiles for you. A method for
doing this is to modify your shell startup script to include instructions to load the modulefile
automatically.
40Configuring Your Environment with Modulefiles
For example, if you wanted to automatically load the TotalView modulefile when you log in,
edit your shell startup script to include the following instructions. This example uses bash as
the login shell. Edit the ~/.bashrc file as follows:
# if the 'module' command is defined, $MODULESHOME
# will be set
if [ -n "$MODULESHOME" ]; then
module load totalview
fi
From now on, whenever you log in, the TotalView modulefile is automatically loaded in your
environment.
3.7 Unloading a Modulefile
In certain cases, you may find it necessary to unload a particular modulefile before you can load
another modulefile in to your environment, to avoid modulefile conflicts. See “Modulefile
Conflicts” for information about modulefile conflicts.
You can unload a modulefile by using the module unload command, as shown inthe following
example:
$ module unload ifort/8.0
Unloading a modulefile that is loaded by default makes it inactive for the current session only
— it will be reloaded the next time you log in.
3.8 Viewing Modulefile-Specific Help
You can view help information for any of the modulefiles on the HP XC system. For example,
to access modulefile-specific help information for TotalView, issue the module help command
as follows:
$ module help totalview
----------- Module Specific Help for 'totalview/default' -----------------
This loads the TotalView environment.
Version <default>
Modifies: MANPATH, PATH, TVDSVRLAUNCHCMD
NOTE:The term <default> is displayed whenever the modulefile originator does not use
version numbers to distinguish the modulefiles.
3.9 Modulefile Conflicts
Some modulefiles should not be loaded while certain other modulefiles are currently loaded.
This is especially true of modulefiles for different versions of the same software. For example,
the Intel C/C++ Version 8.0 compiler modulefile should not be loaded while the IntelC/C++ Version 8.1 compiler modulefile is loaded. A modulefile conflict occurs in this
situation.
The system displays an error message when you attempt to load a modulefile that conflicts with
one or more currently loaded modulefiles. For example:
$ module load ifort/8.0
ifort/8.0(19):ERROR:150: Module 'ifort/8.0' conflicts with the
currently loaded module(s) 'ifort/8.1'
In this example, a user attempted to load the ifort/8.0 modulefile. After the user issued the
command to load the modulefile, an error message occurred, indicating a conflict between this
modulefile and the ifort/8.1 modulefile, which is already loaded.
When a modulefile conflict occurs, unload the conflicting modulefile before loading the new
modulefile. In the previous example, you should unload the ifort/8.0 modulefile before
loading the ifort/8.1 modulefile. For information about unloading a modulefile, see
“Unloading a Modulefile” (page 41).
Note:
To avoid problems, HP recommends that you always unload one version of a modulefile before
loading another version.
3.10 Creating a Modulefile
If you download or install a software package into a private directory, you can create your own
(private) modulefile for products that you install by using the following general steps:
1.Create a private modulefiles directory.
2.Copy an existing modulefile (to use as a template), or copy the software's corresponding
default modulefile from under /opt/modules/modulefiles, into the private modulefiles
directory.
3.Edit and modify the modulefile accordingly.
4.Register the private directory with the module use command.
To install a variant of a product or package already on the system, copy the existing modulefile
for that product to an appropriate name, and edit it accordingly to accommodate the
newly-installed product variant.
To install a random product or package should look at the manpages for modulefiles, examine
the existing modulefiles, and create a new modulefile for the product being installed using
existing modulefiles as a template. To view modules manpages, load the modules modulefile
and then display the modulefile manpage:
$ module load modules
$ man modulefile
Read the manpages for modules so that you know how to create a directory for your private
modulefiles and how to use the module use <dirname> module command to use your private
modules.
42Configuring Your Environment with Modulefiles
4 Developing Applications
This chapter discusses topics associated with developing applications in the HP XC environment.
Before reading this chapter, you should you read and understand Chapter 1 “Overview of the
User Environment” and Chapter 2 “Using the System”.
This chapter addresses the following topics:
•“Application Development Environment Overview” (page 43)
•“Compilers” (page 44)
•“Examining Nodes and Partitions Before Running Jobs” (page 45)
•“Interrupting a Job” (page 45)
•“Setting Debugging Options” (page 45)
•“Developing Serial Applications” (page 45)
•“Developing Parallel Applications” (page 46)
•“Developing Libraries” (page 50)
4.1 Application Development Environment Overview
The HP XC system provides an application development environment that enables developing,
building, and running applications using multiple nodes with multiple cores. These applications
can be parallel applications using many cores, or serial applications using a single core.
The HP XC system is made up of nodes that are assigned one or more roles. Nodes with the login
role (login nodes) and nodes with the compute role (compute nodes) are important to the
application developer:
•Compute nodes run user applications.
•Login nodes are where you log in and interact with the system to perform such tasks as
executing commands, compiling and linking applications, and launching applications. A
login node can also execute single-core applications and commands, just as on any other
standard Linux system.
Applications are launched from login nodes, and then distributed and run on one or more
compute nodes.
The HP XC environment uses the LSF-HPC batch job scheduler to launch and manage parallel
and serial applications. When a job is submitted, LSF-HPC places the job in a queue and allows
it to run when the necessary resources become available. When a job is completed, LSF-HPC
returns job output, job information, and any errors. In addition to batch jobs, LSF-HPC can also
run interactive batch jobs and interactive jobs. An LSF-HPC interactive batch job is a batch job
that allows you to interact with the application, yet still take advantage of LSF-HPC scheduling
policies and features. An LSF-HPC interactive job is run without using the batch processing
features of LSF-HPC, but is dispatched immediately by LSF-HPC on the LSF execution host
node. LSF-HPC is described in detail in Chapter 10 “Using LSF-HPC”.
Regardless of whether an application is parallel or serial, or whether it is run interactively or as
a batch job, the general steps to developing an HP XC application are as follows:
1.Build the code by compiling and linking with the correct compiler. Note that compiler
selection, and set up of appropriate parameters for specific compilers, is made easier by the
use of modules.
2.Launch the application with the bsub, srun, or mpirun command.
The build and launch commands are executed from the node to which you are logged in.
HP Unified Parallel C Support
HP XC System Software provides support for the HP Unified Parallel C (UPC) application
development environment.
4.1 Application Development Environment Overview43
HP UPC is a parallel extension of the C programming language, which runs on both common
types of multiprocessor systems: those with a common global address space (such as SMP) and
those with distributed memory. UPC provides a simple shared memory model for parallel
programming, allowing data to be shared or distributed among a number of communicating
processors. Constructs are provided in the language to permit simple declaration of shared data,
distribute shared data across threads, and synchronize access to shared data across threads. This
model promises significantly easier coding of parallel applications and maximum performance
across shared memory, distributed memory, and hybrid systems.
See the following Web page for more information about HP UPC:
http://www.hp.com/go/upc
4.2 Compilers
You can use compilers acquired from other vendors on an HP XC system. For example, Intel
C/C++ and Fortran compilers for the 64-bit architecture and Portland Group C/C++ and Fortran
compilers on the CP4000 platform.
Intel, PGI, and Pathscale compilers are not supplied with the HP XC system.
You can use other compilers and libraries on the HP XC system as on any other system, provided
they contain single-core routines and have no dependencies on another message-passing system.
Table 4-1 displays the compiler commands for Standard Linux, Intel, and PGI compilers for the
C, C++, and Fortran languages.
Table 4-1 Compiler Commands
Standard
Linux
Intel
Intel
Intel
Intel
PGI
NotesCompilersType
FortranC++C
g77gcc++gcc
iforticcicc
iforticcicc
iforticcicc
efceccecc
pgf95, pgf77pgCCpgcc
All HP XC platforms.
The HP XC System Software supplies
these compilers by default.
Version 9.1compilers
For use on the Intel 64–bit platform.
Version 9.0 compilers
For use on the Intel 64–bit platform.
Version 8.0 compilers
For use on the Intel 64–bit platform.
Version 7.1 compilers
For use on the Intel 64–bit platform.
These compilers can be used but Intel
may not support them much longer.
For use on the CP4000 platform
Pathscale
4.2.1 MPI Compiler
The HP XC System Software includes MPI. The MPI library on the HP XC system is HP-MPI
Version 2.2.5.1.
44Developing Applications
pathCCpathcc
pathf95,
pathf90
For use on the CP4000 platform
pathf90 provided for backward
compatibility.
4.3 Examining Nodes and Partitions Before Running Jobs
Before launching an application, you can determine the availability and status of the system's
nodes and partitions. Node and partition information is useful to have before launching a job so
that you can launch the job to properly match the resources that are available on the system.
When invoked with no options, the SLURM sinfo command returns information about node
availability and partitions, along with other information:
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
lsf up infinite 4 down* n[12-15]
slurm* up infinite 2 idle n[10-11]
The previous sinfo output shows that there are two partitions on the system:
•One for LSF jobs
•One for SLURM jobs
The asterisk in the PARTITION column indicates the default partition. An asterisk in the STATE
column indicates nodes that are currently not responding.
See Chapter 9 “Using SLURM” for information about using the sinfo command. The SLURM
sinfo manpage also provides detailed information about the sinfo command.
4.4 Interrupting a Job
A job launched by the srun command can be interrupted by sending a signal to the command
by issuing one or more Ctrl/C key sequences. Signals sent to the srun command are
automatically forwarded to the tasks that it is controlling.
The Ctrl/C key sequence will report the state of all tasks associated with the srun command.
If the Ctrl/C key sequence is entered twice within one second, the associated SIGINT signal
will be sent to all tasks. If a third Ctrl/C key sequence is entered, the job will be terminated
without waiting for remote tasks to exit.
The Ctrl/Z key sequence is ignored.
4.5 Setting Debugging Options
In general, the debugging information for your application that is needed by most debuggers
can be producedby supplying the -g switch to the compiler.For more specific information about
debugging options, see the documentation and manpages associated with your compiler.
4.6 Developing Serial Applications
This section describes how to build and run serial applications in the HP XC environment. The
following topics are covered:
•“Serial Application Build Environment” (page 46) describes the serial application
programming model.
•“Building Serial Applications” (page 46) discusses how to build serial applications.
For further information about developing serial applications, see the following sections:
•“Debugging Serial Applications” (page 67) describes how to debug serial applications.
•“Launching Jobs with the srun Command” (page 91) describes how to launch applications
with the srun command.
•“Building and Running a Serial Application” (page 125) provides examples of serial
applications.
4.3 Examining Nodes and Partitions Before Running Jobs45
4.6.1 Serial Application Build Environment
You can build and run serial applications in the HP XC programming environment. A serial
application is a command or application that does not use any form of parallelism.
An example of a serial application is a standard Linux command, such as the ls or hostname
command. A serial application is basically a single-core application that has no communication
library calls such as MPI.
4.6.2 Building Serial Applications
This section discusses how to build serial applications on an HP XC system. Compiling, linking,
and running serial applications are discussed.
To build a serial application, you must be logged in to an HP XC node with the login role. Serial
applications are compiled and linked by invoking compilers and linkers.
You launch a serial application either by submitting it to LSF-HPC with the bsub command, or
by invoking the srun command to run it. The process is similar to launching a parallel
application, except that only one compute node core is used. To run on an compute node
processor, the serial application and any required dynamic libraries must be accessible from that
node. A serial application can also be tested locally by running it on the login node.
4.6.2.1 Compiling and Linking Serial Applications
Serial applications are compiled and linked by invoking compile and link drivers.
You can changecompilers by using modules. Forinformation about using modules, see “Overview
of Modules” (page 37).
As an alternative to using dynamic libraries, serial applications can also be linked to static libraries.
Often the -static option is used to do this.
For examples of building serial applications with the GNU C, GNU Fortran, and Intel C/C++,
and Intel Fortran compilers, see “Building and Running a Serial Application” (page 125).
4.7 Developing Parallel Applications
This section describes how to build and run parallel applications. The following topics are
discussed:
For further information about developing parallel applications in the HP XC environment, see
the following:
•“Launching Jobs with the srun Command” (page 91)
•“Debugging Parallel Applications” (page 67)
•Chapter “Advanced Topics” (page 117)
4.7.1 Parallel Application Build Environment
This section discusses the parallel application build environment on an HP XC system.
The HP XC parallel application environment allows parallel application processes to be started
and stopped together on a large number of application cores, along with the I/O and process
control structures to manage these kinds of applications.
The HP XC system software uses the HP-MPI distributed memory programming model for
building and running parallel applications. In addition to using HP-MPI for parallel application
development, OpenMP and Pthreads can be used in conjunction with HP-MPI or separately
under the HP XC system software. The section discusses these development tools as they relate
to the HP XC system.
46Developing Applications
4.7.1.1 Modulefiles
The basics of your working environment are set up automatically by your system administrator
during the installation of HP XC. However, your application development environment can be
modified by means of modulefiles, as described in “Overview of Modules”.
There are modulefiles available that you can load yourself to further tailor your environment to
your specific application development requirements. For example, the TotalView module is
available for debugging applications. “Overview of Modules” provides instructions on how to
list what modulefiles are available for you to load, and how load a modulefile.
If you encounter problems accessing tools or commands (and associated manpages), ensure that
required modules are loaded on your system. If necessary, load required modules yourself, as
described in “Overview of Modules”. Otherwise, contact your system administrator.
4.7.1.2 HP-MPI
With the HP XC system software, you can use the HP-MPI distributed memory programming
model for building and running parallel applications. In this model, all data is private to each
process. All intercore communication within a parallel application is performed through calls
to the HP-MPI message passing library. Even though support for applications that use a
shared-memory programming model is not available at this time, individual cores within an
application node can be used in the same application as separate HP-MPI tasks. Applications
that are based on MPI, and currently run on Linux (or Linux compatible) systems, can be easily
migrated to an HP XC cluster.
See the HP-MPI documentation for more information.
4.7.1.3 OpenMP
The OpenMP specification is a set of compiler directives that can be used to specify
shared-memory parallelism in Fortran and C/C++ programs. Both Intel and Portland Group
Fortran and C/C++ compilers support OpenMP.
Although OpenMP is designed for use on shared-memory architectures, OpenMP can be used
on an HP XC system within a node.
OpenMP can be used alone, or in conjunction with HP-MPI. For information about compiling
programs using OpenMP, see the OpenMP documentation.
4.7.1.4 Pthreads
POSIX Threads (Pthreads) is a standard library that programmers can use to develop portable
threaded applications. Pthreads can be used in conjunction with HP-MPI on the HP XC system.
Compilers from GNU, Intel and PGI provide a -pthread switch to allow compilation with the
Pthread library.
Packages that link against Pthreads, such as MKL, require that the application is linked using
the -pthread option. The Pthread option is invoked with the following compiler-specific
switches:
GNU
Intel
PGI
For example:
$ mpicc object1.o ... -pthread -o myapp.exe
-pthread
-pthread
-lpgthread
4.7.1.5 Quadrics SHMEM
The Quadrics implementation of SHMEM runs on HP XC systems with Quadrics switches.
SHMEM is a collection of high-performance routines (that support a distributed-memory model)
for data passing between parallel executables.
4.7 Developing Parallel Applications47
To compile programs that use SHMEM, it is necessary to include the shmem.h file and to use
the SHMEM and Elan libraries. For example:
$ gcc -o shping shping.c -lshmem -lelan
4.7.1.6 MPI Library
The MPI library supports MPI 1.2 as described in the 1997 release of MPI: A Message Passing
Interface Standard. Users should note that the MPI specification describes the application
programming interface, but does not specify the contents of the MPI header files, mpi.h and
mpif.h, that are included in the source code. Therefore, an MPI application must be recompiled
using the proper header files for the MPI library to which it is to be linked.
Parallel applications that use MPI for communication must include the HP XC infrastructure
libraries. MPI applications must be built with mpicc, mpic++, mpif77, or mpif90 utilities.
When an MPI application is launched, the user environment, including any MPI environment
variables that have been set, is passed to the application.
MPI profiling support is included in the HP XC MPI library, so you do not need to link with a
separate library to access the PMPI_xxx() versions of the MPI routines.
The HP XC cluster comes with a modulefile for HP-MPI. The mpi modulefile is used to set up
the necessary environment to use HP-MPI, such as the values of the search paths for header and
library files.
4.7.1.7 Intel Fortran and C/C++Compilers
You can use Intel Fortran compilers (Version 7.x and greater) on the HP XC cluster. However,
the HP XC cluster does not supply a copy of Intel compilers. Intel compilers must be obtained
directly from the vendor. See the Intel documentation for information about using these compilers.
4.7.1.8 PGI Fortran and C/C++ Compilers
You can use PGI Fortran 95, Fortran 77, and C/C++ compilers on the HP XC cluster. However,
the HP XC cluster does not supply a copy of PGI compilers. PGI compilers must be obtained
directly from the vendor. For information about using the PGI compilers, see the PGI
documentation.
4.7.1.9 GNU C and C++ Compilers
You can use the GNU C and C++ compilers on the HP XC cluster. The HP XC cluster supplies
copies of the GNU C and C++ compilers.
4.7.1.10 Pathscale Compilers
You can use the Pathscale EKOPath Version 2.1 Compiler Suite on the CP4000 platform only.
See the following Web site for more information:
http://www.pathscale.com/ekopath.html.
4.7.1.11 GNU Parallel Make
The GNU parallel Make command is used whenever the make command is invoked. GNU
parallel Make provides the ability to do a parallel Make; however, all compiling takes place on
the login node. Therefore, whether a parallel make improves build time depends upon how
many cores are on the login node and the load on the login node.
Information about using the GNU parallel Make is provided in “Using the GNU Parallel Make
Capability”.
For further information about using GNU parallel Make, see the make manpage. For additional
sources of GNU information, see the references provided in the front of this manual, located in
“About This Document”.
48Developing Applications
4.7.1.12 MKL Library
MKL is a math library that references pthreads, and in enabled environments, can use multiple
threads. MKL can be linked in a single-threaded manner with your application by specifying
the following in the link command:
•On the CP3000 and CP4000 platforms (as appropriate):
You can use the AMD Core Math Library (ACML library) on the CP4000 platform.
4.7.1.14 Other Libraries
Other libraries can be used as they would on any other system. However they must contain
single core routines and have no dependencies on another message passing system.
4.7.2 Building Parallel Applications
This section describes how to build MPI and non-MPI parallel applications on an HP XC system.
4.7.2.1 Compiling and Linking Non-MPI Applications
If you are building non-MPI applications, such as an OpenMP application for example, you can
compile and link them on an HP XC as you normally would, with standard header files and
switches.
4.7.2.2 Compiling and Linking HP-MPI Applications
This section provides some general information about how to build an HP-MPI application in
the HP XC environment.
Compiling and linking an MPI application on an HP XC system is performed by invoking the
HP-MPI compiler utilities. HP-MPI compiler utilities are scripts supplied by HP-MPI to make it
easier to invoke a compiler with the appropriate libraries and search paths. The HP-MPI compiler
utilities add all the necessary path locations and library specifications to the compile and link
steps that are required to build a HP XC parallel application. It is highly recommended that you
use the HP-MPI compiler utilities to compile and link your MPI application on an HP XC cluster,
rather than invoke a compiler directly.
The mpicc, mpic++, mpif90, and mpif77 MPI compiler commands are used to invoke the
HP-MPI compiler utilities that compile and link an MPI application. The mpicc and mpic++
commands invoke the drivers of the C and C++ compilers. The mpif77 and mpif90 commands
invoke the drivers of the Fortran 77 and Fortran 90 compilers.
Before you can compile and link an MPI program using the MPI compiler commands, the MPI
compiler utilities module (mpi) must be loaded by using the module load mpi command, or
you must arrange for them to be in your $PATH search list. The use of modules is described in
“Overview of Modules”.
4.7.2.3 Examples of Compiling and Linking HP-MPI Applications
The following examples show how to compile and link your application code by invoking a
compiler utility.
If you have not already loaded the mpi compiler utilities module , load it now as follows:
$ module load mpi
4.7 Developing Parallel Applications49
To compile and link a C application using the mpicc command:
$ mpicc -o mycode hello.c
To compile and link a Fortran application using the mpif90 command:
$ mpif90 -o mycode hello.f
In the above examples, the HP-MPI commands invoke compiler utilities which call the C and
Fortran compilers with appropriate libraries and search paths specified to build the parallel
application called hello. The -o specifies that the resulting program is called mycode.
4.8 Developing Libraries
This section discusses developing shared and archive libraries for HP XC applications. Building
a library generally consists of two phases:
•Compiling sources to objects
•Assembling the objects into a library
—Using the ar archive tool for archive (.a) libraries
—Using the linker (possibly indirectly by means of a compiler) for shared (.so) libraries.
For sufficiently small shared objects, it is often possible to combine the two steps.
A common technique is to build the archive library first, and then build the shared library from
the archive library (using the linker's -whole-archive switch).
For libraries that do not use HP-MPI, it is recommended that the sources be compiled with the
standard compilers (such as gcc), just as they would be on other UNIX-like platforms.
For libraries that do use HP-MPI, it is possible to use the HP-MPI compiler utilities (such as
mpicc) to compile the sources to objects. For example:
$ mpicc -c -g foo.c
To assemble an archive library, use the ar archive tool as you would on other UNIX-like platforms.
To assemble a shared library, use the linker (possibly indirectly by means of a compiler) as you
would on other UNIX-like platforms.
Once the library is built, it can be used to build applications, just as other libraries are used, for
both serial applications (with the standard compilers) and parallel applications (with the HP-MPI
compiler utilities).
Note that for shared libraries it is necessary to use LD_LIBRARY_PATH to include the directory
containing the shared library, just as you would on other UNIX-like platforms.
4.8.1 Designing Libraries for the CP4000 Platform
This section discusses the issues surrounding the design of libraries for CP4000 platform on the
HP XC system.
A user designing a library for use on an HP XC CP4000 platform can supply a 32-bit library
and/or a 64-bit library. HP recommends both, to provide flexibility and to make it easy to get
the 64-bit advantages locally, but be able to take the 32-bit variant to an x86-class machine or run
a 32-bit variant imported from an x86-class machine.
It is the library designer's responsibility to make sure 32-bit and 64-bit object files do not collide
during the build process. This can be done by "cleaning" object files from the directories between
builds, or (as is more common) maintaining separate directories for the different types of objects.
Separate directories alsomakes it easy to maintain production versions distinct from debuggable
versions.
Different compilers have different ways to select 32-bit or 64-bit compilations and links. Consult
the documentation for the compiler for this information.
For released libraries, dynamic and archive, the usual custom is to have a ../lib directory that
contains the libraries. This, by itself, will work if the 32-bit and 64-bit libraries have different
50Developing Applications
names. However, HP recommends an alternative method. The dynamic linker, during its attempt
to load libraries, will suffix candidate directories with the machine type. The HP XC system on
the CP4000 platform uses i686 for 32-bit binaries and x86_64for 64-bit binaries. HP recommends
structuring directories to reflect this behavior. Therefore, if your released directory structure has
a formsimilar to Figure 4-1, thenensure that the LD_LIBRARY_PATH has /opt/mypackage/lib
in it, which will then be able to handle both 32-bit and 64-bit binaries that have linked against
libmystuff.so.
Figure 4-1 Library Directory Structure
If you have an existing pattern using different names, HP recommends introducing links with
the above names. An example of this is shown in Figure 4-2.
Linking an application using the library (dynamic or archive) requires you to specify the
appropriate subdirectory, depending on whether the application is 32-bit or 64-bit.
For example, to build a 32-bit application, you might enter:
NOTE:There is no shortcut as there is for the dynamic loader.
52Developing Applications
5 Submitting Jobs
This chapter describes how to submit jobs on the HP XC system; it addresses the following topics:
•“Overview of Job Submission” (page 53)
•“Submitting a Serial Job Using LSF-HPC” (page 53)
•“Submitting a Parallel Job” (page 55)
•“Submitting a Parallel Job That Uses the HP-MPI Message Passing Interface” (page 56)
•“Submitting a Batch Job or Job Script” (page 60)
•“Submitting a Job from a Host Other Than an HP XC Host” (page 65)
•“Running Preexecution Programs” (page 65)
5.1 Overview of Job Submission
On an HP XC system, a job is submitted to LSF-HPC, which places the job in a queue and allows
it to run when the necessary resources become available. The LSF bsub command is the primary
method for submitting jobs on the HP XC system.
The format of the bsub command depends on the type of the job, as listed here:
•Serial job; that is, a job that runs on a single core
•Non-MPI parallel job
•HP-MPI parallel job
•A batch job script
The remaining sections describe how to submit a job for each of these job types.
The examples in this section submit the hostname command or a variation of a "hello,world" program. Most examples are run as interactive jobs to display the output.
The examples in this chapter are run on an HP XC system configuration in which
lsfhost.localdomain is the virtual IP name of the LSF execution host and nodes n[1-16]
are compute nodes in the lsf partition. All nodes contain 2 cores, providing 32 cores for use by
LSF-HPC jobs.
5.2 Submitting a Serial Job Using LSF-HPC
There are various methods for submitting a serial job on the HP XC system:
•Using the LSF bsub command alone.
•Using the LSF bsub command in conjunction with the SLURM srun command.
•Using the LSF bsub command in conjunction with the SLURM srun command and the
LSF-SLURM External Scheduler.
•Using the SLURM srun command alone.
These methods are explored in this section.
For most instances, the recommended method for submitting a job is to use the LSF bsub
command in conjunction with the SLURM srun command.
5.2.1 Submitting a Serial Job with the LSF bsub Command
Use the bsub command to submit a serial job on the LSF execution host using the following
format:
Use the bsub command's -I option to launch an interactive serial job.
5.1 Overview of Job Submission53
The srun command is only necessary to launch the job on the allocated node if the HP XC JOBSTARTER script is not configured to run a job on the compute nodes in the lsf partition.
The jobname parameter can be name of an executable or a batch script. If jobname is executable,
job is launched on LSF execution host node. If jobname is batch script (containing srun
commands), job is launched on LSF-HPC node allocation (compute nodes). LSF-HPC node
allocation is created by -n num-procs parameter, which specifies the number of cores the job
requests.
The SLURM srun job launch command is only needed if the LSF-HPC JOB_STARTER script is
not configured for the intended queue, but it can be used regardless of whether or not the script
is configured. You can use the bqueues command to confirm whether or not the JOB_STARTER
script exists; see bqueues(1) for information on the bqueues command.
Example 5-1 invoking the bsub command after which a job can be entered from the standard
input.
Example 5-1 Submitting a Job from the Standard Input
bsub
Example 5-2 shows the submission and launch of a serial interactive job and its output.
Example 5-2 Submitting a Serial Job Using LSF-HPC
$ bsub -I srun hostname
Job <20> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
n1
Example 5-3shows the submission of an interactive serial job without theSLURM srun command
and the job output.
Example 5-3 Submitting an Interactive Serial Job Using LSF-HPC only
$ bsub -I hostname
Job <73> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
n1
Example 5-4 uses the LSF-SLURM External Scheduler to submit a job to run on four cores on
two specific compute nodes.
Example 5-4 Submitting an Interactive Serial Job Using LSF-HPC and the LSF-SLURM External
Scheduler
$ bsub -I -n4 -ext "SLURM[nodelist=n[14,16]]" srun hostname
Job <9> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
n14
n14
n16
n16
5.2.2 Submitting a Serial Job Through SLURM Only
This section describes how to build a simple hello world application, called hw_hostname,
execute it on the login node, and launch it with the SLURM srun command.
54Submitting Jobs
The following is the C source code for this program; the file name is hw_hostname.c.
The following is the command line used to compile this program:
$ cc hw_hostname.c -o hw_hostname
NOTE:The following invocations of the sample hw_hostname program are run on a SLURM
non-root default partition, which is not the default SLURM partition for the HP XC system
software.
When run on the login node, it shows the name of the login node, n16 in this case:
$ ./hw_hostname
n16 says Hello!
When you use the srun command to submit this program, it runs on one of the compute nodes.
In this instance, it runs on node n13:
$ srun ./hw_hostname
n13 says Hello!
Submitting the same program again with the srun command may run this program on another
node, as shown here:
$ srun ./hw_hostname
n12 says Hello!
The srun can also be used to replicate the program on several cores. Although it is not generally
useful, it illustrates the point. Here, the same program is run on 4 cores on 2 nodes.
The output for this command could also have been 1 core on each of 4 compute nodes in the
SLURM allocation.
5.3 Submitting a Parallel Job
When submitting a parallel job, you can specify the use of HP-MPI. You can also opt to schedule
the job by using SLURM. Depending on which submission method you choose, read the
appropriate sections, as follows:
•“Submitting a Non-MPI Parallel Job” (page 55)
•“Submitting a Parallel Job That Uses the HP-MPI Message Passing Interface” (page 56)
•“Submitting a Parallel Job Using the SLURM External Scheduler” (page 57)
5.3.1 Submitting a Non-MPI Parallel Job
to submit a parallel job
Use the following format of the LSF bsub command to submit a parallel job that does not make
use of HP-MPI to an LSF-HPC node allocation (compute nodes). An LSF-HPC node allocation
is created by the -n num-procs parameter, which specifies the minimum number of cores the
job requests.
The -n num-procs parameter, which is required for parallel jobs, specifies the number of cores
requested for thejob. The num-procs parameter may be expressed asminprocs[,maxprocs]
where minprocs specifies the minimum number of cores and the optional value maxprocs
specifies the maximum number of cores.
The SLURM srun command is required to run jobs on an LSF-HPC node allocation. The srun
command is the user job launched by the LSF bsub command. SLURM launches the jobname
in parallel on the reserved cores in the lsf partition.
The jobname parameter is the name of an executable file or command to be run in parallel.
Example 5-5 illustrates a non-MPI parallel job submission. The job output shows that the job
“srun hostname” was launched from the LSF execution host lsfhost.localdomain, and
that it ran on 4 cores from the compute nodes n1 and n2.
Example 5-5 Submitting a Non-MPI Parallel Job
$ bsub -n4 -I srun hostname
Job <21> is submitted to default queue <normal>
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
n1
n1
n2
n2
You can use the LSF-SLURM external scheduler to specify additional SLURM options on the
command line. As shown in Example 5-6, it can be used to submit a job to run one task per
compute node (on SMP nodes):
Example 5-6 Submitting a Non-MPI Parallel Job to Run One Task per Node
$ bsub -n4 -ext "SLURM[nodes=4]" -I srun hostname
Job <22> is submitted to default queue <normal>
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
n1
n2
n3
n4
5.3.2 Submitting a Parallel Job That Uses the HP-MPI Message Passing Interface
Use the following format of the LSF bsub command to submit a parallel job that makes use of
HP-MPI:
bsub -n num-procs [bsub-options] mpijob
The bsub command submits the job to LSF-HPC.
The -n num-procs parameter, which is required for parallel jobs, specifies the number of cores
requested for the job.
See the mpirun(1) manpage for more information on this command.
The mpirun command's -srun option is required if the MPI_USESRUN environment variable
is not set or if you want to use additional srun options to execute your job.
56Submitting Jobs
The srun command, used by the mpirun command to launch the MPI tasks in parallel in the
lsf partition, determines the number of tasks to launch from the SLURM_NPROCS environment
variable that was set by LSF-HPC; this environment variable is equivalent to the number provided
by the -n option of the bsub command.
Any additional SLURM srun options are job specific, not allocation-specific.
The mpi-jobname is the executable file to be run. The mpi-jobname must be compiled with
the appropriate HP-MPI compilation utility. For more information, see the section titled Compilingapplications in the HP-MPI User's Guide.
Example 5-7 shows an MPI job that runs a hello world program on 4 cores on 2 compute
nodes.
Example 5-7 Submitting an MPI Job
$ bsub -n4 -I mpirun -srun ./hello_world
Job <24> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
Hello world!
Hello world! I'm 1 of 4 on host1
Hello world! I'm 3 of 4 on host2
Hello world! I'm 0 of 4 on host1
Hello world! I'm 2 of 4 on host2
You can use the LSF-SLURM External Scheduler option to add capabilities at the job level and
queue level by including several SLURM options in the command line. For example, you can
use it to submit a job to run one task per node, or to submit a job to run on specific nodes.
“LSF-SLURM External Scheduler” discusses this option.
Example 5-8 shows an MPI job that uses the LSF-SLURM External Scheduler option to run the
same hello world program on each of 4 compute nodes.
Example 5-8 Submitting an MPI Job with the LSF-SLURM External Scheduler Option
$ bsub -n4 -ext "SLURM[nodes=4]" -I mpirun -srun ./hello_world
Job <27> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
Hello world!
Hello world! I'm 1 of 4 on host1
Hello world! I'm 2 of 4 on host2
Hello world! I'm 3 of 4 on host3
Hello world! I'm 4 of 4 on host4
Some preprocessing may need to be done:
•If the MPI job requires the use of an appfile
•If there is another reason that prohibits the use of the srun command as the task launcher
This preprocessing should determine the node host names to which mpirun's standard task
launcher should launch the tasks. In such scenarios, you need to write a batch script; there are
several methods available for determining the nodes in an allocation. One method is to use the
SLURM_JOBID environment variable with the squeue command to query the nodes. Another
method is to use the LSF-HPC environment variables such as LSB_HOSTS and LSB_MCPU_HOSTS,
which are prepared by the HP XC job starter script.
5.3.3 Submitting a Parallel Job Using the SLURM External Scheduler
The LSF-SLURM external scheduler option provides additional capabilities at the job level and
queue level by allowing the inclusion of several SLURM options in the LSF-HPC command line.
5.3 Submitting a Parallel Job57
With LSF-HPC integrated with SLURM, you can use the LSF-SLURM External Scheduler to
specify SLURM options that specify the minimum number of nodes required for the job, specific
nodes for the job, and so on.
Note:
The SLURM external scheduler is a plug-in developed by Platform Computing Corporation for
LSF-HPC; it is not actually part of SLURM. This plug-in communicates with SLURM to gather
resource information and request allocations of nodes, but it is integrated with the LSF-HPC
scheduler.
The format of this option is shown here:
-ext "SLURM[slurm-arguments]"
The bsub command format to submit a parallel job to an LSF-HPC allocation of compute nodes
using the external scheduler option is as follows:
The slurm-arguments parameter can be one or more of the following srun options, separated
by semicolons, as described in Table 5-1.
Table 5-1 Arguments for the SLURM External Scheduler
FunctionSLURM Arguments
nodes=min[-max]
mincpus=ncpus
mem=value
tmp=value
constraint=feature
nodelist=list of nodes
exclude=list of nodes
contiguous=yes
Specifies the minimum and maximum number of nodes allocated to job.
The job allocation will contain at least the minimum number of nodes.
Specifies minimum number of cores per node. Default value is 1.
Specifies a minimum amount of real memory in megabytes of each node.
Specifies a minimum amount of temporary disk space inmegabytes of each
node.
Specifies a list of constraints. The list may include multiple features
separated by “&” or “|”. “&” represents ANDed, “|” represents ORed.
Requests a specific list of nodes. The job will at least contain these nodes.
The list may be specified as a comma-separated list of nodes or a range of
nodes
Requests that a specific list of hosts not be included in resource allocated
to this job. The list may be specified as a comma-separated list of nodes or
a range of nodes.
Requests a mandatory contiguous range of nodes.
The PlatformComputing LSF-HPC documentation provides more informationon general external
scheduler support.
Consider an HP XC system configuration in which lsfhost.localdomain is theLSF execution
host and nodes n[1-10] are compute nodes in the lsf partition. All nodes contain two cores,
providing 20 cores for use by LSF-HPC jobs.
Example 5-9 shows one way to submit a parallel job to run on a specific node or nodes.
58Submitting Jobs
Example 5-9 Using the External Scheduler to Submit a Job to Run on Specific Nodes
$ bsub -n4 -ext "SLURM[nodelist=n6,n8]" -I srun hostname
Job <70> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
n6
n6
n8
n8
In the previous example, the job output shows that the job was launched from the LSF execution
host lsfhost.localdomain, and it ran on four cores on the specified nodes, n6 and n8.
Example 5-10 shows one way to submit a parallel job to run one task per node.
Example 5-10 Using the External Scheduler to Submit a Job to Run One Task per Node
$ bsub -n4 -ext "SLURM[nodes=4]" -I srun hostname
Job <71> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
n1
n2
n3
n4
In the previous example, the job output shows that the job was launched from the LSF execution
host lsfhost.localdomain, and it ran on four cores on four different nodes (one task per
node).
Example 5-11 shows one way to submit a parallel job to avoid running on a particular node.
Example 5-11 Using the External Scheduler to Submit a Job That Excludes One or More Nodes
$ bsub -n4 -ext "SLURM[nodes=4; exclude=n3]" -I srun hostname
Job <72> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
n1
n2
n4
n5
This example runs the job exactly the same as in Example 5-10 “Using the External Scheduler to
Submit a Job to Run One Task per Node”, but additionally requests that node n3 is not to be
used to run the job. Note that this command could have been written to exclude additional nodes.
Example 5-12 launches the hostname command once on nodes n1 through n10 (n[1-10]):
Example 5-12 Using the External Scheduler to Launch a Command in Parallel on Ten Nodes
Use the following bsub command format to submit a batch job or job script:
bsub -n num-procs [bsub-options] script-name
The -n num-procs parameter, which is required for parallel jobs, specifies the number of cores
the job requests.
The script-name argument is the name of the batch job or script. The script can contain one
or more srun or mpirun commands.
The script will execute once on the first allocated node, and the srun or mpirun commands
within the script will be run on the allocated compute nodes.
In Example 5-14, a simple script named myscript.sh, which contains two srun commands,
is displayed then submitted.
$ bsub -I -n4 myscript.sh
Job <29> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
n2
n2
n4
n4
Hello world! I'm 0 of 4 on n2
Hello world! I'm 1 of 4 on n2
Hello world! I'm 2 of 4 on n4
Hello world! I'm 3 of 4 on n4
Example 5-15 runs the same script but uses the LSF-SLURM External Scheduler option to specify
different resources (here, 4 compute nodes).
60Submitting Jobs
Example 5-15 Submitting a Batch Script with the LSF-SLURM External Scheduler Option
$ bsub -n4 -ext "SLURM[nodes=4]" -I ./myscript.sh
Job <79> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
n1
n2
n3
n4
Hello world! I'm 0 of 4 on n1
Hello world! I'm 1 of 4 on n2
Hello world! I'm 2 of 4 on n3
Hello world! I'm 3 of 4 on n4
Example 5-16 and Example 5-17 show how the jobs inside the script can be manipulated within
the allocation.
Example 5-16 Submitting a Batch Job Script That Uses a Subset of the Allocation
$ bsub -n4 -ext "SLURM[nodes=4]" -I ./myscript.sh
Job <80> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
n1
n2
Hello world! I'm 0 of 2 on n1
Hello world! I'm 1 of 2 on n2
Example 5-17 Submitting a Batch job Script That Uses the srun --overcommit Option
$ bsub -n4 -I ./myscript.sh
Job <81> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>
n1
n1
n1
n1
n2
n2
n2
n2
Hello world! I'm 0 of 8 on n1
Hello world! I'm 1 of 8 on n1
Hello world! I'm 2 of 8 on n1
Hello world! I'm 3 of 8 on n1
Hello world! I'm 4 of 8 on n2
Hello world! I'm 5 of 8 on n2
Hello world! I'm 6 of 8 on n2
Hello world! I'm 7 of 8 on n2
Example 5-18 shows some of the environment variables that are available in a batch script. The
LSB_HOSTS and LSB_MCPU_HOSTSenvironment variables are definedin Platform LSF Reference.
The SLURM_JOBID and SLURM_NPROCS environment variables are defined in the SLURMReference Manual.
5.4 Submitting a Batch Job or Job Script61
Example 5-18 Environment Variables Available in a Batch Job Script
5.5 Submitting Multiple MPI Jobs Across the Same Set of Nodes
There are two ways to run multiple MPI jobs across the same set of nodes at the same time; they
are:
•Using a script
•Using a Makefile
The following sections show these methods. In both methods, the jobs submitted are parallel
jobs using the HP-MPI message passing interface and use the ping_pong_ring program, which
is delivered with the HP-MPI software.
5.5.1 Using a Script to Submit Multiple Jobs
You can write a script that consists of multiple commands that launch jobs. In this example, the
ping_pong_ring command is run first in the background then again in the foreground:
5.6 Submitting a Job from a Host Other Than an HP XC Host
To submit a job from a host other than an HP XC host to the HP XC system, use the LSF -R
option, and the HP XC host type SLINUX64 (defined in lsf.shared) in the job submission
resource requirement string. The necessary resource requirement string to submit a job from a
host other than an HP XC host is specified as follows:
-R "type=SLINUX64"
The following example shows this resource requirement string in an LSF command:
$ bsub -R "type=SLINUX64" -n4 -I srun hostname
5.7 Running Preexecution Programs
A preexecution program is a program that performs needed setup tasks that an application needs.
It may create directories, input files, and so on.
Though LSF-HPC daemons only run on a node with resource manager role, batch jobs can run
on any compute node that satisfies the scheduling and allocation requirements.
Where preexecution commands run depends on the type of job:
•For interactive jobs, preexecution commands run on the node where the sbatchd daemon
runs, typically, the resource manager node.
•For normalbatch jobs, preexecution commandsrun on the first node of the SLURM allocation.
Before starting a preexecution program, LSF-HPC sets the SLURM_JOBID environment variable.
To enable srun to launch preexecution on the first allocated node and other allocated nodes,
your preexecution program should pick up the SLURM_JOBID environment variable. The
SLURM_JOBID has the information LSF-HPC needs to run the job on the nodes required by your
preexecution program.
The following items provide the information you need to run the preexecution program on the
resource manager node, on the first allocated node, or on all the allocated nodes:
Torun a preexecution program on the resource
manager node:
To run a preexecution program on the first
allocated node:
To run a preexecution program on all allocated
nodes:
This is the default behavior. Run the preexecution program normally.
Your preexecution does not need to make use of the SLURM_JOBID
environment variable.
Use the SLURM srun -N 1 command. For example:
$ /opt/hptc/bin/srun -N 1 my_pre_exec
Use the SLURM srun directly without node options. For example:
$ /opt/hptc/bin/srun my_pre_exec
NOTE:Do not use the srun -b command (for SLURM batch mode) inside preexecution
programs. The srun -b command returns immediately after a SLURM batch job is submitted.
This can cause the preexecution program to exit with success while the real task is still running
in batch mode.
See the SLURM srun(1) manpage for more information about the srun command, its options,
and the SLURM_JOBID environment variable.
5.6 Submitting a Job from a Host Other Than an HP XC Host65
66
6 Debugging Applications
This chapter describes how to debug serial and parallel applications in the HP XC development
environment. In general, effective debugging of applications requires the applications to be
compiled with debug symbols, typically the -g switch. Some compilers allow -g with
optimization.
This chapter addresses the following topics:
•“Debugging Serial Applications” (page 67)
•“Debugging Parallel Applications” (page 67)
6.1 Debugging Serial Applications
You can debug a serial application on an HP XC system with the same method you would use
to debug a serial application on a conventional Linux operating system. For information about
debugging serial programs, see the standard Linux documentation.
The following serial debuggers are available for use in the HP XC environment for local
debugging:
•The gdb utility is provided with the standard Linux distribution; it performs line-mode
debugging of a single process.
•The idb utility is generally available with the Intel compiler suite.
•The pgdbg utility is generally available with the PGI compilers.
For information about using these debuggers, see the standard Linux documentation and the
documentation that is available with the specific debugger that you are using.
6.2 Debugging Parallel Applications
The following parallel debuggers are recommended for use in the HP XC environment are
TotalView and DDT.
TotalView
TotalView is a full-featured GUI debugger for debugging parallel applications from Etnus, Inc.
It is specifically designed to meet the requirements of parallel applications running on many
cores. The use of TotalView in the HP XC environment is described in “Debugging with
TotalView”. You can obtain additional information about TotalView from the TotalView
documentation and the TotalView Web site at:
http://www.etnus.com
Note:
TotalView is not included with the HP XC software and is not supported. If you have any
problems installing or using TotalView, contact Etnus, Inc.
DDT
DDT (Distributed Debugging Tool) is a parallel debugger from Streamline Computing. DDT is
a comprehensive graphical debugger designed for debugging parallel code. It gives users a
common interface for most compilers, languages and MPI distributions. For information about
using DDT, see the Streamline Computing documentation and the Streamline Computing Web
site:
TotalView is a full-featured, debugger based on GUI and specifically designed to fill the
requirements of parallel applications running on many cores.
You can purchase the TotalView debugger, from Etnus, Inc., for use on the HP XC cluster.
TotalView is not included with the HP XC software and technical support is not provided by
HP. Contact Etnus, Inc. for any issues with TotalView.
This section provides only minimum instructions to get you started using TotalView. Instructions
for installing TotalView are included in the HP XC System Software Installation Guide. Read the
TotalView documentation for full information about using TotalView; the TotalView
documentation set is available directly from Etnus, Inc. at the following URL:
http://www.etnus.com
6.2.1.1 SSH and TotalView
As discussed in “Using the Secure Shell to Log In” and “Enabling Remote Execution with
OpenSSH”, HP XC systems use the OpenSSH package in place of traditional commands like rsh
to provide more secure communication between nodes in the cluster. When run in a parallel
environment, TotalView expects to be able to use the rsh command to communicate with other
nodes, but the default HP XC configuration disallows this.
Set the TVDSVRLAUNCHCMD environment variable to specify an alternate command for TotalView
to use in place of rsh. When using the TotalView Modulefile, as described in “Setting Up
TotalView”, this variable is automatically set to /usr/bin/ssh -o BatchMode=yes. If you
manage your environmentindependently of the provided modulefiles, set this variable manually.
6.2.1.2 Setting Up TotalView
TotalView must be set up as described here:
1.Determine if TotalView is installed, and whether environment variables have been defined
for TotalView. Use the which or whereis to do so.
2.Determine if environment variables have been defined for TotalView. You can use the echo
for this.
3.Set the DISPLAY environment variable of the system that hosts TotalView to display on
your local system.
Also, run the xhosts command locally to accept data from the system that hosts TotalView;
see the X(7X) manpage for more information.
4.Edit the PATH environment variable to include the location of the TotalView executable.
Also add the location of the TotalView manpages to the MANPATH environment variable.
The following list summarizes some suggestions:
•Edit your login file or profile file to include the following commands:
module load mpi
module load totalview
•Set the PATH and MANPATH environment variables in your shell initialization file or
login file.
•Have your system administrator set up your environment so that the TotalView
modulefile loads automatically when you log in to the system.
•Adjust your environment manually before invoking TotalView.
See “Overview of Modules” for information on modulefiles.
Your administrator may have already installed TotalView and set up the environment for
you. In this case, skip the steps in this section and proceed to “Setting TotalView Preferences”,
which describes using TotalView for the first time.
68Debugging Applications
6.2.1.3 Using TotalView with SLURM
Use the following commands to allocate the nodes you need before you debug an application
with SLURM, as shown here:
$ srun -Nx -A
$ mpirun -tv -srun application
These commands allocate x nodes and run TotalView to debug the program named
application.
Be sure to exit from the SLURM allocation created with the srun command when you are done.
6.2.1.4 Using TotalView with LSF-HPC
HP recommends the use of xterm when debuggingan application with LSF-HPC. You also need
to allocate the nodes you will need.
You may need to verify the full path name of the xterm and mpirun commands:
First run a bsub command to allocate the nodes you will need and to launch an xterm window:
$ bsub -nx -ext "SLURM[nodes=x]" \
-Is /usr/bin/xterm
Enter an mpirun -tv command in the xterm window to start TotalView on the application
you want to debug:
$ mpirun -tv -srun application
6.2.1.5 Setting TotalView Preferences
You should set TotalView preferences the first time you invoke it. For example, you need to tell
TotalView how to launch TotalView processes on all the cores.
The TotalView preferences are maintained in a preferences file named
.totalview/preferences in your home directory, so you only need to set these preferences
once.
Follow these steps to set the TotalView preferences:
1.Invoke the TotalView debugger:
$ totalview
TotalView's main control window (called the TotalView Root Window) appears.
2.Select Preferences from the File pull-down menu.
A Preferences window opens.
3.Select (that is, click on) the Launch Strings tab in the Preferences window.
4.Ensure that the Enable single debug server launch button isselected in the LaunchStrings tab.
5.In the Launch Strings table, in the area immediately to the right of Command:, verify
that the default command launch string shown is as follows:
You may be able to obtain this setting by selecting the Defaults button; otherwise, you
need to enter this command launch string.
6.Select the Bulk Launch tab in the Preferences window.
Make sure that Enable debug server bulk launch is not selected.
7.Select the OK button at the bottom-left of the Preferences window to save these changes.
8.Exit TotalView by selecting Exit from the File pull-down menu.
The TotalView launch preferences are configured and saved. You can change this configuration
at any time.
6.2 Debugging Parallel Applications69
6.2.1.6 Debugging an Application
This section describes how to use TotalView to debug an application.
1.Compile the application to be debugged. For example:
$ mpicc -g -o Psimple simple.c -lm
Use the -g option to enable debugging information.
2.Run the application in TotalView:
$ mpirun -tv -srun -n2 ./Psimple
3.The TotalView main control window, called the TotalView root window, opens. It displays
the following message in the window header:
Etnus TotalView Version#
4.The TotalView process window opens.
This window contains multiple panes that provide various debugging functions and
debugging information. The name of the application launcher that is being used (either
srun or mpirun) is displayed in the title bar.
5.Set the search path if you are invoking TotalView from a directory that does not contain the
executable file and the source code. If TotalView is invoked from the same directory, you
can skip to step Step 6.
Set the search path as follows:
a.Select the File pull-down menu of the TotalView process window.
b.Select Search Path from the list that appears.
TotalView, by default, will now search for source and binaries (including symbol files) in
the following places and in the following order:
•Current working directory
•Directories in File⇒Search Path
•Directories specified in your PATH environment variable
6.Select the Go button in the TotalView process window. A pop-up window appears, asking
if you want to stop the job:
Process srun is a parallel job.
Do you want to stop the job now?
7.Select Yes in this pop-up window. The TotalView root window appears and displays a line
for each process being debugged.
If you are running Fortran code, another pop-up window may appear with the following
warning:
Sourcefile initfdte.f was not found, using assembler mode.
Select OK to close this pop-up window. You can safely ignore this warning.
8.You can now set a breakpoint somewhere in your code. The method to do this may vary
slightly between versions of TotalView. For TotalView Version 6.0, the basic process is as
follows:
a.Select At Location in the Action Point pull-down menu of the TotalView process
window.
b.Enter the name of the location where you want to set a breakpoint.
c.Select OK.
9.Select the Go button to run the application and go to the breakpoint.
Continue debugging as you would on any system. If you are not familiar with TotalView, you
can select on Help in the right-hand corner of the process window for additional information.
70Debugging Applications
6.2.1.7 Debugging Running Applications
As an alternative to the method described in “Debugging an Application”, it is also possible to
"attach" an instance of TotalView to an application which is already running.
1.Compile a long-running application as in “Debugging an Application”:
$ mpicc -g -o Psimple simple.c -lm
2.Run the application:
$ mpirun -srun -n2 Psimple
3.Start TotalView:
$ totalview
4.Select Unattached in the TotalView Root Window to display a list of running processes.
Double-click on the srun process to attach to it.
5.The TotalView Process Window appears, displaying information on the srun process.
Select Attached in the TotalView Root Window.
6.Double-click one of the remote srun processes to display it in the TotalView Process Window.
7.Now you should be able set breakpoints to debug the application.
6.2.1.8 Exiting TotalView
Make sure your job has completed before exiting TotalView. This may require that you wait a
few seconds from the time your job has completed until srun has completely exited.
If you exit TotalView before your job is completed, use the squeue command to ensure that
your job is not still on the system.
$ squeue
If it is still there, use the following command to remove all of your jobs:
$ scancel --user username
To cancel individual jobs, see the scancel manpage for information about selective job
cancellation.
6.2 Debugging Parallel Applications71
72
7 Monitoring Node Activity
This chapter describes the optional utilities that provide performance information about the set
of nodes associated with your jobs. It addresses the following topics:
•“Installing the Node Activity Monitoring Software” (page 73)
•“Using the xcxclus Utility to Monitor Nodes” (page 73)
•“Plotting the Data from the xcxclus Datafiles” (page 76)
•“Using the xcxperf Utility to Display Node Performance” (page 77)
•“Plotting the Node Performance Data” (page 79)
•“Running Performance Health Tests” (page 80)
7.1 Installing the Node Activity Monitoring Software
The node activity monitoring software is optional and therefore not installed by default with the
HP XC System Software.
Have your system administrator install the following xtools RPMs using the instructions in
the HP XC System Software Administration Guide:
•xtools-common-version.platform.rpm
•xtools-xc_clients-version.platform.rpm
where:
version
platform
Identifies the RPM.
Refers to the platform:
•ia64 denotes the CP6000 platform
•x86_64 denotes the CP3000 and CP4000 platforms
The location of the RPMs on the DVD is /mountpoint/HPC/extras/RPMS/
Where:
mountpoint
Is the directory where the DVD was mounted.
7.2 Using the xcxclus Utility to Monitor Nodes
The xcxclus utility is a graphic utility that enables you to monitor a number of nodes
simultaneously. The utility displays an array of icons. Each icon represents a node in the job
allocation and provides the following information:
•Ethernet activity
•Core utilization
•Memory utilization
•Interconnect I/O activity (for InfiniBand and Quadrics Elan interconnects only)
•Disk I/O activity (for nodes with attached disks)
When you invoke the xcxclus utility from the command line, it determines which nodes your
current jobs are using and opens a window that displays utilization metrics for those nodes. If
you are not running any jobs, the xcxclus utility terminates.
For your convenience, the data is color-coded with respect to the percent of utilization. For
example, red indicates 90% to 100% utilization; purple represents 0% to 10% utilization. The
color legend is displayed at the bottom of the window.
Figure 7-1 shows the xcxclus display for a job that uses nodes n6 and n7.
7.1 Installing the Node Activity Monitoring Software73
Figure 7-1 The xcxclus Utility Display
The icons show most node utilization statistics as a percentage of the total resource utilization.
For example, Figure 7-1 indicates that the CPU cores are almost fully utilized, at 94 per cent, and
95 per cent of available CPU time. These values are rounded to the nearest integer.
Selecting (that is, clicking on) an icon automatically invokes another utility, xcxperf, described
in “Using the xcxperf Utility to Display Node Performance” (page 77). It displays a graph of the
corresponding node's performance.
Figure 7-2 shows a node from Figure 7-1 with numeric designators for the individual portions.
Figure 7-2 The xcxclus Utility Display Icon
The following describes the format of the node icons:
74Monitoring Node Activity
1
The node designator is on the upper left of the icon.
2
The left portion of the icon represents the Ethernet connection or connections.
In this illustration, two Ethernet connections are used. The data for eth0 is above the data
for eth1.
3
The center portion of the icon displays the core usage data for each core in the node.
4
The right portion of the icon displays memory statistics.
5
The arrows at the top of the icon represent Interconnect activity.
NOTE:In Figure 7-2 (page 74), the values in the arrows for Interconnect activity are in
terms of percent of utilization. You have the option of displaying either megabytes per
second (rounded up to the next integer) or a percent of utilization.
6
The arrows at the bottom of the icon represent file I/O activity.
NOTE:In Figure 7-2 (page 74), the values in the arrows for disk I/O activity are in terms
of megabytes per second, and rounded up to the next integer.
Pointing to the icon with the mouse pointer opens a balloon that displays both the purpose of
that portion of the icon and detailed information corresponding to that area of the icon. This
information is also represented in the status bar at the bottom of the window. Figure 7-3 is an
example of this balloon display for Disk I/O writes on node n6.
Figure 7-3 Balloon Display in the xcxclus Utility
The xcxclus utility can generate a data file. The data for each monitored node is updated each
second. You can use this data file either to replay the data or to plot graphs of various node
performance statistics. For more information on plotting graphs with the xcxclus utility, see
“Plotting the Data from the xcxclus Datafiles” (page 76).
When you specify the -output option and a prefix, the xcxclus utility generates a data file
for all allocated nodes. The following example runs the xcxclus utility and stores the output
in a file, named test.xclus:
$ /opt/xtools/bin/xcxclus -output test
The following describes the menu options at the top of the xcxclus display window:
Exit..File
Refresh...Options
Modify Key...
Stops the xcxclus utility.
Opens a dialog box that enables you to set the refresh rate.
Opens a dialog box that enables you to change the utilization to color
mapping for all components.
7.2 Using the xcxclus Utility to Monitor Nodes75
CPU Info
Ethernet
Sys. Memory
Interconnect
ZoomView
Key
Node's
Values
Hold
Enables youto display core utilization in terms of user or system statistics,
or both.
Enables you to specify the data in terms of bandwidth or utilization.
Enables you to specify the data in terms of the memory used for the
application or the total memory used.
Enables you to specify the data in terms of bandwidth or utilization.
Enables you to show the xcxclus display at 50%, 75%, 100%, or 125% of
its size.
Turns off the display of the color key at the bottom of the display.
Enables you to designate the full node name or the node number.
Turns off the display of the utilization values in the node icons.
Enables you to suspend the display until this button is released.
When you invoke the xcxclus utility, specifying the data file you previously generated, the xcxclus
utility plays back the graphical display of the performance metrics stored in the data file. The
following plays back the display from the previous example:
$ /opt/xtools/bin/xcxclus test
For additional information on this utility, see xcxclus(1).
7.3 Plotting the Data from the xcxclus Datafiles
Use the xcxclus utility to graph the data. Figure 7-4 is a representation of the output from the
xcxclus utility for the DB.xclus data file. This data file is created when the xcxclus utility
is run with the -output DB option.
The following invokes the xcxclus utility to plot data:
$ /opt/xtools/bin/xcxclus -plot pfile DB
Do you wish to view the created plots now using the command
$ .../bin/gnuplot pfile (y/[n])? y
Then hit <ENTER> to continue...
Node n6 is one of the nodes in the job allocation. You need to specify a plot file (pfile in this
example) to contain the plot data.
The data for the first node in the allocation is plotted and displayed. Each time you press Enter,
the plotted data for the next node in the allocation is displayed. Continue pressing Enter until
all the node data is displayed.
76Monitoring Node Activity
Figure 7-4 Plotting the Data from the xcxclus Utility
The xcxclus utility uses the GNUplot open source plotting program.
See xcxclus(1) for more information on plotting cluster data.
7.4 Using the xcxperf Utility to Display Node Performance
The xcxperf utility provides a graphic display of node performance for a variety of metrics.
You can invoke the xcxperf utility with either of these methods:
•Enter the xcxperf utility on the command line.
If you enter xcxperf utility on a command line without any options, it displays the
performance metrics for the current node if you have an allocation on that node.
7.4 Using the xcxperf Utility to Display Node Performance77
Use the -node option to specify another node in your allocation, for example, enter the
following to display the performance metrics for node n6:
$ xcxperf -node n6
•Select a node icon in the xcxclus display.
The xcxperf utility displays a dynamic graph showing the performance metrics for the node.
As shown in Figure 7-5, the statistics are shown in a color-coded histogram. The pull-down
menus at the top of the window describe the color coding for a specific metric.
When you specify the -output (also expressed as -o) option and a prefix, the xcxperf utility
generates a data file for the node it monitors; this data file differs from those generated by the
xcxclus utility. The following example runs the xcxperf utility and stores the output in a file
named test.xperf:
$ xcxperf -o test
Figure 7-5 The xcxperf Utility Display
Specifying the data file prefix when you invoke the xcxperf utility from the command line
plays back the display according to the recorded data. The following command line plays back
the test.xperf data file:
$ xcxperf test
The following describes the menu options at the top of the xcxperf display window:
Exit..File
Hide/ShowOptions
System Information
CPU
Disk
78Monitoring Node Activity
Stops the xcxperf utility.
Enables you to hide or show the CPU, Disk, Ethernet, Interrupts, Socket, or
Memory data.
Opens an information dialog box that displays system information. See
Figure 7-6 (page 79) for an example of this dialog box.
Displays the color map for core utilization.
Displays the color map for disk utilization data.
Ethernet
Interrupts
ContextSwitch
Socket
Memory
Swap
Displays the color map for the data on interrupts.
Displays the color map for the data on interrupts.
Displays the color map for the data on context switches data.
Displays the color map for the data on sockets.
Displays the color map for the memory data.
Displays the color map for the swap data.
Figure 7-6 (page 79) is an example of the dialog box that you can open to list system information.
Figure 7-6 xcxperf System Information Dialog Box
For additional information about this utility, see xcxperf(1).
7.5 Plotting the Node Performance Data
The xcxperf utility enables you to graph the data from the datafiles generated with its -output
option. Figure 7-7 is a representation of the plotted output for the DB.n6.xperf data file, which
was created and plotted with the following commands:
$ /opt/xtools/bin/xcxperf DB.n6
$
$ /opt/xtools/bin/xcxperf -plot pfile DB.n6
Do you wish to view the created plots now using the command
$.../bin/gnuplot pfile.xperf.gnuplot (y/[n])? y
Node n6 is one of the nodes in the job allocation. You need to specify a plot file (pfile in this
example) to contain the plot data.
Figure 7-7 (page 80) is an example of the plotted output from the xcxperf utility.
7.5 Plotting the Node Performance Data79
Figure 7-7 Plotting Node Data from the xcxperf Utility
The xcxperf utility plots the performance data and displays as much data that can be shown
on one screen. Each time you press Enter, the xcxperf utility displays the next page of
performance data. Continue pressing Enter until all the node's performance data is displayed.
See xcxperf(1) for more information on plotting node data.
7.6 Running Performance Health Tests
You can run the ovp command to generate reports on the performance health of the nodes. Use
the following format to run a specific performance health test:
ovp [options] [-verify=perf_health/test]
Where:
optionsSpecify additional command line options for the test. The ovp --help
perf_health command lists the command line options for each test.
The following options apply to all the tests:
NOTE:Use the --opts= option to pass this option.
•The --nnodes=n option runs the test on n compute nodes.
•The --nodelist=nodelist option specifies the compute nodes to use.
80Monitoring Node Activity
test
NOTE:The --nodelist=nodelist option is particularly useful for
determining problematic nodes.
If you use this option and the --nnodes=n option, the --nnodes=n option is
ignored.
•The --queue LSF_queue option specifies the LSF queue for the performance
health tests.
Indicates the test to perform. The following tests are available:
cpu
Tests CPU core performance using the Linpack
benchmark.
cpu_usage
Tests CPU core usage. All CPU cores should be
idle during the test. This test reports a node if it
is using more than 10% (by default) of its CPU
cores.
The head node is excluded from this test.
memory
Uses the streams benchmark to test memory
performance.
memory_usage
Tests memory usage. This test reports a node that
uses more than 25 percent (by default) of its
memory.
network_stress
Tests network performance. Check network
performance under stress using the Pallas
benchmark's Alltoall, Allgather, and Allreduce
tests. These tests should be performed on a large
number of nodes for the most accurate results.
The default value for the number of nodes is 4,
which isthe minimum value that should be used.
The --all_group option allows you to select
the node grouping size.
network_bidirectional
Tests network performance between pairs of
nodes using the Pallas benchmark's Exchange
test.
network_unidirectional
Tests network performance between pairs of
nodes using the HP MPIping_pong_ring test.
NOTE:Except for the network_stress and network_bidirectional tests,
these tests only apply to systems that install LSF-HPC incorporated with SLURM.
The network_stress and network_bidirectional tests also function under
Standard LSF.
You can list the available tests with the ovp -l command:
$ ovp -l
Test list for perf_health:
cpu_usage
memory_usage
cpu
memory
7.6 Running Performance Health Tests81
network_stress
network_bidirectional
network_unidirectional
By default, the ovp command reports if the nodes passed or failed the given test. Use the ovp
--verbose option to display additional information.
The results of the test are written to a file in your home directory. The file name has the form
ovp_node_date[rx].log where node is the node from which the command was launched
and date is a date stamp in the form mmddyy. Subsequent test runs of the test are identified
with an r (for run) and a number; this prevents a log file from overwriting a previous one. The
--chdir= option enables you to designate a directory different from your home directory as
the execution directory.
The following is an example of the CPU usage test:
$ ovp --verify=perf_health/cpu_usage
XC CLUSTER VERIFICATION PROCEDURE
date time
Verify perf_health:
Testing cpu_usage ...
+++ PASSED +++
This verification has completed successfully.
A total of 1 test was run.
Details of this verification have been recorded in:
HOME_DIRECTORY/ovp_n16_mmddyy.log
The following example runs the same test but with the --verbose option to show additional
output.
The headnode is excluded from the cpu usage test.
Number of nodes allocated for this test is 14
Job <102> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>>
All nodes have cpu usage less than 10%.
+++ PASSED +++
This verification has completed successfully.
A total of 1 test was run.
82Monitoring Node Activity
Details of this verification have been recorded in:
HOME_DIRECTORY/ovp_n16_mmddyy.log
The following example tests the memory of nodes n11, n12, n13, n14, and n15. The --keep
option preserves the test data in a temporary directory.
Specified nodelist is n[11-15]
Number of nodes allocated for this test is 5
Job <103> is submitted to default queue <interactive>.
<<Waiting for dispatch ...>>
<<Starting on lsfhost.localdomain>>>
Detailed streams results for each node can be found in
Details of this verification have been recorded in:
HOME_DIRECTORY/ovp_n16_mmddyyr1.log
7.6 Running Performance Health Tests83
84
8 Tuning Applications
This chapter discusses how to tune applications in the HP XC environment.
8.1 Using the Intel Trace Collector and Intel Trace Analyzer
This section describes how to use the Intel Trace Collector (ITC) and Intel Trace Analyzer (ITA)
with HP-MPI on an HP XC system. The Intel Trace Collector/Analyzer were formerly known as
VampirTrace and Vampir, respectively. This section addresses the following topics:
•“Building a Program — Intel Trace Collector and HP-MPI” (page 85)
•“Running a Program – Intel Trace Collector and HP-MPI” (page 86)
•“Using the Intel Trace Collector and Intel Trace Analyzer” (page 85)
•“Visualizing Data – Intel Trace Analyzer and HP-MPI” (page 89)
The following Intel Trace Collector/Analyzer documentation is available on the HP XC system
after Intel Trace Collector/Analyzer is installed:
•Intel Trace Collector Users Guide — located on the HP XC system at:
8.1.1 Building a Program — Intel Trace Collector and HP-MPI
HP-MPI is MPICH compatible if you use the HP-MPI MPICH scripts which are located at:
/opt/hpmpi/bin. The HP-MPI MPICH scripts are:
•mpicc is replaced by mpicc.mpich
•mpif77 is replaced by mpif77.mpich
•mpirun is replaced by mpirun.mpich
In summary, mpixx becomes mpixx.mpich.
For further information, see the Intel Trace Collector Users Guide. This document is located on the
HP XC system at the following location:
<install-path-name>/ITC/doc/Intel_Trace_Collector_Users_Guide.pdf.
8.1 Using the Intel Trace Collector and Intel Trace Analyzer85
Example 8-1 The vtjacobic Example Program
For the purposes of this example, the examples directory under /opt/IntelTrace/ITC is
copied to the user's home directory and renamed to examples_directory.
In the cases where Intel compilers are used, add the -static-libcxa option to the link line.
Otherwise the following type of error will occur at run-time:
$ mpirun.mpich -np 2 ~/examples_directory/vtjacobic
~/examples_directory/vtjacobic:
error while loading shared libraries:
libcprts.so.6: cannot open shared object file:
No such file or directory
MPI Application rank 0 exited before MPI_Init() with status 127
mpirun exits with status: 127
In “The vtjacobic Example Program”, ~/examples_directory/vtjacobic represents the
file specification for the vtjacobic example program.
8.1.2 Running a Program – Intel Trace Collector and HP-MPI
Assuming that you have built your program using the -static-libcxa option, as discussed
in “Building a Program — Intel Trace Collector and HP-MPI”, you can launch the program with
the mpirun.mpich command, as shown in the following example.
Example 8-2 shows how to run the vtjacobic example program discussed in “Building a
Program — Intel Trace Collector and HP-MPI”.
Example 8-2 C Example – Running the vtjacobic Example Program
In Example 8-2, ~/examples_directory/vtjacobic represents the file specification for the
vtjacobic example program.
86Tuning Applications
8.2 The Intel Trace Collector and Analyzer with HP-MPI on HP XC
NOTE:The Intel Trace Collector (OTA) was formerly known as VampirTrace. The Intel Trace
Analyzer was formerly known as Vampir.
8.2.1 Installation Kit
The following are installation-related notes:
There are two installation kits for the Intel Trace Collector:
•ITC-IA64-LIN-MPICH-PRODUCT.4.0.2.1.tar.gz
•ITA-IA64-LIN-AS21-PRODUCT.4.0.2.1.tar.gz
The Intel Trace Collector is installed in the /opt/IntelTrace/ITC directory.
The Intel Trace Analyzer is installed in the /opt/IntelTrace/ITA directory.
The license file is located in the /opt/IntelTrace/ directory so both tools can find it.
8.2.2 HP-MPI and the Intel Trace Collector
This section contains information about building a program with OTA.
HP-MPI isMPICH-compatible if you use the following HP-MPI MPICH scripts, which are located
in the /opt/mpi/bin directory:
•mpicc is replaced by mpicc.mpich
•mpif77 is replaced by mpif77.mpich
•mpirun is replaced by mpirun.mpich
In summary: mpiXX becomes mpiXX.mpich
As an example, the examples directory under /opt/IntelTrace/ITC was copied to a home
directory and renamedto ITC_examples_xc6000. The GNU makefile makefile now resembles
the following:
When you use the Intel compilers, add-static-libcxa to the link line; otherwise, the following
errors are generated at runtime:
[n1]/nis.home/sballe/xc_PDE_work/ITC_examples_xc6000
>mpirun.mpich -np 2 ~/xc_PDE_work/ITC_examples_xc6000/vtjacobic
warning: this is a development version of HP-MPI for internal R&D use only
/nis.home/sballe/xc_PDE_work/ITC_examples_xc6000/vtjacobic:
error while loading shared libraries: libcprts.so.6:
cannot open shared object file: No such file or directory MPI Application
rank 0 exited before MPI_Init() with status 127 mpirun exits with
status: 127 [n1]/nis.home/sballe/xc_PDE_work/ITC_examples_xc6000 >
For specific information on the error Error While Loading Shared Libraries:
libcprts.so.6: cannot open shared object, see the following URL:
8.2 The Intel Trace Collector and Analyzer with HP-MPI on HP XC87
Running a Program
Ensure that the -static-libcxa flag is used when you use mpirun.mpich to launch a C or
Fortran program.
The following is a C example called vtjacobic:
# mpirun.mpich -np 2 ~/xc_PDE_work/ITC_examples_xc6000/vtjacobic
warning: this is a development version of HP-MPI for internal R&D use only
/nis.home/user_name/xc_PDE_work/ITC_examples_xc6000/vtjacobic: 100 iterations in
[0] Intel Trace Collector INFO: Writing tracefile vtjacobic.stf in
/nis.home/user_name/xc_PDE_work/ITC_examples_xc6000
mpirun exits with status: 0
The following is a Fortran example called vtjacobif:
# mpirun.mpich -np 2 ~/xc_PDE_work/ITC_examples_xc6000/vtjacobif
warning: this is a development version of HP-MPI for internal R&D use only
2 Difference is 0.390625000000000
4 Difference is 0.123413085937500
6 Difference is 6.341743469238281E-002
8 Difference is 3.945139702409506E-002
10 Difference is 2.718273504797253E-002
12 Difference is 1.992697400677912E-002
14 Difference is 1.520584760276139E-002
16 Difference is 1.192225932086809E-002
18 Difference is 9.527200632430437E-003
20 Difference is 7.718816241067778E-003
22 Difference is 6.318016878920021E-003
24 Difference is 5.211741863535576E-003
26 Difference is 4.324933536667125E-003
28 Difference is 3.605700997191797E-003
30 Difference is 3.016967266492488E-003
32 Difference is 2.531507385360385E-003
34 Difference is 2.128864474525351E-003
36 Difference is 1.793360334698915E-003
38 Difference is 1.512772311960036E-003
40 Difference is 1.277430422527046E-003
42 Difference is 1.079587281504879E-003
44 Difference is 9.129693228356968E-004
46 Difference is 7.724509294615510E-004
48 Difference is 6.538134083283944E-004
50 Difference is 5.535635456297787E-004
52 Difference is 4.687947140887371E-004
54 Difference is 3.970788908277634E-004
56 Difference is 3.363815146174385E-004
58 Difference is 2.849935053418584E-004
60 Difference is 2.414763917353628E-004
62 Difference is 2.046176064805586E-004
64 Difference is 1.733937801556939E-004
66 Difference is 1.469404085386082E-004
68 Difference is 1.245266549586746E-004
70 Difference is 1.055343296682637E-004
72 Difference is 8.944029434752290E-005
74 Difference is 7.580169395426893E-005
76 Difference is 6.424353519703476E-005
78 Difference is 5.444822123484475E-005
80 Difference is 4.614672291984789E-005
82 Difference is 3.911112299221254E-005
84 Difference is 3.314831465581266E-005
86 Difference is 2.809467246160129E-005
88 Difference is 2.381154327036583E-005
90 Difference is 2.018142964565221E-005
92 Difference is 1.710475838933507E-005
94 Difference is 1.449714388058985E-005
96 Difference is 1.228707004052045E-005
98 Difference is 1.041392661369357E-005
88Tuning Applications
[0] Intel Trace Collector INFO: Writing tracefile vtjacobif.stf in
/nis.home/user_name/xc_PDE_work/ITC_examples_xc6000
mpirun exits with status: 0
Running a Program Across Nodes (Using LSF)
The following is an example that uses the LSF bsub command to run the program named
vtjacobic across four nodes:
# bsub -n4 -I mpirun.mpich -np 2 ./vtjacobic
The license file and the OTC directory need to be distributed across the nodes.
8.3 Visualizing Data – Intel Trace Analyzer and HP-MPI
The Intel Trace Analyzer is used in a straightforward manner on an HP XC system, as described
in its standard documentation. For more information, see the Intel Trace Analyzer Users Guide.
This document is located on the HP XC system at the following location:
8.3 Visualizing Data – Intel Trace Analyzer and HP-MPI89
90
9 Using SLURM
HP XC uses the Simple Linux Utility for Resource Management (SLURM) for system resource
management and job scheduling.
This chapter addresses the following topics:
•“Introduction to SLURM” (page 91)
•“SLURM Utilities” (page 91)
•“Launching Jobs with the srun Command” (page 91)
•“Monitoring Jobs with the squeue Command” (page 92)
•“Terminating Jobs with the scancel Command” (page 93)
•“Getting System Information with the sinfo Command” (page 93)
•“Job Accounting” (page 94)
•“Fault Tolerance” (page 94)
•“Security” (page 94)
9.1 Introduction to SLURM
SLURM is a reliable, efficient, open source, fault-tolerant, job and compute resource manager
with features that make it suitable for large-scale, high performance computing environments.
SLURM can report on machine status, perform partition management, job management, and job
scheduling.
The SLURM Reference Manual is available on the HP XC Documentation CD-ROM and from the
following Web site:
http://www.llnl.gov/LCdocs/slurm/.
SLURM manpages are also available online on the HP XC system.
As a system resource manager, SLURM has the following key functions:
•Allocate exclusive and/or non-exclusive access to resources (compute nodes) to users for
some duration of time so they can perform work
•Provide a framework for starting, executing, and monitoring work (normally a parallel job)
on the set of allocated nodes
•Arbitrate conflicting requests for resources by managing a queue of pending work
“How LSF-HPC and SLURM Interact” describes the interaction between SLURM and LSF-HPC.
9.2 SLURM Utilities
You interact with SLURM through its command line utilities. The basic utilities are listed here:
•srun
•squeue
•scancel
•sinfo
•scontrol
For more information on any of these utilities, see the SLURM Reference Manual or the
corresponding manpage.
9.3 Launching Jobs with the srun Command
The srun command submits and controls jobs that run under SLURM management. The srun
command is used to submit interactive and batch jobs for execution, allocate resources, and
initiate job steps.
9.1 Introduction to SLURM91
The srun command handles both serial and parallel jobs.
The srun command has a significant number of options to control the execution of your
application closely. However, you can use it for a simple launch of a serial program, as
Example 9-1 shows.
Example 9-1 Simple Launch of a Serial Program
$ srun hostname n1
9.3.1 The srun Roles and Modes
The srun command submits jobs to run under SLURM management. The srun command can
perform manyroles in launching and managing your job. The srun command operates in several
distinct usage modes to accommodate the roles it performs.
9.3.1.1 The srun Roles
The options of the srun command allow you control a SLURM job by:
•Specifying the parallel environment for your job when you submit it, such as the number
of nodes to use, partition, distribution of processes among nodes, and maximum time.
•Controlling the behavior of your parallel job as it runs, such as by redirecting or labeling its
output, sending it signals, or specifying its reporting verbosity.
9.3.1.2 The srun Modes
The srun command has five distinct modes in which it can be used:
•Simple mode
•Batch mode
•Allocate mode
•Attach mode
•Batch (with LSF-HPC) mode
The SLURM Reference Manual describes the Simple, Batch, Allocate, and Attach modes.
You can submit a script to LSF-HPC that contains (simple) srun commands to execute parallel
jobs later. In this case, LSF-HPC takes the place of the srun -b option for indirect, across-machine
job-queue management.
9.3.2 Using the srun Command with HP-MPI
The srun command can be used as an option in an HP-MPI launch command. See Chapter
Chapter 5: Submitting Jobs for information about using srun with HP-MPI.
9.3.3 Using the srun Command with LSF-HPC
The srun command can be used in an LSF-HPC launch command. See Chapter Chapter 10:
Using LSF-HPC for information about using srun with LSF-HPC.
9.4 Monitoring Jobs with the squeue Command
The squeue command displays the queue of running and waiting jobs (or "job steps"), including
the JobID used for scancel), and the nodes assigned to each running job. It has a wide variety
of filtering, sorting, and formatting options. By default, it reports the running jobs in priority
order and then the pending jobs in priority order.
Example 9-2 reports on job 12345 and job 12346:
92Using SLURM
Example 9-2 Displaying Queued Jobs by Their JobIDs
$ squeue --jobs 12345,12346
JOBID PARTITION NAME USER ST TIME_USED NODES NODELIST(REASON)
12345 debug job1 jody R 0:21 4 n[9-12]
12346 debug job2 jody PD 0:00 8
The squeue command can report on jobs in the job queue according to their state; possible states
are: pending, running, completing, completed, failed, timeout, and node_fail.
Example 9-3 uses the squeue command to report on failed jobs.
Example 9-3 Reporting on Failed Jobs in the Queue
$ squeue --state=FAILED
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
59 amt1 hostname root F 0:00 0
9.5 Terminating Jobs with the scancel Command
The scancel command cancels a pending or running job or job step. It can also be used to send
a specified signal to all processes on all nodes associated with a job. Only job owners or
administrators can cancel jobs.
Example 9-4 terminates job #415 and all its jobsteps.
Example 9-4 Terminating a Job by Its JobID
$ scancel 415
Example 9-5 cancels all pending jobs.
Example 9-5 Cancelling All Pending Jobs
$ scancel --state=PENDING
Example 9-6 sends the TERM signal to terminate jobsteps 421.2 and 421.3.
Example 9-6 Sending a Signal to a Job
$ scancel --signal=TERM 421.2 421.3
9.6 Getting System Information with the sinfo Command
The sinfo command reports the state of partitions and nodes managed by SLURM. It has a
wide variety of filtering, sorting, and formatting options. The sinfo command displays a
summary of available partition and node (not job) information, such as partition names,
nodes/partition, and cores/node.
Example 9-7 Using the sinfo Command (No Options)
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
lsf up infinite 3 down* n[0,5,8]
lsf up infinite 14 idle n[1-4,6-7,9-16]
The node STATE codes in these examples may be appended by an asterisk character (*) ; this
indicates that the reported node is not responding. See the sinfo(1) manpage for a complete listing
and description of STATE codes.
9.5 Terminating Jobs with the scancel Command93
Example 9-8 Reporting Reasons for Downed, Drained, and Draining Nodes
HP XCSystem Software providesan extension to SLURM for job accounting. The sacct command
displays job accounting data in a variety of forms for your analysis. Job accounting data is stored
in a log file; the sacct command filters that log file to report on your jobs, jobsteps, status, and
errors. See your system administrator if job accounting is not configured on your system.
By default, only the superuser is allowed to access the job accounting data. To grant all system
users read access to this data, the superuser must change the permission of the jobacct.log
file, as follows:
# chmod a+r /hptc_cluster/slurm/job/jobacct.log
You can find detailedinformation on the sacct command and job accounting data in the sacct(1)
manpage.
9.8 Fault Tolerance
SLURM can handle a variety of failure modes without terminating workloads, including crashes
of the node running the SLURM controller. User jobs may be configured to continue execution
despite the failure of one or more nodes on which they are executing. The command controlling
a job may detach and reattach from the parallel tasks at any time. Nodes allocated to a job are
available for reuse as soon as the job(s) allocated to that node terminate. If some nodes fail to
complete job termination in a timely fashion because of hardware or software problems, only
the scheduling of those tardy nodes will be affected.
9.9 Security
SLURM has a simple security model:
•Any user of the system can submit jobs to execute. Any user can cancel his or her own jobs.
Any user can view SLURM configuration and state information.
•Only privileged users can modify the SLURM configuration, cancel other users' jobs, or
perform other restricted activities. Privileged users in SLURM include root users and
SlurmUser (as defined in the SLURM configuration file).
If permission to modify SLURM configuration is required by others, set-uid programs may
be used to grant specific permissions to specific users.
SLURM accomplishes security by means of communication authentication, job authentication,
and user authorization.
For further information about SLURM security features, see the SLURM documentation.
94Using SLURM
10 Using LSF-HPC
The Load Sharing Facility (LSF-HPC) from Platform Computing Corporation is a batch system
resource manager used on the HP XC system.
On an HP XC system, a job is submitted to LSF-HPC, which places the job in a queue and allows
it to run when the necessary resources become available. In addition to launching jobs, LSF-HPC
provides extensive job management and information capabilities. LSF-HPC schedules,launches,
controls, and tracks jobs that are submitted to it according to the policies established by the HP
XC site administrator.
Two types of LSF are available for installation on the HP XC:
•LSF-HPC
This product is the popular batch system produced by Platform Computing Corporation
that has become an industry standard. For full information about LSF-HPC, see the standard
LSF-HPC documentation set, which is described in the “Related Software Products and
Additional Publications” section of this manual. LSF-HPC manpages are also available
online on the HP XC system.
•LSF-HPC integrated with SLURM (LSF-HPC)
This product is the High Performance LSF product from Platform Computing Corporation
that has been integrated with SLURM to take advantage of SLURM's scalable, efficient
resource management and parallel job support.
The intended primary use of the HP XC system determined which of these LSF-HPC products
was installed.
This chapter introduces you to LSF-HPC integrated with SLURM in the HP XC environment. It
provides an overview of how LSF-HPC integrated with SLURM works, and discusses some of
the features and differences of LSF-HPC compared to LSF-HPC integrated with SLURM on an
HP XC system.
This chapter also contains an important discussion of how LSF-HPC and SLURM work together
to provide the HP XC job management environment. A description of SLURM is provided in
Chapter 9 (page 91).
This chapter addresses the following topics:
•“Information for LSF-HPC” (page 95)
•“Overview of LSF-HPC Integrated with SLURM” (page 96)
•“Differences Between LSF-HPC and LSF-HPC Integrated with SLURM” (page 98)
•“Job Terminology” (page 99)
•“Using LSF-HPC Integrated with SLURM in the HP XC Environment” (page 101)
•“Submitting Jobs” (page 101)
•“How LSF-HPC and SLURM Launch and Manage a Job” (page 102)
•“Determining the LSF Execution Host” (page 104)
•“ Determining Available System Resources” (page 104)
•“Getting Information About Jobs” (page 107)
•“Translating SLURM and LSF-HPC JOBIDs” (page 110)
•“Working Interactively Within an Allocation” (page 111)
•“LSF-HPC Equivalents of SLURM srun Options” (page 114)
10.1 Information for LSF-HPC
The information for LSF-HPC is provided in the LSF documentation. This documentation is on
the HP XC installation disk and manpages are online.
10.1 Information for LSF-HPC95
LSF-HPC is installed and configured on all nodes of the HP XC system by default. Nodes without
the compute role are closed with '0' job slots available for use.
The LSF environment is set up automatically for the user on login; LSF commands and their
manpages are readily accessible:
•The bhosts command is useful for viewing LSF batch host information.
•The bjobs command provides information on batch jobs.
10.2 Overview of LSF-HPC Integrated with SLURM
LSF-HPC was integrated with SLURM for the HP XC system to merge the scalable and efficient
resource management of SLURM with the extensive scheduling capabilities of LSF-HPC. In this
integration:
•SLURM manages the compute resources.
•LSF-HPC performs the job management.
SLURM extends the parallel capabilities of LSF-HPC with its own fast parallel launcher (which
is integrated with HP-MPI), full parallel I/O and signal support, and parallel job accounting
capabilities.
Managing the compute resources of the HP XC system with SLURM means that the LSF-HPC
daemons run only on one HP XC node and can present the HP XC system as a single LSF-HPC
host. As a result:
•All the nodes are configured as LSF-HPC Client Hosts; every node is able to access LSF-HPC.
You can submit jobs from any node in the HP XC system.
•The lshosts and bhosts commands only list one host that represents all the resources of
the HP XC system.
LSF-HPC integrated with SLURM obtains resource information about the HP XC system. This
information is consolidated and key information such as the total number of cores and the
maximum memory available on all nodes becomes the characteristics of the single HP XC “LSF
Execution Host”. Additional resource information from SLURM, such as pre-configured node
“features”, are noted and processed during scheduling through the external SLURM scheduler
for LSF-HPC.
Integrating LSF-HPC with SLURM on HP XC systems provides you with a parallel launch
command to distribute and manage parallel tasks efficiently. The SLURM srun command offers
much flexibility for requesting requirements across an HP XC system; for example, you can
request
•Request contiguous nodes
•Execute only one task per node
•Request nodes with specific features
This flexibility ispreserved inLSF-HPC through the external SLURM scheduler; this isdiscussed
in more detail in the section titled “LSF-SLURM External Scheduler” (page 102)
A SLURM partition named lsf is used to manage LSF-HPC jobs. Thus:
•You can view information about this partition with the sinfo command.
•The total number of cores listed by the lshosts and bhosts commands for that host should
be equal to the total number of cores assigned to the SLURM lsf partition.
When a job is submitted and the resources are available, LSF-HPC creates a properly-sized
SLURM allocation and adds several standard LSF environment variables to the environment in
which the job is to be run. The following two environment variables are also added:
96Using LSF-HPC
SLURM_JOBIDThis environment variable is created so that subsequent srun commands
make use of the SLURM allocation created by LSF-HPC for the job. This
variable can be used by a job script to query information about the SLURM
allocation, as shown here:
$ squeue --jobs $SLURM_JOBID
“Translating SLURM and LSF-HPC JOBIDs” describes the relationship
between the SLURM_JOBID and the LSF-HPC JOBID.
SLURM_NPROCS
This environment variable passes along the total number of tasks requested
with the bsub -n command to all subsequent srun commands. User scripts
can override this value with thesrun -n command, but the new value must
be less than or equal to the original number of requested tasks.
LSF-HPC regards the entire HP XC system as a “SLURM machine.” LSF-HPC gathers resource
information from SLURM and creates SLURM allocations for each job. As a consequence, every
LSF job has a corresponding SLURM JOBID.
For a parallel job, LSF-HPC allocates multiple nodes for the job, but LSF-HPC always runs the
batch script (or user command) on the first node. The batch script or the user command must
start its tasks in parallel. The srun command is the SLURM “parallel launcher” command.
HP-MPI uses the srun command through the mpirun -srun option.
Example 10-1 Examples of LSF-HPC Job Launch
The following individual examples are run on a 4-node cluster with 2 cores per nodes:
[lsfadmin@n16 ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
lsf up infinite 4 idle n[13-16]
This command line requests 4 cores, but runs the hostname command on the first node:
[lsfadmin@xc19n16 ~]$ bsub -n4 -I hostname
Job <110> is submitted to default queue <interactive>.
<<Waiting for dispatch...>>
<<Starting on lsfhost.localdomain>>
n13
The following command line requests 4 cores and uses the srun to run the hostname command
on all four:
[lsfadmin@n16 ~]$ bsub -n4 -I srun hostname
Job <111> is submitted to default queue <interactive>.
<<Waiting for dispatch...>>
<<Starting on lsfhost.localdomain>>
n13
n13
n14
n14
The followingcommand line requests 4 cores across all 4 nodes and runs the hostname command
on each node:
[lsfadmin@n16 ~]$ bsub -n4 -I -ext "SLURM[nodes=4]" srun hostname
Job <112> is submitted to default queue <interactive>.
<<Waiting for dispatch...>>
<<Starting on lsfhost.localdomain>>
n13
n14
n15
n16
It is possible to set your SSH keys to avoid password prompting so that you can use SSH-based
parallel launchers like the pdsh and mpirun commands. Use the LSB_HOSTS environment
variable to pass the list of allocated nodes to the launcher.
10.2 Overview of LSF-HPC Integrated with SLURM97
Example 10-2 Examples of Launching LSF-HPC Jobs Without the srun Command
The following bsub command line invokes the bash shell to run the hostname command with
the pdsh command:
[lsfadmin@n16 ~]$ bsub -n4 -I -ext "SLURM[nodes=4]" /bin/bash -c 'pdsh -w "$LSB_HOSTS" hostname'
Job <118> is submitted to default queue <interactive>.
<<Waiting for dispatch...>>
<<Starting on lsfhost.localdomain>>
n15: n15
n14: n14
n16: n16
n13: n13
The following command line uses the mpirun command to launch the hello_world example
program on one core on each of four nodes:
[lsfadmin@n16 ~]$ bsub -n4 -I -ext "SLURM[nodes=4]" mpirun -lsb_hosts ./hello_world
Job <119> is submitted to default queue <interactive>.
<<Waiting for dispatch...>>
<<Starting on lsfhost.localdomain>>
Hello world! I'm 0 of 4 on n13
Hello world! I'm 1 of 4 on n14
Hello world! I'm 2 of 4 on n15
Hello world! I'm 3 of 4 on n16
[lsfadmin@n16 ~]$"
The differences described in HP XC System Software documentation take precedence over
descriptions in the LSF documentation from Platform Computing Corporation. See “Differences
Between LSF-HPC and LSF-HPC Integrated with SLURM” and the lsf_diff(1) manpage for more
information on the subtle differences between LSF-HPC and LSF-HPC integrated with SLURM.
10.3 Differences Between LSF-HPC and LSF-HPC Integrated with SLURM
LSF-HPC integrated with SLURM for the HP XC environment supports all the standard features
and functions that LSF-HPC supports, except for those items described in this section, in “Using
LSF-HPC Integrated with SLURM in the HP XC Environment”, and in the HP XC release notes
for LSF-HPC.
•By LSF-HPC standards, the HP XC system is a single host. Therefore, all LSF-HPC “per-host”
configuration and “per-host” options apply to the entire HP XC system.
LSF-HPC integrated with SLURM knows only about the HP XC compute nodes through
SLURM, so any preference or resource request that is intended for the HP XC compute nodes
must go through LSF-HPC's external SLURM scheduler. See the “LSF-SLURM External
Scheduler” (page 102) for more details.
•LSF-HPC requires LSF daemons on every node. These daemons allow LSF-HPC to extract
detailed information from each node, which you display or use for scheduling. This
information includes CPU load, number of users, free memory, and so on.
When LSF-HPC is integrated with SLURM, it runs daemons only on one node in the HP XC
system. Therefore, it relies on SLURM for static resource information (that is, number of
CPUs, totalphysical memory, and any assigned SLURM “features”), and bases its scheduling
on that static information.
•LSF-HPC integrated with SLURM does not collect the following information from each node
in the HP XC system:
The lshosts and lsload commands display “-” for each of these items.
98Using LSF-HPC
— maxswap
— ndisks
— r15s
— r1m
— r15m
— ut
— pg
— io
— tmp
— swp
— mem load
•LSF-HPC integrated with SLURM only runs daemons on one node within the HP XC system.
This node hosts an HP XC LSF Alias, which is an IP address and corresponding host name
specifically established for LSF-HPC integrated with SLURM on HP XC to use. The HP XC
system is known by this HP XC LSF Alias within LSF.
Various LSF-HPC commands, such as lsid , lshosts, and bhosts, display HP XC LSF
Alias in their output. The default value of the HP XC LSF Alias, lsfhost.localdomain
is shown in the following examples:
$ lsid
Platform LSF HPC version number for SLURM, date stamp
Copyright 1992-2005 Platform Computing Corporation
My cluster name is hptclsf
My master name is lsfhost.localdomain
$ lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
lsfhost.loc SLINUX6 Opteron8 60.0 8 2007M - Yes (slurm)
$ bhosts
HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV
lsfhost.localdomai ok - 8 0 0 0 0 0
•All HP XC nodes are dynamically configured as “LSF Floating Client Hosts” so that you
can execute LSF-HPC commands from any HP XC node. When you do execute an LSF-HPC
command from an HP XC node, an entry in the output of the lshosts acknowledges the
node is licensed to run LSF-HPC commands.
In the following example, node n15 is configured as an LSF Client Host, not the LSF execution
host. This is shown in the output when you run lshosts command is run on that node:
The values for the type and model are UNKNOWN and the value for server is No.
$ lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
lsfhost.loc SLINUX6 Opteron8 60.0 8 2007M - Yes (slurm)
$ ssh n15 lshosts
HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES
lsfhost.loc SLINUX6 Opteron8 60.0 8 2007M - Yes (slurm)
n15 UNKNOWN UNKNOWN_ 1.0 - - - No ()
•The job-level run-timelimits enforced by LSF-HPC integratedwith SLURM are not supported.
•LSF-HPC integrated with SLURM does not support parallel or SLURM-based interactive
jobs in PTY mode (bsub -Is and bsub -Ip). However, after LSF-HPC dispatches a user
job on the HP XC system, you can use the srun or ssh command to access the job resources
directly accessible. For more information, see “Working Interactively Within an Allocation”.
•LSF-HPC integrated with SLURM does not support user-account mapping and
system-account mapping.
•LSF-HPC integrated with SLURM does not support chunk jobs. If a job is submitted to chunk
queue, SLURM lets the job pend.
•LSF-HPC integrated with SLURM does not support topology-aware advanced reservation
scheduling.
10.4 Job Terminology
The following terms are used to describe jobs submitted to LSF-HPC integrated with SLURM:
Batch jobA job submitted to LSF-HPC or SLURM that runs without any
I/O connection back to the terminal from which the job was
submitted. This job may run immediately, or it may run
10.4 Job Terminology99
sometime in the future, depending on resource availability and
batch system scheduling policies.
Batch job submissions typically provide instructions on I/O
management, such as files from which to read input and
filenames to collect output.
By default, LSF-HPC jobs are batch jobs. The output is e-mailed
to the user, which requires that e-mail be set up properly.
SLURM batch jobsare submitted with the srun -b command.
By default, the output is written to
$CWD/slurm-SLURMjobID.out from the node on which the
batch job was launched.
Use Ctrl-C at any time to terminate the job.
Interactive batch jobA job submitted to LSF-HPC or SLURM that maintains I/O
connections with the terminal from which the job was
submitted. The job is also subject to resource availability and
scheduling policies, so it may pause before starting. After
running, the job output displays on the terminal and the user
can provide input if the job allows it.
By default, SLURM jobs are interactive. Interactive LSF-HPC
jobs are submitted with the bsub -I command.
Use Ctrl-C at any time to terminate the job.
Serial jobA job that requests only one slot and does not specify any of
the following constraints:
•mem
•tmp
•mincpus
•nodes
Serial jobs are allocated a single CPU on a shared node with
minimal capacities that satisfies other allocation criteria.
LSF-HPC always tries to run multiple serial jobs on the same
node, one CPU per job. Parallel jobs and serial jobs cannot run
on the same node.
Pseudo-parallel jobA job that requests only one slot but specifies any of these
constraints:
•mem
•tmp
•nodes=1
•mincpus > 1
Pseudo-parallel jobs are allocated one node for their exclusive
use.
100 Using LSF-HPC
NOTE:Do NOT rely on this feature to provide node-level
allocation for small jobs in job scripts. Use the SLURM[nodes]
specification instead, along with mem, tmp, mincpus allocation
options.
LSF-HPC considers this job type as a parallel job because the
job requests explicit node resources. LSF-HPCdoes not monitor
these additional resources, so it cannot schedule any other jobs
to the node without risking resource contention. Therefore
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.