UNIX® is a registered trademark of The Open Group.
Linux® is a U.S. registered trademark of Linus Torvalds.
LSF, Platform Computing, and the LSF and Platform Computing logos are tradema
rks or registered trademarks of Platform
Computing Corporation.
Intel®, the Intel logo, Itanium®, Xeon™, and Pentium® are trademarks or registered trademarks of Intel Corporation in the United
States and other countries.
TotalView® is a registered trademark of Etnus, Inc.
Quadrics® is a registered trademark of Quadrics, Ltd.
Myrinet® and Myricom® are registered trademarks of Myricom, Inc.
Red Hat ® is a registered trademark of Red Hat Inc.
Confidential computer software. Validlicense from HP required for possessi
on, use, or copying. Consistent with FAR12.211 and
tion, and Technical Data for Commercial Items are licensed
to the U.S. Government under vendor’s standard commercial license.
The information contained herein is subject to change without notice. The only warranties for HP products and serv i ces are set forth
in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an
additional warranty. H P shall not be liable for technical or editorial errors or omissions contained herein.
About This Document
1Overview of the User Environment
1.1
1.1.1
1.1.2
1.1.3
1.1.4
1.1.5
1.1.6
1.2
1.2.1
1.2.2
1.2.3
1.2.3.1
1.2.3.2
1.2.3.3
1.2.3.4
1.2.3.5
1.3
1.3.1
1.3.2
1.4
1.4.1
1.4.2
1.4.3
1.4.4
1.5
System Architecture ............. ................................................... ..1-1
Operating System ..... ................................................ ...........1-1
Submitting a
Submitting
Submitting
RunninganM
RunninganM
Submitting
Directory
Recommend
Simple Lau
Displayin
Reporting
Killing a J
Cancelli
Sending a
Using the
Reportin
Comparis
a Non-MPIParallel Job ....................... ..........................
a Non-MPIParallel Jobto Run One Taskper Node ..................
a JobScript ................... ............................................
Submitting an HP-MPI Job ....................... ...................................7-14
Submitting an HP-MPI Job with a Specific TopologyRequest ........... ........7-14
Submitting a Batch Job Script ..... ..................................................7-15
Submitting a Batch Script with a SpecificTopologic Request ....................7-15
Submitting a Batch Job Script that uses a Subset of the Allocation ......... .....7-15
Submitting a Batch job Script That Uses the srun --overcommit Option .7-16
Useful Environme
Using the bjobs C
Using the bjobs C
Using the bhist C
Using the bhist
View Your Envir
View Your Allocation in SL
View Your Running Job in LS
View Job Details in LSF .. ................................................ ...........7-21
Running Jobs from an xter
Submitting an Interacti
Submitting an Interacti
Performing System Inter
Using TCP/IP over Gi gab
Using TCP/IP over Elan4
Allocating and Attachi
Allocating 12 Process
heduler to Submit a Job to Run on Specific Nodes .......
cheduler to Submit a Job to Run One Task per Node ......
cheduler to Submit a Job That Excludes One or More
ractive Serial Job ....... .........................................
nt Variables Available in a Batch Job Script ...................
This manual provides information about using the features and functions of the HP XC
System Software and describes how the HP XC user and programming environments differ
from standard Linux® system environments. In addition, this manual focuses on building
and running a pplications in the HP XC environment and is intended to guide an application
developer to take maximum advantage of HP XC features and functions by providing an
understanding of the underlying mechanisms of the HP XC programming environment.
An HP XC system is integrated with several open source software components. Some open
source software components are being used for underlying technology, and their deployment
is transparent. Some open source software components require HP XC-specific user-level
documentation, and that kind of information is included in this document, if required.
HP relies on the documentation provided by the op en source developers to supply the
information you need to use their product. For links to open source software documentation for
products that are integrated with your XC system, see Supplementary Information.
Documentation for third-party hardware and software components that are supported on the HP
XC system is supplied by the third-party vendor. However, information about the operation
of third-party software is included in this document if the functionality of the third -party
component differs f rom standard behavior when used in the XC environment. In this case, HP
XC documentation supersedes information supplied by the third-p a rty v endor. For l inks t o
related third-party Web sit es, see Supplementary Information.
Standard Linux® administrative tasks or the functions provided by standard Linux tools
and commands are documented in commercially available Linux reference manuals and on
various Web sites. For more information about obtaining documentation for standard Linux
administrative tasks and associated topics, see the list of Web sites and additional publications
provided in Related Inform a tion.
Intended Audience
This manual is intended for experienced Linux u sers who run applications developed by others,
and for experienced system or application d evelopers who develop, build, and run application
code on an HP XC system.
This manual assumes that the user understands, and has experience with, multiprocessor systems
and the Message Passing Interface (MPI), and is familiar with HP XC architecture and concepts.
Document Organization
This document is organized as follows:
hapter 1 provides an overv iew of the HP XC user, programming, and run-time
•C
nvironment.
e
•Chapter 2 describes h ow to perform common user tasks on the HP XC system.
•Chapter 3 describes how to build and run applications on the HP X C system.
hapter 4 describes how to debug applications on the HP XC system.
•C
•Chapter 5 describes how to better tune applications for the HP XC system.
•Chapter 6 describes how to use SLURM on the HP XC system.
apter 7 describes how to use LSF® on the HP X C system.
•Ch
•Chapter 8 describes how to use HP-MPI on the HP XC system.
About This Documentxi
•Chapter 9 describes how to use MLIB on the HP XC system.
•Appendix A provides examples of HP X C applications.
•TheGlossary provides definitions of the terms used in this manual.
HP XC Information
The HP XC System So ftwa re Documentation Set includes the following core documents. All
XC documents, except the HP XC System Software Release Notes,areshippedontheXC
documentation CD. All XC documents, including the HP XC System Software Release Notes,
are available on line at the following URL:
HP XC System Software Release NotesContains important, last-minute information about
HP XC Hardware Preparation GuideDescribes tasks specific to HP XC that are required to
HP XC System Software Installation GuideProvides step-by-step instructions for installing the HP
HP XC System Software Administration
Guide
HP XC System Software User’s GuideProvides an overview of managing the HP XC user
firmware, software, or hardware that might affect your
system. This document is only available on line.
prepare each supported cluster platform for installation
and configuration, including the specific placement of
nodes in the switches.
XC System Software on the head node and configuring
the system.
Provides an overview of the HP XC system
administration environment and describes cluster
administration tasks, node maintenance tasks, LSF®
administration tasks, and troubleshooting procedures.
environment with modules, managing jobs with LSF,
and how to build, run, debug, and troubleshoot serial
and parallel applications on an HP XC system.
The following documents are also provided by HP f or use with your HP XC system:
nux Administration Handbook
Li
A third-party Linux reference manual, Linux Administration Handbook, is sh ipped with the
HP XC System Software Documentation Set. This manual was authored by Evi Nemeth, Garth
Snyder,TrentR.Hein,etal(NJ:PrenticeHall,2002).
QuickSpecs for HP XC System Software
ovides a product overview, hardware requirements, software requirements, software licensing
Pr
nformation, ordering information, and information about commercially available s oftware that
i
as been qualified to interoperate with the HP XC System Software.
h
The QuickSpecs are located at the following URL:
The following URL provides pointers to tools that have been tested in the HP XC program
development environment (for example, TotalView® and other debuggers, compilers, and
so on):
HP Message Passing Interface (MPI) is an im plemen tation of the MPI standard for HP systems.
The home page is located at the following URL:
http://www.hp.com/go/mpi
HP Mathematical Library
The HP math libraries (MLIB) support application developers who are looking for ways to
speed up development of new applications and shorten the execution time of long-running
technical applications. The home page is located at the following URL:
http://www.hp.com/go/mlib
HP Cluster Platform Documents
The cluster platform documents describe site requirements, show you how to physically set up
the servers and additional devices, and provide procedures to operate and manage the hardware.
These documents are shipped with your hardware.
Documentation for the HP Integrity and HP ProLiant servers is available at the following URL:
http://www.docs.hp.com/
For More Information
The HP Web site has information on this product. You can access the HP Web site at the
following URL:
http://www.hp.com
Supplementary Information
This section contains links to third-party and open source components that are integrated
into the HP XC System Software core technology. In the XC documentation, except where
necessary, references to third-party and open source software components are generic, and the
XC adjective is not added to any reference to a third-party or open source command or product
name. For example, the SLURM srun command is simply referred to as the srun command.
The location of each Web site or link to a particular topic listed in this section is subject to
change without notice by the site provid e r.
•http://www.platform.com
Home pa ge for Platform Computing, th e developer of the Load Sharing Facility (LSF).
LSF, the batch system resou rce manager used on an XC system, is tightly integrated with
the HP XC and SLURM software.
For your convenience, the following Platform LSF documents are shipped on the HP
XC documentation CD in PDF format. The Platform LSF documents are also available
on the XC Web site.
-Administering Platform LSF
-Administration Primer
-Platform LSF Reference
-Quick Reference Card
-Running Jobs with Platform LSF
•http://www.llnl.gov/LCdocs/slurm/
Home page for the Simple Linux Utility for R esource Management (SLURM), which is
integrated with LSF to manage job and compute resources on an XC system.
About This Documentxiii
•http://www.nagios.org/
Home page for Nagios®, a system and network monitoring application. Nagios watches
specified hosts and services and issues alerts when problems occur and when problems are
resolved. Nagios provides the monitoring capabilities o n an XC system.
•http://supermon.sourceforge.net/
Home page fo r Supermon, a high-speed cluster monitoring system that emphasizes low
perturbation, high sampling rates, and an extensible data protocol and programming
interface. Supermon works in conjunction with Nagios to provide XC system m onitoring.
•http://www.llnl.gov/linux/pdsh/
Home page for t he parallel distributed shell (pdsh), which executes commands across XC
client nodes in parallel.
Home page for SystemImager®, w hich is the underlying technology that is used to install
the XC software, distribute the golden image, and distribute configuration changes.
•http://www.etnus.com
Home page for Etnus, Inc., maker of the TotalView parallel debugger.
•http://www.macrovision.com
•http://sourceforge.net/projects/modules/
•http://dev.mysql.com/
Manpages
Manpages provide online reference and command information from the command line.
Manpages are supplied with the HP XC system for standard HP XC components, Linux user
commands, LSF commands, and other software components that are distributed with the HP
XC system.
Manpages for third-party vendor software components m a y be provided as a part of the
deliverables for that component.
Using the discover
to display a manpage:
$ man discover
$ man 8 discover
Home page for Macrovision®, developer of the FLEXlm™ license management utility,
which is used for HP XC license management.
Home page for Modules, which provide for easy dynamic m odification of a user’s
environment through modulefiles, which typically instruct the module command to alter or
set shell environment variables.
Home page for MySQL AB, developer of the MySQL d atabase. This Web site contains a
link to the MySQL docu m entation, particularly the MySQL Reference Manual.
(8) manpage as an example, you can use either of the following commands
If you are not sure about a command you need to use, enter the man command with the -k
option to obtain a list of commands that are related to the keyword. For example:
# man -k keyword
xivAbout This Document
Related Information
This section provides pointers to the Web sites for related software products and provides
references to useful third-party publications. The location of each Web site or link to a particular
topic is subject to change without notice by the site provider.
RelatedLinuxWebSites
•http://www.redhat.com
Home page for Red Hat®, distributors of Red Hat Enterprise Linux Advanced Server, a
Linux distribution with which the HP XC operating environment is compatible.
•http://www.linux.org/docs/index.html
Home page for the Linux Documentation Project (LDP). This Web site contains guides
covering various aspects of working with Linux, from creating your own Linux system from
scratch to bash script writing. This site also includes links to Linux HowTo documents,
frequently asked questions (FAQs), and manpag es.
•http://www.linuxheadquarters.com
Web site providing documents and tutorials for the Linux user. Documents contain
instructions on installing and using applications for Linux, configuring hardware, and
a variety of other topics.
•http://linuxvirtualserver.org
Home page for the Linux Virtual Server (LVS), the l oad balancer running on the Linux
operating system that distributes login requests on the XC system.
•http://www.gnu.org
Home page for the GNU Project. This site provides online software and information for
many programs and utilities that are commonly used on GNU/Linux systems. Online
information include guides for using the bash shell, emacs, make, cc, gdb,andmore.
Related MPI Web Sites
•http://www.mpi-forum.org
Contains the official MPI standards documents, errata, and archives of the MPI Forum.
TheMPIForumisanopengroupwithrepresentatives from many organizations that define
and maintain the MPI standard.
•http://www-unix.mcs.anl.gov/mpi/
A comprehensive site containing general information, such as the specification and FAQs,
and pointers to a variety of other resources, including tutorials, implementations, and
other MPI-related sites.
Web site for general Intel software development information.
•http://www.pgroup.com/
Home page for The Portland Group™, supplier of the PGI® compiler.
Additional Publications
For m ore information about standard Linux system ad ministration or other related software
topics, refer to the following documents, which must be purchased separately:
About This Documentxv
•Linux Admin istration Unleashed, by Thomas Schenk, et al.
•Managing N FS and NIS, by Hal Stern, Mike Eisler, and Ricardo Labiaga (O’Reilly)
•MySQL, by Paul Debois
•MySQL Cookbook, by Paul Debois
•High Performance MySQL, by Jeremy Zawodny and Derek J. Balling (O’Reilly)
•Perl Cookbook, Second Edition, by Tom Christiansen and Nathan Torkington
•Perl in A Nutshell: A Desktop Quick Reference , by Ellen Siever, et al.
Typographical Conventions
Italic font
Courier font
Bold text
$ and #In command examples, a dollar sign ($) represents the system
Italic (slanted) font indicates the name of a variable that you can
replace in a command example or information in a display that
represents several possible values.
Document titles are shown in Italic font. For exam ple: LinuxAdministration Handbook.
Courier font represents text that is displayed by the computer.
Courier font also represents literal items, such as command
names, file names, routines, directory names, path names, signals,
messages, and programming language structures.
In command and interactive examples, bold text represents the
literal text that you enter. For example:
# cd /opt/hptc/config/sbin
In text paragraphs, bold text indicates a new term or a term that is
defined in t he glossary.
prompt for the bash shell and also shows that a user is in non-root
mode. A pound sign (#) indicates that the user is in root or superuser
mode.
[]
{ }In command syntax and examples, braces ({ }) indicate that
...
.
.
.
|In command syntax and examples, a pipe character ( | ) separates
xviAbout This Document
In command syntax and examples, brackets ([ ]) indicate that
the c ontents are optional. If the contents are separated by a pipe
character ( | ), you must choose one of the items.
the contents are required. If the con tents are separated by a pipe
character (|), you must choose one of the items.
In command syntax and examples, horizontal ellipsis points ( … )
indicate that the preceding element can be repeated as many times
as necessary.
In programming examples, screen displays, and command output,
vertical ellipsis points indicate an omission of information that does
not alter the meaning or affect the user if it is not shown.
items in a list of choices.
discover(8)
A cross-reference to a manpage includes the appropriate section
number in parentheses. For example, discover
you can find information on the discover command in Section 8
of the manpages.
(8) indicates that
Ctrl/x
EnterThe name of a keyboard key. Enter and Return both refer to the
Note
Caution
Warning
In interactive command examples, this symbol indicates that you
hold down the first named key while pressing the key or button that
follows the slash ( / ).
When it occurs in the body of text, the action of pressing two or
more keys is shown without the box. For example:
Press Ctrl/x to exit the application.
same key.
A note calls attention to information that is important to un derstand
before continuing.
A caution calls attention to important information that if not
understood or followed will result in data loss, data corruption, or
a system malfunction.
A warning calls atten tion to important in formation that if
not understood or followed will result in personal injury or
nonrecoverable system problems.
HP Encourages Yo ur Comments
HP welcomes your comments on this document. Please provide your comments and suggestions
at the following URL:
http://docs.hp.com/en/feedback.html
About This Documentxvii
Overview of the User Environment
The HP XC system is a collection of computer nodes, n etworks, storage, and software built into
a cluster that work together to present a single system. It is designed to maximize workload
and I/O performance, and provide efficient management of large, complex, and dynamic
workloads. The HP XC system provides a set of integrated and supported user features, tools,
and components w hich are described in this chapter.
This chapter briefly describes the components of the HP XC env iro nment. The following
topics are covered in this chapter:
•System architecture (Section 1.1)
•User environment (Section 1.2)
•Application development environment (Section 1.3)
The HP XC architecture is designed as a clustered system with single system tra its. From a
user perspective, this architecture achieves a single system view, providing capabilities such as
single user login, a single file system namespace, an integrated view of system resources, an
integrated program development environment, and an integrated job submission environment.
1.1.1 Operating System
The H P XC system is a high-performance compute cluster that runs H P XC Linux for High
Performance Com puting Version 1.0 (HPC Linux) as its software base. Any applications that
run correctly using R ed Hat Enterprise Linux Advanced Server Version 3.0 will also run
correctly using HPC Linux.
1.1.2 Node Specialization
The H P XC system is implemented as a sea-of-nodes. Each node in the system contains the
same software image on its local disk. There are two physical types of nodes in the system
—ahead node and client nodes.
head nodeThe node that is installed with the HP XC system software first — it
client nodesAll the other the nodes that make up the system. They are replicated
is used to generate other HP XC (client) nodes. The head node is
generally of interest only to the administrator of the HP XC system.
from the head node and are usually given one or more specialized
roles to perform various system functions, such as logging into the
system or running jobs.
The HP XC system allows for the specialization of clien t nodes to enable efficient and flexible
distribution of the workload. Nodes can be assigned o ne or more specialized roles that
determine how a particular node is used and what system services it provides. Of the many
Overview of the User Environment1-1
different roles that can be assigned to a client node, the following roles contain services that are
of special interest to the general user:
login role
compute role
1.1.3 Storage and I/O
The H P XC system supports both shared (global) and private (local) disks and file systems.
Shared file systems can be mounted on all the other nodes by means of Lustre or NFS. This
gives users a singl e view of all the shared data on disks attached to the HP XC system.
The role most visible to users is on nodes that have the login role.
Nodes with the login role are where yo u log in and interact with the
system to perform various tasks. For example, once logged in to a
node with login role, you can execute commands, build applications,
or submit jobs to compute nodes for execution. Th ere can be one or
several nodes with the login role in an HP XC system, depending
upon cluster size and req uiremen ts. Nodes with the login role are
a part of the Linux Virtual Server ring, which distributes login
requests from users. A node with the login role is referred to as
a login node in this m a nual.
The compute role is assigned to nodes where jobs are to b e
distributed and run. Although all nodes in the HP XC system are
capable of carrying out computations, the nodes with the compute
role are the primary nodes used to run jobs. Nodes with the compute
role become a part of the resource pool used by LSF-HPC and
SLURM, which manage and distribute the job workload. Jobs that
are submitted to com pute nodes must be launched from nodes
with the login role. Nodes with the compute role are referred to as
compute nodes in this manual.
SAN Storage
HP XC uses the HP StorageWorks Scalable File Share (HP StorageWorks SFS), which is based
on Lustre technology and uses the Lustre File System from Cluster File Systems, Inc. This is a
turnkey Lustre system that is delivered and supported by HP. It supplies access to Lustre file
systems through Lustre client-server protocols over various system interconnects. The HP XC
system is a client to the HP StorageWorks SFS server.
Local Storage
Local storage for each node holds the operating system, a copy of the HP XC system software,
and temporary space that can be used by jobs running on the node.
HP XC file systems are described in detail in S ection 1.1.4.
1.1.4 File System
Each node of the HP XC system has its own local copy of all the HP XC System Software files
including the Linux distribution and also has its own local user files. Every nod e may also im port
files from NFS or Lustre file servers. HP XC System Software supports NFS 3 including both
client and server functionality. H P XC System Software also enables Lustre client services for
high-performance and high-availability file I/O. These Lustre client services require the separate
installation of Lustre software, provided with the HP Storage Works Scalable File Share (SFS).
In the case of NFS files, these can be shared purely between the nodes of the HP XC System,
or alternatively can be shared between the HP XC and external systems. External NFS files
can be shared with any node having a direct external network connection. It is also possible to
set up NFS to import external files to HP XC nodes without external network connections, by
routing through a node with an external network connection. Your system administrator can
1-2Overview of the User Environment
choose to use either the HP XC Administrative Network, or t he XC system Interconnect, for
NFS operations. The HP XC system interconnect can potentially offer higher performance, but
only at the potential expense of the performance of application communications.
For high-perform ance or high-availability file I/O, the Lustre file system is available on HP
XC. The Lustre file system uses POSIX-compliant syntax and semantics. The HP XC System
Software includes kernel modifications required f or Lustre client services which enables the
operation of t he separately installable Lustre client softw are. The Lustre file server product used
on HP XC is the HP Storage Works Scalable File Share (SFS), which fully supports the HP XC.
The SFS includes HP XC Lustre client software. The SFS can be integrated with the HP XC so
that Lustre I/O is performed over the same high-speed system interconnect fabric used by the
HP XC . So, for example, if the HP XC system interconnect is based on a Quadrics QsNet II
switch, then the SFS will serv e files over ports on that switch. The file operations are able to
proceed at the full bandwidth of the HP XC system interconnect because these operations are
implemented dir ectly over the low-level communications libraries. Further optimizations of file
I/O can be achieved at the application level using special file system commands – implemented
as ioctls – which allow a program to interrogate the attributes of the file system, modify the
stripe size and other attributes of new (zero-length) files, and so on. Some of these optim izations
are implicit in the HP-MPI I/O library, w hich implements the MPI-2 file I/O standard.
File System Layout
In an HP XC system, the basic file system layout is the same as that of the Red Hat Advanced
Server 3.0 Linux file system.
The HP XC file system is structured t o separate cluster-specific files, base operating system
files, and user-installed software files. This allows for flexibility and ease of potential upgrades
of the system software as well as k eeping software from conflicting with user instal led software.
Files are segregated into the following types and locations:
•HP X C-specific software is located in /opt/hptc
•HP XC configuration data is located in /opt/hptc/etc
•Clusterwide directory structure (file system) is located in /hptc_cluster
You should be aware of th e following information about the HP XC file system layout:
•Open source software that by default would be installed under the /usr/local directory
is instead installed in the /opt/hptc directory.
•Software installed in the /opt/hptc directory is not intended to be updated by users.
•Software packages are installed in directories under the /opt/hptc directory under their
own names. The exception to this is 3rd-party software, which usually goes in /opt/r.
•There are four directories under the /opt/hptc directory that contain symbolic links
to files included in the packages:
-/opt/hptc/bin
-/opt/hptc/sbin
-/opt/hptc/lib
-/opt/hptc/man
Each package directory should have a directory corresponding to each of these directories
where every file has a symbolic link created in the /opt/hptc/ directory.
1.1.5 System Interconnect Network
The HP XC system interconnect provides high-speed connectivity for parallel applications. T he
system interconn ect network provides a high speed communications path used primarily for
user file service and for communications within user applications that are distributed among
Overview of the User Environment1-3
nodes of the system. The system interconnect network is a private network within the HP XC.
Typically, every node in the HP XC is connected to the system interconnect.
The HP XC system interconnect can be based on either Gigabit Ethernet or Myrinet-2000
switches. The types of system interconnects that are used on HP XC systems are:
•Myricom Myrinet on HP Cluster Platform 4000 (ProLiant/Opteron servers), also referred to
as XC4000 in this manual.
•Quadrics QsNet II on HP Cluster Platform 6000 (Integrity servers), also referred to as
XC6000 in this manual.
•Gigabit Ethernet on both XC4000 and XC6000
•InfiniBand on XC4000
1.1.6 Network Address Translation (NAT)
The HP XC system uses Network Address Translation (NAT ) to allow nodes in the HP XC
system that do not have direct external network connections to open outbound network
connections to external network resources.
1.2 User Environment
This section introduces some basic general information about logging in, configuring, and using
the HP XC environment.
1.2.1 LVS
The HP XC system uses the Linux Virtual Server (LVS) to present a single host name for user
logins. LVS is a highly scalable virtual server built on a system of real servers. By using LVS,
the architecture of the HP XC system is transparent to end users, and they see only a single
virtual server. This eliminates the need for users to know how the system is configured in
order to successfully log in and use the system. Any changes in the system configuration are
transparent to end users. LVS also provides load balancing across login nodes, which distributes
login requests to different servers.
1.2.2 Modules
The HP XC system provides the Modules Pack age (not to be confused with Linux kernel
modules) to configure and modify the user environment. The Mod ules Package enab les
dynamic m odificatio n of a user’s environment by means of modulefiles. Modulefiles provide
a convenient means for users to tailor their working environment as necessary. One of the
key features of modules is to allow multiple versions of the same software t o be used in
a controlled manner.
A modulefile contains information to configure the shell for an application. Typically, a
modulefile contains instru ctions that alters or sets shell environment variables, such as PATH
and MANPATH, to enable access to various installed software. Modulefiles may be shared by
many users on a system, and users may have their own coll ection to supplement or replace the
shared modulefiles.
Modulefiles can be loaded into the your environment automatically when you log in to the
system, or any time you need to alter the env iro nment. The HP XC system does not preload
modulefiles.
1.2.3 Commands
The HP XC user environment includes standard Linux commands, LSF commands, SLURM
commands, HP-MPI commands, and modules commands. This section provides a brief
overview of these comm and sets.
1-4Overview of the User Environment
1.2.3.1 Linux Commands
The HP XC system supports the use of standard Linux user commands and tools. Standard
Linux com m ands are not described in this document. You can access descriptions of Linux
commands in Linux documentation and manpages. Linux manpages are available by invo kin g
the Linux man command with the Linux command name.
1.2.3.2 LSF Commands
HP XC supports LSF-HPC and the use of standard LSF commands, some of which operate
differently in the HP XC environment from standard LSF behavior. The use of LSF commands
in the HP XC env iro nment is described in Chapter 7, and in the HP XC lsf_diff
manpage. Information about standard LSF commands is available in Platform Computing
Corporation LSF documentation, and in the LSF manpages. For your convenience, the HP
XC d ocumentation CD containsXC LSF manuals from Platform Computing. LSF manpages
are available on the HP XC system.
1.2.3.3 SLURM Commands
HP XC uses th e Simple Linux Utility for Resource Management (SLURM) for system r esource
management and job scheduling, and supports the use of standard SLURM commands. SLURM
functionality is described in Chapter 6. Descriptions of SLURM commands are available in the
SLURM manpages by invoking the man command with the SLURM command name.
1.2.3.4 HP-MPI Commands
HP XC supports th e HP-MPI software and the use of standard HP-MPI commands. Descriptions
of HP-MPI commands are available in the HP-MPI do c umentation, which is supplied with the
HP XC system software. HP-MPI m anpages are also available by invoking the man command
with the HP-MPI command name. HP-MPI functionality is d escrib e d in Chapter 8.
1.2.3.5 Modules Commands
The HP XC system supports the u se of standard Modules commands to load and unload
modulefiles that are used to configure and modify the user environment. Modules commands
are described in Section 2.2.
1.3 Application Development Environment
The HP XC system provides an environment that enables dev elop ing , building, and running
applications using multiple nodes with multiple processors. These applications can range from
parallel applications using many processors to serial applications using a single processor.
1.3.1 Parallel Applications
The HP XC parallel application development environment allows parallel application processes
to be started and stopped together on a large number of application processors, along with the
I/O and process control structures to manage these kinds of applications.
Full detai ls and examples of how to build, run, debug, and troubleshoot parallel applications
areprovidedinSection3.7.
1.3.2 Serial Applications
The HP XC serial application development environment supports building a nd running serial
applications. A serial application is a command or application that does not use any form
of parallelism.
Full details and examples of how to build, run, d ebug, and troubleshoot serial applications are
provided in Section 3.6.2.
Overview of the User Environment1-5
1.4 Run-Time Environment
In the HP XC environment, LSF-HPC, SLURM, and HP-MPI work together to provide a
powerful, flexible, extensive run-time environment. This section describes LSF-HPC, SLURM,
and HP-MPI, and how these components work togeth er to p rovide the HP XC run-time
environment.
1.4.1 SLURM
SLURM (Simple Linux Utility for Resource Management) is a resource management system
that is integrated into the HP XC system. SLURM is suitable for use on large and small
Linux clusters. It was d evelo ped by Lawrence Livermore National Lab and Linux Networks.
As a resource manager, SLURM allocates exclusive or non-exclusive access to resources
(application/compute nodes) for users to perform work, and provides a framework to start,
execute and monitor work (normally a parallel job) on the set of allocated nodes. A SLURM
system consists of two daemons, one co nfiguration file, and a set of commands and APIs. The
central controller daemon, slurmctld, maintai ns the global state and directs operations. A
slurmd daem on is deployed to each computing node and responds to job-related requests,
such as launching jobs, signalling, and terminating jobs. End users and system software (such
as LSF-HPC) communicate withSLURMbymeansofcommandsorAPIs—forexample,
allocating resources, launching parallel jobs on allocated resources, and killing running jobs.
SLURM group s compute nodes (the nod es where jobs are run) together into partitions.The
HP XC system can have one or several partitions. When HP XC is installed, a single partition
of compute nodes is created by default for LSF batch jobs. The system administrator has the
option of creatin g additional partitions. For example, another partition could b e created for
interactive jobs.
1.4.2 Load Sharing Facility (LSF-HPC)
The Load Sharing Facility for High Performance Computing (LSF-HPC) from Platform
Computing Corporation is a batch system resource manager that has been integrated w ith
SLURM for use on the HP XC system. LSF-HPC for SLURM is included with the HP XC
System Software, and is an integral part of theHP XC env ironment. LSF-HPC interacts with
SLURM to obtain and allocate available resources, and to launch and control all the jobs
submitted to LSF-HPC. LSF-HPC accepts, queues, schedules, dispatches, and controls all the
batch jobs that users subm it, according to policies and configurations established b y the HP
XC site administrator. On an HP XC system, LSF-HPC for SLURM is install ed and runs on
one HP XC node, known as the LSF-HPC execution host.
A complete description of LSF-HPC is provided i n Chapter 7. In addition, for your convenience,
the HP XC documentation CD contains LSF Version 6.0 manuals from Platform Computing.
1.4.3 How LSF-HPC and SLURM Interact
In the HP XC environment, LSF-HPC cooperates with SLURM to combine LSF-HPC’s
powerful schedu ling functionali ty with SLURM’s scalable parallel job launching capabilities.
LSF-HPC acts primarily as a workload scheduler on top of the SLURM system, providing
policy and topology-based scheduling for end users. SLURM provides an execution and
monitoring layer for LSF-HPC. LSF-HPC uses SLURM to detect system topology information,
make scheduling decisions, and launch jobs on allocated resources.
When a job is submitted to LSF-HPC, LSF-HPC schedules the job based on job resource
requirements and communicates with SLURM to allocate the required HP XC compute nodes
for the job from the SLURM lsf partition. LSF-HPC provides node-level scheduling for
parallel jobs, and CPU-level scheduling for serial jobs. Because of node-level scheduling, a
parallel job may be allocated more CPUs than it requested, depending on its resource request;
the srun or mpirun -srun launch com mands within the job still honor the original CPU
1-6Overview of the User Environment
request. LSF-HPC always tries to pack multiple serial jobs on the same node, with one CPU per
job. Parallel jobs and serial jobs cannot coexist on the same node.
After the LSF-HPC scheduler allocates the SLURM resources for a job, the SLURM allocation
information is recorded with the job. You can view this information with the bjobs and
bhist commands.
When LSF-HPC starts a job, it sets the SLURM_JOBID and SLURM_NPROCS environment
variables in the job environment. SLURM_JOBID associates the LSF-HPC job with SLURM’s
allocated resources. The SLURM_NPROCS environment variable is set to the originally
requested number of processors. LSF-HPC dispatches the job from the LSF-HPC execution
host, which is the same node on which LSF-HPC daemons run. The LSF-HPC JOB_STARTER
script, which is configured for all queues, uses the srun command to launch a user job on the
first node in the allocation. Your job can contain additional srun or mpirun comm ands to
launch tasks to al l nodes in the allocation.
While a jo b is running, all LSF-HPC-supported resource limits are enforced, including core
limit, cputime limit, data limit, file size limit, memory li m it and stack limit. When you kill a
job, LSF-HPC uses the SLURM scancel command to propagate the signal to the entire job.
After a job finishes, LSF-HPC releases all allocated resources.
A detailed description, along with an example and illustration, of how LSF-HPC and SLURM
cooperate to launch and manage jobs is provided in Section 7.1.4. It is highly recommended
that you review this information.
1.4.4 HP-MPI
HP-MPI is a high-performance implementation of the Message Passing
and is included with the HP XC system. HP-MPI uses SLURM to launch jo
system — however, it manages the global MPI exchange so that all pro
with each other.
HP-MPIcompliesfullywiththeMPI-1.2standard.HP-MPIalsocomplieswiththeMPI-2
standard, with some restrictions. HP-MPI provides an application programming interface and
software libraries to support parallel, message-passing applications th at are efficient, portable,
and flexible. HP-MPI version 2.1 is included in this release of HP XC.
HP-MPI 2.1 for HP XC is supported on XC4000 and XC6000 clusters, and includes support for
the following system interconnects:
1.5 Components, Tools, Compilers, Libraries, and Debuggers
This section pro vides a brief overview of the some of the common tools, compilers, libraries,
and debuggers supported for u se on HP XC.
An HP XC system is integrated with several op en source software co mp onents. HP XC
incorporates the Linux operating system, and its standard commands and tools, and does not
diminish the Linux ABI in any way. In addition, HP XC incorporates LSF and SLURM to
launch and manage jobs, and includes HP-MPI for high performance, parallel, message-passing
applications, and HP MLIB math library for intensive computations.
Most standard open source compilers and tools can be used on an HP XC system, however
they must be purchased separately. Several open source and commercially available software
packages have been tested with the HP XC Software. The following list shows some of the
software packages that have been tested for use with HP XC. This list provides an example
of w hat is available on HP XC and is not intended as a complete list. Note that some of the
packages listed are actually included as part of the HPC Linux distribution and as such are
Overview of the User Environment1-7
supported as part of the HP XC. The tested software packages include, but are not limited to,
the following:
•Intel Fortran 95, C, C++ Compiler Version 7.1 and 8.0, including OpenMP, for Itanium
(includes ldb debugger)
•gcc version 3.2.3 (included in t he HP XC distribution)
•g77 version 3.2.3 (included in the HP XC distribution)
•Portland Group PGI Fortran90, C, C++ Version 5.1, including OpenMP, for XC4000
•Quadrics SHMEM, as part of QsNet II user libraries, on Itanium systems connected with
the Quadrics QsNet II switch (included in the HP XC distribution)
•Etnus TotalView debugger Version 6.4
•gdb (part of the HP XC Linux distribution)
•Intel MKL V6.0 on Itanium
•AMD Math Core Library Version 2.0 on XC4000
•valgrind 2.0.0 (http://valgrind.kde.org) in 32-bit mode only
•oprofile 0.7.1 (http://oprofile.sourceforge.net)
•PAPI 3.2 (http://icl.cs.utk.edu/papi)
•Intel Visual Analyzer/Tracer (formally Pallas Vampir and Vampirtrace performa nce
analyzer ) on Itanium
•GNU make, including distributed parallel make (included in the HP XC distribution)
Other standard tools and libraries are available and can most likely be used on HP XC as they
would on any other standard Linux system. It should be noted, however, that software that is
not described in HP XC documentation may not have been tested with HP XC and may not
function in a standard manner.
1-8Overview of the User Environment
This chapter describes tasks and commands that the general user must know to use the system.
It contains the following topics:
•Loggingintothesystem(Section2.1)
•Setting up the user environment (Section 2.2)
•Launching and managing jobs (Section 2.3)
•Performing som e common user tasks (Section 2.4)
•Getting help (Section 2.5)
2.1 Logging in to the S ystem
Logging in to an HP XC sy stem is similar to logging in to any standard Linux system. Logins
are performed on nodes that have the login role. Secure Shell (ssh) is the preferred method
for accessing the HP XC system.
2.1.1 LVS Login Routing
2
Using the System
The HP XC system uses the Linux Virtual Server (LVS) facility to present a set of login nodes
with a single cluster name. When you log in to the system, LVS automatically routes your login
requesttoanavailableloginnodeonthesystem. LVS load balances login sessions across the
login nodes and improves the availability of login access. When you log in to the HP XC system,
you do not have to know specific node names to log in, only the HP XC system’s cluster name.
2.1.2 Using ssh to Log In
TologintoanHPXCsystem,youmustuseSecureShell(ssh). Typically, you access the HP
XC system using the ssh command to get a login shell or to execute commands. For example:
The ssh service also allows file tr ansfer using the scp or sftp commands over the same
port as ssh.
The typical r* UNIX comman
XC system by default becau
and password informatio
UNIX commands (as well as
If you want to use ssh without password prompting, you must set up ssh authentication keys.
Refer to the ssh
ssh is further discussed in Section 10.1.
(1) manpage for information about using ssh authentication keys.
ds, such as rlogin, rsh,andrcp,arenotinstalledonanHP
se of their inherent insecurity. The ssh comm and transfers all login
n in an encrypted form instead of the plaintext form used by the r*
telnet and ftp).
2.2 Configuring Your Environment with Modulefiles
The HP XC system supports the use of Modules software to make it easier to configure and
modify the your environment. Modules software enables dynamic modification of your
environment by the use of modulefiles. A modulefile contains information to configure the
shell for an application. Typically, a modulefile contains instructions that alters or sets shell
Using the System2-1
environment variables, such as PATH and MANPATH, to enable access to various installed
software.
One of the key features of usin g modules is to allow multiple versions of the same software to
be used in your environment in a controlled manner. For example, two different versions of the
Intel C compiler can be installed on the system atthesametime–theversionusedisbased
upon which I ntel C compiler modulefile is loaded.
The HP XC software provides a number of modulefiles. You can also create your own
modulefiles. Modulefiles may be shared by many users on a system, and users may have their
own collectio n of modulefiles to supplement or replace the shared mod ulefiles.
The following topics are addressed in the corresponding sections:
•Section 2.2.1 provides additional information on modulefiles.
•Section 2.2.2 discusses what modules are supplied.
•Section 2.2.3 discusses what modules are loaded by default.
•Section 2.2.4 discusses how to determine what modules are available.
•Section 2.2.5 discusses how to determine which modules are loaded.
•Section 2.2.6 discusses how to load a module.
•Section 2.2.7 discusses how to unload a module.
•Section 2.2.8 discusses module conflicts.
•Section 2.2.9 discusses creating your own module.
For further information about the Modules software supplied with the HP XC system, see
the Modules Web site at the following URL:
http://sourceforge.net/projects/modules/
2.2.1 Notes on Modulefiles
A mod ulefile does not provide configuration of your en vironment until it is explicitly loaded.
That is, the specific modulefile for a software product or application must be loaded in your
environment (with the module load command) before the configuration inform ation in
the modulefile is effective.
You or your system administrator can configure you r environment so that any desired
modulefiles are automatically loaded for you when you log in to the system. You can also load a
modulefile yourself, as described in Section 2.2.6.
The Modules software is initialized when you log in to the HP X C system. It provides access
to the co mmands that allow you to display information about modulefiles, load or unload
modulefiles, or view a list of available modulefiles.
Modulefiles do not affect packages other than their intended package. For ex ample, a modulefile
for a compiler will not adjust MPI_CC (the environment variable used by HP MPI to control
which compiler to use). A m odulefile for a compiler simply makes it easier to access that
particular co mpiler; it does not try to determine how the compiler will be used.
Similarly, a modulefile for HP MPI will not try to adjust LD_LIBRARY_PATH to correspond to
the compiler that the mpicc command uses. The modulefile for MPI simply makes it easier to
access the mpi** scripts and libraries. You can specify the compiler it uses through a variety of
mechanisms long after the modulefile is loaded.
The previous scenarios were chosen in particular because the HP MPI mpicc command
uses heuristics to try to find a suitable compiler when MPI_CC or other default-overriding
mechanisms are not in effect. It is possible that mpicc will choose a c ompiler incon sistent
with the most recently loaded compiler m odule. This could cause inconsistencies in the use
2-2Using the System
of shared objects. If you have multiple compilers (perhaps with incompatible shared objects)
installed, it is probably wise to set MPI_CC (and others) explicitly to the commands made
available by the compiler’s modulefile.
The contents of the m odulefiles in the modulefiles_hptc RPM use the vendor-intended
location of the installed software. In many cases, this is under the /opt directory, but in a few
cases (for example, the PGI compilers and TotalView) this is under the /usr directory.
If you install a software package other than the intended place, you must create or edit an
appropriate modulefile under the /opt/modules/modulefiles directory.
For the packages that install by default into the /usr directory (currently the PGI compilers and
TotalView), their corresponding modulefiles will try their vendor-intended location under the
/usr directory. If they do not find that directory, the packages will also search under the /opt
directory. Therefore, no changes to the modulefiles are needed if you want to install third-party
software consistently as the vendor intended or consistently under the /opt directory,
If the package is the stable product intended to be used by the site, editing an existing modulefile
is appropriate. While each modulefile has its unique characteristics, they all set some variables
describing the top-level directory, and editing to adjust the string should be sufficient. You may
need to repeat the adjustment if you update the modulefiles_hptc RPM or otherwise
rebuild your system.
If the package is a variant, for example, a beta version of a compiler, first copy the default
modulefile to a well-named copy, then edit the copy. You need root access to modify the
modulefiles, which is generally needed to install packages in either the /opt or /usr
directories.
If a user downloads a package into a private directory, the user can create a private modulefiles
directory. The user can then copy the corresponding default modulefile from under the
/opt/modules/modulefiles directory into a private modulefiles directory, edit the file,
and then register the directory with the module use command.
2.2.2 Supplied Modulefiles
The HP XC system provides the Modules Pack age (not to be confused with Linux kernel
modules) to configure and modify the user environment. The Mod ules Package enab les
dynamic m od ification of a user’s environment by means of modulefiles.
A modulefile contains inform a tion that alters or sets shell environment variables, such as PATH
and MANPATH. Mo dulefiles provide a convenient means for users to tailor their working
environment. Modulefiles can be loaded automatically when the user logs in to the system or
any time a user needs to alter the environment.
The HP XC System Software provides a number of modulefiles. In addition, users can also
create and load their own modulefiles to modify their environment further.
The HP XC system supplies the modulefiles listed in Table 2-1.
To use Intel C/C++ Version 8.0 compilers.
To use Intel C/C++ Version 8.1 compilers.
To use Intel Fortran Version 8.0 compilers.
To use Intel Fortran Version 8.1 compilers.
For Intel Version 7.1 compilers.
For Intel Version 8.0 compilers.
For Intel Version 8.1 compilers.
For MLIB and Intel Version 7.1 compilers.
For MLIB and Intel Version 8.0 compilers.
For MLIB and PGI Version 5.1 c ompilers.
For HP-MPI.
For PGI Version 5.1 compilers.
For PGI Version 5.2 compilers.
To use the Intel IDB debugger.
For the TotalView debugger.
2.2.3 Modulefiles Automatically Loaded on the System
HP XC system does not load any modulefiles into you r environment by default. However,
The
re may be mod ulefiles designated by your system administrator that are automatically
the
ded. Section 2.2.5 describes how you can determine what modulefiles are currently loaded
loa
our system.
on y
Users can also automatically load their own modules by creating a login script and designating
the modulefiles to be loaded in the script. Users can also add or remove modules from their
current environment on a per-module basis as described in Section 2.2.6.
2.2.4 Viewing Available Modulefiles
Available modulefiles are modulefiles that have been provided with the HP XC system software
and are available for you to load. A modulefile must be loaded before it provides ch anges to
your environment, as described in the in troduction to this section. You can view the modulefiles
that are available o n the system by issuing the module avail command:
$ module avail
2.5 Viewing Loaded Modulefiles
2.
A loaded m odulefile is a modulefile that has been explici tly loaded in your environment by
the module load command. To view the modu lefiles that are currently loaded in your
environment, issue the module list command:
odule list
$ m
2.2.6 Loading a Modulefile
u can load a modulefile in to your environment to enable easier access to software that you
Yo
nt to use by executing the module load command. You can load a modulefile for the
wa
rrent s ession, or you can set up your environment to load the modulefile whenever you
cu
gintothesystem.
lo
When loading a modulefile, note that certain modulefiles cannot be loaded while other
modulefiles are currently loaded. For example, this can happen with different versions of the
same software. If a modulefile you are attempting to load conflicts with a currently-loaded
modulefile, the modulefile will not be loaded and an error message will be displayed.
2-4Using the System
Loading...
+ 124 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.