HP XC System 2.x Software User Manual

HP XC System Software
User’s Guide
Part Number: AA-RWJVB-TE
June 2005
Product Version: HP XC Sys
This document provides information about the HP XC user and programming environment.
tem Software Version 2.1
Hewlett-Packard Company
lo Alto, California
© Copyright 2003–2005 Hewlett-Packard Development Company, L.P.
UNIX® is a registered trademark of The Open Group.
Linux® is a U.S. registered trademark of Linus Torvalds.
LSF, Platform Computing, and the LSF and Platform Computing logos are tradema
rks or registered trademarks of Platform
Computing Corporation.
Intel®, the Intel logo, Itanium®, Xeon™, and Pentium® are trademarks or registered trademarks of Intel Corporation in the United States and other countries.
TotalView® is a registered trademark of Etnus, Inc.
Quadrics® is a registered trademark of Quadrics, Ltd.
Myrinet® and Myricom® are registered trademarks of Myricom, Inc.
Red Hat ® is a registered trademark of Red Hat Inc.
Confidential computer software. Validlicense from HP required for possessi
12.212, Commercial Computer Software, Computer Software Documenta
on, use, or copying. Consistent with FAR12.211 and
tion, and Technical Data for Commercial Items are licensed
to the U.S. Government under vendor’s standard commercial license.
The information contained herein is subject to change without notice. The only warranties for HP products and serv i ces are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. H P shall not be liable for technical or editorial errors or omissions contained herein.
About This Document
1 Overview of the User Environment
1.1
1.1.1
1.1.2
1.1.3
1.1.4
1.1.5
1.1.6
1.2
1.2.1
1.2.2
1.2.3
1.2.3.1
1.2.3.2
1.2.3.3
1.2.3.4
1.2.3.5
1.3
1.3.1
1.3.2
1.4
1.4.1
1.4.2
1.4.3
1.4.4
1.5
System Architecture ............. ................................................... .. 1-1
Operating System ..... ................................................ ........... 1-1
NodeSpecialization .......................... ................................... 1-1
Storage and I/O .......... ................................................... ..... 1-2
FileSystem ..................... ................................................ .. 1-2
System InterconnectNetwork ....................................... ........... 1-3
Network Address Translation (NAT) ......................... ................. 1-4
UserEnvironment .......................................... .......................... 1-4
LVS ....................... ................................................... ..... 1-4
Modules ................................. ......................................... 1-4
Commands .......................................... ............................. 1-4
Linux Commands ......... .................................................. 1-5
LSF Commands .. ................................................... ........ 1-5
SLURM Commands ... ................................................... .. 1-5
HP-MPI Commands .............................. .......................... 1-5
Modules Commands .............. ......................................... 1-5
Application Development Environment ......... ................................... 1-5
ParallelApplications . ................................................... ........ 1-5
Serial Applications ...... ................................................ ........ 1-5
Run-Time Environment .................................... .......................... 1-6
SLURM ............................ ............................................... 1-6
Load Sharing Facility (LSF-HPC) ............................... .............. 1-6
How LSF-HPC and SLURM Interact ..................................... ..... 1-6
HP-MPI ............. ................................................... ........... 1-7
Components, Tools, Compilers, Libraries, and Debuggers ........................ 1-7
Contents
2 Using the System
2.1
2.1.1
2.1.2
2.2
2.2.1
2.2.2
2.2.3
2.2.4
2.2.5
2.2.6
2.2.6.1
2.2.6.2
2.2.7
2.2.8
2.2.9
2.2.10
Logging in to the System ................. ............................................ 2-1
LVS Login Routing ... ................................................ ........... 2-1
Using ssh to Log In ...................... ...................................... 2-1
Configuring Your Environment with Modulefiles .... ............................. 2-1
Notes on Modulefiles .................................. .......................... 2-2
SuppliedModulefiles .......... .................................................. 2-3
Modulefiles AutomaticallyLoaded on the System ........................... 2-4
Viewing Available Modulefiles.................... ............................. 2-4
Viewing Loaded Modulefiles .......... ......................................... 2-4
Loading aModulefile ....................................... .................... 2-4
Loading aModulefilefor the Current Session ........................... 2-5
Automatically Loadinga Modulefile at Login .... ....................... 2-5
Unloading a Modulefile ......................... ................................ 2-5
Modulefile Conflicts ........................................... ................. 2-5
Creatinga Modulefile .......................................... ................. 2-6
Viewing Modulefile-Specific Help ......................... .................... 2-6
ontents iii
C
2.3
2.3.1
2.3.2
2.3.3
2.3.4
2.3.5
2.3.5.1
2.3.5.2
2.3.5.3
2.3.5.4
2.3.6
2.3.7
2.3.8
2.4
2.4.1
2.4.2
2.5
Launching andManaging JobsQuick Start ........................... .............. 2-7
Introduction ............................. ......................................... 2-7
Getting Information About Queues ............ ................................ 2-7
Getting Information About Resources ................................. ........ 2-7
Getting Information About the System’sPartitions .... ....................... 2-8
Launching Jobs ................... ............................................... 2-8
Submitting a Serial Job ................. ................................... 2-8
Submitting a Non-MPIParallel Job ......................... .............. 2-9
Submitting an MPI Job ........................... .......................... 2-10
Submitting a Batch Job or Job Script ............................. ........ 2-11
Getting Information About Your Jobs ............... .......................... 2-12
Stoppingand Suspending Jobs ......................................... ........ 2-12
Resuming Suspended Jobs ............................... ....................... 2-12
Performing Other Common User Tasks ....... ...................................... 2-12
Determining the LSF Cluster Name and LSF Execution Host ............... 2-12
Installing Third-Party Software ......................... ....................... 2-12
Getting System Help andInformation ................................... ........... 2-12
3 Developing Applications
3.1
3.2
3.2.1
3.2.2
3.2.3
3.2.4
3.2.5
3.3
3.4
3.5
3.6
3.6.1
3.6.1.1
3.6.2
3.6.2.1
3.7
3.7.1
3.7.1.1
3.7.1.2
3.7.1.3
3.7.1.4
3.7.1.5
3.7.1.6
3.7.1.7
3.7.1.8
3.7.1.9
3.7.1.10
3.7.1.11
3.7.1.1
3.7.1.1
3.7.1.1
3.7.1.1
3.7.2
3.7.2.1
Overview ......................... ................................................... .. 3-1
Using Compilers ......................................... ............................. 3-2
StandardLinux Compilers ...................... ................................ 3-2
Intel Compilers ........... ................................................... ..... 3-2
PGI Compilers .. ................................................... .............. 3-2
Pathscale Compilers .................... ......................................... 3-3
MPI Compiler .............................. ...................................... 3-3
CheckingNodes and Partitions Before Running Jobs ............................. 3-3
Interruptinga Job .... ................................................... .............. 3-3
Setting DebuggingOptions .. ................................................ ........ 3-3
Developing Serial Applications ................................. .................... 3-3
Serial Application BuildEnvironment .................................... ..... 3-4
Using MLIB in Serial Applications ......................... .............. 3-4
BuildingSerial Applications ..... ............................................... 3-4
Compiling and Linking Serial Applications .............................. 3-4
Developing ParallelApplications ............................ ....................... 3-4
ParallelApplication Build Environment ....... ................................ 3-5
Modulefiles ........................... ...................................... 3-5
HP-MPI ..... ................................................ ................. 3-5
OpenMP ............. ................................................... ..... 3-5
Pthreads .... ................................................ ................. 3-5
Quadrics S
MLIB Math
MPI Libra
Intel For
PGI Fortr
GNU C and C
GNU Para
2 3 4 5
MKL Libr
ACML Lib
Other Li
Reserve Buildin
Compil
g Parallel Applications ......... .........................................
HMEM ....................... ...................................
Library ... ................................................ .....
ry ................................. ................................
tran and C/C++Compilers ........................................
an and C/C++Compilers ................ .......................
llelMake ................................. .......................
ary ........ ................................................ ........
rary .................................... ..........................
braries ........................... ...................................
d Symbolsand Names .............................. ..............
ing and LinkingNon-MPI Applications ........... ..............
++ Compilers ............... ................................
3-6 3-6 3-6 3-7 3-7 3-7 3-7 3-7 3-7 3-7 3-8 3-8 3-8
v Contents
i
3.7.2.2
3.7.2.3
3.8
3.8.1
3.9
3.9.1
3.9.1.1
3.9.1.2
3.9.1.3
3.9.2
3.9.3
3.9.3.1
3.9.3.2
3.9.4
Developing Librari
Advanced Topics ........ ................................................ .............. 3-10
Compiling a nd Linkin
Examples of Compili
es ........................... ......................................
Designing Librarie
Using the GNU Paral
Example Procedure
Example Procedure
Example Procedur Local Disks on Com I/O Performance C
Shared FileView ............ ............................................... 3-14
Private File View Communication B
4 Debugging Applications
4.1
4.2
4.2.1
4.2.1.1
4.2.1.2
4.2.1.3
4.2.1.4
4.2.1.5
4.2.1.6
4.2.1.7
4.2.1.8
Debugging Serial Applications ........................................... ........... 4-1
Debugging Parallel Applications ................................... ................. 4-1
Debugging with TotalView ........................ ............................. 4-2
SSH and TotalView ............. ............................................ 4-2
Setting Up TotalView ................ ...................................... 4-2
Using TotalView with SLURM ..... ...................................... 4-3
Using TotalView with LSF-HPC .......................................... 4-3
StartingTotalView for the First Time .. ................................... 4-4
Debugging an Application ............................. .................... 4-8
Debugging Running Applications ........................... .............. 4-10
Exiting TotalView ........... ............................................... 4-11
g HP-MPI Applications ...........................
ng and Linking HP-MPI Applications .......... ..
s forXC4000 ......... ......................................
lel Make Capability ....................... ..............
1 ................. ......................................
2 ................. ......................................
e 3 ............................. ..........................
puteNodes ..................................... ...........
onsiderations .................. .............................
.................................... .......................
etween Nodes ....................................... ........
3-8 3-8 3-9 3-9
3-10 3-12 3-13 3-13 3-14 3-14
3-14 3-15
5 Tuning Applications
5.1
5.1.1
5.1.2
5.1.3
Using theIntel Trace Collector/Analyzer ........... ................................ 5-1
Buildinga Program — Intel Trace Collector andHP-MPI ........... ........ 5-1
Running aProgram — Intel Trace Collector and HP-MPI ........ ........... 5-2
Visualizing Data— Intel TraceAnalyzer and HP-MPI ...................... 5-2
6UsingSLURM
6.1
6.2
6.3
6.4
6.4.1
6.4.1.1
6.4.1.2
6.4.2
6.4.3
6.4.4
6.4.5
6.4.5.1
6.4.5.2
6.4.5.3
6.4.5.4
6.4.6
Introduction .... ................................................... .................... 6-1
SLURM Commands ........................................ .......................... 6-1
Accessing the SLURM Launching Jobs with
The srun Roles and Mo
srun Roles .............................. ................................... 6-3
srun Modes ............................. ................................... 6-3
srun Signal Handlin srun Run-Mode Opti srun Resource-All srun Control Optio
Node Management Op
Working Features O
Resource Control
HelpOptions ........ ................................................... ..... 6-8
srun I/O Options .................................. ............................. 6-8
Manpages ................... ................................
the srun Command .........................................
des ..... ...............................................
g .............. ............................................
ons ................................ .......................
ocation Options ................................ ...........
ns ............................................ ..............
tions ................... .............................
ptions ............................. ....................
Options ................................ .................
6-2 6-2 6-2
6-4 6-4 6-5 6-7 6-7 6-7 6-8
ontents v
C
6.4.6.1
6.4.6.2
6.4.7
6.4.8
6.4.9
6.4.10
6.5
6.6
6.7
6.8
6.9
6.10
Monitoring Jobs with the squeue Command ........... .......................... 6-12
Killing Jobs with the scancel Command ........................................ 6-13
Getting System Information with the sinfo Command .................. ........ 6-13
Job Accounting ....................................... ................................ 6-14
Fault Tolerance ............. ................................................... ........ 6-14
Security .............................. .................................................. 6-14
7 Using LSF
7.1
7.1.1
7.1.2
7.1.3
7.1.4
7.1.5
7.1.6
7.1.6.1
7.1.6.2
7.2
7.3
7.3.1
7.3.2
7.3.3
7.3.4
7.3.5
7.4
7.4.1
7.4.2
7.4.3
7.4.4
7.4.5
7.4.6
7.4.6.1
7.4.7
7.5
7.5.1
7.5.1.1
7.5.1.2
7.5.2
7.5.3
7.6
7.6.1
7.6.2
7.7
Introductionto LSFin the HP XC Environment ............... .................... 7-1
Determining ExecutionHost ......................... ................................ 7-7
Determining Available System Res
Submitting Jobs ........................... ............................................ 7-9
Getting Information About Jobs ............ ......................................... 7-17
Working Interactively Withinan LSF-HPC Allocation .................... ........ 7-20
LSF Equivalents of SLURM srun Options .......... ............................. 7-23
I/O Commands ........................ ...................................... 6-8
I/O Redirection Alternatives .......................................... ..... 6-9
srun ConstraintOptions .................... ................................... 6-10
srun Environment Variables ...................... ............................. 6-12
Using srun with HP-MPI ................ ...................................... 6-12
Using srun with LSF .............................. ............................. 6-12
Overviewof LSF ................................... ............................. 7-1
Topology Support ....................... ......................................... 7-2
Notes on LSF-HPC ............ .................................................. 7-3
How LSF and SLURM Launch and Manage a J Differences Between LSF on HP XC and S Notes About Using LSF in the HP XC Env
Job Startup and Job Control ............... ................................ 7-7
Preemption Support ......... ............................................... 7-7
ources .................................. ........
Getting Status of LSF ................................. .......................... 7-7
Getting Information About LSF-H
Getting Host Load Information .............. ................................... 7-8
CheckingLSF System Queues ................. ................................ 7-9
Getting Information About the ls
Summary of the LSF bsub Command F LSF-SLURM External Scheduler
Submitting a Serial Job ................................... ....................... 7-13
Submitting a Jobin Parallel ..................... ................................ 7-13
Submitting an HP-MPI Job ...... ............................................... 7-13
Submitting a Batch Job or Job Script .... ...................................... 7-14
Examples ........................... ......................................... 7-15
Submitting a Jobfrom aNon-HPXC Host ............................... ..... 7-16
Getting Job Allocation Information ............... ............................. 7-17
Job Allocation Informationfor a RunningJob ...................... ..... 7-17
Job Allocation Informationfor a Finished Job ............. .............. 7-18
CheckingStatusof a Job .................. ...................................... 7-18
Viewing a Job’s Historical Information ........ ................................ 7-19
Submitting an Interactive Job to Launch the xterm Program ......... ..... 7-20
Submitting an Interactive Job to Launch a Shell ......................... ..... 7-22
PC Execution Host Node ................
f Partition ................... ..............
................................... ...........
ob ...........................
tandard LSF .... .................
ironment ........... ..............
ormat ........... ....................
7-4 7-6 7-7
7-7
7-8
7-9
7-10
7-11
8UsingHP-MPI
8.1
i Contents
v
Overview ......................... ................................................... .. 8-1
8.2
8.3
8.3.1
8.3.2
8.3.2.1
8.3.2.2
8.3.3
8.3.3.1
8.3.3.2
8.3.3.3
8.3.4
8.3.5
8.4
8.4.1
8.4.2
8.5
8.6
8.7
8.8
8.9
8.9.1
8.9.2
8.9.3
8.9.4
8.9.5
8.9.6
8.9.7
8.9.8
8.9.9
8.9.10
8.10
8.11
8.12
HP-MPI Directory Str Compiling and Runni
Setting Envir onmen Building and Runnin
Example Applicati Building and Runni
Using srun with HP-
Launching MPI Jobs Creating Subshel System Interconn
UsingLSFandHP-M
MPI Versioning .... ................................................... ........... 8-7
System Interconn
HP-MPI Performance on HP XC with Multiple S ystem Interconnects ..... 8-7
Global Environment Variable Settings on the mpirun Command Line ... 8-8
32-Bit Builds on XC4000 .................................. .......................... 8-8
Truncated Messages .... ................................................ .............. 8-8
Allowing Windows to UseExclusive Locks .......... ............................. 8-8
The mpirun Command Options .................... ................................ 8-9
Environment Variables ........................................ ....................... 8-10
MPIRUN_OPTIONS ............................... ............................. 8-10
MPIRUN_SYSTEM_OPTIONS ............................................... 8-10
MPI_IC_ORDE
MPI_PHYSICAL_MEMORY .................. ................................ 8-10
MPI_PIN_PERCENTAGE .................................... ................. 8-11
MPI_PAGE_A
MPI_MAX_WINDOW .................................. ....................... 8-11
MPI_ELANLOCK .................. ............................................ 8-11
MPI_USE_
MPI_USE_LIBELAN_SUB ................................... ................. 8-12
MPICH Object Compatibility ........................ ................................ 8-12
HP-MPI Documentation and Manpages ................................. ........... 8-13
Additional Information,Known Problems, and Work-arounds ..... .............. 8-14
ucture ................ .........................................
ng Applications .................. .............................
t Variables .................... .............................
g an Example Application ...................... ........
on hello_world ...... .............................
ng hello_world ................. .................
MPI .......... ............................................
................................... ....................
ls and LaunchingJobsteps ............. .................
ect Selection ....................................... .....
PI ...................... ...................................
ect Support .......................................... ..............
R ........... ................................................... ..
LIGN_MEM ..... ...............................................
LIBELAN .................................... .......................
8-2 8-2 8-2 8-2 8-3 8-3 8-4 8-4 8-5 8-5 8-6
8-7
8-10
8-11
8-11
9UsingHPMLIB
9.1
9.1.1
9.1.2
9.2
9.2.1
9.2.2
9.2.3
9.2.4
9.2.5
9.2.6
9.2.6.1
9.2.6.2
9.2.6.3
9.2.6.4
9.2.7
9.2.8
9.3
Overview ......................... ................................................... .. 9-1
Intel Compiler Notes ............. ............................................... 9-1
MLIB and Module Files .......... ............................................... 9-2
HP MLIB for the HP XC6000 Platform ............................................ 9-2
PlatformSupport ........................ ......................................... 9-2
Library Support ....... ................................................ ........... 9-2
MPI Parallelism .................................. ................................ 9-2
Modulefiles and MLIB ..... ................................................ ..... 9-3
Using Intel Compilers with HP MLIB ...... ................................... 9-3
Compiling and Linking ................................ .......................... 9-3
Linking VECLIB ............ ............................................... 9-3
Linking LAPACK ................................ .......................... 9-3
Linking ScaLAPACK ..................................... ................. 9-3
Linking SuperLU_DIST ................... ................................ 9-4
Licensing .................... ................................................... .. 9-4
MLIB Manpages .................. ............................................... 9-4
HP MLIB for the HP XC4000 Platform ............................................ 9-4
ontents vii
C
9.3.1
9.3.2
9.3.3
9.3.4
9.3.5
9.3.5.1
9.3.5.2
9.3.5.3
9.3.5.4
9.3.6
9.3.7
PlatformSupport ........................ ......................................... 9-4
Library Support ....... ................................................ ........... 9-4
MPI Parallelism .................................. ................................ 9-5
Modulefiles and MLIB ..... ................................................ ..... 9-5
Compiling and Linking ................................ .......................... 9-5
Linking VECLIB ............ ............................................... 9-5
Linking LAPACK ................................ .......................... 9-5
Linking ScaLAPACK ..................................... ................. 9-5
Linking SuperLU_DIST ................... ................................ 9-6
Licensing .................... ................................................... .. 9-6
MLIB Manpages .................. ............................................... 9-6
10 Advanced Topics
10.1
10.2
EnablingRemoteExecution with OpenSSH ..................................... .. 10-1
Running an X Terminal Session from aRemoteNode ........ .................... 10-1
AExamples
A.1 A.2 A.3 A.3.1 A.3.2 A.4 A.5 A.6 A.7 A.8
Building and Run Launching a Seri Running LSF Job
Example 1. Two P Example 2. Four
Launching a Par Submitting a S Submitting an Submitting an Using a Resour
ninga SerialApplication .................................... .....
al Interactive Shell ThroughLSF ................................
s with a SLURM Allocation Request ...................... .....
rocessors on Any Two Nodes .................. ...........
Processors on Two Specific Nodes ........ .................
allel Interactive Shell Through LSF ............................ ..
imple JobScript with LSF ....... ...................................
Interactive Job with LSF ................................. ...........
HP-MPI Job with LSF .............................. .................
ce RequirementsString in an LSF Command ....................
A-1 A-1 A-2 A-2 A-3 A-3 A-5 A-6 A-7 A-9
Glossary
Index
Examples
2-1 2-2 2-3 2-4 2-5 2-6 3-1 3-2 6-1 6-2 6-3 6-4 6-5 6-6 6-7 6-8 7-1
Submitting a Submitting Submitting RunninganM RunninganM Submitting Directory Recommend Simple Lau Displayin Reporting Killing a J Cancelli Sending a Using the Reportin Comparis
a Non-MPIParallel Job ....................... ..........................
a Non-MPIParallel Jobto Run One Taskper Node ..................
a JobScript ................... ............................................
Structure .................... ...............................................
ed Directory Structure ............ ......................................
nch of a Serial Program ................................... ..............
g Queued Jobs by Their JobIDs .... ......................................
on Failed Jobs in theQueue................. .............................
ob by Its JobID . ................................................... ........
ng All Pending Jobs .... ................................................ .....
Signal to a Job .......................... ...................................
sinfo Command (NoOptions) ............... ..........................
g Reasonsfor Downed, Drained, andDraining Nodes ..................
on of Queues and the Configuration of the Job Starter Script ..........
Serial Job ................... ............................................
PI Job with LSF ..................................... .................
PI Job with LSF Using the External Scheduler Option ...........
2-8
2-9 2-10 2-10 2-10
2-11 3-10 3-10
6-2 6-12 6-13 6-13 6-13 6-13 6-13 6-14
7-4
iii Contents
v
7-2 7-3 7-4
7-5 7-6 7-7 7-8 7-9 7-10 7-11 7-12 7-13 7-14 7-15 7-16 7-17 7-18 7-19 7-20 7-21 7-22 7-23 8-1 8-2 8-3 8-4 8-5
Using the External Sc Using the External S Using the External S
Nodes ......................................... ......................................... 7-12
Submitting an Inte
Submitting an HP-MPI Job ....................... ................................... 7-14
Submitting an HP-MPI Job with a Specific TopologyRequest ........... ........ 7-14
Submitting a Batch Job Script ..... .................................................. 7-15
Submitting a Batch Script with a SpecificTopologic Request .................... 7-15
Submitting a Batch Job Script that uses a Subset of the Allocation ......... ..... 7-15
Submitting a Batch job Script That Uses the srun --overcommit Option . 7-16 Useful Environme Using the bjobs C Using the bjobs C Using the bhist C Using the bhist View Your Envir View Your Allocation in SL View Your Running Job in LS
View Job Details in LSF .. ................................................ ........... 7-21
Running Jobs from an xter Submitting an Interacti Submitting an Interacti Performing System Inter Using TCP/IP over Gi gab Using TCP/IP over Elan4 Allocating and Attachi Allocating 12 Process
heduler to Submit a Job to Run on Specific Nodes .......
cheduler to Submit a Job to Run One Task per Node ......
cheduler to Submit a Job That Excludes One or More
ractive Serial Job ....... .........................................
nt Variables Available in a Batch Job Script ...................
ommand (Short Output) ..................................... ..
ommand (Long Output) ...................... .................
ommand (Short Output) ..................................... ..
Command (Long Output) .......... .............................
onment ....................................... .......................
URM ...................... .............................
F ............... ......................................
m Window ... .........................................
ve Shell Program .............. .............................
ve Shell Programon the LSF ExecutionHost ......... ..
connect Selection ........................... ..............
it Ethernet ........................................... .....
.................................. ..........................
ng Processors ................... .............................
ors on 6 Nodes ............ ...................................
7-12 7-12
7-13
7-16 7-18 7-19 7-19 7-20 7-21 7-21 7-21
7-22 7-22 7-23
8-6 8-6 8-6 8-6 8-7
Figures
4-1 4-2 4-3 4-4 4-5 7-1
Tables
2-1 3-1 3-2 6-1 7-1 7-2 8-1 8-2
TotalView Root Window ....................... ...................................... 4-4
TotalView Preferences Window ........................ ............................. 4-5
TotalView Process Window Example ............................................. .. 4-9
Unattached Window ................ ................................................ .. 4-10
AttachedWindow .... ................................................... .............. 4-11
How LSF-HPC and SLURM Launch and Manage aJob ...... .................... 7-5
SuppliedModulefiles ........................ ......................................... 2-3
Intel Compiler Commands ................................. .......................... 3-2
PGI Compiler Commands ......................... ................................... 3-2
SLURM Commands ........................................ .......................... 6-1
Output Provided by the bhist Command .................................... ..... 7-19
LSF Equivalents of SLURM srun Options .......... ............................. 7-23
Organization of the /opt/hpmpi Directory ........................... ........... 8-2
HP-MPI Manpage Categories ............... ......................................... 8-13
ontents ix
C
About This Document
This manual provides information about using the features and functions of the HP XC System Software and describes how the HP XC user and programming environments differ from standard Linux® system environments. In addition, this manual focuses on building and running a pplications in the HP XC environment and is intended to guide an application developer to take maximum advantage of HP XC features and functions by providing an understanding of the underlying mechanisms of the HP XC programming environment.
An HP XC system is integrated with several open source software components. Some open source software components are being used for underlying technology, and their deployment is transparent. Some open source software components require HP XC-specific user-level documentation, and that kind of information is included in this document, if required.
HP relies on the documentation provided by the op en source developers to supply the information you need to use their product. For links to open source software documentation for products that are integrated with your XC system, see Supplementary Information.
Documentation for third-party hardware and software components that are supported on the HP XC system is supplied by the third-party vendor. However, information about the operation of third-party software is included in this document if the functionality of the third -party component differs f rom standard behavior when used in the XC environment. In this case, HP XC documentation supersedes information supplied by the third-p a rty v endor. For l inks t o related third-party Web sit es, see Supplementary Information.
Standard Linux® administrative tasks or the functions provided by standard Linux tools and commands are documented in commercially available Linux reference manuals and on various Web sites. For more information about obtaining documentation for standard Linux administrative tasks and associated topics, see the list of Web sites and additional publications provided in Related Inform a tion.

Intended Audience

This manual is intended for experienced Linux u sers who run applications developed by others, and for experienced system or application d evelopers who develop, build, and run application code on an HP XC system.
This manual assumes that the user understands, and has experience with, multiprocessor systems and the Message Passing Interface (MPI), and is familiar with HP XC architecture and concepts.

Document Organization

This document is organized as follows:
hapter 1 provides an overv iew of the HP XC user, programming, and run-time
C nvironment.
e
Chapter 2 describes h ow to perform common user tasks on the HP XC system.
Chapter 3 describes how to build and run applications on the HP X C system.
hapter 4 describes how to debug applications on the HP XC system.
C
Chapter 5 describes how to better tune applications for the HP XC system.
Chapter 6 describes how to use SLURM on the HP XC system.
apter 7 describes how to use LSF® on the HP X C system.
Ch
Chapter 8 describes how to use HP-MPI on the HP XC system.
About This Document xi
Chapter 9 describes how to use MLIB on the HP XC system.
Appendix A provides examples of HP X C applications.
•TheGlossary provides definitions of the terms used in this manual.

HP XC Information

The HP XC System So ftwa re Documentation Set includes the following core documents. All XC documents, except the HP XC System Software Release Notes,areshippedontheXC documentation CD. All XC documents, including the HP XC System Software Release Notes, are available on line at the following URL:
http://www.hp.com/techservers/clusters/xc_clusters.html
HP XC System Software Release Notes Contains important, last-minute information about
HP XC Hardware Preparation Guide Describes tasks specific to HP XC that are required to
HP XC System Software Installation Guide Provides step-by-step instructions for installing the HP
HP XC System Software Administration Guide
HP XC System Software User’s Guide Provides an overview of managing the HP XC user
firmware, software, or hardware that might affect your system. This document is only available on line.
prepare each supported cluster platform for installation and configuration, including the specific placement of nodes in the switches.
XC System Software on the head node and configuring the system.
Provides an overview of the HP XC system administration environment and describes cluster administration tasks, node maintenance tasks, LSF® administration tasks, and troubleshooting procedures.
environment with modules, managing jobs with LSF, and how to build, run, debug, and troubleshoot serial and parallel applications on an HP XC system.
The following documents are also provided by HP f or use with your HP XC system:
nux Administration Handbook
Li
A third-party Linux reference manual, Linux Administration Handbook, is sh ipped with the HP XC System Software Documentation Set. This manual was authored by Evi Nemeth, Garth
Snyder,TrentR.Hein,etal(NJ:PrenticeHall,2002).
QuickSpecs for HP XC System Software
ovides a product overview, hardware requirements, software requirements, software licensing
Pr
nformation, ordering information, and information about commercially available s oftware that
i
as been qualified to interoperate with the HP XC System Software.
h The QuickSpecs are located at the following URL:
http://www.hp.com/techservers/clusters/xc_clusters.html
P XC Program Development Environment
H
The following URL provides pointers to tools that have been tested in the HP XC program development environment (for example, TotalView® and other debuggers, compilers, and so on):
ftp://ftp.compaq.com/pub/products/xc/pde/index.html
xii About This Document
HP Message Passing Interface
HP Message Passing Interface (MPI) is an im plemen tation of the MPI standard for HP systems. The home page is located at the following URL:
http://www.hp.com/go/mpi
HP Mathematical Library
The HP math libraries (MLIB) support application developers who are looking for ways to speed up development of new applications and shorten the execution time of long-running technical applications. The home page is located at the following URL:
http://www.hp.com/go/mlib
HP Cluster Platform Documents
The cluster platform documents describe site requirements, show you how to physically set up the servers and additional devices, and provide procedures to operate and manage the hardware. These documents are shipped with your hardware.
Documentation for the HP Integrity and HP ProLiant servers is available at the following URL:
http://www.docs.hp.com/

For More Information

The HP Web site has information on this product. You can access the HP Web site at the following URL:
http://www.hp.com

Supplementary Information

This section contains links to third-party and open source components that are integrated into the HP XC System Software core technology. In the XC documentation, except where necessary, references to third-party and open source software components are generic, and the XC adjective is not added to any reference to a third-party or open source command or product name. For example, the SLURM srun command is simply referred to as the srun command.
The location of each Web site or link to a particular topic listed in this section is subject to change without notice by the site provid e r.
http://www.platform.com
Home pa ge for Platform Computing, th e developer of the Load Sharing Facility (LSF). LSF, the batch system resou rce manager used on an XC system, is tightly integrated with the HP XC and SLURM software.
For your convenience, the following Platform LSF documents are shipped on the HP XC documentation CD in PDF format. The Platform LSF documents are also available on the XC Web site.
- Administering Platform LSF
- Administration Primer
- Platform LSF Reference
- Quick Reference Card
- Running Jobs with Platform LSF
http://www.llnl.gov/LCdocs/slurm/
Home page for the Simple Linux Utility for R esource Management (SLURM), which is integrated with LSF to manage job and compute resources on an XC system.
About This Document xiii
http://www.nagios.org/
Home page for Nagios®, a system and network monitoring application. Nagios watches specified hosts and services and issues alerts when problems occur and when problems are resolved. Nagios provides the monitoring capabilities o n an XC system.
http://supermon.sourceforge.net/
Home page fo r Supermon, a high-speed cluster monitoring system that emphasizes low perturbation, high sampling rates, and an extensible data protocol and programming interface. Supermon works in conjunction with Nagios to provide XC system m onitoring.
http://www.llnl.gov/linux/pdsh/
Home page for t he parallel distributed shell (pdsh), which executes commands across XC client nodes in parallel.
http://www.balabit.com/products/syslog_ng/
Home page for syslog-ng©, a logging tool that replaces the traditional syslog functionality. The syslog-ng tool is a flexible and scalable audit trail processing tool, and it provides a centralized, securely stored log of all devices on your network.
http://systemimager.org
Home page for SystemImager®, w hich is the underlying technology that is used to install the XC software, distribute the golden image, and distribute configuration changes.
http://www.etnus.com
Home page for Etnus, Inc., maker of the TotalView parallel debugger.
http://www.macrovision.com
http://sourceforge.net/projects/modules/
http://dev.mysql.com/

Manpages

Manpages provide online reference and command information from the command line. Manpages are supplied with the HP XC system for standard HP XC components, Linux user commands, LSF commands, and other software components that are distributed with the HP XC system.
Manpages for third-party vendor software components m a y be provided as a part of the deliverables for that component.
Using the discover to display a manpage:
$ man discover $ man 8 discover
Home page for Macrovision®, developer of the FLEXlm™ license management utility, which is used for HP XC license management.
Home page for Modules, which provide for easy dynamic m odification of a user’s environment through modulefiles, which typically instruct the module command to alter or set shell environment variables.
Home page for MySQL AB, developer of the MySQL d atabase. This Web site contains a link to the MySQL docu m entation, particularly the MySQL Reference Manual.
(8) manpage as an example, you can use either of the following commands
If you are not sure about a command you need to use, enter the man command with the -k option to obtain a list of commands that are related to the keyword. For example:
# man -k keyword
xiv About This Document

Related Information

This section provides pointers to the Web sites for related software products and provides references to useful third-party publications. The location of each Web site or link to a particular topic is subject to change without notice by the site provider.
RelatedLinuxWebSites
http://www.redhat.com
Home page for Red Hat®, distributors of Red Hat Enterprise Linux Advanced Server, a Linux distribution with which the HP XC operating environment is compatible.
http://www.linux.org/docs/index.html
Home page for the Linux Documentation Project (LDP). This Web site contains guides covering various aspects of working with Linux, from creating your own Linux system from scratch to bash script writing. This site also includes links to Linux HowTo documents, frequently asked questions (FAQs), and manpag es.
http://www.linuxheadquarters.com
Web site providing documents and tutorials for the Linux user. Documents contain instructions on installing and using applications for Linux, configuring hardware, and a variety of other topics.
http://linuxvirtualserver.org
Home page for the Linux Virtual Server (LVS), the l oad balancer running on the Linux operating system that distributes login requests on the XC system.
http://www.gnu.org
Home page for the GNU Project. This site provides online software and information for many programs and utilities that are commonly used on GNU/Linux systems. Online information include guides for using the bash shell, emacs, make, cc, gdb,andmore.
Related MPI Web Sites
http://www.mpi-forum.org
Contains the official MPI standards documents, errata, and archives of the MPI Forum. TheMPIForumisanopengroupwithrepresentatives from many organizations that define and maintain the MPI standard.
http://www-unix.mcs.anl.gov/mpi/
A comprehensive site containing general information, such as the specification and FAQs, and pointers to a variety of other resources, including tutorials, implementations, and other MPI-related sites.
Related Compiler Web Sites
http://www.intel.com/software/products/compilers/index.htm
Web site for Intel® compilers.
http://support.intel.com/support/performancetools/
Web site for general Intel software development information.
http://www.pgroup.com/
Home page for The Portland Group™, supplier of the PGI® compiler.
Additional Publications
For m ore information about standard Linux system ad ministration or other related software topics, refer to the following documents, which must be purchased separately:
About This Document xv
Linux Admin istration Unleashed, by Thomas Schenk, et al.
Managing N FS and NIS, by Hal Stern, Mike Eisler, and Ricardo Labiaga (O’Reilly)
MySQL, by Paul Debois
MySQL Cookbook, by Paul Debois
High Performance MySQL, by Jeremy Zawodny and Derek J. Balling (O’Reilly)
Perl Cookbook, Second Edition, by Tom Christiansen and Nathan Torkington
Perl in A Nutshell: A Desktop Quick Reference , by Ellen Siever, et al.

Typographical Conventions

Italic font
Courier font
Bold text
$ and # In command examples, a dollar sign ($) represents the system
Italic (slanted) font indicates the name of a variable that you can replace in a command example or information in a display that represents several possible values.
Document titles are shown in Italic font. For exam ple: Linux Administration Handbook.
Courier font represents text that is displayed by the computer. Courier font also represents literal items, such as command names, file names, routines, directory names, path names, signals, messages, and programming language structures.
In command and interactive examples, bold text represents the literal text that you enter. For example:
# cd /opt/hptc/config/sbin
In text paragraphs, bold text indicates a new term or a term that is defined in t he glossary.
prompt for the bash shell and also shows that a user is in non-root mode. A pound sign (#) indicates that the user is in root or superuser mode.
[]
{ } In command syntax and examples, braces ({ }) indicate that
...
.
.
.
| In command syntax and examples, a pipe character ( | ) separates
xvi About This Document
In command syntax and examples, brackets ([ ]) indicate that the c ontents are optional. If the contents are separated by a pipe character ( | ), you must choose one of the items.
the contents are required. If the con tents are separated by a pipe character (|), you must choose one of the items.
In command syntax and examples, horizontal ellipsis points ( … ) indicate that the preceding element can be repeated as many times as necessary.
In programming examples, screen displays, and command output, vertical ellipsis points indicate an omission of information that does not alter the meaning or affect the user if it is not shown.
items in a list of choices.
discover(8)
A cross-reference to a manpage includes the appropriate section number in parentheses. For example, discover you can find information on the discover command in Section 8 of the manpages.
(8) indicates that
Ctrl/x
Enter The name of a keyboard key. Enter and Return both refer to the
Note
Caution
Warning
In interactive command examples, this symbol indicates that you hold down the first named key while pressing the key or button that follows the slash ( / ).
When it occurs in the body of text, the action of pressing two or more keys is shown without the box. For example:
Press Ctrl/x to exit the application.
same key.
A note calls attention to information that is important to un derstand before continuing.
A caution calls attention to important information that if not understood or followed will result in data loss, data corruption, or a system malfunction.
A warning calls atten tion to important in formation that if not understood or followed will result in personal injury or nonrecoverable system problems.

HP Encourages Yo ur Comments

HP welcomes your comments on this document. Please provide your comments and suggestions at the following URL:
http://docs.hp.com/en/feedback.html
About This Document xvii

Overview of the User Environment

The HP XC system is a collection of computer nodes, n etworks, storage, and software built into a cluster that work together to present a single system. It is designed to maximize workload and I/O performance, and provide efficient management of large, complex, and dynamic workloads. The HP XC system provides a set of integrated and supported user features, tools, and components w hich are described in this chapter.
This chapter briefly describes the components of the HP XC env iro nment. The following topics are covered in this chapter:
System architecture (Section 1.1)
User environment (Section 1.2)
Application development environment (Section 1.3)
Run-time environment (Section 1.4)
Supported tools, compilers, libraries (Section 1.5)

1.1 System Architecture

1
The HP XC architecture is designed as a clustered system with single system tra its. From a user perspective, this architecture achieves a single system view, providing capabilities such as single user login, a single file system namespace, an integrated view of system resources, an integrated program development environment, and an integrated job submission environment.
1.1.1 Operating System
The H P XC system is a high-performance compute cluster that runs H P XC Linux for High Performance Com puting Version 1.0 (HPC Linux) as its software base. Any applications that run correctly using R ed Hat Enterprise Linux Advanced Server Version 3.0 will also run correctly using HPC Linux.
1.1.2 Node Specialization
The H P XC system is implemented as a sea-of-nodes. Each node in the system contains the same software image on its local disk. There are two physical types of nodes in the system —ahead node and client nodes.
head node The node that is installed with the HP XC system software first — it
client nodes All the other the nodes that make up the system. They are replicated
is used to generate other HP XC (client) nodes. The head node is generally of interest only to the administrator of the HP XC system.
from the head node and are usually given one or more specialized roles to perform various system functions, such as logging into the system or running jobs.
The HP XC system allows for the specialization of clien t nodes to enable efficient and flexible distribution of the workload. Nodes can be assigned o ne or more specialized roles that determine how a particular node is used and what system services it provides. Of the many
Overview of the User Environment 1-1
different roles that can be assigned to a client node, the following roles contain services that are of special interest to the general user:
login role
compute role
1.1.3 Storage and I/O
The H P XC system supports both shared (global) and private (local) disks and file systems. Shared file systems can be mounted on all the other nodes by means of Lustre or NFS. This gives users a singl e view of all the shared data on disks attached to the HP XC system.
The role most visible to users is on nodes that have the login role. Nodes with the login role are where yo u log in and interact with the system to perform various tasks. For example, once logged in to a node with login role, you can execute commands, build applications, or submit jobs to compute nodes for execution. Th ere can be one or several nodes with the login role in an HP XC system, depending upon cluster size and req uiremen ts. Nodes with the login role are a part of the Linux Virtual Server ring, which distributes login requests from users. A node with the login role is referred to as a login node in this m a nual.
The compute role is assigned to nodes where jobs are to b e distributed and run. Although all nodes in the HP XC system are capable of carrying out computations, the nodes with the compute role are the primary nodes used to run jobs. Nodes with the compute role become a part of the resource pool used by LSF-HPC and SLURM, which manage and distribute the job workload. Jobs that are submitted to com pute nodes must be launched from nodes with the login role. Nodes with the compute role are referred to as compute nodes in this manual.
SAN Storage
HP XC uses the HP StorageWorks Scalable File Share (HP StorageWorks SFS), which is based on Lustre technology and uses the Lustre File System from Cluster File Systems, Inc. This is a turnkey Lustre system that is delivered and supported by HP. It supplies access to Lustre file systems through Lustre client-server protocols over various system interconnects. The HP XC system is a client to the HP StorageWorks SFS server.
Local Storage
Local storage for each node holds the operating system, a copy of the HP XC system software, and temporary space that can be used by jobs running on the node.
HP XC file systems are described in detail in S ection 1.1.4.
1.1.4 File System
Each node of the HP XC system has its own local copy of all the HP XC System Software files including the Linux distribution and also has its own local user files. Every nod e may also im port files from NFS or Lustre file servers. HP XC System Software supports NFS 3 including both client and server functionality. H P XC System Software also enables Lustre client services for high-performance and high-availability file I/O. These Lustre client services require the separate installation of Lustre software, provided with the HP Storage Works Scalable File Share (SFS).
In the case of NFS files, these can be shared purely between the nodes of the HP XC System, or alternatively can be shared between the HP XC and external systems. External NFS files can be shared with any node having a direct external network connection. It is also possible to set up NFS to import external files to HP XC nodes without external network connections, by routing through a node with an external network connection. Your system administrator can
1-2 Overview of the User Environment
choose to use either the HP XC Administrative Network, or t he XC system Interconnect, for NFS operations. The HP XC system interconnect can potentially offer higher performance, but only at the potential expense of the performance of application communications.
For high-perform ance or high-availability file I/O, the Lustre file system is available on HP XC. The Lustre file system uses POSIX-compliant syntax and semantics. The HP XC System Software includes kernel modifications required f or Lustre client services which enables the operation of t he separately installable Lustre client softw are. The Lustre file server product used on HP XC is the HP Storage Works Scalable File Share (SFS), which fully supports the HP XC.
The SFS includes HP XC Lustre client software. The SFS can be integrated with the HP XC so that Lustre I/O is performed over the same high-speed system interconnect fabric used by the HP XC . So, for example, if the HP XC system interconnect is based on a Quadrics QsNet II switch, then the SFS will serv e files over ports on that switch. The file operations are able to proceed at the full bandwidth of the HP XC system interconnect because these operations are implemented dir ectly over the low-level communications libraries. Further optimizations of file I/O can be achieved at the application level using special file system commands – implemented as ioctls – which allow a program to interrogate the attributes of the file system, modify the stripe size and other attributes of new (zero-length) files, and so on. Some of these optim izations are implicit in the HP-MPI I/O library, w hich implements the MPI-2 file I/O standard.
File System Layout
In an HP XC system, the basic file system layout is the same as that of the Red Hat Advanced Server 3.0 Linux file system.
The HP XC file system is structured t o separate cluster-specific files, base operating system files, and user-installed software files. This allows for flexibility and ease of potential upgrades of the system software as well as k eeping software from conflicting with user instal led software. Files are segregated into the following types and locations:
HP X C-specific software is located in /opt/hptc
HP XC configuration data is located in /opt/hptc/etc
Clusterwide directory structure (file system) is located in /hptc_cluster
You should be aware of th e following information about the HP XC file system layout:
Open source software that by default would be installed under the /usr/local directory
is instead installed in the /opt/hptc directory.
Software installed in the /opt/hptc directory is not intended to be updated by users.
Software packages are installed in directories under the /opt/hptc directory under their
own names. The exception to this is 3rd-party software, which usually goes in /opt/r.
There are four directories under the /opt/hptc directory that contain symbolic links
to files included in the packages:
- /opt/hptc/bin
- /opt/hptc/sbin
- /opt/hptc/lib
- /opt/hptc/man Each package directory should have a directory corresponding to each of these directories
where every file has a symbolic link created in the /opt/hptc/ directory.
1.1.5 System Interconnect Network
The HP XC system interconnect provides high-speed connectivity for parallel applications. T he system interconn ect network provides a high speed communications path used primarily for user file service and for communications within user applications that are distributed among
Overview of the User Environment 1-3
nodes of the system. The system interconnect network is a private network within the HP XC. Typically, every node in the HP XC is connected to the system interconnect.
The HP XC system interconnect can be based on either Gigabit Ethernet or Myrinet-2000 switches. The types of system interconnects that are used on HP XC systems are:
Myricom Myrinet on HP Cluster Platform 4000 (ProLiant/Opteron servers), also referred to
as XC4000 in this manual.
Quadrics QsNet II on HP Cluster Platform 6000 (Integrity servers), also referred to as
XC6000 in this manual.
Gigabit Ethernet on both XC4000 and XC6000
InfiniBand on XC4000
1.1.6 Network Address Translation (NAT)
The HP XC system uses Network Address Translation (NAT ) to allow nodes in the HP XC system that do not have direct external network connections to open outbound network connections to external network resources.

1.2 User Environment

This section introduces some basic general information about logging in, configuring, and using the HP XC environment.
1.2.1 LVS
The HP XC system uses the Linux Virtual Server (LVS) to present a single host name for user logins. LVS is a highly scalable virtual server built on a system of real servers. By using LVS, the architecture of the HP XC system is transparent to end users, and they see only a single virtual server. This eliminates the need for users to know how the system is configured in order to successfully log in and use the system. Any changes in the system configuration are transparent to end users. LVS also provides load balancing across login nodes, which distributes login requests to different servers.
1.2.2 Modules
The HP XC system provides the Modules Pack age (not to be confused with Linux kernel modules) to configure and modify the user environment. The Mod ules Package enab les dynamic m odificatio n of a user’s environment by means of modulefiles. Modulefiles provide a convenient means for users to tailor their working environment as necessary. One of the key features of modules is to allow multiple versions of the same software t o be used in a controlled manner.
A modulefile contains information to configure the shell for an application. Typically, a modulefile contains instru ctions that alters or sets shell environment variables, such as PATH and MANPATH, to enable access to various installed software. Modulefiles may be shared by many users on a system, and users may have their own coll ection to supplement or replace the shared modulefiles.
Modulefiles can be loaded into the your environment automatically when you log in to the system, or any time you need to alter the env iro nment. The HP XC system does not preload modulefiles.
1.2.3 Commands
The HP XC user environment includes standard Linux commands, LSF commands, SLURM commands, HP-MPI commands, and modules commands. This section provides a brief overview of these comm and sets.
1-4 Overview of the User Environment
1.2.3.1 Linux Commands
The HP XC system supports the use of standard Linux user commands and tools. Standard Linux com m ands are not described in this document. You can access descriptions of Linux commands in Linux documentation and manpages. Linux manpages are available by invo kin g the Linux man command with the Linux command name.
1.2.3.2 LSF Commands
HP XC supports LSF-HPC and the use of standard LSF commands, some of which operate differently in the HP XC environment from standard LSF behavior. The use of LSF commands in the HP XC env iro nment is described in Chapter 7, and in the HP XC lsf_diff manpage. Information about standard LSF commands is available in Platform Computing Corporation LSF documentation, and in the LSF manpages. For your convenience, the HP XC d ocumentation CD containsXC LSF manuals from Platform Computing. LSF manpages are available on the HP XC system.
1.2.3.3 SLURM Commands
HP XC uses th e Simple Linux Utility for Resource Management (SLURM) for system r esource management and job scheduling, and supports the use of standard SLURM commands. SLURM functionality is described in Chapter 6. Descriptions of SLURM commands are available in the SLURM manpages by invoking the man command with the SLURM command name.
1.2.3.4 HP-MPI Commands
HP XC supports th e HP-MPI software and the use of standard HP-MPI commands. Descriptions of HP-MPI commands are available in the HP-MPI do c umentation, which is supplied with the HP XC system software. HP-MPI m anpages are also available by invoking the man command with the HP-MPI command name. HP-MPI functionality is d escrib e d in Chapter 8.
1.2.3.5 Modules Commands
The HP XC system supports the u se of standard Modules commands to load and unload modulefiles that are used to configure and modify the user environment. Modules commands are described in Section 2.2.

1.3 Application Development Environment

The HP XC system provides an environment that enables dev elop ing , building, and running applications using multiple nodes with multiple processors. These applications can range from parallel applications using many processors to serial applications using a single processor.
1.3.1 Parallel Applications
The HP XC parallel application development environment allows parallel application processes to be started and stopped together on a large number of application processors, along with the I/O and process control structures to manage these kinds of applications.
Full detai ls and examples of how to build, run, debug, and troubleshoot parallel applications areprovidedinSection3.7.
1.3.2 Serial Applications
The HP XC serial application development environment supports building a nd running serial applications. A serial application is a command or application that does not use any form of parallelism.
Full details and examples of how to build, run, d ebug, and troubleshoot serial applications are provided in Section 3.6.2.
Overview of the User Environment 1-5

1.4 Run-Time Environment

In the HP XC environment, LSF-HPC, SLURM, and HP-MPI work together to provide a powerful, flexible, extensive run-time environment. This section describes LSF-HPC, SLURM, and HP-MPI, and how these components work togeth er to p rovide the HP XC run-time environment.
1.4.1 SLURM
SLURM (Simple Linux Utility for Resource Management) is a resource management system that is integrated into the HP XC system. SLURM is suitable for use on large and small Linux clusters. It was d evelo ped by Lawrence Livermore National Lab and Linux Networks. As a resource manager, SLURM allocates exclusive or non-exclusive access to resources (application/compute nodes) for users to perform work, and provides a framework to start, execute and monitor work (normally a parallel job) on the set of allocated nodes. A SLURM system consists of two daemons, one co nfiguration file, and a set of commands and APIs. The central controller daemon, slurmctld, maintai ns the global state and directs operations. A slurmd daem on is deployed to each computing node and responds to job-related requests, such as launching jobs, signalling, and terminating jobs. End users and system software (such as LSF-HPC) communicate withSLURMbymeansofcommandsorAPIs—forexample, allocating resources, launching parallel jobs on allocated resources, and killing running jobs.
SLURM group s compute nodes (the nod es where jobs are run) together into partitions.The HP XC system can have one or several partitions. When HP XC is installed, a single partition of compute nodes is created by default for LSF batch jobs. The system administrator has the option of creatin g additional partitions. For example, another partition could b e created for interactive jobs.
1.4.2 Load Sharing Facility (LSF-HPC)
The Load Sharing Facility for High Performance Computing (LSF-HPC) from Platform Computing Corporation is a batch system resource manager that has been integrated w ith SLURM for use on the HP XC system. LSF-HPC for SLURM is included with the HP XC System Software, and is an integral part of theHP XC env ironment. LSF-HPC interacts with SLURM to obtain and allocate available resources, and to launch and control all the jobs submitted to LSF-HPC. LSF-HPC accepts, queues, schedules, dispatches, and controls all the batch jobs that users subm it, according to policies and configurations established b y the HP XC site administrator. On an HP XC system, LSF-HPC for SLURM is install ed and runs on one HP XC node, known as the LSF-HPC execution host.
A complete description of LSF-HPC is provided i n Chapter 7. In addition, for your convenience, the HP XC documentation CD contains LSF Version 6.0 manuals from Platform Computing.
1.4.3 How LSF-HPC and SLURM Interact
In the HP XC environment, LSF-HPC cooperates with SLURM to combine LSF-HPC’s powerful schedu ling functionali ty with SLURM’s scalable parallel job launching capabilities. LSF-HPC acts primarily as a workload scheduler on top of the SLURM system, providing policy and topology-based scheduling for end users. SLURM provides an execution and monitoring layer for LSF-HPC. LSF-HPC uses SLURM to detect system topology information, make scheduling decisions, and launch jobs on allocated resources.
When a job is submitted to LSF-HPC, LSF-HPC schedules the job based on job resource requirements and communicates with SLURM to allocate the required HP XC compute nodes for the job from the SLURM lsf partition. LSF-HPC provides node-level scheduling for parallel jobs, and CPU-level scheduling for serial jobs. Because of node-level scheduling, a parallel job may be allocated more CPUs than it requested, depending on its resource request; the srun or mpirun -srun launch com mands within the job still honor the original CPU
1-6 Overview of the User Environment
request. LSF-HPC always tries to pack multiple serial jobs on the same node, with one CPU per job. Parallel jobs and serial jobs cannot coexist on the same node.
After the LSF-HPC scheduler allocates the SLURM resources for a job, the SLURM allocation information is recorded with the job. You can view this information with the bjobs and bhist commands.
When LSF-HPC starts a job, it sets the SLURM_JOBID and SLURM_NPROCS environment variables in the job environment. SLURM_JOBID associates the LSF-HPC job with SLURM’s allocated resources. The SLURM_NPROCS environment variable is set to the originally requested number of processors. LSF-HPC dispatches the job from the LSF-HPC execution host, which is the same node on which LSF-HPC daemons run. The LSF-HPC JOB_STARTER script, which is configured for all queues, uses the srun command to launch a user job on the first node in the allocation. Your job can contain additional srun or mpirun comm ands to launch tasks to al l nodes in the allocation.
While a jo b is running, all LSF-HPC-supported resource limits are enforced, including core limit, cputime limit, data limit, file size limit, memory li m it and stack limit. When you kill a job, LSF-HPC uses the SLURM scancel command to propagate the signal to the entire job.
After a job finishes, LSF-HPC releases all allocated resources. A detailed description, along with an example and illustration, of how LSF-HPC and SLURM
cooperate to launch and manage jobs is provided in Section 7.1.4. It is highly recommended that you review this information.
1.4.4 HP-MPI
HP-MPI is a high-performance implementation of the Message Passing and is included with the HP XC system. HP-MPI uses SLURM to launch jo system — however, it manages the global MPI exchange so that all pro with each other.
HP-MPIcompliesfullywiththeMPI-1.2standard.HP-MPIalsocomplieswiththeMPI-2 standard, with some restrictions. HP-MPI provides an application programming interface and software libraries to support parallel, message-passing applications th at are efficient, portable, and flexible. HP-MPI version 2.1 is included in this release of HP XC.
HP-MPI 2.1 for HP XC is supported on XC4000 and XC6000 clusters, and includes support for the following system interconnects:
XC4000 Clusters — Myrinet, Gigabit Ethernet, TCP/IP, InfiniBand
XC6000 Clusters — Quadrics Elan4, Gigabit Ethernet, TCP/IP
Interface standard
bs on an HP X C
cesses can communicate

1.5 Components, Tools, Compilers, Libraries, and Debuggers

This section pro vides a brief overview of the some of the common tools, compilers, libraries, and debuggers supported for u se on HP XC.
An HP XC system is integrated with several op en source software co mp onents. HP XC incorporates the Linux operating system, and its standard commands and tools, and does not diminish the Linux ABI in any way. In addition, HP XC incorporates LSF and SLURM to launch and manage jobs, and includes HP-MPI for high performance, parallel, message-passing applications, and HP MLIB math library for intensive computations.
Most standard open source compilers and tools can be used on an HP XC system, however they must be purchased separately. Several open source and commercially available software packages have been tested with the HP XC Software. The following list shows some of the software packages that have been tested for use with HP XC. This list provides an example of w hat is available on HP XC and is not intended as a complete list. Note that some of the packages listed are actually included as part of the HPC Linux distribution and as such are
Overview of the User Environment 1-7
supported as part of the HP XC. The tested software packages include, but are not limited to, the following:
Intel Fortran 95, C, C++ Compiler Version 7.1 and 8.0, including OpenMP, for Itanium
(includes ldb debugger)
gcc version 3.2.3 (included in t he HP XC distribution)
g77 version 3.2.3 (included in the HP XC distribution)
Portland Group PGI Fortran90, C, C++ Version 5.1, including OpenMP, for XC4000
Quadrics SHMEM, as part of QsNet II user libraries, on Itanium systems connected with
the Quadrics QsNet II switch (included in the HP XC distribution)
Etnus TotalView debugger Version 6.4
gdb (part of the HP XC Linux distribution)
Intel MKL V6.0 on Itanium
AMD Math Core Library Version 2.0 on XC4000
valgrind 2.0.0 (http://valgrind.kde.org) in 32-bit mode only
oprofile 0.7.1 (http://oprofile.sourceforge.net)
PAPI 3.2 (http://icl.cs.utk.edu/papi)
Intel Visual Analyzer/Tracer (formally Pallas Vampir and Vampirtrace performa nce
analyzer ) on Itanium
GNU make, including distributed parallel make (included in the HP XC distribution)
Other standard tools and libraries are available and can most likely be used on HP XC as they would on any other standard Linux system. It should be noted, however, that software that is not described in HP XC documentation may not have been tested with HP XC and may not function in a standard manner.
1-8 Overview of the User Environment
This chapter describes tasks and commands that the general user must know to use the system. It contains the following topics:
Loggingintothesystem(Section2.1)
Setting up the user environment (Section 2.2)
Launching and managing jobs (Section 2.3)
Performing som e common user tasks (Section 2.4)
Getting help (Section 2.5)

2.1 Logging in to the S ystem

Logging in to an HP XC sy stem is similar to logging in to any standard Linux system. Logins are performed on nodes that have the login role. Secure Shell (ssh) is the preferred method for accessing the HP XC system.
2.1.1 LVS Login Routing
2

Using the System

The HP XC system uses the Linux Virtual Server (LVS) facility to present a set of login nodes with a single cluster name. When you log in to the system, LVS automatically routes your login requesttoanavailableloginnodeonthesystem. LVS load balances login sessions across the login nodes and improves the availability of login access. When you log in to the HP XC system, you do not have to know specific node names to log in, only the HP XC system’s cluster name.
2.1.2 Using ssh to Log In
TologintoanHPXCsystem,youmustuseSecureShell(ssh). Typically, you access the HP XC system using the ssh command to get a login shell or to execute commands. For example:
$ ssh user-name@system-name user-name@system-name’s password:
The ssh service also allows file tr ansfer using the scp or sftp commands over the same port as ssh.
The typical r* UNIX comman XC system by default becau and password informatio UNIX commands (as well as
If you want to use ssh without password prompting, you must set up ssh authentication keys. Refer to the ssh
ssh is further discussed in Section 10.1.
(1) manpage for information about using ssh authentication keys.
ds, such as rlogin, rsh,andrcp,arenotinstalledonanHP
se of their inherent insecurity. The ssh comm and transfers all login
n in an encrypted form instead of the plaintext form used by the r*
telnet and ftp).

2.2 Configuring Your Environment with Modulefiles

The HP XC system supports the use of Modules software to make it easier to configure and modify the your environment. Modules software enables dynamic modification of your environment by the use of modulefiles. A modulefile contains information to configure the shell for an application. Typically, a modulefile contains instructions that alters or sets shell
Using the System 2-1
environment variables, such as PATH and MANPATH, to enable access to various installed software.
One of the key features of usin g modules is to allow multiple versions of the same software to be used in your environment in a controlled manner. For example, two different versions of the Intel C compiler can be installed on the system atthesametime–theversionusedisbased upon which I ntel C compiler modulefile is loaded.
The HP XC software provides a number of modulefiles. You can also create your own modulefiles. Modulefiles may be shared by many users on a system, and users may have their own collectio n of modulefiles to supplement or replace the shared mod ulefiles.
The following topics are addressed in the corresponding sections:
Section 2.2.1 provides additional information on modulefiles.
Section 2.2.2 discusses what modules are supplied.
Section 2.2.3 discusses what modules are loaded by default.
Section 2.2.4 discusses how to determine what modules are available.
Section 2.2.5 discusses how to determine which modules are loaded.
Section 2.2.6 discusses how to load a module.
Section 2.2.7 discusses how to unload a module.
Section 2.2.8 discusses module conflicts.
Section 2.2.9 discusses creating your own module.
For further information about the Modules software supplied with the HP XC system, see the Modules Web site at the following URL:
http://sourceforge.net/projects/modules/
2.2.1 Notes on Modulefiles
A mod ulefile does not provide configuration of your en vironment until it is explicitly loaded. That is, the specific modulefile for a software product or application must be loaded in your environment (with the module load command) before the configuration inform ation in the modulefile is effective.
You or your system administrator can configure you r environment so that any desired modulefiles are automatically loaded for you when you log in to the system. You can also load a modulefile yourself, as described in Section 2.2.6.
The Modules software is initialized when you log in to the HP X C system. It provides access to the co mmands that allow you to display information about modulefiles, load or unload modulefiles, or view a list of available modulefiles.
Modulefiles do not affect packages other than their intended package. For ex ample, a modulefile for a compiler will not adjust MPI_CC (the environment variable used by HP MPI to control which compiler to use). A m odulefile for a compiler simply makes it easier to access that particular co mpiler; it does not try to determine how the compiler will be used.
Similarly, a modulefile for HP MPI will not try to adjust LD_LIBRARY_PATH to correspond to the compiler that the mpicc command uses. The modulefile for MPI simply makes it easier to access the mpi** scripts and libraries. You can specify the compiler it uses through a variety of mechanisms long after the modulefile is loaded.
The previous scenarios were chosen in particular because the HP MPI mpicc command uses heuristics to try to find a suitable compiler when MPI_CC or other default-overriding mechanisms are not in effect. It is possible that mpicc will choose a c ompiler incon sistent with the most recently loaded compiler m odule. This could cause inconsistencies in the use
2-2 Using the System
of shared objects. If you have multiple compilers (perhaps with incompatible shared objects) installed, it is probably wise to set MPI_CC (and others) explicitly to the commands made available by the compiler’s modulefile.
The contents of the m odulefiles in the modulefiles_hptc RPM use the vendor-intended location of the installed software. In many cases, this is under the /opt directory, but in a few cases (for example, the PGI compilers and TotalView) this is under the /usr directory.
If you install a software package other than the intended place, you must create or edit an appropriate modulefile under the /opt/modules/modulefiles directory.
For the packages that install by default into the /usr directory (currently the PGI compilers and TotalView), their corresponding modulefiles will try their vendor-intended location under the /usr directory. If they do not find that directory, the packages will also search under the /opt directory. Therefore, no changes to the modulefiles are needed if you want to install third-party software consistently as the vendor intended or consistently under the /opt directory,
If the package is the stable product intended to be used by the site, editing an existing modulefile is appropriate. While each modulefile has its unique characteristics, they all set some variables describing the top-level directory, and editing to adjust the string should be sufficient. You may need to repeat the adjustment if you update the modulefiles_hptc RPM or otherwise rebuild your system.
If the package is a variant, for example, a beta version of a compiler, first copy the default modulefile to a well-named copy, then edit the copy. You need root access to modify the modulefiles, which is generally needed to install packages in either the /opt or /usr directories.
If a user downloads a package into a private directory, the user can create a private modulefiles directory. The user can then copy the corresponding default modulefile from under the /opt/modules/modulefiles directory into a private modulefiles directory, edit the file, and then register the directory with the module use command.
2.2.2 Supplied Modulefiles
The HP XC system provides the Modules Pack age (not to be confused with Linux kernel modules) to configure and modify the user environment. The Mod ules Package enab les dynamic m od ification of a user’s environment by means of modulefiles.
A modulefile contains inform a tion that alters or sets shell environment variables, such as PATH and MANPATH. Mo dulefiles provide a convenient means for users to tailor their working environment. Modulefiles can be loaded automatically when the user logs in to the system or any time a user needs to alter the environment.
The HP XC System Software provides a number of modulefiles. In addition, users can also create and load their own modulefiles to modify their environment further.
The HP XC system supplies the modulefiles listed in Table 2-1.

Table 2-1: Supplied Modulefiles

Modulefile Sets the HP XC User Environment:
icc/8.0 icc/8.1 ifort/8.0 ifort/8.1 intel/7.1 intel/8.0
To use Intel C/C++ Version 8.0 compilers. To use Intel C/C++ Version 8.1 compilers. To use Intel Fortran Version 8.0 compilers. To use Intel Fortran Version 8.1 compilers. For Intel Version 7.1 compilers. For Intel Version 8.0 compilers.
Using the System 2-3
Table 2-1: Supplied Modulefiles (cont.)
Modulefile Sets the HP XC User Environment:
intel/8.1 mlib/intel/7.1 mlib/intel/8.0 mlib/pgi/5.1 mpi/hp pgi/5.1 pgi/5.2 idb/7.3 totalview/default
For Intel Version 8.1 compilers. For MLIB and Intel Version 7.1 compilers. For MLIB and Intel Version 8.0 compilers. For MLIB and PGI Version 5.1 c ompilers. For HP-MPI. For PGI Version 5.1 compilers. For PGI Version 5.2 compilers. To use the Intel IDB debugger. For the TotalView debugger.
2.2.3 Modulefiles Automatically Loaded on the System
HP XC system does not load any modulefiles into you r environment by default. However,
The
re may be mod ulefiles designated by your system administrator that are automatically
the
ded. Section 2.2.5 describes how you can determine what modulefiles are currently loaded
loa
our system.
on y Users can also automatically load their own modules by creating a login script and designating
the modulefiles to be loaded in the script. Users can also add or remove modules from their current environment on a per-module basis as described in Section 2.2.6.
2.2.4 Viewing Available Modulefiles
Available modulefiles are modulefiles that have been provided with the HP XC system software and are available for you to load. A modulefile must be loaded before it provides ch anges to your environment, as described in the in troduction to this section. You can view the modulefiles that are available o n the system by issuing the module avail command:
$ module avail
2.5 Viewing Loaded Modulefiles
2.
A loaded m odulefile is a modulefile that has been explici tly loaded in your environment by the module load command. To view the modu lefiles that are currently loaded in your environment, issue the module list command:
odule list
$ m
2.2.6 Loading a Modulefile
u can load a modulefile in to your environment to enable easier access to software that you
Yo
nt to use by executing the module load command. You can load a modulefile for the
wa
rrent s ession, or you can set up your environment to load the modulefile whenever you
cu
gintothesystem.
lo When loading a modulefile, note that certain modulefiles cannot be loaded while other
modulefiles are currently loaded. For example, this can happen with different versions of the same software. If a modulefile you are attempting to load conflicts with a currently-loaded modulefile, the modulefile will not be loaded and an error message will be displayed.
2-4 Using the System
If you encounter a modulefile conflict when loading a mod ulef ile , you mu st unload the conflicting mo dulefile before you load the new modulefile. Refer to Section 2.2.8 for f urther information about modulefile conflicts.
2.2.6.1 Loading a Modulefile for the Current Session
You can load a modulefile for your current l ogin session as needed. To do this, issue the module load command as shown in the following example, which illustrates the
TotalView modulefile being loaded:
$ module load totalview
Loading a modulefile in this manner affects your environment for the curren t session only.
2.2.6.2 Automatically Loading a Modulefile at Login
If you frequently use one or more modulefiles that are not loaded when you log in to the system, you can set up your environment to automatically lo ad those modulefiles for you. A method for doin g this is to mod ify your shell startup script to include instructions to load the modulefile automatically.
For example, if you wanted to automatically load the TotalView modulefile when you log in, edit your shell startup script to include the following instructions. This example uses bash as the login shell. Edit the ~/.bashrc f il e as follows:
# if the ’module’ command is defined, $MODULESHOME # will be set if [ -n "$MODULESHOME" ]; then
module load totalview
fi
From now on, whenever you log in, the TotalView modulefile is automatically loaded in your environment.
2.2.7 Unloading a Modulefile
In certain cases, you may find it necessary to unload a particular modulefile before you can load another modulefile in to your environment to avoid modulefile conflicts. Refer to Section 2.2.8 for information about modulefile conflicts.
You can unload a modulefile by using the module unload command, as shown in the following example:
$ module unload ifort/8.0
Unloading a modulefile that is loaded by default makes it inactive for the current session on ly — it w ill be reloaded the next time you log in.
2.2.8 Modulefile Conflicts
Some modulefiles should not be loaded while certain other modulefiles are cur ren tly loaded. This is especially true of modulefiles for different versions of the same software. F or example, the Intel C/C++ Version 8.0 compiler modulefile should not be loaded while the Intel C/C++ Version 8.1 compiler modulefile is l oaded . A modulefile conflict occurs in this situation.
The system will display an error message when you attempt to load a modulef ile that conflicts with one or more currently-loaded modulefiles. For example:
$ module load ifort/8.0 ifort/8.0(19):ERROR:150: Module ’ifort/8.0’ conflicts with the currently loaded module(s) ’ifort/8.1’
Using the System 2-5
ifort/8.0(19):ERROR:102: Tcl command execution failed: conflict ifort/8.1
In this example, the user attempted to load the ifort/8.0 modulefile, but after issuing the command to load the modulefile, an error message occurred indicating a conflict between this modulefile and the ifort/8.1 modulefile, which is already loaded.
n a modulefile conflict occurs, un load the conflicting modulefile(s) before loading the ne w
Whe
ulefile. In the above example, you should unload th e ifort/8.0 modulefile before
mod
ding the ifort/8.1 modulefile. For informa tio n about unloading a modulefile, refer to
loa
ction 2.2.7.
Se
_________________________ Note _________________________
To avoid problems, it is recommended that you alw ays unload one version of a modulefile before loading another version.
2.2.9 Creating a Modulefile
If you download or install a software p ackage into a private directory, you can create your own (private) modulefile for products that you install by using the following general steps:
1. Create a private modulefiles directory.
2. Copy an existing modulefile (to use as a template), or copy the software’s corresponding default modulefile from under /opt/modules/modulefiles, into the private modulefiles directory.
3. Edit and modify the modulefile accordingly.
4. Register t he private directory with the module use command.
A user installing a variant of a product/package already on the system should copy the existing modulefile for that product to an appropriate name, and edit it accordingly to accommodate the newly-installed product variant.
A user installing a random product/package should look at the manpages for modulefiles, examine the existing modulefiles, and create a new modulefile for the product being installed using existing modulefiles as a template. To view modules manpages, load the modules modulefile and then display the modulefile manpage:
$ module load modules $ man modulefile
Users should also read the manpages for modules so that they know how to create a directory for their private modulefiles and how to use the module use <dirname> module command to use their private modules.
2.2.10 Viewing Modulefile-Specific Help
You can view help information for any of the modulefiles on the HP XC system. For example, to access modulefile-specific help information for TotalView, issue the module help command as follows:
$ module help totalview
----------- Module Specific Help for ’totalview’ -----------------
This loads the TotalView environment.
Version 6.0
2-6 Using the System

2.3 Launching and Managing Jobs Quick Start

This section provides a brief description of some of the many ways to lau and get information about jobs on an HP XC system . This section is inten overview about some basic ways of running and managing jobs. Full info about the HP XC job launch environment are provided in the SLURM chapte the LSF chapter (Chapter 7) of this manual.
2.3.1 Introduction
As described in Section 1.4, SLU RM and LSF cooperate to run and manage j obs on the HP XC system, combining LSF’s powerful and flexible scheduling functionality with SLURM’s scalable parallel job launching capabilities.
SLURM is the low-level resource manager and job launcher, and performs processor allocation for jobs. LSF gathers information about the cluster from SLURM — when a job is ready to be launched, LSF creates a SLURM node allocation and dispatches the job to that allocation.
Although jobs can be launched directly using SLURM, it is recommended that you use LSF to take advantage of its scheduling and job management capabilities. SLURM options can be added to the LSF job launch command line to further define job launch requirements. The HP-MPI mpirun command and its option s can be used within LSF to launch jobs that require MPI’s high-performance message-passing capabilities.
When the HP XC system is installed, a SLURM partition of nodes is created to contain LSF jobs. This partition is called t he lsf partition.
When a job is submitted to LSF, the LSF scheduler prioritizes the job and waits until the required resources (compute nodes from the lsf partition) are available.
nch jobs, manage jobs,
dedonlyasaquick rmation and details
r(Chapter6)and
When the requested r of nodes on behalf of job with the LSF-HPC
A detailed explanation of how SLURM and LSF interact to launch and manage jobs is provided in Section 7.1.4.
esources are available for the job, LSF-HPC creates a SLURM allocation
the user, sets the SLURM JobID for the allocation, and dispatches the
JOB_STARTER script to the first allocated node.
2.3.2 Getting Information About Queues
The LSF bqueues command lists the configured job queues in LSF. By default, bqueues returns the following information about all queues: queue name, queue priority, queue status, job slot statistics, and job state statistics.
To get inform a tion about queue s, enter the bqueues as follows:
$ bqueues
Refer to Section 7.3.4 for more information about using this command and a sample of its output.
2.3.3 Getting Information About Resources
The LSF bhosts, lshosts,andlsload commands are quick ways to get information about system resources. LSF daemons run on only one node in the HP XC system, so the bhosts and lshosts commands will list one host — which represents all the resources of the HP XC system. The total number of processors for that host should be equal to the total number of processors assigned to the SLURM lsf partition.
•TheLSFbhosts comm a nd provides a summary of the jobs on the system and information
about the current state of LSF.
$ bhosts
Refer to Section 7.3.1 for more information about using th is command and a sample of its output.
Using the System 2-7
•TheLSFlshosts command displays machine-specific information for the LSF execution
host node.
$ lshosts
Refer to Section 7.3.2 for more information about using th is command and a sample of its output.
•TheLSFlsload command displays load information for the LSF execution host node.
$ lsload
Refer to Section 7.3.3 for more information about using th is command and a sample of its output.
2.3.4 Getting Information About the System’s Partitions
Information a b out the system’s partitions can be viewed with the SLURM sinfo comm and. The sinfo command reports the state of all partitions and nodes managed by SLU RM and provides a wide variety of filtering, sorting, and formatting options. sinfo displays a summary of available partition and node (not job) informatio n such as partition names, nodes/partition, and CPUs/node).
$ sinfo
Refer to Section 7.3.5 for more information about using the sinfo command and a sample of its output.
2.3.5 Launching Jobs
To launch a job on an HP XC system, use the LSF bsub command. The bsub command submits batch jobs or interactive batch jobs to an LSF queue for execution.
This section provides some brief examples of how to launch some typical serial and parallel jobs. Refer to Section 7.4 for full information about launching jobs with the bsub command.
2.3.5.1 Submitting a Serial Job
Submitting serial jobs is discussed in detail in Section 7.4.3. The command format to submit aserialjobis:
bsub [bsub-options][srun [srun-options]]executable [executable-options]
Use the LSF bsub command to submit a job on the LSF-HPC execution hos t. The SLURM srun job launch command is only needed if the LSF-HPC JOB_STARTER script is
not configured for the intended queue (but can be used regardless of whether or not the script is configured). You can use the bqueues command to confirm whether or not the JOB_STARTER script exists.The executable parameter is the name of an executable f ile or command.
The use of srun isdiscussedindetailinChapter6. Consider an HP XC configuration where lsfhost.localdomain is the LSF execution
host, and nodes n[1-10] are compute nodes in the LSF partition. All nodes contain two processors, providing 20 processors for use by LSF jobs. The following example shows one waytosubmitaserialjobonthissystem:
Example 2-1: Submitting a Serial Job
$ bsub -I srun hostname Job <20> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> n1
2-8 Using the System
2.3.5.2 Submitting a Non-MPI Parallel Job
Submitting non-MPI parallel jobs is discussedindetailinSection7.4.4.TheLSFbsub command format to submit a simple non-MPI parallel job is:
bsub -n num-procs [bsub-options] srun [srun-options] executable [executable-options]
The bsub commandsubmitsthejobtoLSF-HPC. The -n num-procs parameter specifies the number o f processors requested for the job. This
parameter is required for parallel jobs. The inclusion of the SLURM srun command is required in the LSF-HPC command line to
distribute th e tasks on the allocated compute nodes in the LSF partition. The executable parameter is the name of an executable file or command. Consider an HP XC configuration where lsfhost.localdomain is the LSF-H PC
execution host and nodes n[1-10] are comp ute nodes in the SLURM lsf partition. All nodes contain two processors, providing 20 processors for use by LSF-H PC jobs. The following example shows one way to submit a non-MPI parallel job on this system:
Example 2-2: Submitting a Non-MPI Parallel Job
$ bsub -n4 -I srun hostname Job <21> is submitted to default queue <normal> <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> n1 n1 n2 n2
In the above example, the job output shows that the job “srun hostname” was launched from the LSF execution host lsfhost.localdomain, and that it ran on four processors from the allotted nodes n1 and n2.
Refer to Section 7.4.4 for an explanation of the options used in this command, and for full information about submitting a parallel job.
Using SLURM Options with the LSF External Scheduler
An important option that can be included in submitting parallel jobs is LSF-HPC’s external scheduler option. The LSF-HPC external SLURM scheduler provides addition a l capabilities at the job and queue levels by allowing the inclusion of several SLURM options in the L SF-HPC command line. For example, it can be used to submit a job to run one task per node, or to submit a job to run on only specified nodes.
The format for this option is:
-ext "SLURM[slurm-arguments]"
The slurm-arguments can consist of one or more srun allocationoptions(inlongformat). Refer to Section 7.4.2 for additional inform atio n about using the LSF-HPC external scheduler.
The Platform C omputing LSF documentation provide more information on general external scheduler support. Also see the lsf_diff
(1) manpage for information on the specific srun
options available in the external SLURM scheduler. The following example uses the external SLURM scheduler to subm it one task per node (on
SMP nodes):
Using the System 2-9
Example 2-3: Submitting a Non-MPI Parallel Job to Run One Task per Node
$ bsub -n4 -ext "SLURM[nodes=4]" -I srun hostname Job <22> is submitted to default queue <normal> <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> n1 n2 n3 n4
2.3.5.3 Submitting an MPI Job
Submitting MPI jobs is discussed in detail in Section 7.4.5. The bsub command format to submit a job to HP-MPI by means of mpirun command is:
bsub -n num-procs [bsub-options] mpirun [mpirun-options] [-srun [srun-options]]mpi-jobname [job-options]
The -srun command is required by the mpirun command to run jobs in the LSF partition. The -n num-procs parameter specifies the number of processors the job requests. -n num-procs is required for parallel jobs. Any SLURM srun options that are included are job specific, not allocation-specific.
Using SLURM Options in MPI Jobs with the LSF External Scheduler
An important option that can be included in submitting HP-MPI jobs is LSF’s external scheduler option. The LSF external scheduler provides additional capabilities at the job level and queue level by allowing the inclusion of several SLURM options in the LSF command line. For example, it can be used to submit a job to run one task per node, or to submit a job to run on specific nodes. This option is discussed in detail in Section 7.4.2. An example of its use is provided in this section.
Consider an HP XC configuration where lsfhost.localdomain is the LSF execution host and nodes n[1-10] are compute nodes in the LSF partition. All nodes contain two processors, providing 20 processors for use by LSF jobs.
Example2-4:RunninganMPIJobwithLSF
$ bsub -n4 -I mpirun -srun ./hello_world Job <24> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>>
Hello world! Hello world! I’m 1 of 4 on host1 Hello world! I’m 3 of 4 on host2 Hello world! I’m 0 of 4 on host1 Hello world! I’m 2 of 4 on host2
Example 2-5: Running an MPI Job with LSF Using the External Scheduler Option
$ bsub -n4 -ext "SLURM [nodes=4]" -I mpirun -srun ./hello_world Job <27> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>>
Hello world! Hello world! I’m 1 of 4 on host1
2-10 Using the System
Example 2-5: Running an MPI Job with LSF Using the External Scheduler Option (cont.)
Hello world! I’m 2 of 4 on host2 Hello world! I’m 3 of 4 on host3 Hello world! I’m 4 of 4 on host4
2.3.5.4 Submitting a Batch Job or Job Script
SubmittingbatchjobsisdiscussedindetailinSection7.4.6.Thebsub command format to submit a batch job or job script is:
bsub -n num-procs [bsub-options] script-name
The -n num-procs option specifies the number of processors the job requests. The -n num-procs is required for parallel jobs.
The script-name argument is the name of the batch job or script. The script can contain one or more srun or mpirun commands.
The script will e x ecute once on the first allocated node, and the srun or mpirun commands within the script will be run on the allocated compute nodes.
Consider an HP XC configuration where lsfhost.localdomain is the LSF execution host and nodes n[1-10] are compute nodes in the LSF partition. All nodes contain 2 processors, providing 20 processors for use by LSF jobs.
In this example, the following simple script myjobscript.sh is launched. Two srun commands are specified within the script.
Example 2-6: Submitting a Job Script
#!/bin/sh srun hostname mpirun -srun hellompi
The following command submits this script:
$ bsub -I -n4 myjobscript.sh
The -n4 option specifies that four processors are required. The -I option specifies that the job is interactive and directs output to the terminal screen. The following output is displayed:
Job <29> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> n2 n2 n4 n4 Hello world! I’m 0 of 4 on n2 Hello world! I’m 1 of 4 on n2 Hello world! I’m 2 of 4 on n4 Hello world! I’m 3 of 4 on n4
Using the System 2-11
2.3.6 Getting Information About Your Jobs
Yo u can obtain information about your running or completed jobs with the bjobs and bhist commands.
bjobs
bhist
The components of the actual SLURM allocation command can be seen with the bjobs -l and bhist -l LSF commands.
Checks the status of a running job (Section 7.5.2 )
Gets brief or full information about finished jobs (Section 7.5.3)
2.3.7 Stopping and Suspending Jobs
You can suspend or stop your jobs with the bstop and bkill commands.
You can use the LSF bstop commandtostoporsuspendanLSFjob.
You can use the LSF bkill command to kill an LSF job.
2.3.8 Resuming Suspended Jobs
You can use the LSF bresume command to resume a stopped or suspended job.

2.4 Performing Other Common User Tasks

This section contains general information and assorted topics about using the HP XC system.
2.4.1 Determining the LSF Cluster Name and LSF Execution Host
lsid command returns the LSF cluster name, the LSF Version, and the name of the
The
execution host.
LSF
$ lsid Platform LSF HPC 6.0 for SLURM, Sep 23 2004 Copyright 1992-2004 Platform Computing Corporation
My cluster name is hptclsf My master name is lsfhost.localdomain
In this example, hptclsf is the LSF clust node where LSF is installed and runs (LSF
er name, and lsfhost.localdomain is the execution host).
2.4.2 Installing Third-Party Software
Ifyouintendtodownloadorinstallathird-party software package, contact your system administrator to assure that you perform the proper installation and set up requirements on the HP XC system. For example, if you install a 3rd-party software package into a private directory, you should create a modulefile for it. Refer to Section 2.2.9 for information about creating a modulefile.

2.5 Getting System Help and Information

In addition to the hardcopy documentation described in the preface section of this manual (About This Documen t), the HP XC system also provides sy stem help and information in the form on online manpages.
Manpages provide online reference and command information from the system command line. Manpages are supplied with the HP XC system for standard HP XC components, Linux user commands, LSF commands, SLURM commands, and other software components that are
2-12 Using the System
distributed with the HP XC cluster, such as HP-MPI. Manpages for third-party vendor software components may be provided as a part of the deliverables for that software component.
To access manpages, type the man command with the nam e of a command. For example:
$ man sinfo
This command accesses the manpage for the SLURM sinfo command. If you are unsure of what command that you need to reference, you can enter the man command
with the -k option and a keyword, to get a list of commands that are related to the keyword you entered. For example:
$ man -k keyword
Using the System 2-13
3

Developing Applications

This chapter discusses topics associated with developing ap plicatio ns in the HP XC environment. Before reading this ch apter, you should you read and u nderstand Chapter 1 and Chapter 2.
This chapter discusses the following topics:
HP XC application development environment overview (Section 3 .1)
Using compilers (Section 3.2)
Getting system information (Section 3 .3 )
Getting system information (Section 3 .4 )
Setting debugging options (Section 3.5)
Developing serial applications (Section 3.6)
Developing parallel applications (Section 3.7)
Developing libraries (Section 3.8)
Advanced topics (Section 3.9)

3.1 Overview

The HP XC cluster provides an application development environment that enables developing, building, and running applications using multiple nodes with multiple processors. These applications can be parallel applications using many processors, or serial applications using a single processor.
The HP XC cluster is made up of nodes that are assigned one or more various roles. Of importance to the application developer are the nodes that have the compute role and the login role (compute nodes and login nodes). Compute no des run user applications. Login nodes are where you log in and interact with the system to perform various tasks, such as executing commands, compiling and linking applications, and launching applications. A login node can also execute single-processor applications and commands, just as on any other standard Linux system . Applications are launched from login nodes, and then distributed and run on one or more compute nodes.
The HP XC environment uses the LSF batch job scheduler to launch and manage parallel and serial applications. When a job is submitted, LSF places the job in a queue and allows it to run when the necessary resources become available. When a job is com pleted, LSF returns job output, job information, and any errors. In addition to batch j obs, LSF can also run interactive batch jobs and interactive jobs. An LSF interactive batch job is a batch job that allows you to interact with the application, yet still take advantage of LSF scheduling policies and features. An LSF interactive job is run without using LSF’s batch processing features, but is dispatched immediately by LSF on the LSF execution host node. LSF is described in detail in Chapter 7.
Regardless of whether an application is parallel or serial, or whether it is run interactively or a s a batch job, the general steps to developing an HP XC application are as follows:
1. Build the code by compiling and linking with the correct compiler. Note that compiler selection, and set up of appropriate parameters for specific compilers, is made easier by the use of modules.
2. Launch the application with the bsub, srun,ormpirun command.
The build and launch commands are executed from the node to which you are logged in.
Developing Applications 3-1

3.2 Using Compilers

You can use compilers acquired from other vendors on an HP XC system. For example, H P XC supports Intel C/C++ and Fortran compilers for the 6 4-bit architecture, and Portland Group C/C++ and Fortran compilers for the XC4000 platform.
You can use other compilers and libraries on the HP XC system as on any other system, provided they contain single-processor routines and have no dependencies on another message-passing system.
3.2.1 Standard Linux Compilers
The standard Linux C compiler (gcc), C++ compiler (gcc++), and Fortran 77 compiler (g77) and libraries are supported on the HP X C system. They perform as described in the Linux documentation.
The HP XC System Software supplies these compilers by default.
.2 Intel Compilers
3.2
The information in this section pertains to using Intel compilers in the HP XC environment on the 64-bit architecture platform. Intel compilers are not supplied with the HP XC system. The following I ntel compilers are supported for this version of HP XC:
Intel Fortran and C/C++ Version 8.0 compilers
Intel Fortran and C/C++ Version 7.1 compilers
Use one of the commands listed in Table 3-1 to compile programs with Intel compilers, depending upon what Intel compilers are installed on your system.

Table 3-1: Intel Compiler Commands

Compiler Command
Fortran V8.1, V9.0 beta C/C++ V8.1, V9.0 beta Fortran V7.1* C/C++ V7.1* * These compilers can be used, but will not be supported by Intel much longer.
3.2.3 PGI Compilers
The information in this section pertains to using PGI compilers in the HP XC environment on the XC4000 platform. The following PGI com pilers are supported for this version of HP XC:
PGI Fortran 95 and C/C++ compilers
PGI Fortran 77 and C/C++ compilers
Use one of the commands listed in Table 3-2 to compile programs with PGI compilers, depending upon what PGI compilers are installed on your system.

Table 3-2: PGI Compiler Commands

Compiler Command
Fortran 95 C/C++ Fortran 77 C/C++
ifort icc efc ecc
pgf95 pgcc
pgf77 pgCC
3-2 Developing Applications
3.2.4 Pathscale Compilers
Compilers in the Pathscale EKOPath Ver sion 2.1 Compiler Suite are supported on HP XC4000 systems only. See the following Web site for more information: http://www.pathscale.com/ekopath.html.
3.2.5 MPI Compiler
The HP XC System Software includes MPI. The MPI library on the HP XC system supports HP MPI 2.1.

3.3 Checking Nodes and Partitions Before Running Jobs

Before launching an application, you can determine the availability and status of the system’s nodes and partitions. Node and partitio n information is useful to have before launching a job so that you can launch the job to properly match the resources that are available on the system .
When invoked with no options, the SLURM sinfo command returns information about node availability and p ar titions, along with other information:
$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST lsf up infinite 4 down* n[12-15] slurm* up infinite 2 idle n[10-11]
The previous sinfo output shows that there are two partitions on the system:
one for LSF-HPC jobs
one for SLURM jobs
The asterisk in the PARTITION column indicates the default partition. An asterisk in the STATE column indicates nodes that are currently not responding.
Refer t o Chapter 6 for information about using the sinfo command. The SLURM sinfo manpage a lso provides detailed information about the sinfo command.

3.4 Interrupting a Job

A job launched by the srun command can be interrupted by sending a signal to the command by issuing one or more Ctrl/C key sequences. Signals sent to the srun command are automatically forwarded to t he tasks that it is controlling.
The Ctrl/C key sequence will report the state of all tasks associated with the srun command. If the Ctrl/C key sequence is entered twice within one second, the associated SIGINT signal will be sent to all tasks. If a third Ctrl/C key sequence is entered, the job will be terminated without waiting for remote tasks to exit.
The Ctrl/Z key sequence is ignored.

3.5 Setting Debugging Options

In general, th e debugging information for your application that is needed by most debuggers can be produced by supplying the -g switch to the compiler. For more specific information about debugging options, refer to the documentation and manpages associated with your compiler.

3.6 Developing Serial Applications

This section describes how to build and run serial applications in the HP XC environment. The following topics are covered:
Developing Applications 3-3
Section 3.6.1 describes the serial application programming model.
Section 3.6.2 discusses how to build serial applications.
For further information about developing serial applications, refer to the following sections:
Section 4.1 describes how to debug serial applications.
Section 6.4 describes how to launch applications with the srun command.
Section A.1 provides examples of serial applications.
3.6.1 Serial Application Build Environment
The HP XC programming environment supports building and running serial applications. A serial application is a command or application that does not use any form of parallelism.
An example of a serial application is a standard Linux command, such as the ls or hostname command. A serial application is basically a single-processor application that has no communication library calls such as MPI.
3.6.1.1 Using MLIB in Serial Applications
Information about using HP MLIB in serial ap plicatio ns is provided in Chapter 9. The HP MLIB Product Overview and HP MLIB User ’s Guide are available on the H P XC Documentation CD.
3.6.2 Building Serial Applications
This section discusses how to build serial applications on an HP XC system. Compiling, linking, and running serial applications are discussed.
Tobuild a serial application, you must be logged in to an HP XC node with the login role. Serial applications are compiled and linked by invoking compilers and linkers directly.
Yo u launch a serial application either by submitting it to LSF with the bsub command, or by invoking the srun comm and to run it. The process is similar to launching a parallel application, except that only one compute node processor is used. To run on an compute node processor, the serial application and any required dynamic libraries must be accessible from that node. A serial application can also be tested locally by running it on the login node.
3.6.2.1 Compiling and Linking Serial Applications
Serial applications are compiled and lin ked by invoking compile and link drivers directly. You can change compilers by using modules. Refer to Section 2.2 for information about using
modules. As an alternative to u sing dynamic libraries, serial a pplications can also be linked to static
libraries. Often, the -static optionisusedtodothis. Refer to Section A.1 for examples of building serial applications with the GNU C, GNU
Fortran, and Intel C/C++, and Intel Fortran compilers.

3.7 Developing Parallel Applications

This section describes how to build and run parallel applications. The following topics are discussed:
The parallel build environment (Section 3.7.1)
Building parallel applications (Section 3.7.2)
For further information about developing parallel applications in the HP XC environment, refer to the following sections:
3-4 Developing Applications
Launching applications with the srun command (Section 6.4)
Advanced topics related to developing parallel applications (Section 3.9)
Debugging parallel app lications (Section 4.2)
3.7.1 Parallel Application Build Environment
This section discusses the parallel application build environment on an H P XC system. The HP XC parallel application enviro nment allows parallel application processes to be started
and stopped together on a large number of application processors, along with the I/O and process control structures to manage these kinds of applications.
HP XC supports the HP-MPI distributed memory programming model for building and running parallel applications. In addition to using HP-MPI for parallel application development, HP XC supports the use of OpenMP and Pthreads. OpenMP and Pthreads can be used in conjunction with HP-MPI or separately. The section d iscusses these development tools as they rel ate to the HP XC system.
3.7.1.1 Modulefiles
The basics of your working environment are set up automatically by your system administrator during the installation of HP XC. However, your application development environment can be modified by means of modulefiles, as described in Section 2.2.
There a r e modulefiles available that you can load yourself to furth e r tailor your environment to your specific application development requirements. For example, the TotalView module is available for debugging ap plicatio ns. Section 2.2 provides instructions on how to list what modulefiles are available for you to load, and how load a modulefile.
If you encounter problems accessing tools or commands (and associated manpages), check to ensure that required modules are loaded on y our system. If necessary, load required modules yourself, as described in Section 2.2. Otherwise, contact your administrator.
3.7.1.2 HP-MPI
HP XC supports the HP-MPI distributed memory programming model for building and running parallel applications. In this model, all data is private to each process. All interprocessor communication within a parallel application is performed throug h calls to the HP-MPI message passing library. Even though support for applications that use a shared-memory programming model is not available at this time, individual processors within an application node can be used in the same application as separate HP-MPI tasks. A pplications that are MPI-based, and currently run on Linux (or Linux compatible) systems, can be easily migrated to an HP XC cluster. For information about using HP-MPI in the HP XC env iro nm e nt, refer to Chapter 8.
3.7.1.3 OpenMP
The OpenMP specification is a set of compiler directives that can be used to specify shared-memory p arallelism in Fortran and C/C++ programs. Both Intel and Portland Group Fortran an d C/C++ compilers support OpenMP.
Although OpenMP is designed for use on share d-m emory architectures, OpenMP can be used on an H P XC system within a node.
OpenMP can be used alone, or i n conjunction with HP-MPI. For information about compiling programs using OpenM P, refer to OpenM P documentation.
3.7.1.4 Pthreads
POSIX Threads (Pthreads) is a standard library that programmers can use to develop portable threaded applications. Pthreads can be used in conjunction with HP-MPI on the HP XC system.
Developing Applications 3-5
Compilers from GNU, Intel and PGI provide a -pthread switch to allow compilation with the Pthread library.
Packages that link against Pthreads, such as MKL and MLIB, require that the application is linked using the -pthread option. The Pthread option is invoked with the following compiler-specific switches:
GNU
Intel
PGI
-pthread
-pthread
-lpgthread
For example:
$ mpicc object1.o ... -pthread -o myapp.exe
3.7.1.5 Quadrics SHMEM
The Quadrics implementation of Quadrics switches. SHMEM is a co distributed-memory model) for
To compile programs that use SHMEM, it is necessary to include the shmem.h fileandtouse the SHMEM and Elan libraries. For example:
$ gcc -o shping shping.c -lshmem -lelan
3.7.1.6 MLIB Math Library
The HP MLIB mathematical library is included in the HP XC System Software and is installed by default.
SHMEM is supported on HP XC systems with
llection of high-performance routines (th at support a
data passing between parallel executables.
HP MLIB contains mathematical software and computational kernels for e ng i scientific appl icati ons involving linear equations, least squares, eige value deco mp osition, vector and matrix computations, conv olutions, and F This release of HP XC MLIB has four components: VECLIB, LAPACK, ScaLAPACK SuperLU_DIST.
You must install the Intel compilers for the 64-bit architecture, or PGI compilers for XC4000, in order to use the HP XC MLIB mathematical library. See your system administrator if the required compilers are not installed on your system.
Information about Overview and User’
3.7.1.7 MPI Library
The MPI library supports MPI 1.2 as describ ed in the 1997 release of MPI: A Message Passing Interface Standard. Users should n ote that the MPI specification describes the a pplication
programming interface, but does not specify the contents of the MPI header files, mpi.h and mpif.h, that are included in the s ource code. Therefore, an MPI application must be recompiled using the proper header files for the MPI library to which it is to be linked.
Parallel application s that use M PI for communication must include the HP XC infrastructure libraries. MPI applications must be built with mpicc, mpic++, mpif77,ormpif90 utiliti es.
When an MPI application is launched, the user environment, including any MPI environment variables that have been set, is passed to the application.
neering and
nvalue problems, singular
ourier Transforms.
,and
using HP MLIB is provided in Chapter 9. An HP XC MLIB Product
s Guide are available on the HP XC Documentation CD-ROM.
MPI profiling support is included in the HP XC MPI library, so you do not need to link with a separate library to access the PMPI_xxx() versions of the MPI routines.
3-6 Developing Applications
The HP XC cluster comes with a mo dulefile for HP-MPI. The mpi modulefile is used to set up the necessary environment to use HP-MPI, such as the values of the search paths for header and library files.
Refer to Chapter 8 for information and examples that show how to build and run an HP-MPI application.
3.7.1.8 Intel Fortran and C/C++Compilers
Intel Fortran compilers (Version 7.x and greater) are supported on the HP XC cluster. However, the HP XC cluster does not supply a copy of Intel compilers. Intel compilers must be ob tain ed directly from the vendor. Refer to Intel documentation for information about using these compilers.
3.7.1.9 PGI Fortran and C/C++ Compilers
PGI Fortran 95, Fortran 77, and C/C++ compilers are supported on the HP XC cluster. However, the HP XC cluster does not supply a copy of PGI compilers. PGI compilers must be obtained directly from the vendor. Refer to PGI documentation for information about using these compilers.
3.7.1.10 GNU C and C++ Compilers
The GNU C and C++ compilers are supported on the HP XC cluster. The HP XC cluster supplies copies of the GNU C and C++ comp ilers.
3.7.1.11 GNU Parallel Make
The GNU parallel Make command is used whenever the make command is invoked. GNU parallel Make provides the ability to do a parallel Make; however, all compiling takes place on the login node. Therefore, whether a parallel make improves build time depends upon how many processors are on the login node and the load on the login node.
Information about using the GNU parallel Make is provided in Section 3.9.1. For f urther information about using GNU parallel Make, refer to the make manpage. For
additional sources of GNU information, refer to the references provided in the front of this manual, located in About This Document.
3.7.1.12 MKL Librar
y
MKL is a math library that references pthreads, and in enabled environments, can use multiple threads. MKL can be linked in a single-threaded manner with your application by specifying the following in the link command:
On XC3000 and XC4000 systems:
-L/opt/intel/mkl70/lib/32 -lmkl_ia32 -lguide -pthread
On XC6000 systems:
-L/opt/intel/mkl70/lib/64 -lmkl_ipf -lguide -pthread
3.7.1.13 ACML Library
HP XC supports the AMD Core Math Lib rary (ACML library) on XC4000.
3.7.1.14 Other Libraries
Other l ibraries can be used as they would on an y other system. However they must contain single processor routines and have no dependencies on another m essage passing system.
Developing Applications 3-7
3.7.1.15 Reserved Symbols and Names
The HP XC system reserves certain symbols and name s for internal use. Reserved symbols and nam es should not be included in user code. If a reserved symbol or name is used, errors could occur.
3.7.2 Building Parallel Applications
This section describes how to build MPI and non-MPI parallel applications on an HP XC system.
3.7.2.1 Compiling and Linking Non-MPI Applications
If you are building non-MPI applications, such as an OpenMP application for example, you can compile and link them on an HP XC as you normally would, with standard header files and switches.
3.7.2.2 Compiling and Linking HP-MPI Applications
This section provides some general information about how to build an HP-MPI application in the HP XC environment. Refer to Chapter 8 for complete details, exam ples, and further information about building HP-MPI applications.
Compiling and linking an MPI applicat ion on an HP XC system is performed by invo king the HP-MPI compiler utilities. HP-MPI compiler utilities are scripts supplied by HP-MPI to make it easier to invoke a com piler with the appropriate libraries and search paths. The HP-MPI compiler u tilities add all the necessary path locations and library specifications to the compile and link steps that are required to build a HP XC parallel application. It is highly recommended that you use the HP-MPI compiler utilities to compile and link your M PI application on an HP XC cluster, rather than invoke a compiler directly.
The mpicc, mpic++, mpif90,andmpif77 MPI compiler comma nds are used to invoke the HP-MPI compiler utilities that compile and link an M PI application. The mpicc and mpic++ commands invoke the drivers of the C and C++ compilers. The mpif77 and mpif90 commands invoke the drivers of the Fortran 77 and Fortran 90 compilers.
Before you can com pile and link an MPI program using the MPI compiler commands, the MPI compiler utilities module (mpi) must be loaded, or you must arrange for them to be in your $PATH search list. The use of modules is described in Section 2.2.
3.7.2.3 Examples of Compiling and Linking HP-MPI Applications
The following examples show how to compile and link you r application code by invoking a compiler utility.
If you have not already loaded the mpi compiler utilities m odule , load it now as follows:
$ module load mpi
To compile and lin k a C application using the mpicc command:
$ mpicc -o mycode hello.c
To compile and link a Fortran application using the mpif90 command:
$ mpif90 -o mycode hello.f
In the above examples, the HP-MPI commands invoke compiler utilities w hich call the C and Fortran com pilers with appropriate libraries and search paths specified to build the parallel application called hello.The-o specifies that output is directed to a file called mycode.
Refer to Chapter 8 for additional inform ation about building ap plicatio ns with HP-MPI.
3-8 Developing Applications

3.8 Developing Libraries

This section disc a library general
Compiling sources to objects
Assembling the objects into a library
-Usingthear archi
- Using the linker (possibly indirectly by means of a compiler) for shared (.so) libraries. For sufficiently small shared objects, it is often possible to combine the two steps. A common technique i s to build the archive library first, and then build the shared library from
the archive library (using the linker’s -whole-archive switch). For libraries that do not use HP-MPI, it is recommended that the sources be compiled with the
standard compilers (such as gcc), just as they would be on other UNIX-like platforms. For libr a ri es that do u
mpicc) to compile the s
$ mpicc -c -g foo.c
To assemble an archive library, use the ar archive tool as you would on other UNIX-like platforms. To assemble a shared library, use the linker (possibly indirectly by means of a compiler) as you would on other UNIX-like platforms.
Once the library is built, it can be used to build applications, just as other libraries are used, for both serial applications (with the standard compilers) and parallel applications (with the HP-MPI compiler utilities).
usses developing shared and archive libraries for HP XC applications. Building
ly consists of tw o phases:
ve tool for archive (.a)libraries
se HP-M PI, it is possible to use the HP-MPI compiler utilities (such as
ources to objects. For example:
Note that for shared libraries it is necessary to use LD_LIBRARY_PATH to include the directory containing t he shared library, just as you would on other UNIX-like platforms.
3.8.1 Designing Libraries for XC4000
This section discusses the issues surrounding the design of libraries for XC400 0 on the HP XC system.
A user designing a library for use on an HP XC XC4000 system can supply a 32-bit library and/or a 64-bit library. HP recommends that both are supported to provide flexibility, and to make it easy to get the 64-bit advantages locally, but be able to take the 32-bit variant to an x86-class machine or r un a 32-bit variant imported from an x86-class m achine.
It is the librar y designer’s res collide during the build p ro ces between builds, or (as is more co of objects. Separate directori from debuggable versions.
Different com pilers have different ways to select 32-bit or 64-bit compilations and links. Consult the documentation for the compiler for this information.
For released libraries, dynamic and archive, the usual custom is to have a ../lib directory that contains the libraries. This, by itself, will wo rk if the 32-bit and 64-bit libraries have different names. However, HP recommends an alternative method. The dynamic linker, during its attempt to load libraries, will suffix candidate directories with the machine type. For 32-bit binaries on XC4000, it uses i686 and for 64-bit binaries it uses x86_64. HP recommends structuring directories to reflect this behavior. Therefore, if your released directory structure looks like Example 3-1, then it is only necessary to ensure that the LD_LIBRARY_PATH
ponsibility to make sure 32-bit and 64-bit object files do not
s. This can be done by "cleaning" object files from the directories
mmon) ma intaining separate directories for the different types
es also makes it easy to maintain production versions distinct
Developing Applications 3-9
has /opt/mypackage/lib in it, which will then be able to handle both 32-bit and 64-bit binaries that have linked against libmystuff.so.
Example 3-1: Directory Structure
/opt/mypackage/
include/
mystuff.h
lib/
i686/
libmystuff.a libmystuff.so
x86_64/
libmystuff.a libmystuff.so
If you have an existing paradigm using different names, HP recommends introducing links with the above names. An example of t his is shown in Exam ple 3-2.
Example 3-2: Recommended Directory Structure
/opt/mypackage/
include/
mystuff.h
lib/
32/
libmystuff.a libmystuff.so
64/
libmystuff.a libmystuff.so
....... i686 -> 32
....... x86_64 -> 64
Linking an application using the library (dynamic or archive) requires you to specify the appropriate subdirectory, depending on whether the application is 32-bit or 64-bit.
For exam ple, to build a 32-bit application, you might enter:
<linkcommand> <32-bit> -L/opt/mypackage/lib/i686 -lmystuff
To build a 64 -bit application, you might enter:
<linkcommand> <64-bit> -L/opt/mypackage/lib/x86_64 -lmystuff
Note that there is no sho rtcut as there is for the dynamic loader.

3.9 Advanced Topics

This section discusses topics of an advanced nature pertaining to developing applications in the HP XC environment.
3.9.1 Using the GNU Parallel Make Capability
By default, the make command invokes the GNU make program. GNU make has the ability to make independent targets concurrently. For example, if building a program requires compiling 10 source files, and the compilations can be done independen tly, make can manage multiple compilations at once — the number of jobs is user selectable. More precisely, each target’s rules are run normally (sequentially wi thin the rule). Typically the rules for an object file target is a
3-10 Developing Applications
single compilation line, so it is common to talk about concurrent compilations, though GNU make is more general.
On non-cluster platforms or command nodes, matching con c urrency to the number of processors oftenworkswell.Italsooftenworkswelltospecify a few more jobs than processors so that one job can proceed while another is waiting for I/O. On an HP XC system, there is the potential to use compute nodes to do compilations, and there are a variety of ways to make this happen.
One way is to prefix the actual compilation line in the rule with an srun command. So, instead of executing cc foo.c -o foo.o it would execute srun cc foo.c -o foo.o. With concurrency, multiple command nodes would have multiple srun commands instead of multiple cc commands. For projects that recursively run make on subdirectories, the recursive
make can be run on the compute nodes. For example:
$ cd subdir; srun $(MAKE)...
Further, if the recursive make is run remotely, it can be told to use concurrency on the remote node. For example:
$ cd subdir; srun -n1 -N1 $(MAKE) -j4...
This can cause multiple makes to run concurrently, each building their targets concurrently. The -N1 option is used to reserve the entire node, because it is intended to be used for multiple compilations. T he following examples illustrate these ideas. In GNU make,a$(VARIABLE) that is unspecified is replaced with nothing. Therefore, not specifying PREFIX keeps the original makefile’s behavior, but specifying PREFIX to appropriate srun command prefixes will cause concurrency within the b uild.
For more information about GN U parallel make, refer to the make manpage. For additional sources of GNU information, refer to the references provided in About This Document.
In this section, t hree different ways to parallelize a make procedure are illustrated. The smg98 package is used to illustrate these three procedures. The smg98 package is available at the following URL:
http://www.llnl.gov/asci/applications/SMG98README.html
These procedures take advantage of the GNU make -j switch, which specifies the number of jobs to run simultaneously. Refer to the make manpage for more information about this switch.
The following parallel make approaches are described:
Section 3.9.1.1 — Go through the directories serially and have the make procedure within
each directory be parallel. (Modified makefile: Makefile_type1).
Section 3.9.1.2 — Go through the directories in parallel and have the make procedure
within each directory be serial (Modified makefile: Makefile_type2).
Section 3.9.1.3 — Go through the directories in parallel and have the make procedure
within each directory be parallel (Modified makefile: Makefile_type3).
The original makefile is shown below:
#BHEADER*********************************************************** # (c) 1998 The Regents of the University of California # # See the file COPYRIGHT_and_DISCLAIMER for a complete copyright # notice, contact person, and disclaimer. # # $Revision: 1.1 $ #EHEADER***********************************************************
SHELL = /bin/sh
Developing Applications 3-11
srcdir = .
HYPRE_DIRS =\
utilities\ struct_matrix_vector\ struct_linear_solvers\ test
all:
@\ for i in ${HYPRE_DIRS}; \ do \
if [ -d $$i ]; \ then \
echo "Making $$i ..."; \ (cd $$i; make); \ echo ""; \
fi; \
done
clean:
@\ for i in ${HYPRE_DIRS}; \ do \
if [ -d $$i ]; \ then \
echo "Cleaning $$i ..."; \ (cd $$i; make clean); \
fi; \
done
veryclean:
@\ for i in ${HYPRE_DIRS}; \ do \
if [ -d $$i ]; \ then \
echo "Very-cleaning $$i ..."; \ (cd $$i; make veryclean); \
fi; \
done
3.9.1.1 Example Procedure 1
Go through t he directories serially and have the make procedure within each directory be parallel.
For the pu rpose of this exercise we are only parallelizing the “make all” component. The “clean” and “veryclean” components can be par allel ized in a similar fashion.
Modified makefile:
all:
@\ for i ${HYPRE_DIRS}; \ do \
if [ -d $$i ]; \ then \
echo "Making $$i ..."; \ echo $(PREFIX) $(MAKE) $(MAKE_J) -C $$i; \ $(PREFIX) $(MAKE) $(MAKE_J) -C $$i; \
fi; \
done
3-12 Developing Applications
By modifying the makefile to reflect the changes illustrated above, we will now be processing each directory serially and parallelize the individual makes within each directory. The modified
Makefile is invoked as follows:
$ make PREFIX=’srun –n1 –N1 MAKE_J=’-j4’
3.9.1.2 Example Procedure 2
Go through the directories in parallelandhavethemakeprocedurewithin
each directory be
serial. For the pu rpose of this exercise we are only parallelizing the “make all” component. The
“clean” and “veryclean” components can be par allel ized in a similar fashion. Modified makefile:
all:
$(MAKE) $(MAKE_J) struct_matrix_vector/libHYPRE_mv.a
struct_linear_solvers/libHYPRE_ls.a utilities/libHYPRE_utilities.a
$(PREFIX) $(MAKE) -C test
struct_matrix_vector/libHYPRE_mv.a:
$(PREFIX) $(MAKE) -C struct_matrix_vector
struct_linear_solvers/libHYPRE_ls.a:
$(PREFIX) $(MAKE) -C struct_linear_solvers
utilities/libHYPRE_utilities.a:
$(PREFIX) $(MAKE) -C utilities
The modified Makefile is invoked as follows:
$ make PREFIX=’srun -n1 -N1’ MAKE_J=’-j4’
3.9.1.3 Example Procedure 3
Go through the directories in parallel and have the make procedure within each directory be parallel. For the purpose of this exercise, we are on ly parallelizing the “make all” component. The “clean” and “veryclean” components can be parallelized in a similar fashion.
Modified makefile:
all:
$(MAKE) $(MAKE_J) struct_matrix_vector/libHYPRE_mv.a
struct_linear_solvers/libHYPRE_ls.a utilities/libHYPRE_utilities.a
$(PREFIX) $(MAKE) $(MAKE_J) -C test
struct_matrix_vector/libHYPRE_mv.a:
$(PREFIX) $(MAKE) $(MAKE_J) -C struct_matrix_vector
struct_linear_solvers/libHYPRE_ls.a:
$(PREFIX) $(MAKE) $(MAKE_J) -C struct_linear_solvers
Developing Applications 3-13
utilities/libHYPRE_utilities.a:
$(PREFIX) $(MAKE) $(MAKE_J) -C utilities
The modified M akefile is invoked as follows:
$ make PREFIX=’srun -n1 -N1’ MAKE_J=’-j4’
3.9.2 Local Disks on Compute Nodes
The use of a local disk for private, temporary storage may be configured on the compute nodes of your HP XC system. Contact your system administrator to f ind out about the local disks configured on your system.
A local disk is a temporary storage space and does not hold data across execution of applications. Therefore, any information generated by the application during its execution is not saved on the local disk once the application has completed.
3.9.3 I/O Performance Considerations
Before bu ilding and running your parallel application, I/O performance issues on the HP XC cluster must be considered.
The I/O control system provides two basic types of standard file system views to the application:
•Shared
•Private
3.9.3.1 Shared File View
Although a file opened by multiple processes of an application is shared, each processor maintains a private file pointer and file position. This means that if a certain order of input or output from multiple processors is desired, the appli cation must synchronize its I/O requests or position its file pointer such that it acts on the desired file location.
Output requests to standard output and standard error are line-buffered, which can be sufficient output ordering in many cases. A similar effect for other files can be achieved by using append mode when opening the file with the fopen system call:
fp = fopen ("myfile", "a+");
3.9.3.2 Private File View
Although the shared file approach improves ease of use for most applications, some a p plications, especially those written for shared-nothing clusters, can require the use of file systems private to each node. To accommodate these applications, the system must be configured with local disk.
For example, assume /tmp and/tmp1 have been configured on each compute node. Now each process can open up a file named /tmp/myscratch or /tmp1/myother-
scratch and each would see a u nique file pointer. If these file systems do not exist local to the node, an error results.
It is a good idea to use this option for temporary storage only, and make sure that the application deletes the file at the end.
Cexample:fd = open ("/tmp/myscratch", flags) Fortran example: open (unit=9, file="/tmp1/myotherscratch" )
3-14 Developing Applications
3.9.4 Communication Between Nodes
On the HP XC system, processes in an MPI application run on compute nodes and use the system interconnect for communication betw een the nodes. By default, intranode communication is done using shared memory between MPI processes. Refer to Chapter 8 for information about selecting and overridin g the default system interconnect.
Developing Applications 3-15

Debugging Applications

This chapter describes how to debug serial and parallel applicat ions in the HP XC development environment. In general, effective debugging of applications requires the applications to be compiled with debug symbols, typically the -g switch. Some compilers allow -g with optimization.

4.1 Debugging Serial Applications

Debugging a serial application on a n HP XC system is performed the same as debugging a serial application on a conventional Linux operating system. Refer to standard Linux documentatio n for information abo ut debugging serial programs.
llowing serial debuggers are available for use in the HP XC environment for local
The fo
ging:
debug
•Thegdb utility is provided with the standard Linux distribution; it performs line-mode
debugging of a single process.
•Theidb utility is generally available with the Intel compiler suite.
•Thepg
dbg utility is generally available with the PGI compilers.
4
For information about using these debuggers, refer to standard Linux documentation and the documentation that is available with the specific debug ger that you are using.
.2 Debugging Parallel Applications
4
The following parallel debuggers are recommended for use in the HP XC environment are TotalView and DDT.
TotalView
TotalView is a full-featured GUI debugger for debugging p arallel applications from Etnus, Inc. It is specifically designed to meet the requirements of parallel applications running on many processors. The use of TotalView in the HP XC environment is described in Section 4.2.1. You can obtain ad ditional information about TotalView from the TotalView documentation and the TotalView Web site at:
http://www.etnus.com

_________________________ Note _________________________

TotalView is not included with the HP XC software and is not supported. If you have any problems installing or using TotalView, contact Etnus, Inc.
DDT
DDT (D istribu ted Debugging Tool) is a parallel debugger from Streamline Computing. DDT is a comprehensive graphical debugger designed for debuggin g parallel code. It gives users a common interface for most compilers, languages and MPI distributions. For information abo ut using DDT, refer to Streamline Computing documentation and the Streamline Computing Web site:
http://www.streamline-computing.com/softwaredivision_1.shtml
Debugging Applications 4-1
4.2.1 Debugging with To talView
You can purchase the TotalView debugger, from Etnus, Inc., for use on the H P XC cluster. TotalView is a full-featured, GUI-based debugger specifically designed to meet the requirements of parallel applications running on many processors.
TotalView has been tested for use in the HP XC environment. However, it is not included with the HP XC software and technical support is not provided by HP XC. If you install and use TotalView, and have problems with it, contact Etnus, Inc.
This section describes how to use TotalView in the HP XC environment. It provides set up and configuration information, and an example of debug gin g applications. Instructio ns for installing TotalView are included in the HP XC System Software Installation Guide.
This section provides only minimum instructions to get you started using TotalView. It is recommended that you obtain TotalView documentation for full information about using TotalView. The TotalView documentation set is available directly from Etnus, Inc. at the following URL:
http://www.etnus.com.
4.2.1.1 SSH and TotalView
As discussed in Section 2.1 and Section 10.1, XC systems use the OpenSSH package in place of tradi tional co m mands like rsh to provide more secure communication between nodes in the cluster. When run in a parallel environment, TotalView expects to be able to use the rsh command to communicate with other nodes, but the default XC configuration disallows this. Users should set the TVDSVRLAUNCHCMD environment variable to specify an alternate command for TotalView to use in place of rsh. When using the TotalView Modulefile, asdescribedinSection4.2.1.2,this variable is automatically set to /usr/bin/ssh -o BatchMode=yes. Users who manage their environments independently of the provided modulefiles will need to set this variable manually.
4.2.1.2 Setting Up TotalView
Before you can use TotalView,your administrator must install it on the HP XC system, and yo ur operating environment must be properly set up fo r TotalView.
Your administrator may have already installed TotalView and set up the environment for you. In this case, you can skip the steps in this section and proceed to Section 4.2.1.5, which describes using TotalView for the first time.
The set up of your environment to run TotalView consists of configuring the PATH and MANPATH variables so that they can locate the TotalView executable and manpag es when TotalView is invoked. This configuration step is recommended, but not absolutely necessary, because. y ou can optionally start TotalView by entering its full pathname. The HP XC system provides the TotalView modulefile to configure your environment for you, as described in the following steps.
To prepare your environment to use TotalView, proceed as follows:
1. Determine if TotalView has been installed, and whether environm ent variables have been
defined for TotalView. You can use standard Linux commands to check these things.
2. Depending upon the status of TotalView, do one of the following things:
If TotalView is installed and your environment is set up, proceed to Section 4.2.1.5, for information about running TotalView for the first time.
If TotalView is installed, but your environmen t is not yet set up, either you or your system administrator should set it up, as described in the next step.
4-2 Debugging Applications
If TotalView is not installed, have your administrator install it. Then either you or your administrator should set up your environment, as described in the next step.
3. Set the DISPLAY environment variable of the system that hosts TotalVi ew to display on your local system.
Also, run the xhosts command to accept data from the system that hosts TotalView; see the X
(7X) manpage for more information.
4. Set up yo ur enviro nment to run TotalView. This consists of defining the PATH and MANPATH variables with the location of the TotalView executable and manpages. The methods to do this are in described in Section 2.2. The following l ist summarizes some suggestions:
•Addthemodule load mpi and module load totalview commands to your
login file. This command loads the TotalView modulefile. The TotalView modulefi le defines the PATH and MANPATH variables for TotalView in your environment. Be sure to use the correct name of the TotalView modulefile (obtain from your administrator).
Add the location of the TotalView executable and manpages to the PATH and
MANPATH variables in your shell initialization file or login file.
Have your administrator set up your environment so that the TotalView modulefile
loads a ut omatically when you log in to the syst e m.
______________________ Note: _______________________
If you prefer, you do not have to set up the TotalView environment variables ahead of time. In this case, you must define these va riab les before running TotalView. There are two ways to do this on the command line:
•Enterthemodule load mpi and module load totalview commands, as described in Section 2.2.
Enter the full TotalView PATH and MANPATH information. See your system administrator for this information .
4.2.1.3 Using TotalView with SLURM
Use a command t o allocate the nodes you will need before you debug an application with SLURM, as shown here:
$ srun -Nx -A $ mpirun -tv -srun application
se commands allocate x nodes and run TotalView to debug the program named
The
lication.
app
Be sure to exit from the SLURM allocation created with the srun com mand when you are done.
4.2.1.4 Using TotalView with LSF-HPC
HP recommends the use of xterm when debugging an application with LSF-HPC. You also need to allocate the nodes you will need.
You may need to verify the full path name of the xterm and mpirun commands: First run a bsub command to allocate the nodes you will need and to launch an xterm window:
$ bsub -nx -ext "SLURM[nodes=x]" \
-Is /usr/bin/xterm
Enter an mpirun -tv command in the xterm window to start TotalView on the application you want to debug:
$ mpirun -tv -srun application
Debugging Applications 4-3
4.2.1.5 Starting TotalView for the First Time
This section tells you what you must do when running TotalView for the first time — before you begin to use it to debug an application. The steps in this section assume that you have already set up your environment to run TotalView, as described in Section 4.2.1.2.
The f irst time you use TotalView, you should set up preferences. For example, you need to tell TotalView how to launch TotalView processes on all of the processors. After you have performed the steps in this section once, you do not need to do it again unless you want to change how processes are launched. The result of these steps is to create a preferences file the .totalview directory of you r home directory. As long as this file exists in your home directory, the preferences you are about to define are used whenever you run TotalView.
To create t he preferences file, proceed as follows:
1. Start TotalView by entering the following command:
$ totalview
TotalView’s main control window (called the TotalView Root Window) appears, as shown in Figure 4-1:
Figure 4-1: TotalView Root Window
4-4 Debugging Applications
2. Select Preferences from the File pull-down menu of the TotalView Root Window.
A Preferences window is displayed, as shown in Figure 4-2.
Figure 4-2: TotalView Preferences Window
Debugging Applications 4-5
3. In the Preferences window, click on the Launch Strings tab.
4-6 Debugging Applications
4. In the Launch Strings tab, ensure that the Enable single debug server launch button is selected.
5. In the Launch Strings table, in the area immediately to the right of Command:, assure that the default command launch string sh own is the following string:
%C %R -n "%B/tvdsvr -working_directory %D -callback %L -set_pw %P
-verbosity %V %F"
If it is n
Default
settin
ot the above string, you may be able to obtain this setting by pressing the
s button. If the pressing the Default button does not provide the correct
g, then you need to enter the default command launch string by hand.
Debugging Applications 4-7
6. In the Preferences window, click on the Bulk Launch tab. M a ke sure that Enable debug server bulk launch is not selected.
7. ClickontheOK button at the bottom-left of the Preferences window to save these changes.Thefileisstoredinthe.totalview directory in your home directory. As long as the file exists, you can omit the steps in this section for subsequent TotalView runs.
8. Exit To
talView by selectin g Exit from the File pulldown menu.
TotalView launch preferences are now configured and saved. You can make changes to this configuration at any time.
4.2.1.6 Debugging an Application
This section describes how to use TotalView to debug an application. Note that the steps in this section assume that you have already completed the previous steps to prepare TotalView, described in Section 4.2.1.2 and Section 4.2.1.5.
1. Compi
le the application to be debugged. For example:
$ mpicc -g -o Psimple simple.c -lm
The -g option is strongly recommended. It enables debugging information that TotalView will utilize.
2. Run th
e application in TotalView:
$ mpirun -tv -srun -n2 ./Psimple
4-8 Debugging Applications
3. The TotalView main control window, called the TotalView root window, is displayed. It displays th e following message in the w indow header:
Etnus TotalView Version#
4. The TotalView process window is displayed (Figure 4-3). This window contains multiple panes tha t provides various debugging functions and debugging information. The name of the application launcher that is being used (either srun or mpirun) is displayed in the title bar.
Figure 4-3: TotalView Process Window Example
5. Set the search path, if necessary. If TotalView is being invoked from a directory that does not contain the executable file and the source code, you must set the path where TotalView can find them. If TotalView is invoked from the same directory, you can skip this step and continue on to the next step.
Set the search path as follows:
a. ClickontheFile pull-down menu of the TotalView process window. b. Select Search Path from the list that appears.
TotalView, by d efault, will now s earch for source and binaries (including symbol files) in the following places and in the following order:
1. Current working directory 2 . Directories in File Search Path 3 . Directories specified in your PATH environment variable
6. Click the Go b utton in the TotalView process window. A pop-up window appears, asking ifyouwanttostopthejob:
Process srun is a parallel job. Do you want to stop the job now?
Debugging Applications 4-9
7. Click Yes in this pop-up window. The TotalView root window appears and displays a line for each process being debugged.
If you are running Fortran code, another pop-up window may appear with the f ollowing warning:
Sourcefile initfdte.f was not found, using assembler mode.
Click OK to close this pop-up window . You can safely ignore this warning.
8. You can now set a breakpoint somewhere in your code. The method to do this may vary slightly between versions of TotalView. For TotalView Version 6.0, the basic process is as follows:
a. Select At Location in the Action Point pull-down m enu of the TotalView
process w indow. b. Enter the name of the location where you want to set a breakpoint. c. Click OK.
9. Click the Go button to run the application and go to the b reakpoint. Continue debugging as you would on any system. If you are not familiar with TotalView, you
can click on Help in the right-hand corner of the process window for additional information.
4.2.1.7 Debugging Running Applications
As an alternative to t he method described in Section 4.2.1.6, it is also possible to "attach" an instance of TotalView to an application which is already running. The example presented here assumes you have already completed the steps in Section 4.2.1.2 and Section 4.2.1.5.
1. Compile a long-running application as in Section 4.2.1.6 :
$ mpicc -g -o Psimple simple.c -lm
2. Run the application:
$ mpirun -srun -n2 Psimple
3. Start TotalView:
$ totalview
4. In the TotalView Root Window, click Unattached to display a list of running processes (Figure 4-4 ). Do uble-click on the srun process to attach to it.
Figure 4-4: Unattached Window
4-10 Debugging Applications
5. In a few seconds, the TotalView Process Window will appear, displaying information on the srun process. In the TotalView Root Window, click Attached (Figure 4-5). Double-click one of the remote srun processes to display it in the TotalView Process Window.
Figure 4-5: Attached Window
6. At this point, you should be able to debug the application as in Step 8 of Section 4.2.1.6.
4.2.1.8 Exiting TotalView
It is important that you make sure your job has completed before exiting TotalView. This may require that you wait a few seconds from the time your job has completed until srun has completely exited.
If you exit TotalView before your job is completed, use the squeue command to check that your job is not still on the system.
$ squeue
If it is still there, use the following command to remove all of your jobs:
$ scancel --user username
If you desire to cancel just certain jobs, refer to the scancel manpage for information about selective job cancellation.
Debugging Applications 4-11

Tuning Applications

This chapter discusses how to tune applications in the HP XC environment.

5.1 Using the Intel Trace Collector/Analyzer

This section describes how to use the Intel Trace Collector (ITC) and Intel Trace Analyzer (ITA) with HP-MPI on an HP XC system. The Intel Trace Collector/Analyzer were form erly known as VampirTrace and Vampir, respectively. The followin g topics are discussed in this section:
Building a Program (Section 5.1.1)
Running a Program (Section 5.1.2)
Visualizing Data (Section 5.1.3)
5
The following Intel Trace Collector/Anal after Intel Trace Collector/Analyzer is
Intel Trace Collector Users Guide — located on the HP XC system at:
<install-path-name>/ITC/doc/Intel_Trace_Collec­tor_Users_Guide.pdf.
Intel Trace Analyzer Users Guide — located on the HP XC system at:
<install-path-name>/ITA/doc/Intel_Trace_Ana­lyzer_Users_Guide.pdf
yzer documentation is available on the HP X C system
installed:
5.1.1 Building a Program — Intel Trace Collector and HP-MPI
HP-MPI is MPICH compatible if you use the HP-MPI MPICH scripts which are located at: /opt/hpmpi/bin. The HP-MPI MPICH scripts are:
mpicc is replaced by mpicc.mpich
mpif77 is replaced by mpif77.mpich
mpirun is replaced by mpirun.mpich
In summary, mpixx becomes mpixx.mpich. For further information, refer to the Intel Trace Collector Users Guide. This document
is located on the HP XC system at the following location: <install-path- name>/ITC/doc/Intel_Trace_Collector_Users_Guide.pdf.
Example
For the purposes of this example, the examples directory under /opt/IntelTrace/ITC is copied to the user’s home directory and renamed to examples_directory.
The GNU makefile looks as follows:
CC = mpicc.mpich F77 = mpif77.mpich CLINKER = mpicc.mpich FLINKER = mpif77.mpich IFLAGS = -I$(VT_ROOT)/include CFLAGS = -g FFLAGS = -g LIBS = -lvtunwind -ldwarf -lnsl -lm -lelf -lpthread
Tuning Applications 5-1
CLDFLAGS = -static-libcxa -L$(VT_ROOT)/lib $(TLIB) -lvtunwind \
-ldwarf -lnsl -lm -lelf -lpthread
FLDFLAGS = -static-libcxa -L$(VT_ROOT)/lib $(TLIB) -lvtunwind \
-ldwarf -lnsl -lm -lelf -lpthread
In the cases where Intel compilers are used, add the -static-libcxa option to the link line. Otherwise the following type of error will occur at run-time:
$ mpirun.mpich -np 2 ~/examples_directory/vtjacobic ~/examples_directory/vtjacobic:
error while loading shared libraries:
libcprts.so.6: cannot open shared object file:
No such file or directory
MPI Application rank 0 exited before MPI_Init() with status 127 mpirun exits with status: 127
In the above example, ~/examples_directory/vtjacobic represents the file specification for the vtjacobic example program.
For more information, see the following:
http://support.intel.com/support/performancetools/c/linux/sb/CS-
010097.htm
5.1.2 Running a Program — Intel Trace Collector and HP-MPI
Assuming that you have built your program using the -static-libcxa optio n, as discussed in Section 5.1.1, you can launch the program with the mpirun.mpich command, as shown in the following example.
C Example — Running the vtjacobic Example Program
This example shows how to run the vtjacobic example program discussed in Section 5.1.1.
$ mpirun.mpich -np 2 ~/examples_directory/vtjacobic ~/examples_directory/vtjacobic:100 iterations in 0.228252 secs (28.712103 MFlops), m=130 n=130 np=2 [0] Intel Trace Collector INFO: Writing tracefile vtjacobic.stf in ~/examples_directory/vtjacobic
mpirun exits with status: 0
In the above example, ~/examples_directory/vtjacobic represents the file specification for the vtjacobic example program.
5.1.3 Visualizing Data — Intel Trace Analyzer and HP-MPI
The Intel Trace Analyzer is used in a straightforward manner on an HP XC system, as described in it’s standard documentation. For more information, refer to the Intel Trace Analyzer Users
Guide. This document is located on the HP XC system at the following location: <install- path-name>/ITA/doc/Intel_Trace_Analyzer_Users_Guide.pdf.
5-2 Tuning Applications

6.1 Introduction

HP XC uses th e Simple Linux Utility for Resource Management (SLURM) for system r esource management and job scheduling. SLURM is a reliable, efficient, open source, fault-tolerant, job and compute resource manager with features that make it suitable for large-scale, high performance computing environments. SLURM can report on m achine status, perform p artition management, job management, and job scheduling .
The SLURM Reference Manual is available on the HP XC Documentation CD-ROM and from the following Web site: http://www.llnl.gov/LCdocs/slurm/.
As a system resource manager, SLURM has the following key functions:
Allocate exclusive and/or non-exclusive access to resources (compute nodes) to users for
some duration of time so they can perform work
Provide a framew o r k for starting, executing, and monitoring work (normally a parallel
job) on the set of allocated nodes
Arbitrate conflicting requests for resources by managing a queue of pending work
6

Using SLURM

Section 1.4.3 describes the interaction between SLURM and LSF.

6.2 SLURM Commands

Users interact with SLURM through its command line utilities. SLURM has the following basic commands: srun, scancel, squeue, sinfo,andscontrol, which can run on any node in the HP XC system. These commands are summarized in Table 6-1 and described in the following sections.

Table 6-1: SLURM Commands

Command
srun
squeue
scancel
Function
Submits job execution, srun can:
Submit a batch job and then terminate
Submit an interactive job and then persist to shepherd the job as it runs
Allocate re subordinat
Displays th used for sca
ng, sorting, and formatting options. By default, it reports the running jobs in
of filteri priority o
Cancels a p signal to a administr
rder and then the pending jobs in priority order.
ators can cancel jobs.
s to run under SLURM management. srun is used to submit a job for
allocate resources, attach to an existing allocation, or initiate job steps.
sources to a shell and then spawn that shell for use in running
e jobs
e queue of running and waiting jobs (or "job steps"), including the JobID
ncel), and the nodes assigned to each running job. It has a wide variety
ending or running job or job step. It can also be used to send a specified
ll processes on all nodes associated with a job. Only job owners or
Using SLURM 6-1
Table 6-1: SLURM Commands (cont.)
Command
sinfo
scontrol
Function
Reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options. sinfo displays a summary of available partition and node (not job) information (such as partition names, nodes/partition, and CPUs/node).
Is an administrative tool used to view or modify the SLURM state. Typically, users do not need to access this command. Therefore, the scontrol command can only be executed as user root. Refer to the HP XC System Software Administration Guide for information about using this command.
The -help command option also provides a brief summary of SLURM options. N ote that command options are not case sensitive.

6.3 Accessing the SLURM Manpages

can also view online descriptions of these comm ands by accessing the SLU RM manpages.
You
pages are provided for a ll SLURM commands and API functions. If SLURM manpages
Man
not already available in your MANPATH environment variable, you can set and export
are
masfollows:
the
$ MANPATH=$MANPATH:/opt/hptc/man $ export MANPATH
You can now access the SLURM manpages with the standard man command. For example:
n srun
$ ma

Launching Jobs with the srun Command

6.4
The srun command submits jobs to run u nder SLURM management. Jobs can be submitted to run in parallel on multiple compute nodes. srun is used to submit a job for execution, allocate resources, attach to an existing allocation , or initiate job steps. srun can perform the following:
•Sub
mit a batch job and then terminate
Submit an interactive job and then persist to shepherd the job as it runs
Allocate resources to a shell and then spawn that shell for use in running subordinate jobs Jobs can be submitted for immediate execution or later execution (batch). srun has a wide
variety of options to specify resource requirements, including: minimum and m aximum nod e count, pro cessor count, specific nodes to use or not use, and specific node characteristics (so much m emory, disk space, certain required features). Besides securing a resource allocation, srun is used to initiate job steps. These job steps can execute sequentially or in parallel on independent or shared nodes within the job’s nod e allocation.
ample 6-1: Simple Launch of a Serial Program
Ex
un -n2 -l hostname
$ sr
1
0: n
1
1: n
6.4.1 The srun Roles and Modes
The srun comman d submits jobs to run under SLURM management. The srun command can perform many roles in launching and managing your job. srun also provides several d isti nct usage modes to accommodate the roles it performs.
6-2 Using SLURM
6.4.1.1 srun Roles
srun options allow you submit a job by:
Specifying the pa rallel environment for your job, such as the number of nodes to use, partition, distribution of processes amo ng nodes, and maximum time.
Controlling the behavior of your parallel job as it runs, such as by redirecting or labeling its output, sending it signals, or specifying its reporting verbosity.
6.4.1.2 srun Modes
Because srun performs several different roles, it has five distinct ways, or modes,inwhich it can be used:
Simple Mode
Batch Mode
In simp le mode, srun submits yo ur job to the local SLURM job controller, initiates all processes on the specified nodes, and blocks until needed resources are free to run the job if necessary. Many control options can change the details of this general pattern.
The simplest way to use t he srun command is to dis execution of a serial program (such as a LINUX ut specified number or ran ge of compute nodes. For
$ srun -N 8 cp ~/data1 /var/tmp/data1
tribute the
ility) across a
example:
This comma nd copies (CP) file data1 from your common home directory into local disk space on each of eight compute nodes. This is simila r to running simple programs in parallel.
srun can also directly submit complex scripts to the job queue(s) managed by SLURM for later execution when needed resources become available and when no higher priority jobs are pending. For example:
$ srun -N 16 -b myscript.sh
This command uses the srun -b optiontoplacemyscript.sh into the batch queue to run later on 16 nodes. S c ripts in turn normally contain either MPI programs, or other simple invocations of srun itself (as shown above). The srun -b option supports basic, local batch service.
Allocate Mode
Attach
When you need to co mbine the job complexity of scripts with the immediacy of interactive execution, you can use the allocate mode. For example:
$ srun -A -N 4 myscript.sh
This command uses the srun -A option to allocate specified resources (four nodes in the above example), spawn a subshell with access to those resources, and then run multiple jobs using simple srun com mands within the specified script (myscript.sh in the above example) that the subshell immediately starts to execute. T his is similar to allocating resources by setting environment variables at the beginning of a script, and then using them for scripted tasks. No job queues are involved.
You can monitor or intervene in an already running srun job, either batch (started with -b), or interactive (allocated - started with -A), by executing srun again and attaching (-a)tothat job. For example:
$ srun -a 6543 -j
Using SLURM 6-3
Batch (with LSF) You can submit a script to LSF that contains (simple) srun
6.4.2 srun Signal Handling
Signals sent to srun are automatically forwarded to the tasks that srun controls, with a few special cases. srun handles the Ctrl/C sequence differently, depending on how many times it receives Ctrl/C in one second. The followin g defines how Ctrl/C is handled by srun:
•Ifsrun receives one Ctrl/C, it reports the state of all tasks associated with srun.
•Ifsrun receives a second Ctrl/C within one second, it sends the SIGINT signal to all associated srun tasks.
•Ifsrun receives a third Ctrl/C within one second, it terminates t he job at once, without waiting for remote tasks to exit.
This command fo rw ards the standard output and error messages from the running job with SLURM ID 6543 to the attaching srun command to reveal the job’s current status, and (with -j) also joins the job so that you can send it signals as if this srun command had initiated the job. Omit -j for read-only attachments. Because you are attach ing to a running job whose resources have already been allocated, the srun resource-allocation options (such as -N)are incompatible with -a.
commands within it to execute parallel jobs later. In this case, LSF takes the place of the srun -b option for indirect, across-machine job-queue management.
6.4.3 srun Run-Mode Options
This section explains the mutually exclusive srun options that enable its different run modes.
-b (--batch)
This option runs a script in batch mode. The script name must appear at the end of the srun execute line, not as an argument to -b. You cannot use -b with -A or -a.
srun copies the script, submits the request to run (with your specified resource allocation) to the local SLURM-managed job queue, and ends. When resources become available and no higher priority job is pending, SLURM runs the script on the first node allocated to the job, with stdin redirected from /dev/null and stdout and stderr redirected to a file called jobname.out in the current working directory (unless you request a different name or a more elaborate set of output files by using -J or -o).
The -b option has the following script requirements:
You must use the script’s absolute pathname , or a pathname relative to the current working directory (srun ignores your search path).
srun interprets the script using your default shell unless the file begins with the character pair #! followed by the absolute pathname of a valid shell.
The script must contain MPI commands or other (simple) srun commands to initiate parallel tasks.
-A (--allocate)
The-A option allocates compute resources (as specified by other srun options) and starts (spawns) a subshell that has access to those allocated resources. No remote tasks are started. You cannot use -A with -b or -a.
6-4 Using SLURM
If you specify a script at the end of the srun command line (not as an argument to -A), the spawned shell executes that script using the allocated resources (interactively, w ithout a queue). See the -b option for script requirements.
If you specify no script, you can then execute other instances of srun interactively, within the spawned subshell, to run multiple parallel jobs on the resources that you allocated to the subshell. Resources (such as nodes) will on ly be freed for other jobs when you terminate the subshell.
-a=jobid (--attach=jobid)
The -a=jobid option attaches (or reattaches) your current srun session to the already running job whose SLURM ID is jobid. The job to which you attach must have its resources managed by SLURM, but it can be either interactive ("allocated," started with -A)orbatch (started with --b). This option allows you to monitor or intervene in previously started srun jobs. You cannot use -a with -b or -A. Because the running job to which you attach already has its resources specified, you cannot use -a with -n, -N,or-c. You can only attach to jobs for which you are the authorized owner.
By default, -a attaches to the designated j ob read-only. stdout and stderr are copied to the attaching srun, just as if the current srun session had started the job. However, signals are not forwarded to the remote processes (and a single Ctrl/C will detach the read-only srun from the job).
If you use -j (-join)or-s (-steal) along with -a,yoursrun session joins the running job and can also forward signals to it as well as receive stdout and stderr from it. If you join a SL URM batch (-b) job, you can send signals to its batch script. Join (-j) does not forward stdin, but steal (-s, which closes other open sessions with the job) does forward stdin as well as signals.
-j (--join)
The -j option joins a running SLURM job (alwa ys used only with -a option to specify the jobid). This not only duplicates stdout and stderr to the attaching srun session, but it also forward s signals to the job’s script or processes as well.
-s (--steal)
The -s option steals all connections to a runnin g SLURM job (always used only with -a option to specify the jobid). -steal closes any open sessions with the specified job, then copies stdout and stderr to the attaching srun session, and it also forwards both signals and stdin to the job’s script or processes.
6.4.4 srun Resource-Allocation Options
The srun options assign compute resources to your parallel SLURM-managed job. These options can be used alone or in combination. Also, refer to the other srun options that can affect node management for your job, especially the control options and constraint options.
-n procs (--nprocs=procs)
The -n procs option requests that srun execute procs processes. To control how these processes are distributed among nodes and CPUs, combine -n with -c or -N as explained below (default is one process per node).
-N n (--nodes=n)
The -N n option allocates at least n nodes to this job, where n may be one of the following:
a specific node count (such as -N16)
a node cou nt range (such as -N14-18)
Using SLURM 6-5
Each partition’s node limits supersede those specified by -N. Jobs that request more nodes than the partition allows n ever leave the PENDING state. To use a specific partition, use the srun
-p option. Combinations of -n and -N control how job p rocesses are distributed among nodes
according to the following srun policies:
-n/-N combin ations srun infers your intended number of processes per node if y ou
specify both the number of processes and the number of nodes for your job. Thus -n16 -N8 normally results in running 2 processes/node. But, see the next policy for exceptions.
Minimum interpretation srun interprets all node requests as minimum node requests (-N16
means "at least 16 nodes"). If some nodes lack enough CPUs to cover the process count specified by -n, srun will automatically allocate more nodes (than mentioned with -N) to meet the need. For example, if not all nodes have 2 working CPUs, then -n32 -N16 together will allocate more than 16 nodes so that all processes are supported. The actual number of nodes assigned (not the number requested) is stored in environment variable SLURM_NNODES.
CPU overcommitment By default, srun never allocates more than one process per CPU. If
you intend to assign multiple processes per CPU, you must invoke the srun -O option along with -n and -N.Thus,-n16 -N4 -O together allow 2 processes per CPU on the 4 allocated 2-CPU nodes.
Inconsistent allocation
srun rejects as errors inconsistent -n/-N combinations. For example, -n15 -N16 requests the impossible assignment o f 15 processes to 1 6 nodes.
-c cpt (--cpus-per-task=cpt)
The -c cpt option assigns cpt CPUs per process for this job (default is one CPU per process). This option supports multithreaded programs that require more than a single CPU per process for best performance.
For multithr eaded programs where the density of CPUs is more important than a specific node count, use both -n and -c on the srun execute line rather than -N. The options -n16 and
-c2 result in whatever node allocation is needed to yield the requested 2 CPUs/process. This is
the reverse of CPU overcommitment (see -N and -O options).
-p part (--partition=part)
The -p part option requests nodes only from the part partition. T h e default partition is assigned by the system administrator.
-t minutes (--time=minutes)
The -t minutes option allocates a total number of minutes for this job to run (default is the current partition’s time limit). If the number of minutes exceeds the partition’s time limit, then the j ob never leaves the PENDING state. When the time limit has been reached, SLUR M sends each job process SIGTERM followed (after a pause specified by SLURM’s KillWait configuration parameter) by SIGKILL.
-T nthreads (--threads=nthreads)
The -T nthreads option requests that srun allocate nthreads threads to initiate and control the parallel tasks in this job. The default is the smaller o f either 10 or th e number of nodes actually allocated, SLURM_NNODES.
6-6 Using SLURM
6.4.5 srun Control Options
srun control options determine how a SLURM job manages its nodes and other resources, what its working features (such as job name) are, and how it gives you help. Separate "constraint" options and I/O options are available an d are described in other sections of this chapter. The following types of control options are available:
Node management
Working features
Resource control
Help options
6.4.5.1 Node Management Options
-k (--no-kill)
The -k option avoids automatic termination if any node fails that has been allocated to this job. The job assumes responsibility for handling such node failures internally. (SLURM’s default is to term inate a job if any of its allocated nodes fail.)
-m dist (--distribution=dist)
The -m option te lls SLURM how to distribute tasks among nodes for this job. The choices for dist are either block or cyclic.
block Assigns tasks in order to each CPU on one node before assigning any to the
next node. This is the default if the number of tasks exceeds the number of nodes requested.
cyclic Assigns tasks "round robin" across all allocated nodes (task 1 goes to the first
node, task 2 goes to the second node, and so on). This is the default if the number of nodes requested equals or exceeds the number of tasks.
-r n (--relative=n)
The -r option offsets the first job step to node n of this job’s allocated node set (where the first node is 0). Option -r is incompatible with "constraint" options -w and -x, and it is ignored when you run a job without a prior node allocation (default for n is 0).
-s (--share)
The -s option allows this job to share nodes with other running jobs. Sharing nodes often starts the job faster and boosts system utilization, but it can also lower application perform ance.
6.4.5.2 Working Features Options
-D path (--chdir=path)
The -D option causes each remote process to change its default directory to path (by using CHDIR) before it begins execution (without -D, the current working directory of srun
becomes the default directory for each process).
-d level (--slurmd-debug=level)
The -d option specifies level as the level at which daemon SLURMD reports debug in formation and deposits it in this job’s stderr location. Here, level can be any integer between 0 (quiet, reports only errors, the default) and 5 (extremely verbose messages).
Using SLURM 6-7
-J jobname (--job-name=jobname)
The -J option specifies jobname as the identifying string for this job (along with its system-supplied job ID , as stored in SLURM_JOBID) in responses to your queries about job status (the default jobname is the executable program’s name).
-v (--verbose)
The -v option reports verbose messages as srun executes your job. The default is program output with o nly overt error messages added. Using multiple -v options further increases message verbosity.
6.4.5.3 Resource Control Options
-I (--immediate)
The -I option exits if requested resources are not available at once. By default, srun bloc ks until requested resources become available.
-O (--overcommit)
The -O option over-commits CPUs. By default, srun never allocates more than one process per CPU. If you intend to assign multiple processes per CPU, you must invoke the -O option along with -n and -N (thus -n16-N4-O together allow 2 processes per CPU on the 4 allocated 2-CPU nodes). Even with -O, srun never allows more than MAX_TASKS_PER_NODE tasks to run on any single node. MAX_TASKS_PER_NODE is discussed in Section 6.4.8.
-W seconds (--wait=seconds)
The -W option waits the specified number of seconds after any job task terminates before terminating all remaining tasks. The default for seconds is unlimited. Use -W to force an entire job to end quickly if any one task terminates prematurely.
6.4.5.4 Help Options
--help
The --help option lists the name of every srun option, with a one-line description of each. Options appear in categories by function.
--usage
The --usage option reports a syntax summary for srun, which includes many, but not all srun options.
-V (--version)
The -V option reports the currently installed version number for SLURM.
6.4.6 srun I/O Options
srun provides the following I/O options:
I/O commands
I/O Redirection Alternatives
6.4.6.1 I/O Commands
The srun I/O commands manage and redirect the standard input to, as well as the standard output and error messages from, parallel jobs executed under SLURM. Three of these
6-8 Using SLURM
commands let you choose from among any of five I/O redirection alternatives (modes) that are explained in the next section.
-o mode (--output=mode)
The -o option redirects standard output stdout for this job to mode, one of five alternative ways to display, capture, or subdivide the job’s I/O, explained in the next section. By default, srun collects stdout from all job tasks and line buffers it to the attached terminal.
-i mode (--input=mode)
The -i option redirects standard input stdin for this job from mode, one of five alternative ways to display, capture, or subdivide the job’s I/O, explained in the next section. By default, srun redirects stdin from the attached terminal to all job tasks.
-e mode (--error=mode)
The -e option redirects standard error stderr for this job to mode, one of five alternative ways to display, capture, or subdivide the job’s I/O, explained in the next section. By default, srun collects stderr from all job tasks and line buffers it to the attached terminal, just as with stdout. But you can request that srun handle standard output an d standard error differently by invoking -e and -o with different redirection modes.
-l (--label)
The -l option prepends the remote task ID number to each line of standard output and standard error. By default, srun line buffers this I/O to the terminal (or to specified files) without any task labels. Options -l and -u are mutually exclusive.
-u (--unbuffered)
The -u option prevents line buffering of standard output from remote tasks (buffering is the srun default). Options -l and -u are mutually exclusive.
6.4.6.2 I/O Redirection Alternatives
The srun I/O option s -i (-input), -o (-output), and -e (-error), all take any of five I/O redirection alternatives (modes) as arguments. These arguments are explained in this section.
all
The all argument redirects stdout and stderr from all job tasks to srun (and hence to the attached terminal), and broadcasts stdin from srun (the terminal) to all remote tasks. This is srun’s default behavior for handling I/O.
none
The none argument redirects stdout and stderr from all job tasks to /dev/null (receives no I/O from any task) and sends no stdin to any task (closes stdin).
taskid
The taskid argument redirects to srun (and hence to the attached terminal) stdout and stderr from the s in gle specified task whose relative ID is taskid, where the range for integer
taskid st arts at 0 (the f irst task) and runs through the total number of tasks in the current job step. This choice also redirects stdin from srun (the terminal) to this single specified task.
filename
The filename argument redirects stdout or stderr from all job tasks into a single file called filename, or broadcasts stdin from that same file to all remote tasks, depending upon the I/O command.
Using SLURM 6-9
You can use a parameterized "format string" to systematically generate unique names for (usually) multiple I/O files, each of which receives some job I/O depending on the naming scheme that you choose. You can subdivide the received I/O into separate files by job ID, step ID, node (name or sequence number), or individual task. In each case, srun opens the appropriate number of files and associates each with the appropriate subset of tasks.
Available parameters with which to construct the format string, and thereby to split the I/O among separate files, include the following:
%J
%j Creates one file for each job ID, and embeds jobid in its n ame (for example,
%s
%N
%n
%t
(uppercase) Creates one file for each job ID/step ID combination for this running job, and embeds jobid.stepid in each file’s name (for example, out%J might yield files out4812.0, out4812.1, and so on).
job%j might yield file job4812).
Creates one file for each step ID, and embeds stepid in its name (for example, step%s.out would yield files step0.out, step1.out, and so on).
Creates one file for each node on which this job runs, and embeds that node’s short hostname in the file name (for example, node.%N might yield files node.mcr347, node.mcr348, and so on).
Creates one file for each node on which this job runs, and embeds that node’s numerical identifier relative to the job (where the each job’s first node is 0, then 1,andsoon)inthefilename(forexample,node%n would yield files node0, node1,andsoon).
Creates one file for each separate task in this running job, and embeds that task’s numerical identifier relative to the job (the first task is 0) in the file name (for example, job%j-%t.out might yield files job4812-0.out, job4812-1.out, and so on).
For all format string parameters except the nonnumeric case of %N, you can insert an integer between the percent character and the letter (such as %3t) t o "zero-pad" the resulting file names, that is, to always use the integer number of character positions and to fill any empty positions with zeros from the left. Thus job%j-%3t.out might yield files job4812-000.out and job4812-001.out.
6.4.7 srun Constraint Options
The srun con straint options limit the nodes on which your job will execute to only those nodes having the properties (constraints) that you specify. The following constraints are available:
-C clist (--constraint=clist)
The -C option runs your job on those nodes having the properties in clist, where clist is a list of features assigned for this purpose by SLURM system administrators (the features may vary by network or machine).
To conjoin (AND) mu ltiple constraints, separate them in clist by using a comma (c1,c2). To disjoin (OR) multiple constraints, separate them in clist by using a vertical bar (c3|c4). If no nodes have the feature(s) that you require with -C, then the SLURM job manager will
reject your job.
6-10 Using SLURM
--contiguous=yes|no
The --contiguous option specifies whether or not your job requires a contiguous range of nodes. The default i s YES, which demands contiguous nodes, while the alternative (NO) allows noncontiguous allocation.
--mem=size
The -mem option specifies a minimum amount of real memory per node, where size is an integer number of megabytes. See also -vmem.
--mincpus=n
The -mincpus option specifies a minimum number n of CPUs per nod e.
--vmem=size
The -vmem option specifies a minimum amou nt of virtual memory per node, where size is an integer number of megabytes. See also -mem.
--tmp=size
The -tmp option specifies a minim um amount of temporary disk space per node, where size is an integer number of megabytes.
-w hosts (--nodelist=hosts)
The -w hosts option specifies by name the individual nodes that must be included in the set o f nodes on which your job ru ns (perhaps along with others unspecified). Option -w is incompatible with srun option -r (--relative). hosts may have any of three formats:
host1,host2,... A comma-delimited list of node names (for example, n100,
n200,...)
host[na-nb,nc,...] A range of node names, perhaps mixed with individual nodes in a
comma-delimited s ubl ist (for example, n[1-256,500,...])
filename A file that contains a list of nodes in either o f the previous two
formats (srun interprets any string containing the slash (/) character as a file name)
-x hosts (--xclude=hosts)
The -x hosts option specifies by name the individual nodes that must be excluded from the set of nodes on which your job runs (perhaps along with others unspecified). Option -x is incompatible with srun option -r (--relative). hosts may have any of three formats:
host1,host2,... A comma-delimited list of node names (for example, n100,
n200,...)
host[na-nb,nc,...] A range of node names, perhaps mixed with individual nodes in a
comma-delimited s ubl ist (for example, n[1-256,500,...])
filename A file that contains a list of nodes in either o f the previous two
formats (srun interprets any string containing the slash (/) character as a file name)
Using SLURM 6-11
6.4.8 srun Environment Variables
Many srun options have corresponding environment variables. An srun option, if invoked, always overrides (resets) the corresponding environment variable (which contains each job feature’s default value, if there is a default).
In ad dition , srun sets the following environment variables for each executing task on the remote compute nodes:
SLURM_JOBID
SLURM_NODEID
SLURM_NODELIST
SLURM_NPROCS
SLURM_PROCID
Other environment variables important for srun — managed jobs include: MAX_TASKS_PER_NODE
SLURM_NNODES
Specifies the job ID of the executing job.
Specifies the relative node ID of the current node.
Specifies the list of nodes on which the job is actually running.
Specifies the total number o f processes in the job.
Specifies the MPI rank (or relative process ID) for the current process.
6.4.9 Using srun with HP-MPI
Provides an upper bound on the number of tasks that srun assigns to each job node, even if you allow more than one process per CPU by invoking the srun
-O option.
Is the actual number of nodes assigned to run your job (which may exceed the number of nodes that you explicitly requested with the srun -N option).
The srun command can be used as an option in an HP-MPI launch comm and. Refer to Section 8.3.3 for information about using srun with HP-MPI.
6.4.10 Using srun with LSF
The srun command can be used in an LSF launch command. Refer to Chapter 7 for information about using srun with LSF.

6.5 Monitoring Jobs with the squeue Command

The squeue com mand displays the queue of running and waiting jobs (or "job steps"), including the JobID used for scancel), and the nodes assigned to each running job. It has a wide variety of filtering, sorting, and formatting options. By default, it reports the run ning jobs in priority order and then the pending jobs in priority order.
Example 6-2 reports on job 12345 and job 12346:
Example 6-2: Displaying Queued Jobs by Their JobIDs
$ squeue --jobs 12345,12346
JOBID PARTITION NAME USER ST TIME_USED NODES NODELIST 12345 debug job1 jody R 0:21 4 n[9-12] 12346 debug job2 jody PD 0:00 8
6-12 Using SLURM
The squeue command can report on jobs in the job queue according to their state; valid states are: pending, running, completing, completed, failed, timeout, and node_fail. Example 6-3 uses the squeue command to report on failed jobs.
Example 6-3: Reporting on Failed Jobs in the Queue
$ squeue --state=FAILED
JOBID PARTITION NAME USER ST TIME NODES NODELIST
59 amt1 hostname root F 0:00 0

6.6 Killing Jobs with the scancel Command

The scancel command cancels a pending or running job or job step. It can also be used to send a specified signal to all processes on all nodes associated with a j ob. Only job owners or administrators can cancel jobs.
Example 6-4 kills job 415 and all its jobsteps.
Example 6-4: Killing a Job by Its JobID
$ scancel 415
Example 6-5 cancels all pending jobs.
Example 6-5: Cancelling All Pending Jobs
$ scancel --state=PENDING
Example 6-6 sends the TERM signal to terminate jobsteps 421.2 and 421.3.
Example 6-6: Sending a Signal to a Job
$ scancel --signal=TERM 421.2 421.3

6.7 Getting System Information with the sinfo Command

The sinfo command reports the state of partitions and nodes managed by SLURM. It has a wide variety of filtering, sorting, and formatting options. sinfo displays a summary of available partition and n ode (not job) information (such as partition names, nodes/partition, and CPUs/node).
Example 6-7: Using the sinfo Command (No Options)
$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST lsf up infinite 1 down* n15 lsf up infinite 2 idle n[14,16]
Using SLURM 6-13
Example 6-8: Reporting Reasons for Downed, Drained, and Draining Nodes
$ sinfo -R
REASON NODELIST Memory errors dev[0,5] Not Responding dev8

6.8 Job Accounting

HP X C System Software provides an extension to SLURM for job accounting. The sacct command displays job accounting data in a variety of forms for your analysis. Job accounting data is stored in a log file; the sacct command filters that log file to report on your jobs, jobsteps, status, and errors. See your system administrato r if job accounting is not configured on your system.
Yo u can find detailed information on the sacct command and job accounting data in the
sacct
(1) manpage.

6.9 Fault Tolerance

SLURM can handle a variety of failure modes without terminating workloads, including crashes of the node running the SLURM controller. User jobs may be configured to continue execution despite the failure of one or more no des on which they are executing (refer to Section 6.4.5.1 for further information). The command controlling a job may detach and reattach from the parallel tasks at any time. Nodes allocated to a job are available for reuse as soon as the job(s) allocated to that node terminate. If some nodes fail to complete job termination in a timely fashion because of hardware or software problems, only the scheduling of those tardy nodes w ill be affected.

6.10 Security

SLURM has a simple security model:
Any user of the system can submit jobs to execute. Any user can cancel his or her own jobs. Any user can view SLURM configuration and state information.
Only privileged users can modify the SLURM configuration, cancel any job, or perform other restricted activities. Privileged users in SLURM include root users and SlurmUser (as defined in the SLURM configuration file).
If permission to m od ify SLURM configuration is required by others, set-uid programs may be used to grant specific permissions to specific users.
SLURM accomplishes security by means of communication authentication, job authentication, and user authorization.
Refer to SLURM documentation for further information about SLURM security features.
6-14 Using SLURM
7

Using LSF

The Load Sharing Facility (LSF) from Platform Computing Corporation is a batch system resource manager used on the HP XC system. LSF is included with HP XC, and is an integral part of the HP XC environment. On an HP XC system, a job is submitted to LSF, which places the job in a queue and allows it to run when the n ecessary resources become available. In addition to launching jobs, LSF provides extensive job management and information capabilities. LSF schedules, launches, controls, and tracks jobs t hat are submitted to it according to t he policies established by the HP XC site administrator.
This chapter describes the functionality ofLSFinanHPXCsystem,anddiscusseshow to use some basic LSF commands to submit jobs, manage jobs, and access job inform ation. The following topics are discussed:
Introduction to LSF on HP XC (Section 7.1)
Determining the LSF execution host (Section 7.2)
Determining available LSF resources (Section 7.3)
SubmittingjobstoLSF(Section7.4)
Getting information about LSF jobs (Section 7.5)
Working interactively within an LSF-HPC allocation (Section 7.6)
LSF Equivalents of SLURM options (Section 7.7)
For full information about LSF, refer to the standard LSF documentation set, which is described in the Related Information section of this manual. LSF manpages are also available online on the HP XC system.

7.1 Introduction to LSF in the HP XC Environment

This section introduces you to LSF in the HP XC environment. It provides an overview of how LSF works, and discusses some of the features and differences of standard LSF compared to LSFonanHPXCsystem.Thissectionalsocontains an important discussion of how LSF and SLURM work together to provide the HP XC job management environmen t. A description of SLURM is provided in Chapter 6.
7.1.1 Overview of LSF
LSF is a batch system resource manager. In the HP XC environment, LSF manages just one resource — the total number of HP XC processors designated for batch processing. The HP XC system is based on dedicating processors to jobs, and LSF is implemented to use these processors in the most efficient manner.
As jobs are submitted to LSF, LSF places the jobs in queues and determines an overall priority for launching the jobs. When the required number of HP XC processors become available to launch the next job, LSF reserves them and launches the job on these processors. When a job is completed, LSF returns job output, job information, and any errors.
A standard LSF installation on an HP XC system would consist of LSF daemons running on every node and providing activity and resource information for each node. LSF-HPC for SLURM on an HP XC system consists of one node running LSF-HPC daemons, and these daemons communicate with SLURM for resource information about the other nodes. LSF-HPC consolidates this resource information into one "virtual" node. Thus LSF-HPC integrated with
Using LSF 7-1
SLURM views the LSF-HPC system as one large computer with many resources available to run jobs.
SLURM does not provide the same amount of information that can be obtained via standard LSF. But on HP XC systems, where the compute nodes have the same architecture and are expected to be allocated solely through LSF on a per-processor or per-node basis, the information provided by SLURM is sufficient and allows the LSF-HPC design to be more scalable and generate less overhead on the compute nodes.
Integrating LSF-HPC with SLURM on HP XC system s also provides you with a parallel launch command to distribute and manage parallel tasks efficiently. The SLURM srun command offers much flexibility in requesting topological requirements across an HP XC system, such as requesting contiguous nodes o r executing only o ne task per node, or requesting nodes with specific features. This flexibility is preserved in LSF-HPC through the external SLURM scheduler; this is discussed in more detail in Section 7.1.2.
In an HP XC system, only one node runs LSF-HPC, but all nodes in the are configured as LSF-HPC Client Hosts; this means that every node is able to access LSF-HPC. You can submit jobs from any node in the HP XC system.
See Section 7.1.5 and the lsf_diff differences between standard LSF and LSF-HP C. Differences described in HP XC System Software documentation take precedence over descriptions in the LSF documentation from Platform Computing Corporation.
7.1.2 Topology Support
LSF-HPC contains topology support wh en requesting resources for a job. This topology support is available through LSF’s standard external scheduler feature, which makes use of a SLURM external scheduler provided with LSF-HPC on HP XC System Software systems.
Section 1.4.3 describes the interaction of SLURM and LSF-HPC on HP XC System Software systems.
Yo u can apply L SF-HPC’s external scheduler functionality with the bsub command and in LSF queue configurations. See the LSF bqueues how the available queues are configured on HP XC System Software systems. The format of the LSF bsub command with the external SLURM scheduler option is:
bsub -ext "SLURM[slurm-arguments]" [bsub-options] jobname [job-options]
The slurm-arguments parameter can be one or more of the following srun options, separated by semicolons:
nodes=min [-max]
mincpus= ncpus
mem=value in Megabytes
tmp=value in Megabytes
(1) manpage for more information on the subtle
(1) command for more information on determining
constraint=feature
nodelist= list-of-nodes
exclude= list-of-nodes
contiguous=yes
mem=value tmp=value constraint=value nodelist=list-of-nodes exclude=list-of-nodes contiguous=yes
The srun
7-2 Using LSF
(1) manpage provides details on these options and their arguments.
To illustrate how the external scheduler is used to launch an application, consider the following command line, which launches an application on ten nodes with one task per node:
$ bsub -n 10 -ext "SLURM[nodes=10]" srun my_app
The follo wing comm and line launches the same application, also on ten nodes, but stipulates that node n16 should not be used:
$ bsub -n 10 -ext "SLURM[nodes=10;exclude=n16]" srun my_app
7.1.3 Notes on LSF-HPC
The followin g are noteworthy item s for users of LSF-HPC on HP XC systems:
You must run job s as a non-root user such as lsfadmin or any other local user; do not run jobs as the root user.
A SLURM partition named lsf is used to manage LSF jobs. You can view information about this partition with the sinfo command.
LSF daemons only run on one node in the HP XC system. As a result, the lshosts and bhosts commands only list one host that represents all the resources of the HP XC system. The total number of CPUs for that host should be equal to the total number of CPUs found in the nodes assigned to the SLURM lsf partition.
The total number of processors for that host should be equal to the total number of processors assigned to the SLURM lsf partition.
When a job is submitted and the resources are available, LSF-HPC creates a properly sized SLURM allocation and adds several standard LSF environment variables to the environment in which the job is to be run. The following two environment variables are also added:
SLURM_JOBID
This environment variable is created so that subsequent srun commands make use of the SLURM allocation created by LSF-HPC for the job. This variable can be used by a job script to query information about the SLURM allocation, as shown here:
$ squeue --jobs $SLURM_JOBID
SLURM_NPROCS
This environment variable passes along the total number of tasks requested with the bsub -n command to all subsequent
srun commands. User scripts can override this value with the srun -n command, but the new value must be less than or
equal to the original number of requested tasks.
LSF-HPC dispatches all jobs locally. The default installation of LSF-HPC for SLURM on the HP XC system provides a job starter script that is configured for use by all LSF-HPC queues. This job starter script adjusts the LSB_HOSTS and LSB_MCPU_HOSTS environment variables to the correct resource values in the allocation. Then, the job s tarter script uses the srun command to launch the user task on the first node in the allocation.
If this job starter script is not configured for a queue, the user jobs begin execution locally on the LSF-HPC execution host. In this case, it is recommended that the user job uses one or mo re srun commands to make use of the resources allocated to the job. Work done on the LSF-HPC execution host competes for CPU time with the LSF-HPC daemons, and could affect the overall performance of LSF-HPC on the HP XC system.
The bqueues -l com mand displays the full queue configuration, including whether or not a job starter script has been configured. See the Platform LSF documentation or the
bqueues
(1) manpage for more information on the use of this command.
For example, consider an LSF-HPC LSF configuration in which nod e n20 is the LSF-HPC execution host and nodes n[1-10] are in the SLURM lsf pa rtition. The default normal
Using LSF 7-3
queue contains the job starter script, but the unscripted queue does not have the job starter script configured.
Example 7-1: Comparison of Queues and the Configuration of the Job Starter Script
$ bqueues -l normal | grep JOB_STARTER JOB_STARTER: /opt/hptc/lsf/bin/job_starter.sh
$ bqueues -l unscripted | grep JOB_STARTER JOB_STARTER:
$ bsub -Is hostname Job <66> is submitted to the default queue <normal>. <<Waiting for dispatch...>> <<Starting on lsfhost.localdomain>> n10
$ bsub -Is -q unscripted hostname Job <67> is submitted to the default queue <unscripted>. <<Waiting for dispatch...>> <<Starting on lsfhost.localdomain>> n20
•Usethebjobs -l and bhist -l LSF commands to see the components of the actual SLURM allocation command.
•Usethebkill command to kill jobs.
•Usethebjobs command to monitor job status in LSF.
•Usethebqueues command to list the configured job queues in LSF.
7.1.4 How LSF and SLURM Launch and Manage a Job
This section describes what happens in the HP XC system when a job is submitted to LSF. Figure 7-1 illustrates this process. Use the numbered steps in the text and depicted in the illustration as an aid to understanding the process.
Consider the HP XC system configuration shown in Figure 7-1, in which
lsfhost.localdomain is the LSF execution host, node n16 is the login node, and nodes n[1-10] are compute nodes in the lsf partition. All nodes contain two processors, providing
20 processors for use by LSF j obs.
7-4 Using LSF
Figure 7-1: How LSF-HPC and SLURM Launch and Manage a Job
User
1
N16
N16
N16
$ bsub-n4 -ext”SLURM[nodes-4]” -o output.out./myscript
lsfhost.localdomain
Login node
2
LSF Execution Host
4
SLURM_JOBID=53 SLURM_NPROCS=4
N1
$ hostname
n1
1. A user logs in to login node n16.
2. The user executes the following LSF bsub command on login node n16:
job_starter.sh $ srun -nl
5
$ hostname $ srun hostname
$ mpirun -srun ./hellompi
myscript
3
Compute Node
myscript
6
hostname
n1
7
srun
6
hostname
n2
7
6
hostname
n3
7
6
hostname
n4
7
N2
Compute Node
N3
Compute Node
N4
Compute Node
$ bsub -n4 -ext "SLURM[nodes=4]" -o output.out ./myscript
This bsub command launches a request for four CPUs (from the -n4 option of the bsub command) across four nodes (from the -ext "SLURM[nodes=4]" option); the job is launched on those CPUs. The script, myscript, which is shown here, runs the job:
#!/bin/sh hostname srun hostname mpirun -srun ./hellompi
3. LSF-HPC schedules the job an d monitors the state of the resources (compute nodes) in
the SLURM lsf partition. When the LSF-HPC scheduler determines that the required resources are available, LSF-HPC allocates those resources in SLURM and obtains a SLURM job identifier (jobID) that corresponds to the allocation.
In this example, four processors spread over four nodes (n1,n2,n3,n4) are allocated for myscript, and the SLURM job id of 53 is assigned to the allocation.
Using LSF 7-5
4. LSF-HPC prepares the user environment for the job on the LSF-HPC execution host
node and dispatches the job with the job_starter.sh script. This user environment includes standard LSF environment variables and two SLURM-specific environm ent variables: SLURM_JOBID and SLURM_NPROCS.
SLURM_JOBID is the SLURM job ID of the job. Note that this is not the same as the LSF jobID.
SLURM_NPROCS is the number of processors allocated. These environment variables are intended for use by the user’s job, whether it is explicitly
(user scripts may use these variables as necessary) or implicitly (any srun commands in the user’s job can use these variables to determine its allocation of resources).
The value for SLURM_NPROCS is 4 and the SLURM_JOBID is 53 in this example.
5. The user job myscript begins execution on compute node n1.
Thefirstlineinmyscript is the hostname command. It executes locally and returns the name of node, n1.
6. The second line in the myscript script is the srun hostname command. The
srun comm and in myscript inherits SLURM_JOBID and SLURM_NPROCS from the environment and executes the hostname command on each compute node in the allocation.
7. The output of the hostname tasks ( n1, n2, n3,andn4). is aggregated back to the srun
launch command (shown as dashed lines in Figure 7-1), and is ultimately returned to the srun commandinthejobstarterscript, where it is collected by LSF-HPC .
Thelastlineinmyscript is the mpirun -srun ./hellompi command. The srun command inside the mpirun command in myscript inherits the SLURM_JOBID and SLURM_NPROCS environment variables from the environment and executes hellompi on each compute node in the allocation.
The output of the hellompi tasks is aggregated back to the srun launch command where it is collected by LSF-HPC.
The command executes on the allocated compute nodes n1, n2, n3,andn4. When the job finishes, LSF-HPC cancels the SLURM allocation, which frees the compute
nodes for use by another job.
7.1.5 Differences Between LSF on HP XC and Standard LSF
LSF for the H P XC environment sup ports all the standard features and functions that standard LSF supports, except for those items described in this section, in Section 7.1.6, and in the HP XC release notes for LSF.
The external scheduler option for HP XC provides additional capabilities at the job level and queue level by allowing the inclusion of several SLURM options i n the LSF command line.
LSF does not collect maxswap, ndisks, r15s, r1m, r15m, ut, pg, io, tmp, swp and mem load indices from each application node. lshosts and lsload commands will display "-" for all of these item s .
LSF-enforced job-level run-tim e limits are not supported.
Except run-time (wall clock) and total number of CPUs, LSF cannot report any other job accounting information.
LSF does not support parallel or SLURM-based interactive jobs in PTY mode ( bsub -Is and bsub -Ip).
LSF does not support user-account mapping and system-account mapping.
7-6 Using LSF
LSF does not support chunk jobs. If a job is submitted to chunk queue, SLURM will let the job pend.
LSF does not support to pology-aware advanced reservation scheduling.
7.1.6 Notes About Using LSF in the HP XC Environment
This section provides some additional information that should be noted about using LSF in the HP XC Environment.
7.1.6.1 Job Startup and Job Control
When L SF starts a SLURM job, it sets SLURM_JOBID to associate the job with the SLURM allocation. During job running, all LSF supported operating-system-enforced resource limits are suppor ted, including core limit, cputime limit , data limit, f ile size limit, memory limit, and stack lim it. If the user kills a job, LSF propagates signals to entire job, including the job file runningonthelocalnodeandalltasksrunningonremotenodes.
7.1.6.2 Preemption Support
LSF uses the SLURM "node share" feature to support preemption. When a low-priority is job preempted, job processes are suspended on allocated nodes, and LSF places the high-priority job on the same node. After high-priority job completes, LSF resumes suspended low-priority jobs.

7.2 Determining Execution Host

The lsid co mm and displays the name of the HP XC system, and the name of the LSF execution host, along with some general LSF information.
$ lsid Platform LSF HPC 6.0 for SLURM, Sep 23 2004 Copyright 1992-2004 Platform Computing Corporation
My cluster name is penguin My master name is lsfhost.localdomain
In this example, penguin is the HP XC system name (where is user is logged in and which contains the compute nodes), and lsfhost.localdomain is the node where LSF is installed and runs (LSF execution host).

7.3 Determining Available System Resources

For best use of system resources when launching an application, it is useful to know b eforehand what system resources are available for your use. This section describes how to obtain information about system resources such as the number of processors available, LSF e xecution host node in form ation, and LSF system queues.
7.3.1 Getting Status of LSF
The bhosts command displays LSF resource usage information. This command is useful to check the status of the system processors. The bhosts command provides a summary of the jobs on the system and information about the current state of LSF. For example, it can be used to determine if LSF is ready to start accepting batch jobs.
LSF daemons r un on only one node in the HP XC system, so the bhosts command will list one host, which represents a ll the resources of the HP XC system. The total number of processors for that host should be equal to the total number of processors assigned to the SLURM lsf partition.
By default, this command returns the host name, host status, and job state statistics.
Using LSF 7-7
The following example shows the output from the bhosts command:
$ bhosts HOST_NAME STATUS JL/U MAX NJOBS RUN SSUSP USUSP RSV lsfhost.localdomain ok - 16 00000
Of note in the bhosts output:
•TheHOST_NAME column displays the n ame of the LSF execution host.
•TheMAX column displays the total processor count (usable CPUs) of all available computer nodes in the lsf partition.
•TheSTATUS column shows the state of LSF and displays a status of either ok or closed.
•TheNJOBS column displays the number of jobs.
7.3.2 Getting Information About LSF-HPC Execution Host Node
The lshosts command displays resource information about the LSF-HPC execution host node. This command is useful for checking machine-specific information.
LSF daem on s run on only one node in the HP XC system , so the lshosts command will list one host — which represents all the resources of the HP XC system. The total number of processors for that host should be equal to the total number of processors assigned to the SLURM lsf partition.
By default, lshosts returns the following information: host name, host type, host model, CPU factor,number of CPUs, total memory, total swap space, server information, and static resources.
The following example shows the output from the lshosts command:
$ lshosts HOST_NAME type model cpuf ncpus maxmem maxswp server RESOURCES lsfhost.loc SLINUX6 Itanium2 16.0 12 3456M - Yes (slurm) n7 UNKNOWN UNKNOWN_ 1.0 - - - No () n8 UNKNOWN UNKNOWN_ 1.0 - - - No () n2 UNKNOWN UNKNOWN_ 1.0 - - - No ()
Of note in the lshosts output:
•TheHOST_NAME column displays the name of the LSF-HPC execution host, lsfhost.localdomain and any other HP XC nodes that have been granted a floating client license because LSF commands were executed on them. LSF-HPC does not know about these floating client hosts, so they are listed as UNKNOWN types and models.
•Thencpus column displays the total processor count (usable CPUs) of all available computer nodes in the lsf partition.
•Themaxmem column displays minimum maxmem over all available computer nodes in the lsf partition.
•Themaxtmp column (not shown) displays minimu m maxtmp over all available computer nodes in the lsf partition. Use the lshosts -l command to display this column.
7.3.3 Getting Host Load Information
The LSF lsload command displays load information for LSF execution hosts.
$ lsload HOST_NAME status r15s r1m r15m ut pg ls it tmp swp mem lsfhost.loc ok - - - --4----
In the previous exam ple output, the LSF execution host (lsfhost.localdomain)islisted under the HOST_NAME column. The status is listed as ok, indicating that it can accept remote jobs. The ls column shows the number of current login users on this host.
7-8 Using LSF
See the OUTPUT section of the lsload manpage for furth e r information about the output of this example. In addition, refer to the Platform Com puting Corporation LSF documentation and the lsload manpage for more information about the features of this command.
7.3.4 Checking LSF System Queues
All jobs on the HP XC system that are submitted to LSF-HPC are placed into an LSF job queue.HP recommends that you check the status and availability of LSF system queues before launching a job so that you can select the most appropriate queue for your job.
Yo u can easily c heck the status, limits, and configurat ions of LSF queues with the bqueues command. This command is fully described in Platform Computing Corporation’s LSF documentation and manpages.
See the bsub
(1) manpage for more information on submitting jobs to specific queues.
Refer to the bqueues manpage for further information about this command. The bparams command displays a list of the current default queues configured for th e system.
See the bparams
(1) manpage for more details.
7.3.5 Getting Information About the lsf Partition
Information about the SLUR M lsf compute nod e partition can be viewed with the SLURM sinfo command. A partition is one or more compute nodes that h ave been grouped together.
ASLURMlsf partitio n is created when the HP XC system is installed. This partition contains the resources that will be managed by LSF-HPC andavailableforjobssubmittedtoLSF-HPC.
The sinfo command reports t system. The sinfo command such as partition names, n sorting, and formatting o
The following example shows the use of the sinfo command to obtain lsf partition information:
$ sinfo -p lsf PARTITION AVAIL TIMELIMIT NODES STATE NODELIST lsf up infinite 128 idle n[1-128]
Use the following command to obtain more infor mation on the nodes in the lsf partition:
$ sinfo -p lsf -lNe NODELIST NODES PARTITION STATE CPUS MEMORY TMP_DISK WEIGHT FEATURES REASON n[1-128] 128 lsf idle 2 3456 1 1 (null) none
he state of the lsf partition and all other partitions on the displays a summary of available partition and node information
odes/partition, and CPUs/node). It has a wid e variety of filtering,
ptions.
Refer to the sinfo(1) manpage and Chapter 6 for further information about using the sinfo command.

7.4 Submitting Jobs

The bsub command submits jobs to the LSF-HPC. This section discusses how you can use the bsub command on the HP XC system with
LSF-HPC to launch a variety of app lication s. This section focuses on enhancements to the bsub command from the L SF-HPC integration with SLURM on the HP XC system; th is section does not discuss standard bsub functionality or flexibility. See the Platform LSF documentation and the bsub
The bsub command and its opti ons, in cluding the external SLURM scheduler is used to request a set of resources on which to launch a job. See Section 7.1.2 for an introduction and Section 7.4.1 for additional information. The arguments to he bsub command consist of the user job and its arguments. T he bsub options allow you to provide information on the amoun t and type of resources needed by the job.
(1) manpage for more information on this important command.
Using LSF 7-9
The basic synopsis of the bsub command is:
bsub [bsub_options] jobname [job_options]
The HP XC system has several features that make it optimal for running parallel applications, particularly (but not exclusively) MPI applications. You can use the bsub command’s -n to request more than one CPU for a job. This option, coupled with the external SLURM scheduler, discussedinSection7.4.2,givesyoumuchflexibility in selecting resources and shaping how the job is executed on those resources.
LSF-HPC, like standard LSF, reserves the requested number of nodes and executes one instance of the job on the first reserv ed node, when you request multiple nodes. Use the srun command or the mpirun command with the -srun option i n your jobs to launch parallel applications. The -srun can be set implicitly for the mpirun command; see Section 7.4.5 for more information on using the mpirun -srun command.
Most parallel installedont access to othe
applications rely o n rsh or ssh to "launch" remote tasks. The ssh utility is
he HP XC system by default. If you configured the ssh keys to allow unprompted
r nodes in the HP XC system, the parallel applications can use ssh.
7.4.1 Summary of the LSF bsub Command Format
This section provides a summary of the fo rmat LSF bsub command on the HP X C system. The bsub command can have the following formats:
bsub
When you inv oke the bsub command without any arguments, you are prompted for a command from standard input.
bsub [bsub-options][srun ] jobname [job-arguments]
This is the bsub command format to submit a serial job. The srun com mand is required to run parallel j obs on the allocated compute node. Refer to Section 7.4.3.
bsub -n num-procs [bsub-options] jobname [job-arguments]
This is the standard bsub command f or mat to submit a parallel job to LSF execution host. The jobname parameter can be name of an executable or a batch script. If jobname is executable, job is launched on LSF execution host node. If jobname is batch script (containing srun commands), job is launched on LSF node allocation (compute nodes). LSF node allocation is created by -n num-procs parameter, which specifies the number of processors the job requests. Refer to Section 7.4.4 for information about running jobs. Refer t o Section 7.4.6 for information about running scripts.
bsub -n num-procs [bsub-options] srun [srun-options] jobname [job-arguments]
This is the bsub command format to submit a parallel job to LSF node allocation (compute nodes). LSF node allocation is created by -n num-procs parameter, which specifies the number of processors the job requests. srun command is required to run jobs on LSF node allocation. Refer to Section 7.4.4.
bsub -n num-procs [bsub-options] mpirun [mpirun-options] -srun [srun-options] mpi-jobname [job-options]
This is the bsub command format to submit an HP-MPI job. The -srun option is required. Refer to Section 7.4.5.
bsub -n num-procs -ext "SLURM[slurm-arguments]" [bsub-options] [srun [srun-options]] jobname [job-options]
This is the bsub command format to submit a parallel job to LSF node allocation (compute nodes) using the external scheduler option. The external scheduler option provides
7-10 Using LSF
additional capabilities at the job level and queue level by allowing the inclusion of several SLURM options in the LSF command line. Refer to Section 7.4.2.
7.4.2 LSF-SLURM External Scheduler
An important option that can be included in submitting parallel jobs w ith LSF is the external scheduler option: The external scheduler option provides application—specific external scheduling options for jobs capabilities and enables inclusion of several SLURM options in the LSF command line. For example, this option could be used to submit a job to run one task per node when you have a resource-intensive job which needs to have sole access to the full resources of a node. Or, if your job needed particular resources found only on a specific set of nod es, this option could be used to submit a job to th ose specific nodes. There are several options availabl e for use with the external scheduler. Refer to the list in this section.
The format for the external scheduler is:
-ext "SLURM[slurm-arguments]"
slurm-arguments can consist of one or more of the following srun options, separated
by semicolons:
SLURM Arguments
nodes=min[-max]
mincpus=<n mem=<value tmp=<value> Specify a minimum amount of temporary disk space of each
constraint=<value> Specify a list of constraints. The list may include multiple
nodelist=<list of nodes>
exclude=<list of nodes> Requests that a specific list of hosts not be included in
contiguous=yes
cpus>
>
Function
Minimum and maximum number of nodes allocated to job. The job allocation will at least contain the minimum number of nodes.
Specify minimum number of CPUs per node. Default value is 1. Specify a mi
node.
features separated by “&”or“|”. “&” represents AND-ed, “|” represent
Request a s these nodes. The list may be specified as a comma-separated list of nodes, or a range of nodes.
resource allocated to this job. The list may be specified as a comma-separated list of nodes, or a range of nodes.
Request a mandatory contiguous range of nodes.
nimum amount of real memory of each node.
sOR-ed.
pecific list of nodes. The job will at least contain
When this option is added to an LSF command line, it looks like the following:
bsub -n num-procs -ext "SLURM[slurm-arguments]" [bsub-options] [srun [srun-options]] jobname [job-options]
Refer to the LSF bsub command manp age for additional information about using the external scheduler (-ext) option. See the srun manpage for more details about the above options and their arguments.
Consider an HP XC system configuration where lsfhost.localdomain is the LSF execution host and nodes n[1-10] are compute nodes in the lsf partition. All nodes contain two processors, providing 20 p rocessors for use by LSF jobs.
Example 7-2 shows one way to submit a p arallel job to run on a specific node or nodes.
Using LSF 7-11
Example 7-2: Using the External Scheduler to Submit a Job to Run on Specific Nodes
$ bsub -n4 -ext "SLURM[nodelist=n6,n8]" -I srun hostname Job <70> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> n6 n6 n8 n8
In the previous example, the job output shows that the job was launched from t he LSF execution host lsfhost.localdomain, and it ran on four nodes using the specified nodes n6 and n8 as two of the four nodes.
Consider an HP XC system configuration i n which lsfhost.localdomain is the LSF execution host and nodes n[1-10] are compute nodes in the lsf partition. All nodes contain 2 processors, providing 20 processors for use by LSF jobs.
Example 7-3 shows one way to submit a parallel job to run one task per node.
Example 7-3: Using the External Scheduler to Submit a Job to Run One Task per Node
$ bsub -n4 -ext "SLURM[nodes=4]" -I srun hostname Job <71> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> n1 n2 n3 n4
In the previous example, the job output shows that the job was launched from t he LSF execution host lsfhost.localdomain, and it ran on four processors on four different nodes (one task per node).
Consider an HP XC system configuration i n which lsfhost.localdomain is the LSF execution host and nodes n[1-10] are compute nodes in the lsf partition. All nodes contain 2 processors, providing 20 processors for use by LSF jobs.
Example 7-4 shows one way to submit a parallel job to avoid running on a particular node. Note that this command could have been written t o exclude additional nodes.
Example 7-4: Using the External Scheduler to Submit a Job That Excludes One or More Nodes
$ bsub -n4 -ext "SLURM[nodes=4; exclude=n3]" -I srun hostname Job <72> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> n1 n2 n4 n5
7-12 Using LSF
This example runs the job exactly the same as in Example 2, but additionally requests that node n3 is not t o be used to run the job. Note that this command could have been written to exclude additional nodes.
7.4.3 Submitting a Serial Job
The synopsis of the bsub command to submit a serial (single CPU) job to LSF-HPC is:
bsub [bsub-options ][srun [srun-options]] jobname [job-options]
The bsub comm and launches the job. The srun command is only necessary to launch the job on the allocated node if the HP XC JOB
STARTER script is not configured. to run a job on the compute nodes in the lsf partition. The jobname argument is the name of an executable file. Consider an HP XC system configuration i n which lsfhost.localdomain is the LSF
execution host and nodes n[1-10] are compute nodes in the lsf partition. All nodes contain two processors, providing 20 processors for u se by LSF jobs. Example 7-5 shows one way to submit a parallel job on this system.
Example 7-5: Submitting an Interactive Serial Job
$ bsub -I hostname Job <73> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> n1
This command runs the serial hostname job interactively on a compute node (n1). The -I option specifies interactive; output appears on your display.
7.4.4 Submitting a Job in Parallel
The synopsis of the bsub command, which submits a job to run in parallel to LSF-HPC, is:
bsub -n num-procs [bsub-options ] srun [srun-options] jobname [job-options]
The bsub commandsubmitsthejobtoLSF-HPC. The-n num-procs option specifies to LSF-HPC the num ber of CPUs to be reserved for
use by the job. The srun command is t he user job launched by LSF-HPC. The srun command is used to
launch the jobname in parallel on the reserved CPUs on the allocated nodes. The jobname is the name of the executable file to be run in parallel.
7.4.5 Submitting an HP-MPI Job
The synopsis of the bsub command to submit a job to run in parallel to LSF-HPC is as follows:
bsub-n num-procs [bsub-options] mpijob
The mpijob argument has the following format:
mpirun [mpirun--options][- srun [srun-options]] mpi-jobname
The mpirun command’s -srun optionisonlyrequirediftheMPI_USESRUN environment variable is not set or if you want to use additional srun options to execute your job. See Chapter 8 for more information on the mpirun command.
Using LSF 7-13
The srun command, used by the mpirun command to launch the MPI tasks in parallel, determines th e number of tasks to launch from the SLURM_NPROCS environment variable that was set by LSF-HPC. Recall that the value of this environment variable is equivalent to the number provided by the -n option of the bsub command.
Consider an HP XC system configuration i n which lsfhost.localdomain is the LSF execution host and nodes n[1-10] are compute nodes in the lsf partition. All nodes contain 2 processors, providing 20 processors for use by LSF jobs.
Example 7-6 runs a hello_world MPI program on four processors.
Example 7-6: Submitting an HP-MPI Job
$ bsub -n4 -I mpirun -srun ./hello_world Job <75> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> Hello world! I’m 0 of 4 on n2 Hello world! I’m 1 of 4 on n2 Hello world! I’m 2 of 4 on n4 Hello world! I’m 3 of 4 on n4
Example 7-7 runs the same hello_world MPI program on four processors, but u ses the external SLUR M scheduler to request one task per n ode.
Example 7-7: Submitting an HP-MPI Job with a Specific Topology Request
$ bsub -n4 -ext "SLURM[nodes=4]" -I mpirun -srun ./hello_world Job <77> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> Hello world! I’m 0 of 4 on n1 Hello world! I’m 1 of 4 on n2 Hello world! I’m 2 of 4 on n3 Hello world! I’m 3 of 4 on n4
If the MPI job requires the use of an appfile, or has another reason that prohibits the use of the srun command as the task launcher, some preprocessing to determine the node hostnames to which mpirun’s standard task launcher should launch the tasks needs to be done. In such scenarios, you need to write a batch script; there are several methods available for determining the nodes in an allocation. One is using the SLURM_JOBID environment variable with the
squeue command to query the nodes. Another is using LSF environment variables such as LSB_HOSTS and LSB_MCPU_HOSTS, which are prepared by the HP XC job starter script.
7.4.6 Submitting a Batch Job or Job Script
The bsub command format to submit a batch job or job script is: bsub -n num-procs [bsub-options] script-name
The -n num-procs parameter specifies the number of processors the job requests. -n num-procs is required for parallel jobs. script-name is the name of the batch job or scrip t. Any bsub options can be included. T he script can contain one or more srun or mpirun com mands and options.
The script will be executed once on the first allocated node, and any srun or mpirun commands within the script can use some or all of the allocated compute nodes.
7-14 Using LSF
7.4.6.1 Examples
Consider an HP XC system configuration i n which lsfhost.localdomain is the LSF execution host and nodes n[1-10] are compute nodes in the lsf partition. All nodes contain 2 processors, providing 20 processors for use by LSF jobs.
Example 7-8 displays, then runs, a simple batch script.
Example 7-8: Submitting a Batch Job Script
$ cat ./myscript.sh #!/bin/sh srun hostname mpirun -srun hellompi $ bsub -n4 -I ./myscript.sh Job <78> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> n1 n1 n2 n2 Hello world! I’m 0 of 4 on n1 Hello world! I’m 1 of 4 on n1 Hello world! I’m 2 of 4 on n2 Hello world! I’m 3 of 4 on n2
Example 7-9 runs the same script on different resources.
Example 7-9: Submitting a Batch Script with a Specific Topologic Request
$ bsub -n4 -ext "SLURM[nodes=4]" -I ./myscript.sh Job <79> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> n1 n2 n3 n4 Hello world! I’m 0 of 4 on n1 Hello world! I’m 1 of 4 on n2 Hello world! I’m 2 of 4 on n3 Hello world! I’m 3 of 4 on n4
Example 7-10 and Example 7-11 show how the jobs inside the script can be manipulated within the allocation.
Example 7-10: Submitting a Batch Job Script that uses a Subset of the Allocation
$ bsub -n4 -ext "SLURM[nodes=4]" -I ./myscript.sh "-n 2" Job <80> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> n1 n2 Hello world! I’m 0 of 2 on n1 Hello world! I’m 1 of 2 on n2
Using LSF 7-15
Example 7-11: Submitting a Batch job Script That Uses the srun --overcommit Option
$ bsub -n4 -I ./myscript.sh "-n8 -O" Job <81> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> n1 n1 n1 n1 n2 n2 n2 n2 Hello world! I’m 0 of 8 on n1 Hello world! I’m 1 of 8 on n1 Hello world! I’m 2 of 8 on n1 Hello world! I’m 3 of 8 on n1 Hello world! I’m 4 of 8 on n2 Hello world! I’m 5 of 8 on n2 Hello world! I’m 6 of 8 on n2 Hello world! I’m 7 of 8 on n2
Example 7-12 shows some of the environment variables that are available in a batch script.
Example 7-12: Useful Environment Variables Available in a Batch Job Script
$ cat ./envscript.sh #!/bin/sh name=‘hostname‘ echo "hostname = $name" echo "LSB_HOSTS = ’$LSB_HOSTS’" echo "LSB_MCPU_HOSTS = ’$LSB_MCPU_HOSTS’" echo "SLURM_JOBID = $SLURM_JOBID" echo "SLURM_NPROCS = $SLURM_NPROCS" $ bsub -n4 -I ./envscript.sh Job <82> is submitted to default queue <normal>. <<Waiting for dispatch ...>> <<Starting on lsfhost.localdomain>> hostname = n1 LSB_HOSTS = ’n1 n1 n2 n2’ LSB_MCPU_HOSTS = ’n1 2 n2 2’ SLURM_JOBID = 176 SLURM_NPROCS = 4
7.4.7 Submitting a Job from a Non-HP XC Host
You can submit a job from a non-HP XC host to the HP XC system. This can be accomplished by adding a resource requirement string to the LSF command line. A resource requirement string describes the resources a job needs. LSF uses resource requirements to select hosts for remote execution and job execution. LSF then runs the job on a host that meets the specified resource requirements.
To submit a job from a non-HP XC host to the HP XC system, use the LSF -R option, and the HP XC host type SLINUX64 (defined in lsf.shared) in t he job submission resource requirement strin g. The necessary resource requirement string to submit a job from a non-HP XC host is specified as follows:
-R "type=SLINUX64"
7-16 Using LSF
Loading...