HP XC System 2.x Software User Manual

HP XC System Software
User’s Guide
Part Number: AA-RWJVB-TE
June 2005
Product Version: HP XC Sys
This document provides information about the HP XC user and programming environment.
tem Software Version 2.1
Hewlett-Packard Company
lo Alto, California
© Copyright 2003–2005 Hewlett-Packard Development Company, L.P.
UNIX® is a registered trademark of The Open Group.
Linux® is a U.S. registered trademark of Linus Torvalds.
LSF, Platform Computing, and the LSF and Platform Computing logos are tradema
rks or registered trademarks of Platform
Computing Corporation.
Intel®, the Intel logo, Itanium®, Xeon™, and Pentium® are trademarks or registered trademarks of Intel Corporation in the United States and other countries.
TotalView® is a registered trademark of Etnus, Inc.
Quadrics® is a registered trademark of Quadrics, Ltd.
Myrinet® and Myricom® are registered trademarks of Myricom, Inc.
Red Hat ® is a registered trademark of Red Hat Inc.
Confidential computer software. Validlicense from HP required for possessi
12.212, Commercial Computer Software, Computer Software Documenta
on, use, or copying. Consistent with FAR12.211 and
tion, and Technical Data for Commercial Items are licensed
to the U.S. Government under vendor’s standard commercial license.
The information contained herein is subject to change without notice. The only warranties for HP products and serv i ces are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. H P shall not be liable for technical or editorial errors or omissions contained herein.
About This Document
1 Overview of the User Environment
1.1
1.1.1
1.1.2
1.1.3
1.1.4
1.1.5
1.1.6
1.2
1.2.1
1.2.2
1.2.3
1.2.3.1
1.2.3.2
1.2.3.3
1.2.3.4
1.2.3.5
1.3
1.3.1
1.3.2
1.4
1.4.1
1.4.2
1.4.3
1.4.4
1.5
System Architecture ............. ................................................... .. 1-1
Operating System ..... ................................................ ........... 1-1
NodeSpecialization .......................... ................................... 1-1
Storage and I/O .......... ................................................... ..... 1-2
FileSystem ..................... ................................................ .. 1-2
System InterconnectNetwork ....................................... ........... 1-3
Network Address Translation (NAT) ......................... ................. 1-4
UserEnvironment .......................................... .......................... 1-4
LVS ....................... ................................................... ..... 1-4
Modules ................................. ......................................... 1-4
Commands .......................................... ............................. 1-4
Linux Commands ......... .................................................. 1-5
LSF Commands .. ................................................... ........ 1-5
SLURM Commands ... ................................................... .. 1-5
HP-MPI Commands .............................. .......................... 1-5
Modules Commands .............. ......................................... 1-5
Application Development Environment ......... ................................... 1-5
ParallelApplications . ................................................... ........ 1-5
Serial Applications ...... ................................................ ........ 1-5
Run-Time Environment .................................... .......................... 1-6
SLURM ............................ ............................................... 1-6
Load Sharing Facility (LSF-HPC) ............................... .............. 1-6
How LSF-HPC and SLURM Interact ..................................... ..... 1-6
HP-MPI ............. ................................................... ........... 1-7
Components, Tools, Compilers, Libraries, and Debuggers ........................ 1-7
Contents
2 Using the System
2.1
2.1.1
2.1.2
2.2
2.2.1
2.2.2
2.2.3
2.2.4
2.2.5
2.2.6
2.2.6.1
2.2.6.2
2.2.7
2.2.8
2.2.9
2.2.10
Logging in to the System ................. ............................................ 2-1
LVS Login Routing ... ................................................ ........... 2-1
Using ssh to Log In ...................... ...................................... 2-1
Configuring Your Environment with Modulefiles .... ............................. 2-1
Notes on Modulefiles .................................. .......................... 2-2
SuppliedModulefiles .......... .................................................. 2-3
Modulefiles AutomaticallyLoaded on the System ........................... 2-4
Viewing Available Modulefiles.................... ............................. 2-4
Viewing Loaded Modulefiles .......... ......................................... 2-4
Loading aModulefile ....................................... .................... 2-4
Loading aModulefilefor the Current Session ........................... 2-5
Automatically Loadinga Modulefile at Login .... ....................... 2-5
Unloading a Modulefile ......................... ................................ 2-5
Modulefile Conflicts ........................................... ................. 2-5
Creatinga Modulefile .......................................... ................. 2-6
Viewing Modulefile-Specific Help ......................... .................... 2-6
ontents iii
C
2.3
2.3.1
2.3.2
2.3.3
2.3.4
2.3.5
2.3.5.1
2.3.5.2
2.3.5.3
2.3.5.4
2.3.6
2.3.7
2.3.8
2.4
2.4.1
2.4.2
2.5
Launching andManaging JobsQuick Start ........................... .............. 2-7
Introduction ............................. ......................................... 2-7
Getting Information About Queues ............ ................................ 2-7
Getting Information About Resources ................................. ........ 2-7
Getting Information About the System’sPartitions .... ....................... 2-8
Launching Jobs ................... ............................................... 2-8
Submitting a Serial Job ................. ................................... 2-8
Submitting a Non-MPIParallel Job ......................... .............. 2-9
Submitting an MPI Job ........................... .......................... 2-10
Submitting a Batch Job or Job Script ............................. ........ 2-11
Getting Information About Your Jobs ............... .......................... 2-12
Stoppingand Suspending Jobs ......................................... ........ 2-12
Resuming Suspended Jobs ............................... ....................... 2-12
Performing Other Common User Tasks ....... ...................................... 2-12
Determining the LSF Cluster Name and LSF Execution Host ............... 2-12
Installing Third-Party Software ......................... ....................... 2-12
Getting System Help andInformation ................................... ........... 2-12
3 Developing Applications
3.1
3.2
3.2.1
3.2.2
3.2.3
3.2.4
3.2.5
3.3
3.4
3.5
3.6
3.6.1
3.6.1.1
3.6.2
3.6.2.1
3.7
3.7.1
3.7.1.1
3.7.1.2
3.7.1.3
3.7.1.4
3.7.1.5
3.7.1.6
3.7.1.7
3.7.1.8
3.7.1.9
3.7.1.10
3.7.1.11
3.7.1.1
3.7.1.1
3.7.1.1
3.7.1.1
3.7.2
3.7.2.1
Overview ......................... ................................................... .. 3-1
Using Compilers ......................................... ............................. 3-2
StandardLinux Compilers ...................... ................................ 3-2
Intel Compilers ........... ................................................... ..... 3-2
PGI Compilers .. ................................................... .............. 3-2
Pathscale Compilers .................... ......................................... 3-3
MPI Compiler .............................. ...................................... 3-3
CheckingNodes and Partitions Before Running Jobs ............................. 3-3
Interruptinga Job .... ................................................... .............. 3-3
Setting DebuggingOptions .. ................................................ ........ 3-3
Developing Serial Applications ................................. .................... 3-3
Serial Application BuildEnvironment .................................... ..... 3-4
Using MLIB in Serial Applications ......................... .............. 3-4
BuildingSerial Applications ..... ............................................... 3-4
Compiling and Linking Serial Applications .............................. 3-4
Developing ParallelApplications ............................ ....................... 3-4
ParallelApplication Build Environment ....... ................................ 3-5
Modulefiles ........................... ...................................... 3-5
HP-MPI ..... ................................................ ................. 3-5
OpenMP ............. ................................................... ..... 3-5
Pthreads .... ................................................ ................. 3-5
Quadrics S
MLIB Math
MPI Libra
Intel For
PGI Fortr
GNU C and C
GNU Para
2 3 4 5
MKL Libr
ACML Lib
Other Li
Reserve Buildin
Compil
g Parallel Applications ......... .........................................
HMEM ....................... ...................................
Library ... ................................................ .....
ry ................................. ................................
tran and C/C++Compilers ........................................
an and C/C++Compilers ................ .......................
llelMake ................................. .......................
ary ........ ................................................ ........
rary .................................... ..........................
braries ........................... ...................................
d Symbolsand Names .............................. ..............
ing and LinkingNon-MPI Applications ........... ..............
++ Compilers ............... ................................
3-6 3-6 3-6 3-7 3-7 3-7 3-7 3-7 3-7 3-7 3-8 3-8 3-8
v Contents
i
3.7.2.2
3.7.2.3
3.8
3.8.1
3.9
3.9.1
3.9.1.1
3.9.1.2
3.9.1.3
3.9.2
3.9.3
3.9.3.1
3.9.3.2
3.9.4
Developing Librari
Advanced Topics ........ ................................................ .............. 3-10
Compiling a nd Linkin
Examples of Compili
es ........................... ......................................
Designing Librarie
Using the GNU Paral
Example Procedure
Example Procedure
Example Procedur Local Disks on Com I/O Performance C
Shared FileView ............ ............................................... 3-14
Private File View Communication B
4 Debugging Applications
4.1
4.2
4.2.1
4.2.1.1
4.2.1.2
4.2.1.3
4.2.1.4
4.2.1.5
4.2.1.6
4.2.1.7
4.2.1.8
Debugging Serial Applications ........................................... ........... 4-1
Debugging Parallel Applications ................................... ................. 4-1
Debugging with TotalView ........................ ............................. 4-2
SSH and TotalView ............. ............................................ 4-2
Setting Up TotalView ................ ...................................... 4-2
Using TotalView with SLURM ..... ...................................... 4-3
Using TotalView with LSF-HPC .......................................... 4-3
StartingTotalView for the First Time .. ................................... 4-4
Debugging an Application ............................. .................... 4-8
Debugging Running Applications ........................... .............. 4-10
Exiting TotalView ........... ............................................... 4-11
g HP-MPI Applications ...........................
ng and Linking HP-MPI Applications .......... ..
s forXC4000 ......... ......................................
lel Make Capability ....................... ..............
1 ................. ......................................
2 ................. ......................................
e 3 ............................. ..........................
puteNodes ..................................... ...........
onsiderations .................. .............................
.................................... .......................
etween Nodes ....................................... ........
3-8 3-8 3-9 3-9
3-10 3-12 3-13 3-13 3-14 3-14
3-14 3-15
5 Tuning Applications
5.1
5.1.1
5.1.2
5.1.3
Using theIntel Trace Collector/Analyzer ........... ................................ 5-1
Buildinga Program — Intel Trace Collector andHP-MPI ........... ........ 5-1
Running aProgram — Intel Trace Collector and HP-MPI ........ ........... 5-2
Visualizing Data— Intel TraceAnalyzer and HP-MPI ...................... 5-2
6UsingSLURM
6.1
6.2
6.3
6.4
6.4.1
6.4.1.1
6.4.1.2
6.4.2
6.4.3
6.4.4
6.4.5
6.4.5.1
6.4.5.2
6.4.5.3
6.4.5.4
6.4.6
Introduction .... ................................................... .................... 6-1
SLURM Commands ........................................ .......................... 6-1
Accessing the SLURM Launching Jobs with
The srun Roles and Mo
srun Roles .............................. ................................... 6-3
srun Modes ............................. ................................... 6-3
srun Signal Handlin srun Run-Mode Opti srun Resource-All srun Control Optio
Node Management Op
Working Features O
Resource Control
HelpOptions ........ ................................................... ..... 6-8
srun I/O Options .................................. ............................. 6-8
Manpages ................... ................................
the srun Command .........................................
des ..... ...............................................
g .............. ............................................
ons ................................ .......................
ocation Options ................................ ...........
ns ............................................ ..............
tions ................... .............................
ptions ............................. ....................
Options ................................ .................
6-2 6-2 6-2
6-4 6-4 6-5 6-7 6-7 6-7 6-8
ontents v
C
6.4.6.1
6.4.6.2
6.4.7
6.4.8
6.4.9
6.4.10
6.5
6.6
6.7
6.8
6.9
6.10
Monitoring Jobs with the squeue Command ........... .......................... 6-12
Killing Jobs with the scancel Command ........................................ 6-13
Getting System Information with the sinfo Command .................. ........ 6-13
Job Accounting ....................................... ................................ 6-14
Fault Tolerance ............. ................................................... ........ 6-14
Security .............................. .................................................. 6-14
7 Using LSF
7.1
7.1.1
7.1.2
7.1.3
7.1.4
7.1.5
7.1.6
7.1.6.1
7.1.6.2
7.2
7.3
7.3.1
7.3.2
7.3.3
7.3.4
7.3.5
7.4
7.4.1
7.4.2
7.4.3
7.4.4
7.4.5
7.4.6
7.4.6.1
7.4.7
7.5
7.5.1
7.5.1.1
7.5.1.2
7.5.2
7.5.3
7.6
7.6.1
7.6.2
7.7
Introductionto LSFin the HP XC Environment ............... .................... 7-1
Determining ExecutionHost ......................... ................................ 7-7
Determining Available System Res
Submitting Jobs ........................... ............................................ 7-9
Getting Information About Jobs ............ ......................................... 7-17
Working Interactively Withinan LSF-HPC Allocation .................... ........ 7-20
LSF Equivalents of SLURM srun Options .......... ............................. 7-23
I/O Commands ........................ ...................................... 6-8
I/O Redirection Alternatives .......................................... ..... 6-9
srun ConstraintOptions .................... ................................... 6-10
srun Environment Variables ...................... ............................. 6-12
Using srun with HP-MPI ................ ...................................... 6-12
Using srun with LSF .............................. ............................. 6-12
Overviewof LSF ................................... ............................. 7-1
Topology Support ....................... ......................................... 7-2
Notes on LSF-HPC ............ .................................................. 7-3
How LSF and SLURM Launch and Manage a J Differences Between LSF on HP XC and S Notes About Using LSF in the HP XC Env
Job Startup and Job Control ............... ................................ 7-7
Preemption Support ......... ............................................... 7-7
ources .................................. ........
Getting Status of LSF ................................. .......................... 7-7
Getting Information About LSF-H
Getting Host Load Information .............. ................................... 7-8
CheckingLSF System Queues ................. ................................ 7-9
Getting Information About the ls
Summary of the LSF bsub Command F LSF-SLURM External Scheduler
Submitting a Serial Job ................................... ....................... 7-13
Submitting a Jobin Parallel ..................... ................................ 7-13
Submitting an HP-MPI Job ...... ............................................... 7-13
Submitting a Batch Job or Job Script .... ...................................... 7-14
Examples ........................... ......................................... 7-15
Submitting a Jobfrom aNon-HPXC Host ............................... ..... 7-16
Getting Job Allocation Information ............... ............................. 7-17
Job Allocation Informationfor a RunningJob ...................... ..... 7-17
Job Allocation Informationfor a Finished Job ............. .............. 7-18
CheckingStatusof a Job .................. ...................................... 7-18
Viewing a Job’s Historical Information ........ ................................ 7-19
Submitting an Interactive Job to Launch the xterm Program ......... ..... 7-20
Submitting an Interactive Job to Launch a Shell ......................... ..... 7-22
PC Execution Host Node ................
f Partition ................... ..............
................................... ...........
ob ...........................
tandard LSF .... .................
ironment ........... ..............
ormat ........... ....................
7-4 7-6 7-7
7-7
7-8
7-9
7-10
7-11
8UsingHP-MPI
8.1
i Contents
v
Overview ......................... ................................................... .. 8-1
8.2
8.3
8.3.1
8.3.2
8.3.2.1
8.3.2.2
8.3.3
8.3.3.1
8.3.3.2
8.3.3.3
8.3.4
8.3.5
8.4
8.4.1
8.4.2
8.5
8.6
8.7
8.8
8.9
8.9.1
8.9.2
8.9.3
8.9.4
8.9.5
8.9.6
8.9.7
8.9.8
8.9.9
8.9.10
8.10
8.11
8.12
HP-MPI Directory Str Compiling and Runni
Setting Envir onmen Building and Runnin
Example Applicati Building and Runni
Using srun with HP-
Launching MPI Jobs Creating Subshel System Interconn
UsingLSFandHP-M
MPI Versioning .... ................................................... ........... 8-7
System Interconn
HP-MPI Performance on HP XC with Multiple S ystem Interconnects ..... 8-7
Global Environment Variable Settings on the mpirun Command Line ... 8-8
32-Bit Builds on XC4000 .................................. .......................... 8-8
Truncated Messages .... ................................................ .............. 8-8
Allowing Windows to UseExclusive Locks .......... ............................. 8-8
The mpirun Command Options .................... ................................ 8-9
Environment Variables ........................................ ....................... 8-10
MPIRUN_OPTIONS ............................... ............................. 8-10
MPIRUN_SYSTEM_OPTIONS ............................................... 8-10
MPI_IC_ORDE
MPI_PHYSICAL_MEMORY .................. ................................ 8-10
MPI_PIN_PERCENTAGE .................................... ................. 8-11
MPI_PAGE_A
MPI_MAX_WINDOW .................................. ....................... 8-11
MPI_ELANLOCK .................. ............................................ 8-11
MPI_USE_
MPI_USE_LIBELAN_SUB ................................... ................. 8-12
MPICH Object Compatibility ........................ ................................ 8-12
HP-MPI Documentation and Manpages ................................. ........... 8-13
Additional Information,Known Problems, and Work-arounds ..... .............. 8-14
ucture ................ .........................................
ng Applications .................. .............................
t Variables .................... .............................
g an Example Application ...................... ........
on hello_world ...... .............................
ng hello_world ................. .................
MPI .......... ............................................
................................... ....................
ls and LaunchingJobsteps ............. .................
ect Selection ....................................... .....
PI ...................... ...................................
ect Support .......................................... ..............
R ........... ................................................... ..
LIGN_MEM ..... ...............................................
LIBELAN .................................... .......................
8-2 8-2 8-2 8-2 8-3 8-3 8-4 8-4 8-5 8-5 8-6
8-7
8-10
8-11
8-11
9UsingHPMLIB
9.1
9.1.1
9.1.2
9.2
9.2.1
9.2.2
9.2.3
9.2.4
9.2.5
9.2.6
9.2.6.1
9.2.6.2
9.2.6.3
9.2.6.4
9.2.7
9.2.8
9.3
Overview ......................... ................................................... .. 9-1
Intel Compiler Notes ............. ............................................... 9-1
MLIB and Module Files .......... ............................................... 9-2
HP MLIB for the HP XC6000 Platform ............................................ 9-2
PlatformSupport ........................ ......................................... 9-2
Library Support ....... ................................................ ........... 9-2
MPI Parallelism .................................. ................................ 9-2
Modulefiles and MLIB ..... ................................................ ..... 9-3
Using Intel Compilers with HP MLIB ...... ................................... 9-3
Compiling and Linking ................................ .......................... 9-3
Linking VECLIB ............ ............................................... 9-3
Linking LAPACK ................................ .......................... 9-3
Linking ScaLAPACK ..................................... ................. 9-3
Linking SuperLU_DIST ................... ................................ 9-4
Licensing .................... ................................................... .. 9-4
MLIB Manpages .................. ............................................... 9-4
HP MLIB for the HP XC4000 Platform ............................................ 9-4
ontents vii
C
9.3.1
9.3.2
9.3.3
9.3.4
9.3.5
9.3.5.1
9.3.5.2
9.3.5.3
9.3.5.4
9.3.6
9.3.7
PlatformSupport ........................ ......................................... 9-4
Library Support ....... ................................................ ........... 9-4
MPI Parallelism .................................. ................................ 9-5
Modulefiles and MLIB ..... ................................................ ..... 9-5
Compiling and Linking ................................ .......................... 9-5
Linking VECLIB ............ ............................................... 9-5
Linking LAPACK ................................ .......................... 9-5
Linking ScaLAPACK ..................................... ................. 9-5
Linking SuperLU_DIST ................... ................................ 9-6
Licensing .................... ................................................... .. 9-6
MLIB Manpages .................. ............................................... 9-6
10 Advanced Topics
10.1
10.2
EnablingRemoteExecution with OpenSSH ..................................... .. 10-1
Running an X Terminal Session from aRemoteNode ........ .................... 10-1
AExamples
A.1 A.2 A.3 A.3.1 A.3.2 A.4 A.5 A.6 A.7 A.8
Building and Run Launching a Seri Running LSF Job
Example 1. Two P Example 2. Four
Launching a Par Submitting a S Submitting an Submitting an Using a Resour
ninga SerialApplication .................................... .....
al Interactive Shell ThroughLSF ................................
s with a SLURM Allocation Request ...................... .....
rocessors on Any Two Nodes .................. ...........
Processors on Two Specific Nodes ........ .................
allel Interactive Shell Through LSF ............................ ..
imple JobScript with LSF ....... ...................................
Interactive Job with LSF ................................. ...........
HP-MPI Job with LSF .............................. .................
ce RequirementsString in an LSF Command ....................
A-1 A-1 A-2 A-2 A-3 A-3 A-5 A-6 A-7 A-9
Glossary
Index
Examples
2-1 2-2 2-3 2-4 2-5 2-6 3-1 3-2 6-1 6-2 6-3 6-4 6-5 6-6 6-7 6-8 7-1
Submitting a Submitting Submitting RunninganM RunninganM Submitting Directory Recommend Simple Lau Displayin Reporting Killing a J Cancelli Sending a Using the Reportin Comparis
a Non-MPIParallel Job ....................... ..........................
a Non-MPIParallel Jobto Run One Taskper Node ..................
a JobScript ................... ............................................
Structure .................... ...............................................
ed Directory Structure ............ ......................................
nch of a Serial Program ................................... ..............
g Queued Jobs by Their JobIDs .... ......................................
on Failed Jobs in theQueue................. .............................
ob by Its JobID . ................................................... ........
ng All Pending Jobs .... ................................................ .....
Signal to a Job .......................... ...................................
sinfo Command (NoOptions) ............... ..........................
g Reasonsfor Downed, Drained, andDraining Nodes ..................
on of Queues and the Configuration of the Job Starter Script ..........
Serial Job ................... ............................................
PI Job with LSF ..................................... .................
PI Job with LSF Using the External Scheduler Option ...........
2-8
2-9 2-10 2-10 2-10
2-11 3-10 3-10
6-2 6-12 6-13 6-13 6-13 6-13 6-13 6-14
7-4
iii Contents
v
7-2 7-3 7-4
7-5 7-6 7-7 7-8 7-9 7-10 7-11 7-12 7-13 7-14 7-15 7-16 7-17 7-18 7-19 7-20 7-21 7-22 7-23 8-1 8-2 8-3 8-4 8-5
Using the External Sc Using the External S Using the External S
Nodes ......................................... ......................................... 7-12
Submitting an Inte
Submitting an HP-MPI Job ....................... ................................... 7-14
Submitting an HP-MPI Job with a Specific TopologyRequest ........... ........ 7-14
Submitting a Batch Job Script ..... .................................................. 7-15
Submitting a Batch Script with a SpecificTopologic Request .................... 7-15
Submitting a Batch Job Script that uses a Subset of the Allocation ......... ..... 7-15
Submitting a Batch job Script That Uses the srun --overcommit Option . 7-16 Useful Environme Using the bjobs C Using the bjobs C Using the bhist C Using the bhist View Your Envir View Your Allocation in SL View Your Running Job in LS
View Job Details in LSF .. ................................................ ........... 7-21
Running Jobs from an xter Submitting an Interacti Submitting an Interacti Performing System Inter Using TCP/IP over Gi gab Using TCP/IP over Elan4 Allocating and Attachi Allocating 12 Process
heduler to Submit a Job to Run on Specific Nodes .......
cheduler to Submit a Job to Run One Task per Node ......
cheduler to Submit a Job That Excludes One or More
ractive Serial Job ....... .........................................
nt Variables Available in a Batch Job Script ...................
ommand (Short Output) ..................................... ..
ommand (Long Output) ...................... .................
ommand (Short Output) ..................................... ..
Command (Long Output) .......... .............................
onment ....................................... .......................
URM ...................... .............................
F ............... ......................................
m Window ... .........................................
ve Shell Program .............. .............................
ve Shell Programon the LSF ExecutionHost ......... ..
connect Selection ........................... ..............
it Ethernet ........................................... .....
.................................. ..........................
ng Processors ................... .............................
ors on 6 Nodes ............ ...................................
7-12 7-12
7-13
7-16 7-18 7-19 7-19 7-20 7-21 7-21 7-21
7-22 7-22 7-23
8-6 8-6 8-6 8-6 8-7
Figures
4-1 4-2 4-3 4-4 4-5 7-1
Tables
2-1 3-1 3-2 6-1 7-1 7-2 8-1 8-2
TotalView Root Window ....................... ...................................... 4-4
TotalView Preferences Window ........................ ............................. 4-5
TotalView Process Window Example ............................................. .. 4-9
Unattached Window ................ ................................................ .. 4-10
AttachedWindow .... ................................................... .............. 4-11
How LSF-HPC and SLURM Launch and Manage aJob ...... .................... 7-5
SuppliedModulefiles ........................ ......................................... 2-3
Intel Compiler Commands ................................. .......................... 3-2
PGI Compiler Commands ......................... ................................... 3-2
SLURM Commands ........................................ .......................... 6-1
Output Provided by the bhist Command .................................... ..... 7-19
LSF Equivalents of SLURM srun Options .......... ............................. 7-23
Organization of the /opt/hpmpi Directory ........................... ........... 8-2
HP-MPI Manpage Categories ............... ......................................... 8-13
ontents ix
C
About This Document
This manual provides information about using the features and functions of the HP XC System Software and describes how the HP XC user and programming environments differ from standard Linux® system environments. In addition, this manual focuses on building and running a pplications in the HP XC environment and is intended to guide an application developer to take maximum advantage of HP XC features and functions by providing an understanding of the underlying mechanisms of the HP XC programming environment.
An HP XC system is integrated with several open source software components. Some open source software components are being used for underlying technology, and their deployment is transparent. Some open source software components require HP XC-specific user-level documentation, and that kind of information is included in this document, if required.
HP relies on the documentation provided by the op en source developers to supply the information you need to use their product. For links to open source software documentation for products that are integrated with your XC system, see Supplementary Information.
Documentation for third-party hardware and software components that are supported on the HP XC system is supplied by the third-party vendor. However, information about the operation of third-party software is included in this document if the functionality of the third -party component differs f rom standard behavior when used in the XC environment. In this case, HP XC documentation supersedes information supplied by the third-p a rty v endor. For l inks t o related third-party Web sit es, see Supplementary Information.
Standard Linux® administrative tasks or the functions provided by standard Linux tools and commands are documented in commercially available Linux reference manuals and on various Web sites. For more information about obtaining documentation for standard Linux administrative tasks and associated topics, see the list of Web sites and additional publications provided in Related Inform a tion.

Intended Audience

This manual is intended for experienced Linux u sers who run applications developed by others, and for experienced system or application d evelopers who develop, build, and run application code on an HP XC system.
This manual assumes that the user understands, and has experience with, multiprocessor systems and the Message Passing Interface (MPI), and is familiar with HP XC architecture and concepts.

Document Organization

This document is organized as follows:
hapter 1 provides an overv iew of the HP XC user, programming, and run-time
C nvironment.
e
Chapter 2 describes h ow to perform common user tasks on the HP XC system.
Chapter 3 describes how to build and run applications on the HP X C system.
hapter 4 describes how to debug applications on the HP XC system.
C
Chapter 5 describes how to better tune applications for the HP XC system.
Chapter 6 describes how to use SLURM on the HP XC system.
apter 7 describes how to use LSF® on the HP X C system.
Ch
Chapter 8 describes how to use HP-MPI on the HP XC system.
About This Document xi
Chapter 9 describes how to use MLIB on the HP XC system.
Appendix A provides examples of HP X C applications.
•TheGlossary provides definitions of the terms used in this manual.

HP XC Information

The HP XC System So ftwa re Documentation Set includes the following core documents. All XC documents, except the HP XC System Software Release Notes,areshippedontheXC documentation CD. All XC documents, including the HP XC System Software Release Notes, are available on line at the following URL:
http://www.hp.com/techservers/clusters/xc_clusters.html
HP XC System Software Release Notes Contains important, last-minute information about
HP XC Hardware Preparation Guide Describes tasks specific to HP XC that are required to
HP XC System Software Installation Guide Provides step-by-step instructions for installing the HP
HP XC System Software Administration Guide
HP XC System Software User’s Guide Provides an overview of managing the HP XC user
firmware, software, or hardware that might affect your system. This document is only available on line.
prepare each supported cluster platform for installation and configuration, including the specific placement of nodes in the switches.
XC System Software on the head node and configuring the system.
Provides an overview of the HP XC system administration environment and describes cluster administration tasks, node maintenance tasks, LSF® administration tasks, and troubleshooting procedures.
environment with modules, managing jobs with LSF, and how to build, run, debug, and troubleshoot serial and parallel applications on an HP XC system.
The following documents are also provided by HP f or use with your HP XC system:
nux Administration Handbook
Li
A third-party Linux reference manual, Linux Administration Handbook, is sh ipped with the HP XC System Software Documentation Set. This manual was authored by Evi Nemeth, Garth
Snyder,TrentR.Hein,etal(NJ:PrenticeHall,2002).
QuickSpecs for HP XC System Software
ovides a product overview, hardware requirements, software requirements, software licensing
Pr
nformation, ordering information, and information about commercially available s oftware that
i
as been qualified to interoperate with the HP XC System Software.
h The QuickSpecs are located at the following URL:
http://www.hp.com/techservers/clusters/xc_clusters.html
P XC Program Development Environment
H
The following URL provides pointers to tools that have been tested in the HP XC program development environment (for example, TotalView® and other debuggers, compilers, and so on):
ftp://ftp.compaq.com/pub/products/xc/pde/index.html
xii About This Document
HP Message Passing Interface
HP Message Passing Interface (MPI) is an im plemen tation of the MPI standard for HP systems. The home page is located at the following URL:
http://www.hp.com/go/mpi
HP Mathematical Library
The HP math libraries (MLIB) support application developers who are looking for ways to speed up development of new applications and shorten the execution time of long-running technical applications. The home page is located at the following URL:
http://www.hp.com/go/mlib
HP Cluster Platform Documents
The cluster platform documents describe site requirements, show you how to physically set up the servers and additional devices, and provide procedures to operate and manage the hardware. These documents are shipped with your hardware.
Documentation for the HP Integrity and HP ProLiant servers is available at the following URL:
http://www.docs.hp.com/

For More Information

The HP Web site has information on this product. You can access the HP Web site at the following URL:
http://www.hp.com

Supplementary Information

This section contains links to third-party and open source components that are integrated into the HP XC System Software core technology. In the XC documentation, except where necessary, references to third-party and open source software components are generic, and the XC adjective is not added to any reference to a third-party or open source command or product name. For example, the SLURM srun command is simply referred to as the srun command.
The location of each Web site or link to a particular topic listed in this section is subject to change without notice by the site provid e r.
http://www.platform.com
Home pa ge for Platform Computing, th e developer of the Load Sharing Facility (LSF). LSF, the batch system resou rce manager used on an XC system, is tightly integrated with the HP XC and SLURM software.
For your convenience, the following Platform LSF documents are shipped on the HP XC documentation CD in PDF format. The Platform LSF documents are also available on the XC Web site.
- Administering Platform LSF
- Administration Primer
- Platform LSF Reference
- Quick Reference Card
- Running Jobs with Platform LSF
http://www.llnl.gov/LCdocs/slurm/
Home page for the Simple Linux Utility for R esource Management (SLURM), which is integrated with LSF to manage job and compute resources on an XC system.
About This Document xiii
http://www.nagios.org/
Home page for Nagios®, a system and network monitoring application. Nagios watches specified hosts and services and issues alerts when problems occur and when problems are resolved. Nagios provides the monitoring capabilities o n an XC system.
http://supermon.sourceforge.net/
Home page fo r Supermon, a high-speed cluster monitoring system that emphasizes low perturbation, high sampling rates, and an extensible data protocol and programming interface. Supermon works in conjunction with Nagios to provide XC system m onitoring.
http://www.llnl.gov/linux/pdsh/
Home page for t he parallel distributed shell (pdsh), which executes commands across XC client nodes in parallel.
http://www.balabit.com/products/syslog_ng/
Home page for syslog-ng©, a logging tool that replaces the traditional syslog functionality. The syslog-ng tool is a flexible and scalable audit trail processing tool, and it provides a centralized, securely stored log of all devices on your network.
http://systemimager.org
Home page for SystemImager®, w hich is the underlying technology that is used to install the XC software, distribute the golden image, and distribute configuration changes.
http://www.etnus.com
Home page for Etnus, Inc., maker of the TotalView parallel debugger.
http://www.macrovision.com
http://sourceforge.net/projects/modules/
http://dev.mysql.com/

Manpages

Manpages provide online reference and command information from the command line. Manpages are supplied with the HP XC system for standard HP XC components, Linux user commands, LSF commands, and other software components that are distributed with the HP XC system.
Manpages for third-party vendor software components m a y be provided as a part of the deliverables for that component.
Using the discover to display a manpage:
$ man discover $ man 8 discover
Home page for Macrovision®, developer of the FLEXlm™ license management utility, which is used for HP XC license management.
Home page for Modules, which provide for easy dynamic m odification of a user’s environment through modulefiles, which typically instruct the module command to alter or set shell environment variables.
Home page for MySQL AB, developer of the MySQL d atabase. This Web site contains a link to the MySQL docu m entation, particularly the MySQL Reference Manual.
(8) manpage as an example, you can use either of the following commands
If you are not sure about a command you need to use, enter the man command with the -k option to obtain a list of commands that are related to the keyword. For example:
# man -k keyword
xiv About This Document

Related Information

This section provides pointers to the Web sites for related software products and provides references to useful third-party publications. The location of each Web site or link to a particular topic is subject to change without notice by the site provider.
RelatedLinuxWebSites
http://www.redhat.com
Home page for Red Hat®, distributors of Red Hat Enterprise Linux Advanced Server, a Linux distribution with which the HP XC operating environment is compatible.
http://www.linux.org/docs/index.html
Home page for the Linux Documentation Project (LDP). This Web site contains guides covering various aspects of working with Linux, from creating your own Linux system from scratch to bash script writing. This site also includes links to Linux HowTo documents, frequently asked questions (FAQs), and manpag es.
http://www.linuxheadquarters.com
Web site providing documents and tutorials for the Linux user. Documents contain instructions on installing and using applications for Linux, configuring hardware, and a variety of other topics.
http://linuxvirtualserver.org
Home page for the Linux Virtual Server (LVS), the l oad balancer running on the Linux operating system that distributes login requests on the XC system.
http://www.gnu.org
Home page for the GNU Project. This site provides online software and information for many programs and utilities that are commonly used on GNU/Linux systems. Online information include guides for using the bash shell, emacs, make, cc, gdb,andmore.
Related MPI Web Sites
http://www.mpi-forum.org
Contains the official MPI standards documents, errata, and archives of the MPI Forum. TheMPIForumisanopengroupwithrepresentatives from many organizations that define and maintain the MPI standard.
http://www-unix.mcs.anl.gov/mpi/
A comprehensive site containing general information, such as the specification and FAQs, and pointers to a variety of other resources, including tutorials, implementations, and other MPI-related sites.
Related Compiler Web Sites
http://www.intel.com/software/products/compilers/index.htm
Web site for Intel® compilers.
http://support.intel.com/support/performancetools/
Web site for general Intel software development information.
http://www.pgroup.com/
Home page for The Portland Group™, supplier of the PGI® compiler.
Additional Publications
For m ore information about standard Linux system ad ministration or other related software topics, refer to the following documents, which must be purchased separately:
About This Document xv
Linux Admin istration Unleashed, by Thomas Schenk, et al.
Managing N FS and NIS, by Hal Stern, Mike Eisler, and Ricardo Labiaga (O’Reilly)
MySQL, by Paul Debois
MySQL Cookbook, by Paul Debois
High Performance MySQL, by Jeremy Zawodny and Derek J. Balling (O’Reilly)
Perl Cookbook, Second Edition, by Tom Christiansen and Nathan Torkington
Perl in A Nutshell: A Desktop Quick Reference , by Ellen Siever, et al.

Typographical Conventions

Italic font
Courier font
Bold text
$ and # In command examples, a dollar sign ($) represents the system
Italic (slanted) font indicates the name of a variable that you can replace in a command example or information in a display that represents several possible values.
Document titles are shown in Italic font. For exam ple: Linux Administration Handbook.
Courier font represents text that is displayed by the computer. Courier font also represents literal items, such as command names, file names, routines, directory names, path names, signals, messages, and programming language structures.
In command and interactive examples, bold text represents the literal text that you enter. For example:
# cd /opt/hptc/config/sbin
In text paragraphs, bold text indicates a new term or a term that is defined in t he glossary.
prompt for the bash shell and also shows that a user is in non-root mode. A pound sign (#) indicates that the user is in root or superuser mode.
[]
{ } In command syntax and examples, braces ({ }) indicate that
...
.
.
.
| In command syntax and examples, a pipe character ( | ) separates
xvi About This Document
In command syntax and examples, brackets ([ ]) indicate that the c ontents are optional. If the contents are separated by a pipe character ( | ), you must choose one of the items.
the contents are required. If the con tents are separated by a pipe character (|), you must choose one of the items.
In command syntax and examples, horizontal ellipsis points ( … ) indicate that the preceding element can be repeated as many times as necessary.
In programming examples, screen displays, and command output, vertical ellipsis points indicate an omission of information that does not alter the meaning or affect the user if it is not shown.
items in a list of choices.
discover(8)
A cross-reference to a manpage includes the appropriate section number in parentheses. For example, discover you can find information on the discover command in Section 8 of the manpages.
(8) indicates that
Ctrl/x
Enter The name of a keyboard key. Enter and Return both refer to the
Note
Caution
Warning
In interactive command examples, this symbol indicates that you hold down the first named key while pressing the key or button that follows the slash ( / ).
When it occurs in the body of text, the action of pressing two or more keys is shown without the box. For example:
Press Ctrl/x to exit the application.
same key.
A note calls attention to information that is important to un derstand before continuing.
A caution calls attention to important information that if not understood or followed will result in data loss, data corruption, or a system malfunction.
A warning calls atten tion to important in formation that if not understood or followed will result in personal injury or nonrecoverable system problems.

HP Encourages Yo ur Comments

HP welcomes your comments on this document. Please provide your comments and suggestions at the following URL:
http://docs.hp.com/en/feedback.html
About This Document xvii

Overview of the User Environment

The HP XC system is a collection of computer nodes, n etworks, storage, and software built into a cluster that work together to present a single system. It is designed to maximize workload and I/O performance, and provide efficient management of large, complex, and dynamic workloads. The HP XC system provides a set of integrated and supported user features, tools, and components w hich are described in this chapter.
This chapter briefly describes the components of the HP XC env iro nment. The following topics are covered in this chapter:
System architecture (Section 1.1)
User environment (Section 1.2)
Application development environment (Section 1.3)
Run-time environment (Section 1.4)
Supported tools, compilers, libraries (Section 1.5)

1.1 System Architecture

1
The HP XC architecture is designed as a clustered system with single system tra its. From a user perspective, this architecture achieves a single system view, providing capabilities such as single user login, a single file system namespace, an integrated view of system resources, an integrated program development environment, and an integrated job submission environment.
1.1.1 Operating System
The H P XC system is a high-performance compute cluster that runs H P XC Linux for High Performance Com puting Version 1.0 (HPC Linux) as its software base. Any applications that run correctly using R ed Hat Enterprise Linux Advanced Server Version 3.0 will also run correctly using HPC Linux.
1.1.2 Node Specialization
The H P XC system is implemented as a sea-of-nodes. Each node in the system contains the same software image on its local disk. There are two physical types of nodes in the system —ahead node and client nodes.
head node The node that is installed with the HP XC system software first — it
client nodes All the other the nodes that make up the system. They are replicated
is used to generate other HP XC (client) nodes. The head node is generally of interest only to the administrator of the HP XC system.
from the head node and are usually given one or more specialized roles to perform various system functions, such as logging into the system or running jobs.
The HP XC system allows for the specialization of clien t nodes to enable efficient and flexible distribution of the workload. Nodes can be assigned o ne or more specialized roles that determine how a particular node is used and what system services it provides. Of the many
Overview of the User Environment 1-1
different roles that can be assigned to a client node, the following roles contain services that are of special interest to the general user:
login role
compute role
1.1.3 Storage and I/O
The H P XC system supports both shared (global) and private (local) disks and file systems. Shared file systems can be mounted on all the other nodes by means of Lustre or NFS. This gives users a singl e view of all the shared data on disks attached to the HP XC system.
The role most visible to users is on nodes that have the login role. Nodes with the login role are where yo u log in and interact with the system to perform various tasks. For example, once logged in to a node with login role, you can execute commands, build applications, or submit jobs to compute nodes for execution. Th ere can be one or several nodes with the login role in an HP XC system, depending upon cluster size and req uiremen ts. Nodes with the login role are a part of the Linux Virtual Server ring, which distributes login requests from users. A node with the login role is referred to as a login node in this m a nual.
The compute role is assigned to nodes where jobs are to b e distributed and run. Although all nodes in the HP XC system are capable of carrying out computations, the nodes with the compute role are the primary nodes used to run jobs. Nodes with the compute role become a part of the resource pool used by LSF-HPC and SLURM, which manage and distribute the job workload. Jobs that are submitted to com pute nodes must be launched from nodes with the login role. Nodes with the compute role are referred to as compute nodes in this manual.
SAN Storage
HP XC uses the HP StorageWorks Scalable File Share (HP StorageWorks SFS), which is based on Lustre technology and uses the Lustre File System from Cluster File Systems, Inc. This is a turnkey Lustre system that is delivered and supported by HP. It supplies access to Lustre file systems through Lustre client-server protocols over various system interconnects. The HP XC system is a client to the HP StorageWorks SFS server.
Local Storage
Local storage for each node holds the operating system, a copy of the HP XC system software, and temporary space that can be used by jobs running on the node.
HP XC file systems are described in detail in S ection 1.1.4.
1.1.4 File System
Each node of the HP XC system has its own local copy of all the HP XC System Software files including the Linux distribution and also has its own local user files. Every nod e may also im port files from NFS or Lustre file servers. HP XC System Software supports NFS 3 including both client and server functionality. H P XC System Software also enables Lustre client services for high-performance and high-availability file I/O. These Lustre client services require the separate installation of Lustre software, provided with the HP Storage Works Scalable File Share (SFS).
In the case of NFS files, these can be shared purely between the nodes of the HP XC System, or alternatively can be shared between the HP XC and external systems. External NFS files can be shared with any node having a direct external network connection. It is also possible to set up NFS to import external files to HP XC nodes without external network connections, by routing through a node with an external network connection. Your system administrator can
1-2 Overview of the User Environment
choose to use either the HP XC Administrative Network, or t he XC system Interconnect, for NFS operations. The HP XC system interconnect can potentially offer higher performance, but only at the potential expense of the performance of application communications.
For high-perform ance or high-availability file I/O, the Lustre file system is available on HP XC. The Lustre file system uses POSIX-compliant syntax and semantics. The HP XC System Software includes kernel modifications required f or Lustre client services which enables the operation of t he separately installable Lustre client softw are. The Lustre file server product used on HP XC is the HP Storage Works Scalable File Share (SFS), which fully supports the HP XC.
The SFS includes HP XC Lustre client software. The SFS can be integrated with the HP XC so that Lustre I/O is performed over the same high-speed system interconnect fabric used by the HP XC . So, for example, if the HP XC system interconnect is based on a Quadrics QsNet II switch, then the SFS will serv e files over ports on that switch. The file operations are able to proceed at the full bandwidth of the HP XC system interconnect because these operations are implemented dir ectly over the low-level communications libraries. Further optimizations of file I/O can be achieved at the application level using special file system commands – implemented as ioctls – which allow a program to interrogate the attributes of the file system, modify the stripe size and other attributes of new (zero-length) files, and so on. Some of these optim izations are implicit in the HP-MPI I/O library, w hich implements the MPI-2 file I/O standard.
File System Layout
In an HP XC system, the basic file system layout is the same as that of the Red Hat Advanced Server 3.0 Linux file system.
The HP XC file system is structured t o separate cluster-specific files, base operating system files, and user-installed software files. This allows for flexibility and ease of potential upgrades of the system software as well as k eeping software from conflicting with user instal led software. Files are segregated into the following types and locations:
HP X C-specific software is located in /opt/hptc
HP XC configuration data is located in /opt/hptc/etc
Clusterwide directory structure (file system) is located in /hptc_cluster
You should be aware of th e following information about the HP XC file system layout:
Open source software that by default would be installed under the /usr/local directory
is instead installed in the /opt/hptc directory.
Software installed in the /opt/hptc directory is not intended to be updated by users.
Software packages are installed in directories under the /opt/hptc directory under their
own names. The exception to this is 3rd-party software, which usually goes in /opt/r.
There are four directories under the /opt/hptc directory that contain symbolic links
to files included in the packages:
- /opt/hptc/bin
- /opt/hptc/sbin
- /opt/hptc/lib
- /opt/hptc/man Each package directory should have a directory corresponding to each of these directories
where every file has a symbolic link created in the /opt/hptc/ directory.
1.1.5 System Interconnect Network
The HP XC system interconnect provides high-speed connectivity for parallel applications. T he system interconn ect network provides a high speed communications path used primarily for user file service and for communications within user applications that are distributed among
Overview of the User Environment 1-3
nodes of the system. The system interconnect network is a private network within the HP XC. Typically, every node in the HP XC is connected to the system interconnect.
The HP XC system interconnect can be based on either Gigabit Ethernet or Myrinet-2000 switches. The types of system interconnects that are used on HP XC systems are:
Myricom Myrinet on HP Cluster Platform 4000 (ProLiant/Opteron servers), also referred to
as XC4000 in this manual.
Quadrics QsNet II on HP Cluster Platform 6000 (Integrity servers), also referred to as
XC6000 in this manual.
Gigabit Ethernet on both XC4000 and XC6000
InfiniBand on XC4000
1.1.6 Network Address Translation (NAT)
The HP XC system uses Network Address Translation (NAT ) to allow nodes in the HP XC system that do not have direct external network connections to open outbound network connections to external network resources.

1.2 User Environment

This section introduces some basic general information about logging in, configuring, and using the HP XC environment.
1.2.1 LVS
The HP XC system uses the Linux Virtual Server (LVS) to present a single host name for user logins. LVS is a highly scalable virtual server built on a system of real servers. By using LVS, the architecture of the HP XC system is transparent to end users, and they see only a single virtual server. This eliminates the need for users to know how the system is configured in order to successfully log in and use the system. Any changes in the system configuration are transparent to end users. LVS also provides load balancing across login nodes, which distributes login requests to different servers.
1.2.2 Modules
The HP XC system provides the Modules Pack age (not to be confused with Linux kernel modules) to configure and modify the user environment. The Mod ules Package enab les dynamic m odificatio n of a user’s environment by means of modulefiles. Modulefiles provide a convenient means for users to tailor their working environment as necessary. One of the key features of modules is to allow multiple versions of the same software t o be used in a controlled manner.
A modulefile contains information to configure the shell for an application. Typically, a modulefile contains instru ctions that alters or sets shell environment variables, such as PATH and MANPATH, to enable access to various installed software. Modulefiles may be shared by many users on a system, and users may have their own coll ection to supplement or replace the shared modulefiles.
Modulefiles can be loaded into the your environment automatically when you log in to the system, or any time you need to alter the env iro nment. The HP XC system does not preload modulefiles.
1.2.3 Commands
The HP XC user environment includes standard Linux commands, LSF commands, SLURM commands, HP-MPI commands, and modules commands. This section provides a brief overview of these comm and sets.
1-4 Overview of the User Environment
1.2.3.1 Linux Commands
The HP XC system supports the use of standard Linux user commands and tools. Standard Linux com m ands are not described in this document. You can access descriptions of Linux commands in Linux documentation and manpages. Linux manpages are available by invo kin g the Linux man command with the Linux command name.
1.2.3.2 LSF Commands
HP XC supports LSF-HPC and the use of standard LSF commands, some of which operate differently in the HP XC environment from standard LSF behavior. The use of LSF commands in the HP XC env iro nment is described in Chapter 7, and in the HP XC lsf_diff manpage. Information about standard LSF commands is available in Platform Computing Corporation LSF documentation, and in the LSF manpages. For your convenience, the HP XC d ocumentation CD containsXC LSF manuals from Platform Computing. LSF manpages are available on the HP XC system.
1.2.3.3 SLURM Commands
HP XC uses th e Simple Linux Utility for Resource Management (SLURM) for system r esource management and job scheduling, and supports the use of standard SLURM commands. SLURM functionality is described in Chapter 6. Descriptions of SLURM commands are available in the SLURM manpages by invoking the man command with the SLURM command name.
1.2.3.4 HP-MPI Commands
HP XC supports th e HP-MPI software and the use of standard HP-MPI commands. Descriptions of HP-MPI commands are available in the HP-MPI do c umentation, which is supplied with the HP XC system software. HP-MPI m anpages are also available by invoking the man command with the HP-MPI command name. HP-MPI functionality is d escrib e d in Chapter 8.
1.2.3.5 Modules Commands
The HP XC system supports the u se of standard Modules commands to load and unload modulefiles that are used to configure and modify the user environment. Modules commands are described in Section 2.2.

1.3 Application Development Environment

The HP XC system provides an environment that enables dev elop ing , building, and running applications using multiple nodes with multiple processors. These applications can range from parallel applications using many processors to serial applications using a single processor.
1.3.1 Parallel Applications
The HP XC parallel application development environment allows parallel application processes to be started and stopped together on a large number of application processors, along with the I/O and process control structures to manage these kinds of applications.
Full detai ls and examples of how to build, run, debug, and troubleshoot parallel applications areprovidedinSection3.7.
1.3.2 Serial Applications
The HP XC serial application development environment supports building a nd running serial applications. A serial application is a command or application that does not use any form of parallelism.
Full details and examples of how to build, run, d ebug, and troubleshoot serial applications are provided in Section 3.6.2.
Overview of the User Environment 1-5

1.4 Run-Time Environment

In the HP XC environment, LSF-HPC, SLURM, and HP-MPI work together to provide a powerful, flexible, extensive run-time environment. This section describes LSF-HPC, SLURM, and HP-MPI, and how these components work togeth er to p rovide the HP XC run-time environment.
1.4.1 SLURM
SLURM (Simple Linux Utility for Resource Management) is a resource management system that is integrated into the HP XC system. SLURM is suitable for use on large and small Linux clusters. It was d evelo ped by Lawrence Livermore National Lab and Linux Networks. As a resource manager, SLURM allocates exclusive or non-exclusive access to resources (application/compute nodes) for users to perform work, and provides a framework to start, execute and monitor work (normally a parallel job) on the set of allocated nodes. A SLURM system consists of two daemons, one co nfiguration file, and a set of commands and APIs. The central controller daemon, slurmctld, maintai ns the global state and directs operations. A slurmd daem on is deployed to each computing node and responds to job-related requests, such as launching jobs, signalling, and terminating jobs. End users and system software (such as LSF-HPC) communicate withSLURMbymeansofcommandsorAPIs—forexample, allocating resources, launching parallel jobs on allocated resources, and killing running jobs.
SLURM group s compute nodes (the nod es where jobs are run) together into partitions.The HP XC system can have one or several partitions. When HP XC is installed, a single partition of compute nodes is created by default for LSF batch jobs. The system administrator has the option of creatin g additional partitions. For example, another partition could b e created for interactive jobs.
1.4.2 Load Sharing Facility (LSF-HPC)
The Load Sharing Facility for High Performance Computing (LSF-HPC) from Platform Computing Corporation is a batch system resource manager that has been integrated w ith SLURM for use on the HP XC system. LSF-HPC for SLURM is included with the HP XC System Software, and is an integral part of theHP XC env ironment. LSF-HPC interacts with SLURM to obtain and allocate available resources, and to launch and control all the jobs submitted to LSF-HPC. LSF-HPC accepts, queues, schedules, dispatches, and controls all the batch jobs that users subm it, according to policies and configurations established b y the HP XC site administrator. On an HP XC system, LSF-HPC for SLURM is install ed and runs on one HP XC node, known as the LSF-HPC execution host.
A complete description of LSF-HPC is provided i n Chapter 7. In addition, for your convenience, the HP XC documentation CD contains LSF Version 6.0 manuals from Platform Computing.
1.4.3 How LSF-HPC and SLURM Interact
In the HP XC environment, LSF-HPC cooperates with SLURM to combine LSF-HPC’s powerful schedu ling functionali ty with SLURM’s scalable parallel job launching capabilities. LSF-HPC acts primarily as a workload scheduler on top of the SLURM system, providing policy and topology-based scheduling for end users. SLURM provides an execution and monitoring layer for LSF-HPC. LSF-HPC uses SLURM to detect system topology information, make scheduling decisions, and launch jobs on allocated resources.
When a job is submitted to LSF-HPC, LSF-HPC schedules the job based on job resource requirements and communicates with SLURM to allocate the required HP XC compute nodes for the job from the SLURM lsf partition. LSF-HPC provides node-level scheduling for parallel jobs, and CPU-level scheduling for serial jobs. Because of node-level scheduling, a parallel job may be allocated more CPUs than it requested, depending on its resource request; the srun or mpirun -srun launch com mands within the job still honor the original CPU
1-6 Overview of the User Environment
request. LSF-HPC always tries to pack multiple serial jobs on the same node, with one CPU per job. Parallel jobs and serial jobs cannot coexist on the same node.
After the LSF-HPC scheduler allocates the SLURM resources for a job, the SLURM allocation information is recorded with the job. You can view this information with the bjobs and bhist commands.
When LSF-HPC starts a job, it sets the SLURM_JOBID and SLURM_NPROCS environment variables in the job environment. SLURM_JOBID associates the LSF-HPC job with SLURM’s allocated resources. The SLURM_NPROCS environment variable is set to the originally requested number of processors. LSF-HPC dispatches the job from the LSF-HPC execution host, which is the same node on which LSF-HPC daemons run. The LSF-HPC JOB_STARTER script, which is configured for all queues, uses the srun command to launch a user job on the first node in the allocation. Your job can contain additional srun or mpirun comm ands to launch tasks to al l nodes in the allocation.
While a jo b is running, all LSF-HPC-supported resource limits are enforced, including core limit, cputime limit, data limit, file size limit, memory li m it and stack limit. When you kill a job, LSF-HPC uses the SLURM scancel command to propagate the signal to the entire job.
After a job finishes, LSF-HPC releases all allocated resources. A detailed description, along with an example and illustration, of how LSF-HPC and SLURM
cooperate to launch and manage jobs is provided in Section 7.1.4. It is highly recommended that you review this information.
1.4.4 HP-MPI
HP-MPI is a high-performance implementation of the Message Passing and is included with the HP XC system. HP-MPI uses SLURM to launch jo system — however, it manages the global MPI exchange so that all pro with each other.
HP-MPIcompliesfullywiththeMPI-1.2standard.HP-MPIalsocomplieswiththeMPI-2 standard, with some restrictions. HP-MPI provides an application programming interface and software libraries to support parallel, message-passing applications th at are efficient, portable, and flexible. HP-MPI version 2.1 is included in this release of HP XC.
HP-MPI 2.1 for HP XC is supported on XC4000 and XC6000 clusters, and includes support for the following system interconnects:
XC4000 Clusters — Myrinet, Gigabit Ethernet, TCP/IP, InfiniBand
XC6000 Clusters — Quadrics Elan4, Gigabit Ethernet, TCP/IP
Interface standard
bs on an HP X C
cesses can communicate

1.5 Components, Tools, Compilers, Libraries, and Debuggers

This section pro vides a brief overview of the some of the common tools, compilers, libraries, and debuggers supported for u se on HP XC.
An HP XC system is integrated with several op en source software co mp onents. HP XC incorporates the Linux operating system, and its standard commands and tools, and does not diminish the Linux ABI in any way. In addition, HP XC incorporates LSF and SLURM to launch and manage jobs, and includes HP-MPI for high performance, parallel, message-passing applications, and HP MLIB math library for intensive computations.
Most standard open source compilers and tools can be used on an HP XC system, however they must be purchased separately. Several open source and commercially available software packages have been tested with the HP XC Software. The following list shows some of the software packages that have been tested for use with HP XC. This list provides an example of w hat is available on HP XC and is not intended as a complete list. Note that some of the packages listed are actually included as part of the HPC Linux distribution and as such are
Overview of the User Environment 1-7
supported as part of the HP XC. The tested software packages include, but are not limited to, the following:
Intel Fortran 95, C, C++ Compiler Version 7.1 and 8.0, including OpenMP, for Itanium
(includes ldb debugger)
gcc version 3.2.3 (included in t he HP XC distribution)
g77 version 3.2.3 (included in the HP XC distribution)
Portland Group PGI Fortran90, C, C++ Version 5.1, including OpenMP, for XC4000
Quadrics SHMEM, as part of QsNet II user libraries, on Itanium systems connected with
the Quadrics QsNet II switch (included in the HP XC distribution)
Etnus TotalView debugger Version 6.4
gdb (part of the HP XC Linux distribution)
Intel MKL V6.0 on Itanium
AMD Math Core Library Version 2.0 on XC4000
valgrind 2.0.0 (http://valgrind.kde.org) in 32-bit mode only
oprofile 0.7.1 (http://oprofile.sourceforge.net)
PAPI 3.2 (http://icl.cs.utk.edu/papi)
Intel Visual Analyzer/Tracer (formally Pallas Vampir and Vampirtrace performa nce
analyzer ) on Itanium
GNU make, including distributed parallel make (included in the HP XC distribution)
Other standard tools and libraries are available and can most likely be used on HP XC as they would on any other standard Linux system. It should be noted, however, that software that is not described in HP XC documentation may not have been tested with HP XC and may not function in a standard manner.
1-8 Overview of the User Environment
This chapter describes tasks and commands that the general user must know to use the system. It contains the following topics:
Loggingintothesystem(Section2.1)
Setting up the user environment (Section 2.2)
Launching and managing jobs (Section 2.3)
Performing som e common user tasks (Section 2.4)
Getting help (Section 2.5)

2.1 Logging in to the S ystem

Logging in to an HP XC sy stem is similar to logging in to any standard Linux system. Logins are performed on nodes that have the login role. Secure Shell (ssh) is the preferred method for accessing the HP XC system.
2.1.1 LVS Login Routing
2

Using the System

The HP XC system uses the Linux Virtual Server (LVS) facility to present a set of login nodes with a single cluster name. When you log in to the system, LVS automatically routes your login requesttoanavailableloginnodeonthesystem. LVS load balances login sessions across the login nodes and improves the availability of login access. When you log in to the HP XC system, you do not have to know specific node names to log in, only the HP XC system’s cluster name.
2.1.2 Using ssh to Log In
TologintoanHPXCsystem,youmustuseSecureShell(ssh). Typically, you access the HP XC system using the ssh command to get a login shell or to execute commands. For example:
$ ssh user-name@system-name user-name@system-name’s password:
The ssh service also allows file tr ansfer using the scp or sftp commands over the same port as ssh.
The typical r* UNIX comman XC system by default becau and password informatio UNIX commands (as well as
If you want to use ssh without password prompting, you must set up ssh authentication keys. Refer to the ssh
ssh is further discussed in Section 10.1.
(1) manpage for information about using ssh authentication keys.
ds, such as rlogin, rsh,andrcp,arenotinstalledonanHP
se of their inherent insecurity. The ssh comm and transfers all login
n in an encrypted form instead of the plaintext form used by the r*
telnet and ftp).

2.2 Configuring Your Environment with Modulefiles

The HP XC system supports the use of Modules software to make it easier to configure and modify the your environment. Modules software enables dynamic modification of your environment by the use of modulefiles. A modulefile contains information to configure the shell for an application. Typically, a modulefile contains instructions that alters or sets shell
Using the System 2-1
environment variables, such as PATH and MANPATH, to enable access to various installed software.
One of the key features of usin g modules is to allow multiple versions of the same software to be used in your environment in a controlled manner. For example, two different versions of the Intel C compiler can be installed on the system atthesametime–theversionusedisbased upon which I ntel C compiler modulefile is loaded.
The HP XC software provides a number of modulefiles. You can also create your own modulefiles. Modulefiles may be shared by many users on a system, and users may have their own collectio n of modulefiles to supplement or replace the shared mod ulefiles.
The following topics are addressed in the corresponding sections:
Section 2.2.1 provides additional information on modulefiles.
Section 2.2.2 discusses what modules are supplied.
Section 2.2.3 discusses what modules are loaded by default.
Section 2.2.4 discusses how to determine what modules are available.
Section 2.2.5 discusses how to determine which modules are loaded.
Section 2.2.6 discusses how to load a module.
Section 2.2.7 discusses how to unload a module.
Section 2.2.8 discusses module conflicts.
Section 2.2.9 discusses creating your own module.
For further information about the Modules software supplied with the HP XC system, see the Modules Web site at the following URL:
http://sourceforge.net/projects/modules/
2.2.1 Notes on Modulefiles
A mod ulefile does not provide configuration of your en vironment until it is explicitly loaded. That is, the specific modulefile for a software product or application must be loaded in your environment (with the module load command) before the configuration inform ation in the modulefile is effective.
You or your system administrator can configure you r environment so that any desired modulefiles are automatically loaded for you when you log in to the system. You can also load a modulefile yourself, as described in Section 2.2.6.
The Modules software is initialized when you log in to the HP X C system. It provides access to the co mmands that allow you to display information about modulefiles, load or unload modulefiles, or view a list of available modulefiles.
Modulefiles do not affect packages other than their intended package. For ex ample, a modulefile for a compiler will not adjust MPI_CC (the environment variable used by HP MPI to control which compiler to use). A m odulefile for a compiler simply makes it easier to access that particular co mpiler; it does not try to determine how the compiler will be used.
Similarly, a modulefile for HP MPI will not try to adjust LD_LIBRARY_PATH to correspond to the compiler that the mpicc command uses. The modulefile for MPI simply makes it easier to access the mpi** scripts and libraries. You can specify the compiler it uses through a variety of mechanisms long after the modulefile is loaded.
The previous scenarios were chosen in particular because the HP MPI mpicc command uses heuristics to try to find a suitable compiler when MPI_CC or other default-overriding mechanisms are not in effect. It is possible that mpicc will choose a c ompiler incon sistent with the most recently loaded compiler m odule. This could cause inconsistencies in the use
2-2 Using the System
of shared objects. If you have multiple compilers (perhaps with incompatible shared objects) installed, it is probably wise to set MPI_CC (and others) explicitly to the commands made available by the compiler’s modulefile.
The contents of the m odulefiles in the modulefiles_hptc RPM use the vendor-intended location of the installed software. In many cases, this is under the /opt directory, but in a few cases (for example, the PGI compilers and TotalView) this is under the /usr directory.
If you install a software package other than the intended place, you must create or edit an appropriate modulefile under the /opt/modules/modulefiles directory.
For the packages that install by default into the /usr directory (currently the PGI compilers and TotalView), their corresponding modulefiles will try their vendor-intended location under the /usr directory. If they do not find that directory, the packages will also search under the /opt directory. Therefore, no changes to the modulefiles are needed if you want to install third-party software consistently as the vendor intended or consistently under the /opt directory,
If the package is the stable product intended to be used by the site, editing an existing modulefile is appropriate. While each modulefile has its unique characteristics, they all set some variables describing the top-level directory, and editing to adjust the string should be sufficient. You may need to repeat the adjustment if you update the modulefiles_hptc RPM or otherwise rebuild your system.
If the package is a variant, for example, a beta version of a compiler, first copy the default modulefile to a well-named copy, then edit the copy. You need root access to modify the modulefiles, which is generally needed to install packages in either the /opt or /usr directories.
If a user downloads a package into a private directory, the user can create a private modulefiles directory. The user can then copy the corresponding default modulefile from under the /opt/modules/modulefiles directory into a private modulefiles directory, edit the file, and then register the directory with the module use command.
2.2.2 Supplied Modulefiles
The HP XC system provides the Modules Pack age (not to be confused with Linux kernel modules) to configure and modify the user environment. The Mod ules Package enab les dynamic m od ification of a user’s environment by means of modulefiles.
A modulefile contains inform a tion that alters or sets shell environment variables, such as PATH and MANPATH. Mo dulefiles provide a convenient means for users to tailor their working environment. Modulefiles can be loaded automatically when the user logs in to the system or any time a user needs to alter the environment.
The HP XC System Software provides a number of modulefiles. In addition, users can also create and load their own modulefiles to modify their environment further.
The HP XC system supplies the modulefiles listed in Table 2-1.

Table 2-1: Supplied Modulefiles

Modulefile Sets the HP XC User Environment:
icc/8.0 icc/8.1 ifort/8.0 ifort/8.1 intel/7.1 intel/8.0
To use Intel C/C++ Version 8.0 compilers. To use Intel C/C++ Version 8.1 compilers. To use Intel Fortran Version 8.0 compilers. To use Intel Fortran Version 8.1 compilers. For Intel Version 7.1 compilers. For Intel Version 8.0 compilers.
Using the System 2-3
Table 2-1: Supplied Modulefiles (cont.)
Modulefile Sets the HP XC User Environment:
intel/8.1 mlib/intel/7.1 mlib/intel/8.0 mlib/pgi/5.1 mpi/hp pgi/5.1 pgi/5.2 idb/7.3 totalview/default
For Intel Version 8.1 compilers. For MLIB and Intel Version 7.1 compilers. For MLIB and Intel Version 8.0 compilers. For MLIB and PGI Version 5.1 c ompilers. For HP-MPI. For PGI Version 5.1 compilers. For PGI Version 5.2 compilers. To use the Intel IDB debugger. For the TotalView debugger.
2.2.3 Modulefiles Automatically Loaded on the System
HP XC system does not load any modulefiles into you r environment by default. However,
The
re may be mod ulefiles designated by your system administrator that are automatically
the
ded. Section 2.2.5 describes how you can determine what modulefiles are currently loaded
loa
our system.
on y Users can also automatically load their own modules by creating a login script and designating
the modulefiles to be loaded in the script. Users can also add or remove modules from their current environment on a per-module basis as described in Section 2.2.6.
2.2.4 Viewing Available Modulefiles
Available modulefiles are modulefiles that have been provided with the HP XC system software and are available for you to load. A modulefile must be loaded before it provides ch anges to your environment, as described in the in troduction to this section. You can view the modulefiles that are available o n the system by issuing the module avail command:
$ module avail
2.5 Viewing Loaded Modulefiles
2.
A loaded m odulefile is a modulefile that has been explici tly loaded in your environment by the module load command. To view the modu lefiles that are currently loaded in your environment, issue the module list command:
odule list
$ m
2.2.6 Loading a Modulefile
u can load a modulefile in to your environment to enable easier access to software that you
Yo
nt to use by executing the module load command. You can load a modulefile for the
wa
rrent s ession, or you can set up your environment to load the modulefile whenever you
cu
gintothesystem.
lo When loading a modulefile, note that certain modulefiles cannot be loaded while other
modulefiles are currently loaded. For example, this can happen with different versions of the same software. If a modulefile you are attempting to load conflicts with a currently-loaded modulefile, the modulefile will not be loaded and an error message will be displayed.
2-4 Using the System
Loading...
+ 124 hidden pages