Compaq SC RMS User Manual

Compaq AlphaServer SC RMS Reference Manual
Quadrics Supercomputers World Ltd.Document Version 7 - June 22nd 2001 - AA-RLAZB-TE
The information supplied in this document is believed to be correct at the time of publica­tion, but no liability is assumed for its use or for the infringements of the rights of others resulting from its use. No license or other rights are granted in respect of any rights owned by any of the organizations mentioned herein.
This document may not be copied, in whole or in part, without the prior written consent of Quadrics Supercomputers World Ltd.
Copyright 1998,1999,2000,2001 Quadrics Supercomputers World Ltd.
The specifications listed in this document are subject to change without notice.
Compaq, the Compaq logo, Alpha, AlphaServer, and Tru64 are trademarks of Compaq Information Technologies Group, L.P. in the United States and other countries.
UNIX is a registered trademark of The Open Group in the U.S. and other countries.
TotalView and Etnus are registered trademarks of Etnus LLC.
All other product names mentioned herein may be trademarks of their respective compa­nies.
The Quadrics Supercomputers World Ltd. (Quadrics) web site can be found at:
http://www.quadrics.com/
Quadrics’ address is:
One Bridewell Street Bristol BS1 2AA UK
Tel: +44-(0)117-9075375 Fax: +44-(0)117-9075395
Circulation Control: None
Document Revision History
Revision Date Author Remarks
1 January 1999 HRA Initial Draft 2 Feb 2000 DR Updated Draft 3 Apr 2000 DR Draft changes for Product Release 4 Jun 2000 RMC Corrections for Product Release 5 Jan 2001 HRA Updates for Version 2 6 June 2001 DR Further Version 2 changes 7 June 2001 DR AlphaServer SC V2 Product Release

Contents

1 Introduction 1-1
1.1 Scope of Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.2 Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.3 Using this Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.4 Related Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
1.5 Location of Online Documentation . . . . . . . . . . . . . . . . . . . 1-3
1.6 Reader’s Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
1.7 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
2 Overview of RMS 2-1
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
2.2 The System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
2.2.1 Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-1
2.3 The Role of the RMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3
2.3.1 The Structure of the RMS . . . . . . . . . . . . . . . . . . . . . . 2-4
2.3.2 The RMS Daemons . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
2.3.3 The RMS Commands . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.3.4 The RMS Database . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
2.4 RMS Management Functions . . . . . . . . . . . . . . . . . . . . . . 2-7
2.4.1 Allocating Resources . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.4.2 Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
2.4.3 Access Control and Accounting . . . . . . . . . . . . . . . . . . . 2-9
Contents i
2.4.4 RMS Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
3 Parallel Programs Under RMS 3-1
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
3.2 Resource Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
3.3 Loading and Running Programs . . . . . . . . . . . . . . . . . . . . . 3-3
4 RMS Daemons 4-1
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-1
4.1.1 Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
4.1.2 Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
4.1.3 Daemon Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
4.2 The Database Manager . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
4.3 The Machine Manager . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
4.3.1 Interaction with the Database . . . . . . . . . . . . . . . . . . . 4-3
4.4 The Partition Manager . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
4.4.1 Partition Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
4.4.2 Interaction with the Database . . . . . . . . . . . . . . . . . . . 4-4
ii Contents
4.5 The Switch Network Manager . . . . . . . . . . . . . . . . . . . . . . 4-5
4.5.1 Interaction with the Database . . . . . . . . . . . . . . . . . . . 4-5
4.6 The Transaction Log Manager . . . . . . . . . . . . . . . . . . . . . . 4-5
4.6.1 Interaction with the Database . . . . . . . . . . . . . . . . . . . 4-6
4.7 The Event Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
4.7.1 Interaction with the Database . . . . . . . . . . . . . . . . . . . 4-6
4.8 The Process Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
4.8.1 Interaction with the Database . . . . . . . . . . . . . . . . . . . 4-7
4.9 The RMS Daemon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
4.9.1 Interaction with the Database . . . . . . . . . . . . . . . . . . . 4-8
5 RMS Commands 5-1
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1
allocate(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3
nodestatus(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-8
msqladmin(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9
prun(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11
rcontrol(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-20
rinfo(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-32
rmsbuild(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-35
rmsctl(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-37
rmsexec(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-39
rmshost(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-41
rmsquery(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-42
rmstbladm(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-44
6 Access Control, Usage Limits and Accounting 6-1
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
6.2 Users and Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
6.3 Access Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
6.3.1 Access Controls Example . . . . . . . . . . . . . . . . . . . . . . 6-3
6.4 How Access Controls are Applied . . . . . . . . . . . . . . . . . . . . 6-4
6.4.1 Memory Limit Rules . . . . . . . . . . . . . . . . . . . . . . . . . 6-4
6.4.2 Priority Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.4.3 CPU Usage Limit Rules . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.5 Accounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
7 RMS Scheduling 7-1
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
7.2 Scheduling Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-1
7.3 Scheduling Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2
7.4 What Happens When a Request is Received . . . . . . . . . . . . . . 7-3
7.4.1 Memory Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5
7.4.2 Swap Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5
7.4.3 Time Slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6
7.4.4 Suspend and Resume . . . . . . . . . . . . . . . . . . . . . . . . 7-6
Contents iii
7.4.5 Idle Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6
8 Event Handling 8-1
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1
8.1.1 Posting Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
8.1.2 Waiting on Events . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
8.2 Event Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-3
8.3 List of Events Generated . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
8.3.1 Extending the RMS Event Handling Mechanism . . . . . . . . 8-6
9 Setting up RMS 9-1
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1
9.2 Installation Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1
9.2.1 Node Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
9.3 Setting up RMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
9.3.1 Starting RMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
9.3.2 Initial Setup with One Partition . . . . . . . . . . . . . . . . . . 9-3
9.3.3 Simple Day/Night Setup . . . . . . . . . . . . . . . . . . . . . . . 9-4
iv Contents
9.4 Day-to-Day Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
9.4.1 Periodic Shift Changes . . . . . . . . . . . . . . . . . . . . . . . . 9-5
9.4.2 Backing Up the Database . . . . . . . . . . . . . . . . . . . . . . 9-5
9.4.3 Summarizing Accounting Data . . . . . . . . . . . . . . . . . . . 9-6
9.4.4 Archiving Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6
9.4.5 Database Maintenance . . . . . . . . . . . . . . . . . . . . . . . . 9-7
9.4.6 Configuring Nodes Out . . . . . . . . . . . . . . . . . . . . . . . 9-9
9.5 Local Customization of RMS . . . . . . . . . . . . . . . . . . . . . . . 9-10
9.5.1 Partition Startup . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10
9.5.2 Core File Handling . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10
9.5.3 Event Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11
9.5.4 Switch Manager Configuration . . . . . . . . . . . . . . . . . . . 9-11
9.6 Log Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-12
10 The RMS Database 10-1
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1
10.1.1 General Information about the Tables . . . . . . . . . . . . . . . 10-1
10.1.2 Access to the Database . . . . . . . . . . . . . . . . . . . . . . . . 10-2
10.1.3 Categories of Table . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
10.2 Listing of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4
10.2.1 The Access Controls Table . . . . . . . . . . . . . . . . . . . . . . 10-4
10.2.2 The Accounting Statistics Table . . . . . . . . . . . . . . . . . . 10-4
10.2.3 The Attributes Table . . . . . . . . . . . . . . . . . . . . . . . . . 10-6
10.2.4 The Elans Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-8
10.2.5 The Elites Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-9
10.2.6 The Events Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-9
10.2.7 The Event Handlers Table . . . . . . . . . . . . . . . . . . . . . . 10-10
10.2.8 The Fields Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-11
10.2.9 The Installed Components Table . . . . . . . . . . . . . . . . . . 10-12
10.2.10 The Jobs Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-12
10.2.11 The Link Errors Table . . . . . . . . . . . . . . . . . . . . . . . . 10-13
10.2.12 The Modules Table . . . . . . . . . . . . . . . . . . . . . . . . . . 10-14
10.2.13 The Module Types Table . . . . . . . . . . . . . . . . . . . . . . . 10-15
10.2.14 The Nodes Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-15
10.2.15 The Node Statistics Table . . . . . . . . . . . . . . . . . . . . . . 10-16
10.2.16 The Partitions Table . . . . . . . . . . . . . . . . . . . . . . . . . 10-17
10.2.17 The Projects Table . . . . . . . . . . . . . . . . . . . . . . . . . . 10-19
10.2.18 The Resources Table . . . . . . . . . . . . . . . . . . . . . . . . . 10-19
10.2.19 The Servers Table . . . . . . . . . . . . . . . . . . . . . . . . . . 10-20
10.2.20 The Services Table . . . . . . . . . . . . . . . . . . . . . . . . . . 10-21
10.2.21 The Software Products Table . . . . . . . . . . . . . . . . . . . . 10-22
10.2.22 The Switch Boards Table . . . . . . . . . . . . . . . . . . . . . . 10-23
10.2.23 The Transactions Table . . . . . . . . . . . . . . . . . . . . . . . 10-23
10.2.24 The Users Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-24
Contents v
A Compaq AlphaServer SC Interconnect Terms A-1
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1
A.2 Link States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4
A.3 Link Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-4
B RMS Status Values B-1
B.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
B.2 Generic Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
B.3 Job Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
B.4 Link Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
B.5 Module Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
B.6 Node Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4
B.7 Partition Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . B-5
B.8 Resource Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . B-5
B.9 Transaction Status Values . . . . . . . . . . . . . . . . . . . . . . . . B-6
C RMS Kernel Module C-1
C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1
vi Contents
C.2 Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-1
C.3 System Call Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . C-2
rms_setcorepath(3) . . . . . . . . . . . . . . . . . . . . . . . . . . C-3
rms_getcorepath(3) . . . . . . . . . . . . . . . . . . . . . . . . . . C-3
rms_prgcreate(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . C-4
rms_prgdestroy(3) . . . . . . . . . . . . . . . . . . . . . . . . . . C-4
rms_prgids(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-6
rms_prginfo(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-6
rms_getprgid(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-6
rms_prgsuspend(3) . . . . . . . . . . . . . . . . . . . . . . . . . . C-8
rms_prgresume(3) . . . . . . . . . . . . . . . . . . . . . . . . . . C-8
rms_prgsignal(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . C-8
rms_prgaddcap(3) . . . . . . . . . . . . . . . . . . . . . . . . . . C-10
rms_setcap(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-10
rms_ncaps(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-12
rms_getcap(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C-12
rms_prggetstats(3) . . . . . . . . . . . . . . . . . . . . . . . . . . C-13
D RMS Application Interface D-1
D.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1
rms_allocateResource(3) . . . . . . . . . . . . . . . . . . . . . . . D-2
rms_deallocateResource(3) . . . . . . . . . . . . . . . . . . . . . D-2
rms_run(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-4
rms_suspendResource(3) . . . . . . . . . . . . . . . . . . . . . . D-6
rms_resumeResource(3) . . . . . . . . . . . . . . . . . . . . . . . D-6
rms_killResource(3) . . . . . . . . . . . . . . . . . . . . . . . . . D-6
rms_defaultPartition(3) . . . . . . . . . . . . . . . . . . . . . . . D-7
rms_numCpus(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . D-7
rms_numNodes(3) . . . . . . . . . . . . . . . . . . . . . . . . . . D-7
rms_freeCpus(3) . . . . . . . . . . . . . . . . . . . . . . . . . . . D-7
E Accounting Summary Script E-1
E.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-1
E.2 Command Line Interface . . . . . . . . . . . . . . . . . . . . . . . . . E-1
E.3 Example Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-2
E.4 Listing of the Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-3
Glossary Glossary-1
Index Index-1
Contents vii

List of Figures

2.1 A Network of Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
2.2 High Availability RMS Configuration . . . . . . . . . . . . . . . . . . . . 2-3
2.3 The Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6
2.4 Partitioning a System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.5 Distribution of Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
2.6 Preemption of Low Priority Jobs . . . . . . . . . . . . . . . . . . . . . . . 2-9
2.7 Two Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
3.1 Distribution of Parallel Processes . . . . . . . . . . . . . . . . . . . . . . 3-2
3.2 Loading and Running a Parallel Program . . . . . . . . . . . . . . . . . 3-3
A.1 A 2-Stage, 16-Node, Switch Network . . . . . . . . . . . . . . . . . . . . A-2
A.2 A 3-Stage, 64-Node, Switch Network . . . . . . . . . . . . . . . . . . . . A-2
A.3 A 3-Stage, 128-Node, Switch Network . . . . . . . . . . . . . . . . . . . A-3
List of Figures i

List of Tables

10.1 Access Controls Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4
10.2 Accounting Statistics Table . . . . . . . . . . . . . . . . . . . . . . . . . 10-5
10.3 Machine Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-6
10.4 Performance Statistics Attributes . . . . . . . . . . . . . . . . . . . . . . 10-7
10.5 Server Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-7
10.6 Scheduling Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-8
10.7 Elans Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-8
10.8 Elites Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-9
10.9 Events Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-9
10.10 Example of Status Changes . . . . . . . . . . . . . . . . . . . . . . . . . 10-10
10.11 Event Handlers Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-10
10.12 Fields Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-11
10.13 Type Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-11
10.14 Installed Components Table . . . . . . . . . . . . . . . . . . . . . . . . . 10-12
10.15 Jobs Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-12
10.16 Link Errors Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-13
10.17 Modules Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-14
10.18 Module Types Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-15
10.19 Valid Module Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-15
10.20 Nodes Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-16
10.21 Node Statistics Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-17
List of Tables i
10.22 Partitions Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-18
10.23 Projects Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-19
10.24 Resources Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-19
10.25 Servers Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-20
10.26 Services Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-21
10.27 Entries in the Services Table . . . . . . . . . . . . . . . . . . . . . . . . . 10-22
10.28 Software Products Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-22
10.29 Component Attribute Values . . . . . . . . . . . . . . . . . . . . . . . . . 10-22
10.30 Switch Boards Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23
10.31 Transaction Log Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-23
10.32 Entry in the Transactions Table . . . . . . . . . . . . . . . . . . . . . . . 10-24
10.33 Users Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-24
A.1 Switch Network Parameters . . . . . . . . . . . . . . . . . . . . . . . . . A-3
B.1 Job Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-2
B.2 Link Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
B.3 Module Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-3
B.4 Node Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-4
B.5 Run Level Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5
B.6 Partition Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-5
B.7 Resource Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-6
B.8 Transaction Status Values . . . . . . . . . . . . . . . . . . . . . . . . . . B-6
ii List of Tables
1.1 Scope of Manual
This manual describes the Resource Management System (RMS). The manual’s purpose is to provide a technical overview of the RMS system, its functionality and programmable interfaces. It covers the RMS daemons, client applications, the RMS database, the system call interface to the RMS kernel module and the application program interface to the RMS database.
1.2 Audience
This manual is intended for system administrators and developers. It provides a detailed technical description of the operation and features of RMS and describes the programming interface between RMS and third-party systems.
1

Introduction

The manual assumes that the reader is familiar with the following:
R
UNIX
C programming language
operating system including shell scripts
1.3 Using this Manual
This manual contains ten chapters and five appendices. The contents of these are as follows:
1-1
Related Information
Chapter 1 (Introduction)
Chapter 2 (Overview of RMS)
Chapter 3 (Parallel Programs Under RMS)
Chapter 4 (RMS Daemons)
Chapter 5 (RMS Commands)
Chapter 6 (Access Control, Usage Limits and Accounting)
Chapter 7 (RMS Scheduling)
explains the layout of the manual and the conventions used to present information
overviews the functions of the RMS and introduces its components
shows how parallel programs are executed under RMS
describes the functionality of the RMS daemons
describes the RMS commands
explains RMS access controls, usage limits and accounting
describes how RMS schedules parallel jobs
Chapter 8 (Event Handling)
describes RMS event handling
Chapter 9 (Setting up RMS)
explains how to set up RMS
Chapter 10 (The RMS Database)
presents the structure of tables in the RMS database
Appendix A (Compaq AlphaServer SC Interconnect Terms)
defines terms relating to support for QsNet in RMS
Appendix B (RMS Status Values)
lists the status values of RMS objects
Appendix C (RMS Kernel Module)
describes the RMS kernel module and its system call interface
Appendix D (RMS Application Interface)
describes the RMS application interface
Appendix E (Accounting Summary Script)
contains an example of producing accounting information
1-2 Introduction
1.4 Related Information
The following manuals provide additional information about the RMS from the point of view of either the system administrator or the user:
Compaq AlphaServer SC User Guide
Compaq AlphaServer SC System Administration Guide
1.5 Location of Online Documentation
Online documentation in HTML format is installed in the directory
/usr/opt/rms/docs/html and can be accessed from a browser at http://rmshost:8081/html/index.html. PostScript and PDF versions of the
documents are in /usr/opt/rms/docs. Please consult your system administrator if you have difficulty accessing the documentation. On-line documentation can also be found on the AlphaServer SC System Software CD-ROM.
New versions of this and other Quadrics documentation can be found on the Quadrics web site http://www.quadrics.com.
Further information on AlphaServer SC can be found on the Compaq website
http://www.compaq.com/hpc.

Conventions

1.6 Reader’s Comments
If you would like to make any comments on this or any other AlphaServer SC manual please contact your local Compaq support centre.
1.7 Conventions
The following typographical conventions have been used in this document:
monospace type
Monospace type denotes literal text. This is used for command descriptions, file names and examples of output.
bold monospace type
Bold monospace type indicates text that the user enters when contrasted with on-screen computer output.
Introduction 1-3
Conventions
italic monospace type
italic type Italic (slanted) proportional type is used in the text to introduce new
Ctrl/x This symbol indicates that you hold down the Ctrl key while you
TLA Small capital letters indicate an abbreviation (see Glossary).
ls(1) A cross-reference to a reference page includes the appropriate section
# A number sign represents the superuser prompt.
%, $ A percent sign represents the C shell system prompt. A dollar sign
Italic (slanted) monospace type denotes some meta text. This is used most often in command or parameter descriptions to show where a textual value is to be substituted.
terms. It is also used when referring to labels on graphical elements such as buttons.
press another key or mouse button (shown here by x).
number in parentheses.
represents the system prompt for the Bourne, Korn, and POSIX shells.
1-4 Introduction
2.1 Introduction
This chapter describes the role of the Resource Management System (RMS). The RMS provides tools for the management and use of a Compaq AlphaServer SC system. To put into context the functions that RMS performs, a brief overview of the system architecture is given first in Section 2.2. Section 2.3 outlines the main functions of the RMS and introduces the major components of the RMS: a set of UNIX daemons, a suite of command line utilities and a SQL database. Finally, Section 2.4 describes the resource management facilities from the system administrator’s point of view.
2.2 The System Architecture
2

Overview of RMS

An RMS system looks like a standard UNIX system: it has the familiar command shells, editors, compilers, linkers and libraries; it runs the same applications. The RMS system differs from the conventional UNIX one in that it can run parallel applications as well as sequential ones. The processes that execute on the system, particularly the parallel programs, are controlled by the RMS.
2.2.1 Nodes

An RMS system comprises a network of computers (referred to as nodes) as shown in

Figure 2.1
each node runs a single copy of UNIX. Nodes used interactively to login to the RMS
. Each node may have single or multiple processors (such as a SMP server);
Overview of RMS 2-1
The System Architecture
system are also connected to an external LAN. The application nodes, used for running parallel programs, are accessed through the RMS.
Figure 2.1: A Network of Nodes
SwitchNetworkControl
QM-S16 Switch
SwitchNetwork
...
Terminal Concentrator
ManagementNetwork
InteractiveNodes withLAN/FDDI Interface
ApplicationNodes
All of the nodes are connected to a management network (normally, a 100 BaseT Ethernet). They may also be connected to a Compaq AlphaServer SC Interconnect, to provide high-performance user-space communications between application processes.
The RMS processes that manage the system reside either on an interactive node or on a separate management server. This node, known as rmshost, holds the RMS database, which stores all state for the RMS system.
For high-availability installations, the rmshost node should be an interactive node rather than a management server. This will allow you to configure the system for failover, as shown in Figure 2.2 (see Chapter 15 of the System Administration Guide for details).
2-2 Overview of RMS

The Role of the RMS

Figure 2.2: High Availability RMS Configuration
RMSHost BackupRMSHost
RMSDatabase
The RMS processes run on the node with the name rmshost, which migrates to the backup on fail-over. The database is held on a shared disk, accessible to both the primary and backup node.
2.3 The Role of the RMS
The RMS provides a single point interface to the system for resource management. This interface enables a system administrator to manage the system resources (CPUs, memory, disks, and so on) effectively and easily. The RMS includes facilities for the following administrative functions:
Monitoring controlling and monitoring the nodes in the network to ensure the
correct operation of the hardware
Fault diagnosis diagnosing faults and isolating errors; instigating fault recovery
and escalation procedures
Data collection recording statistics on system performance
Allocating CPUs allocating system resources to applications
Access control controlling user access to resources
Accounting single point for collecting accounting data
Parallel jobs providing the system support required to run parallel programs
Overview of RMS 2-3
The Role of the RMS
Scheduling deciding when and where to run parallel jobs
Audit maintaining an audit trail of system state changes
From the user’s point of view, RMS provides tools for:
Information querying the resources of the system
Execution loading and running parallel programs on a given set of resources
Monitoring monitoring the execution of parallel programs
2.3.1 The Structure of the RMS
RMS is implemented as a set of UNIX commands and daemons, programmed in C and
C++, using sockets for communications. All of the details of the system (its configuration, its current state, usage statistics) are maintained in a SQL database, as shown in
Chapter 10 (The RMS Database) for details of the database.
Figure 2.3. See Section 2.3.4 for an overview and
2.3.2 The RMS Daemons
A set of daemons provide the services required for managing the resources of the system. To do this, the daemons both query and update the database (see Section 2.3.4).
The Database Manager, msqld, provides SQL database services.
The Machine Manager, mmanager, monitors the status of nodes in an RMS system.
The Partition Manager, pmanager, controls the allocation of resources to users and
the scheduling of parallel programs.
The Switch Network Manager, swmgr, supervises the operation of the Compaq
AlphaServer SC Interconnect, monitoring it for errors and collecting performance data.
The Event Manager, eventmgr, runs handlers in response to system incidents and
notifies clients who have registered an interest in them.
The Transaction Log Manager, tlogmgr, instigates database transactions that have
been requested in the Transaction Log. All client transactions are made through this mechanism. This ensures that changes to the database are serialized and an audit trail is kept.
The Process Manager, rmsmhd, runs on each node in the system. It starts the other
RMS daemons.
2-4 Overview of RMS
The RMS Daemon, rmsd, runs on each node in the system. It loads and runs user
processes and monitors resource usage and system performance.
The RMS daemons are described in more detail in Chapter 4 (RMS Daemons).
2.3.3 The RMS Commands
RMS commands call on the RMS daemons to get information about the system, to
distribute work across the system, to monitor the state of programs and, in the case of administrators, to configure the system and back it up. A suite of these RMS client applications is supplied. There are commands for users and commands for system administrators.
The user commands for gaining access to the system and running parallel programs are as follows:
allocate reserves resources for a user.
prun loads and runs parallel programs.
rinfo gets information about the resources in the system.
rmsexec performs load balancing for the efficient execution of sequential programs.
The Role of the RMS
rmsquery queries the database. Administrators can also use rmsquery to update
the database.
The system administration commands for managing the system are as follows:
nodestatus gets and sets node status information.
rcontrol starts, stops and reconfigures services.
rmsbuild populates the RMS database with information on a given system.
rmsctl starts and stops RMS and shows the system status.
rmshost reports the name of the node hosting the RMS database.
rmstbladm builds and maintains the database.
msqladmin performs database server administration.
The services available to the different types of user (application programmer, operator, system administrator) are subject to access control. Access control restrictions are embedded in the SQL database, based on standard UNIX group IDs (see
Overview of RMS 2-5
RMS Management Functions
Section 10.2.20). Users have read access to all tables but no write access. Operator and
administrative applications are granted limited write access. Password-protected administrative applications and RMS itself have full read/write access.
The RMS commands are described in more detail in Chapter 5 (RMS Commands).
2.3.4 The RMS Database
The database provides a platform-independent interface to the RMS system. Users and administrators can interact with the database using standard SQL queries. For example, the following query displays details about the nodes in the machine. It selects fields from the table called nodes (see Section 10.2.14). The query is submitted through the
RMS client rmsquery.
$ rmsquery "select name,status from nodes" atlasms running atlas0 running atlas1 running atlas2 running atlas3 running
Figure 2.3: The Database
RMS uses the mSQL database engine from Hughes Technologies (for details see
http://www.Hughes.com.au). Client applications may use C, C++, Java, HTML or
UNIX script interfaces to generate SQL queries. See the Quadrics support page
http://www.quadrics.com/web/support for details of the SQL language.
2-6 Overview of RMS
NodeConfiguration
NetworkConfiguration
AccessControl
ResourceQuotas
Accounting
Auditing
UsageStatistics
SystemState
InternalSupport
2.4 RMS Management Functions
The RMS gives the system administrator control over how the resources of a system are assigned to the tasks it must perform. This includes the allocation of resources (Section 2.4.1), scheduling policies (Section 2.4.2), access controls and accounting (Section 2.4.3) and system configuration (Section 2.4.4).
2.4.1 Allocating Resources
The nodes in an RMS system can be configured into mutually exclusive sets known as partitions as shown in Figure 2.4. The administrator can create partitions with different mixes of resources to support a range of uses. For example, a system may have to cater for a variety of processing loads, including the following:
Interactive login sessions for conventional UNIX processes
Parallel program development
Production execution of parallel programs
Distributed system services, such as database or file system servers, used by
conventional UNIX processes

RMS Management Functions

Sequential batch streams
Figure 2.4: Partitioning a System
Login
Parallel
Sequential
batch
The system administrator can allocate a partition with appropriate resources for each of these tasks. Furthermore, the administrator can control who accesses the partitions (by user or by project) and how much of the resource they can consume. This ensures that resources intended for a particular purpose, for example, running production parallel codes, are not diverted to other uses, for example, running user shells.
Overview of RMS 2-7
RMS Management Functions
A further partition, the root partition, is always present. It includes all nodes. It does not have a scheduler. The root partition can only be used by administrative users (root and rms by default).
2.4.2 Scheduling
Partitions enable different scheduling policies to be put into action. On each partition, one or more of three scheduling policies can be deployed to suit the intended usage:
1. Gang scheduling of parallel programs, where all processes in a program are scheduled and de-scheduled together. This is the default scheduling policy for parallel partitions.
2. Regular UNIX scheduling with the addition of load balancing, whereby the user can run a sequential program on a lightly loaded node. The load may be judged in terms of free CPU time, free memory or number of users.
3. Batch scheduling, where the use of resources is controlled by a batch system.
Scheduling parameters such as time limits, time slice interval and minimum request size are applied on an individual partition basis. Default priorities, memory limits and
CPU usage limits can be applied to users or projects to tune the partition’s workload. For
details see Chapter 6 (Access Control, Usage Limits and Accounting) and
Chapter 7 (RMS Scheduling).
The partition shown in Figure 2.5 has its CPUs allocated to five parallel jobs. The jobs have been allocated CPUs in two different ways: jobs 1 and 2 use all of the CPUs on each node; jobs 3, 4 and 5 are running with only one or two CPUs per node. RMS allows the user to specify how their job will be laid out, trading off the competing benefits of increased locality on the one hand against increased total memory size on the other. With this allocation of resources, all five parallel programs can run concurrently on the partition.
Figure 2.5: Distribution of Processes
4CPUs
2-8 Overview of RMS
0
4
8
12
0
4
1
5
9
13
1
5
2
6
10
14
2
6
0011223
3
4
5
6
7
Job3
4
5
6
7
Job4
0
2
4
6
Job5
3
7
11
15
3
Job1
Job2
7
16Nodes
1
3
5
7
RMS Management Functions
The RMS scheduler allocates contiguous ranges of nodes with a given number of CPUs per node1. Where possible each resource request is met by allocating a single range of nodes. If this is not possible, an unconstrained request (those that only specify the number of CPUs required) may be satisfied by allocating CPUs on disjoint nodes. This ensures that an unconstrained resource request can utilize all of the available CPUs.
The scheduler attempts to find free CPUs for each request. If this is not possible, the request blocks until CPUs are available. RMS preempts programs when a higher priority job is submitted, as shown in Figure 2.6. Initially, CPUs have been allocated for resource requests 1 and 2. When the higher priority resource request 3 is submitted, 1 and 2 are suspended; 3 runs to completion after which 1 and 2 are restarted.
Figure 2.6: Preemption of Low Priority Jobs
startjobs
Resource1
0 1 2
Resource2
0 2 4 6
3 4 5
suspendjobs
startjob
jobends
resumejobs
Resource1
0 1 2
3 4 5
2.4.3 Access Control and Accounting
Users are allocated resources on a per-partition basis. Resources in this context include both CPUs and memory. The system administrator can control access to resources both at the individual user level and at the project level (where a project is a list of users). This means that default access controls can be set up at the project level and overridden on an individual user basis as required. The access controls mechanism is described in
1
The scheduler allocates contiguous ranges of nodes so that processes may take advantage of the Compaq AlphaServer SC Interconnect hardware support for broadcast and barrier operations which operate over a contiguous range of network addresses.
1 3 5 7
Resource3
2 310 4 5 6 7
10
98
12
11
0 2 4 6
1 3 5 7
13 14
Resource2
15
Overview of RMS 2-9
RMS Management Functions
detail in Chapter 6 (Access Control, Usage Limits and Accounting). Each partition, except the root partition, is managed by a Partition Manager (see
Section 4.4), which mediates user requests, checking access permissions and usage
limits before scheduling CPUs and starting user jobs.
An accounting record is created as CPUs are allocated to each request. It is updated periodically until the resources are freed. The accounting record itemizes CPU and memory usage, indexed by job, by user and by project.
2.4.4 RMS Configuration
The set of partitions active at any time is known as a configuration. A system will normally have a number of configurations, each appropriate to a particular operating pattern. For example, there may be one configuration for normal working hours and another for night time and weekend operation.
The CPUs allocated to a partition may vary between configurations. For example, a login partition (nodes allocated for interactive use) may have more nodes allocated during working hours than at night – it may even be absent from the night time configuration. A pair of configurations are shown in Figure 2.7.
Figure 2.7: Two Configurations
16nodes,4CPUspernode
Day
Login Development
Night
Parallel
RMS supports automated reconfiguration at shift changes as well as dynamic
Parallel
reconfiguration in response to a request from an operator or administrator. The RMS client rcontrol (Page 5-20) manages the switch-over from one configuration to another. For automatic reconfiguration, rcontrol can be invoked from a cron job.
2-10 Overview of RMS
3.1 Introduction
RMS provides users with tools for running parallel programs and monitoring their
execution, as described in Chapter 5 (RMS Commands). Users can determine what resources are available to them and request allocation of the CPUs and memory required to run their programs. This chapter describes the structure of parallel programs under
RMS and how they are run.
A parallel program consists of a controlling process, prun, and a number of application processes distributed over one or more nodes. Each process may have multiple threads running on one or more CPUs. prun can run on any node in the system but it normally runs in a login partition or on an interactive node.
3

Parallel Programs Under RMS

In a system with SMP nodes, RMS can allocate CPUs so as to use all of the CPUs on the minimum number of nodes (a block distribution); alternatively, it can allocate a specified number of CPUs on each node (a cyclic distribution). This flexibility allows users to choose between the competing benefits of increased CPU count and memory size on each node (generally good for multithreaded applications) and increased numbers of nodes (generally best for applications requiring increased total memory size, memory bandwidth and I/O bandwidth).
Parallel programs can be written so that they will run with varying numbers of CPUs and varying numbers of CPUs per node. They can, for example, query the number of processors allocated and determine their data distributions and communications patterns accordingly (see Appendix C (RMS Kernel Module) for details).
Parallel Programs Under RMS 3-1

Resource Requests

3.2 Resource Requests
Having logged into the system, a user makes a request for the resources needed to run a parallel program by using the RMS commands prun (see Page 5-11) or allocate (see
Page 5-3). When using the prun command, the request can specify details such as the
following:
The partition on which to run the program (the -p option)
The number of processes to run (the -n option)
The number of nodes required (the -N option)
The number of CPUs required per process (the -c option)
The memory required per process (the RMS_MEMLIMITenvironment variable)
The distribution of processes over the nodes (the -m, -B and -R options)
How standard input, output and error streams should be handled (the -i, -o and -e
options)
The project to which the program belongs for accounting and scheduling purposes
(the -P option)
Two variants of a program with eight processes are shown in Figure 3.1: first, with one process per node; and then, with two processes per node.
Figure 3.1: Distribution of Parallel Processes
0 1 2 3 4 5 6 7
0 1
4 5
3-2 Parallel Programs Under RMS
1ProcessPerNode
2 3
6 7
2ProcessesPerNode
Loading...
+ 164 hidden pages