Quadrics Supercomputers World Ltd.Document Version 7 - June 22nd 2001 - AA-RLAZB-TE
The information supplied in this document is believed to be correct at the time of publication, but no liability is assumed for its use or for the infringements of the rights of others
resulting from its use. No license or other rights are granted in respect of any rights
owned by any of the organizations mentioned herein.
This document may not be copied, in whole or in part, without the prior written consent
of Quadrics Supercomputers World Ltd.
Copyright 1998,1999,2000,2001 Quadrics Supercomputers World Ltd.
The specifications listed in this document are subject to change without notice.
Compaq, the Compaq logo, Alpha, AlphaServer, and Tru64 are trademarks of Compaq
Information Technologies Group, L.P. in the United States and other countries.
UNIX is a registered trademark of The Open Group in the U.S. and other countries.
TotalView and Etnus are registered trademarks of Etnus LLC.
All other product names mentioned herein may be trademarks of their respective companies.
The Quadrics Supercomputers World Ltd. (Quadrics) web site can be found at:
http://www.quadrics.com/
Quadrics’ address is:
One Bridewell Street
Bristol
BS1 2AA
UK
Tel: +44-(0)117-9075375
Fax: +44-(0)117-9075395
Circulation Control: None
Document Revision History
RevisionDateAuthorRemarks
1January 1999HRAInitial Draft
2Feb 2000DRUpdated Draft
3Apr 2000DRDraft changes for Product Release
4Jun 2000RMCCorrections for Product Release
5Jan 2001HRAUpdates for Version 2
6June 2001DRFurther Version 2 changes
7June 2001DRAlphaServer SC V2 Product Release
This manual describes the Resource Management System (RMS). The manual’s purpose
is to provide a technical overview of the RMS system, its functionality and
programmable interfaces. It covers the RMS daemons, client applications, the RMS
database, the system call interface to the RMS kernel module and the application
program interface to the RMS database.
1.2 Audience
This manual is intended for system administrators and developers. It provides a
detailed technical description of the operation and features of RMS and describes the
programming interface between RMS and third-party systems.
1
Introduction
The manual assumes that the reader is familiar with the following:
R
• UNIX
• C programming language
operating system including shell scripts
1.3 Using this Manual
This manual contains ten chapters and five appendices. The contents of these are as
follows:
1-1
Related Information
Chapter 1 (Introduction)
Chapter 2 (Overview of RMS)
Chapter 3 (Parallel Programs Under RMS)
Chapter 4 (RMS Daemons)
Chapter 5 (RMS Commands)
Chapter 6 (Access Control, Usage Limits and Accounting)
Chapter 7 (RMS Scheduling)
explains the layout of the manual and the conventions used to present
information
overviews the functions of the RMS and introduces its components
shows how parallel programs are executed under RMS
describes the functionality of the RMS daemons
describes the RMS commands
explains RMS access controls, usage limits and accounting
describes how RMS schedules parallel jobs
Chapter 8 (Event Handling)
describes RMS event handling
Chapter 9 (Setting up RMS)
explains how to set up RMS
Chapter 10 (The RMS Database)
presents the structure of tables in the RMS database
Appendix A (Compaq AlphaServer SC Interconnect Terms)
defines terms relating to support for QsNet in RMS
Appendix B (RMS Status Values)
lists the status values of RMS objects
Appendix C (RMS Kernel Module)
describes the RMS kernel module and its system call interface
Appendix D (RMS Application Interface)
describes the RMS application interface
Appendix E (Accounting Summary Script)
contains an example of producing accounting information
1-2 Introduction
1.4 Related Information
The following manuals provide additional information about the RMS from the point of
view of either the system administrator or the user:
•
Compaq AlphaServer SC User Guide
• Compaq AlphaServer SC System Administration Guide
1.5 Location of Online Documentation
Online documentation in HTML format is installed in the directory
/usr/opt/rms/docs/html and can be accessed from a browser at
http://rmshost:8081/html/index.html. PostScript and PDF versions of the
documents are in /usr/opt/rms/docs. Please consult your system administrator if
you have difficulty accessing the documentation. On-line documentation can also be
found on the AlphaServer SC System Software CD-ROM.
New versions of this and other Quadrics documentation can be found on the Quadrics
web site http://www.quadrics.com.
Further information on AlphaServer SC can be found on the Compaq website
http://www.compaq.com/hpc.
Conventions
1.6 Reader’s Comments
If you would like to make any comments on this or any other AlphaServer SC manual
please contact your local Compaq support centre.
1.7 Conventions
The following typographical conventions have been used in this document:
monospace type
Monospace type denotes literal text. This is used for command
descriptions, file names and examples of output.
bold monospace type
Bold monospace type indicates text that the user enters when
contrasted with on-screen computer output.
Introduction 1-3
Conventions
italic monospace type
italic typeItalic (slanted) proportional type is used in the text to introduce new
Ctrl/xThis symbol indicates that you hold down the Ctrl key while you
TLASmall capital letters indicate an abbreviation (see Glossary).
ls(1)A cross-reference to a reference page includes the appropriate section
#A number sign represents the superuser prompt.
%, $A percent sign represents the C shell system prompt. A dollar sign
Italic (slanted) monospace type denotes some meta text. This is used
most often in command or parameter descriptions to show where a
textual value is to be substituted.
terms. It is also used when referring to labels on graphical elements
such as buttons.
press another key or mouse button (shown here by x).
number in parentheses.
represents the system prompt for the Bourne, Korn, and POSIX
shells.
1-4 Introduction
2.1 Introduction
This chapter describes the role of the Resource Management System (RMS). The RMS
provides tools for the management and use of a Compaq AlphaServer SC system. To put
into context the functions that RMS performs, a brief overview of the system architecture
is given first in Section 2.2. Section 2.3 outlines the main functions of the RMS and
introduces the major components of the RMS: a set of UNIX daemons, a suite of
command line utilities and a SQL database. Finally, Section 2.4 describes the resource
management facilities from the system administrator’s point of view.
2.2 The System Architecture
2
Overview of RMS
An RMS system looks like a standard UNIX system: it has the familiar command shells,
editors, compilers, linkers and libraries; it runs the same applications. The RMS system
differs from the conventional UNIX one in that it can run parallel applications as well as
sequential ones. The processes that execute on the system, particularly the parallel
programs, are controlled by the RMS.
2.2.1 Nodes
An RMS system comprises a network of computers (referred to as nodes) as shown in
Figure 2.1
each node runs a single copy of UNIX. Nodes used interactively to login to the RMS
. Each node may have single or multiple processors (such as a SMP server);
Overview of RMS 2-1
The System Architecture
system are also connected to an external LAN. The application nodes, used for running
parallel programs, are accessed through the RMS.
Figure 2.1: A Network of Nodes
SwitchNetworkControl
QM-S16
Switch
SwitchNetwork
...
Terminal
Concentrator
ManagementNetwork
InteractiveNodes
withLAN/FDDI
Interface
ApplicationNodes
All of the nodes are connected to a management network (normally, a 100 BaseT
Ethernet). They may also be connected to a Compaq AlphaServer SC Interconnect, to
provide high-performance user-space communications between application processes.
The RMS processes that manage the system reside either on an interactive node or on a
separate management server. This node, known as rmshost, holds the RMS database,
which stores all state for the RMS system.
For high-availability installations, the rmshost node should be an interactive node
rather than a management server. This will allow you to configure the system for
failover, as shown in Figure 2.2 (see Chapter 15 of the System Administration Guide for
details).
2-2 Overview of RMS
The Role of the RMS
Figure 2.2: High Availability RMS Configuration
RMSHostBackupRMSHost
RMSDatabase
The RMS processes run on the node with the name rmshost, which migrates to the
backup on fail-over. The database is held on a shared disk, accessible to both the
primary and backup node.
2.3 The Role of the RMS
The RMS provides a single point interface to the system for resource management. This
interface enables a system administrator to manage the system resources (CPUs,
memory, disks, and so on) effectively and easily. The RMS includes facilities for the
following administrative functions:
Monitoringcontrolling and monitoring the nodes in the network to ensure the
correct operation of the hardware
Fault diagnosisdiagnosing faults and isolating errors; instigating fault recovery
and escalation procedures
Data collectionrecording statistics on system performance
Allocating CPUsallocating system resources to applications
Access controlcontrolling user access to resources
Accountingsingle point for collecting accounting data
Parallel jobsproviding the system support required to run parallel programs
Overview of RMS 2-3
The Role of the RMS
Schedulingdeciding when and where to run parallel jobs
Auditmaintaining an audit trail of system state changes
From the user’s point of view, RMS provides tools for:
Informationquerying the resources of the system
Executionloading and running parallel programs on a given set of resources
Monitoringmonitoring the execution of parallel programs
2.3.1 The Structure of the RMS
RMS is implemented as a set of UNIX commands and daemons, programmed in C and
C++, using sockets for communications. All of the details of the system (its
configuration, its current state, usage statistics) are maintained in a SQL database, as
shown in
Chapter 10 (The RMS Database) for details of the database.
Figure 2.3. See Section 2.3.4 for an overview and
2.3.2 The RMS Daemons
A set of daemons provide the services required for managing the resources of the system.
To do this, the daemons both query and update the database (see Section 2.3.4).
• The Database Manager, msqld, provides SQL database services.
• The Machine Manager, mmanager, monitors the status of nodes in an RMS system.
• The Partition Manager, pmanager, controls the allocation of resources to users and
the scheduling of parallel programs.
• The Switch Network Manager, swmgr, supervises the operation of the Compaq
AlphaServer SC Interconnect, monitoring it for errors and collecting performance
data.
• The Event Manager, eventmgr, runs handlers in response to system incidents and
notifies clients who have registered an interest in them.
• The Transaction Log Manager, tlogmgr, instigates database transactions that have
been requested in the Transaction Log. All client transactions are made through this
mechanism. This ensures that changes to the database are serialized and an audit
trail is kept.
• The Process Manager, rmsmhd, runs on each node in the system. It starts the other
RMS daemons.
2-4 Overview of RMS
• The RMS Daemon, rmsd, runs on each node in the system. It loads and runs user
processes and monitors resource usage and system performance.
The RMS daemons are described in more detail in Chapter 4 (RMS Daemons).
2.3.3 The RMS Commands
RMS commands call on the RMS daemons to get information about the system, to
distribute work across the system, to monitor the state of programs and, in the case of
administrators, to configure the system and back it up. A suite of these RMS client
applications is supplied. There are commands for users and commands for system
administrators.
The user commands for gaining access to the system and running parallel programs are
as follows:
• allocate reserves resources for a user.
• prun loads and runs parallel programs.
• rinfo gets information about the resources in the system.
• rmsexec performs load balancing for the efficient execution of sequential programs.
The Role of the RMS
• rmsquery queries the database. Administrators can also use rmsquery to update
the database.
The system administration commands for managing the system are as follows:
• nodestatus gets and sets node status information.
• rcontrol starts, stops and reconfigures services.
• rmsbuild populates the RMS database with information on a given system.
• rmsctl starts and stops RMS and shows the system status.
• rmshost reports the name of the node hosting the RMS database.
• rmstbladm builds and maintains the database.
• msqladmin performs database server administration.
The services available to the different types of user (application programmer, operator,
system administrator) are subject to access control. Access control restrictions are
embedded in the SQL database, based on standard UNIX group IDs (see
Overview of RMS 2-5
RMS Management Functions
Section 10.2.20). Users have read access to all tables but no write access. Operator and
administrative applications are granted limited write access. Password-protected
administrative applications and RMS itself have full read/write access.
The RMS commands are described in more detail in Chapter 5 (RMS Commands).
2.3.4 The RMS Database
The database provides a platform-independent interface to the RMS system. Users and
administrators can interact with the database using standard SQL queries. For example,
the following query displays details about the nodes in the machine. It selects fields
from the table called nodes (see Section 10.2.14). The query is submitted through the
RMS uses the mSQL database engine from Hughes Technologies (for details see
http://www.Hughes.com.au). Client applications may use C, C++, Java, HTML or
UNIX script interfaces to generate SQL queries. See the Quadrics support page
http://www.quadrics.com/web/support for details of the SQL language.
2-6 Overview of RMS
NodeConfiguration
NetworkConfiguration
AccessControl
ResourceQuotas
Accounting
Auditing
UsageStatistics
SystemState
InternalSupport
2.4 RMS Management Functions
The RMS gives the system administrator control over how the resources of a system are
assigned to the tasks it must perform. This includes the allocation of resources
(Section 2.4.1), scheduling policies (Section 2.4.2), access controls and accounting
(Section 2.4.3) and system configuration (Section 2.4.4).
2.4.1 Allocating Resources
The nodes in an RMS system can be configured into mutually exclusive sets known as
partitions as shown in Figure 2.4. The administrator can create partitions with different
mixes of resources to support a range of uses. For example, a system may have to cater
for a variety of processing loads, including the following:
• Interactive login sessions for conventional UNIX processes
• Parallel program development
• Production execution of parallel programs
• Distributed system services, such as database or file system servers, used by
conventional UNIX processes
RMS Management Functions
• Sequential batch streams
Figure 2.4: Partitioning a System
Login
Parallel
Sequential
batch
The system administrator can allocate a partition with appropriate resources for each of
these tasks. Furthermore, the administrator can control who accesses the partitions (by
user or by project) and how much of the resource they can consume. This ensures that
resources intended for a particular purpose, for example, running production parallel
codes, are not diverted to other uses, for example, running user shells.
Overview of RMS 2-7
RMS Management Functions
A further partition, the root partition, is always present. It includes all nodes. It does
not have a scheduler. The root partition can only be used by administrative users (root
and rms by default).
2.4.2 Scheduling
Partitions enable different scheduling policies to be put into action. On each partition,
one or more of three scheduling policies can be deployed to suit the intended usage:
1. Gang scheduling of parallel programs, where all processes in a program are
scheduled and de-scheduled together. This is the default scheduling policy for parallel
partitions.
2. Regular UNIX scheduling with the addition of load balancing, whereby the user can
run a sequential program on a lightly loaded node. The load may be judged in terms
of free CPU time, free memory or number of users.
3. Batch scheduling, where the use of resources is controlled by a batch system.
Scheduling parameters such as time limits, time slice interval and minimum request
size are applied on an individual partition basis. Default priorities, memory limits and
CPU usage limits can be applied to users or projects to tune the partition’s workload. For
details see Chapter 6 (Access Control, Usage Limits and Accounting) and
Chapter 7 (RMS Scheduling).
The partition shown in Figure 2.5 has its CPUs allocated to five parallel jobs. The jobs
have been allocated CPUs in two different ways: jobs 1 and 2 use all of the CPUs on each
node; jobs 3, 4 and 5 are running with only one or two CPUs per node. RMS allows the
user to specify how their job will be laid out, trading off the competing benefits of
increased locality on the one hand against increased total memory size on the other.
With this allocation of resources, all five parallel programs can run concurrently on the
partition.
Figure 2.5: Distribution of Processes
4CPUs
2-8 Overview of RMS
0
4
8
12
0
4
1
5
9
13
1
5
2
6
10
14
2
6
0011223
3
4
5
6
7
Job3
4
5
6
7
Job4
0
2
4
6
Job5
3
7
11
15
3
Job1
Job2
7
16Nodes
1
3
5
7
RMS Management Functions
The RMS scheduler allocates contiguous ranges of nodes with a given number of CPUs
per node1. Where possible each resource request is met by allocating a single range of
nodes. If this is not possible, an unconstrained request (those that only specify the
number of CPUs required) may be satisfied by allocating CPUs on disjoint nodes. This
ensures that an unconstrained resource request can utilize all of the available CPUs.
The scheduler attempts to find free CPUs for each request. If this is not possible, the
request blocks until CPUs are available. RMS preempts programs when a higher priority
job is submitted, as shown in Figure 2.6. Initially, CPUs have been allocated for resource
requests 1 and 2. When the higher priority resource request 3 is submitted, 1 and 2 are
suspended; 3 runs to completion after which 1 and 2 are restarted.
Figure 2.6: Preemption of Low Priority Jobs
startjobs
Resource1
012
Resource2
0246
345
suspendjobs
startjob
jobends
resumejobs
Resource1
012
345
2.4.3 Access Control and Accounting
Users are allocated resources on a per-partition basis. Resources in this context include
both CPUs and memory. The system administrator can control access to resources both
at the individual user level and at the project level (where a project is a list of users).
This means that default access controls can be set up at the project level and overridden
on an individual user basis as required. The access controls mechanism is described in
1
The scheduler allocates contiguous ranges of nodes so that processes may take advantage of the Compaq
AlphaServer SC Interconnect hardware support for broadcast and barrier operations which operate over a
contiguous range of network addresses.
1357
Resource3
23104567
10
98
12
11
0246
1357
1314
Resource2
15
Overview of RMS 2-9
RMS Management Functions
detail in Chapter 6 (Access Control, Usage Limits and Accounting).
Each partition, except the root partition, is managed by a Partition Manager (see
Section 4.4), which mediates user requests, checking access permissions and usage
limits before scheduling CPUs and starting user jobs.
An accounting record is created as CPUs are allocated to each request. It is updated
periodically until the resources are freed. The accounting record itemizes CPU and
memory usage, indexed by job, by user and by project.
2.4.4 RMS Configuration
The set of partitions active at any time is known as a configuration. A system will
normally have a number of configurations, each appropriate to a particular operating
pattern. For example, there may be one configuration for normal working hours and
another for night time and weekend operation.
The CPUs allocated to a partition may vary between configurations. For example, a login
partition (nodes allocated for interactive use) may have more nodes allocated during
working hours than at night – it may even be absent from the night time configuration.
A pair of configurations are shown in Figure 2.7.
Figure 2.7: Two Configurations
16nodes,4CPUspernode
Day
LoginDevelopment
Night
Parallel
RMS supports automated reconfiguration at shift changes as well as dynamic
Parallel
reconfiguration in response to a request from an operator or administrator. The RMS
client rcontrol (Page 5-20) manages the switch-over from one configuration to another.
For automatic reconfiguration, rcontrol can be invoked from a cron job.
2-10 Overview of RMS
3.1 Introduction
RMS provides users with tools for running parallel programs and monitoring their
execution, as described in Chapter 5 (RMS Commands). Users can determine what
resources are available to them and request allocation of the CPUs and memory required
to run their programs. This chapter describes the structure of parallel programs under
RMS and how they are run.
A parallel program consists of a controlling process, prun, and a number of application
processes distributed over one or more nodes. Each process may have multiple threads
running on one or more CPUs. prun can run on any node in the system but it normally
runs in a login partition or on an interactive node.
3
Parallel Programs Under RMS
In a system with SMP nodes, RMS can allocate CPUs so as to use all of the CPUs on the
minimum number of nodes (a block distribution); alternatively, it can allocate a specified
number of CPUs on each node (a cyclic distribution). This flexibility allows users to
choose between the competing benefits of increased CPU count and memory size on each
node (generally good for multithreaded applications) and increased numbers of nodes
(generally best for applications requiring increased total memory size, memory
bandwidth and I/O bandwidth).
Parallel programs can be written so that they will run with varying numbers of CPUs
and varying numbers of CPUs per node. They can, for example, query the number of
processors allocated and determine their data distributions and communications
patterns accordingly (see Appendix C (RMS Kernel Module) for details).
Parallel Programs Under RMS 3-1
Resource Requests
3.2 Resource Requests
Having logged into the system, a user makes a request for the resources needed to run a
parallel program by using the RMS commands prun (see Page 5-11) or allocate (see
Page 5-3). When using the prun command, the request can specify details such as the
following:
• The partition on which to run the program (the -p option)
• The number of processes to run (the -n option)
• The number of nodes required (the -N option)
• The number of CPUs required per process (the -c option)
• The memory required per process (the RMS_MEMLIMITenvironment variable)
• The distribution of processes over the nodes (the -m, -B and -R options)
• How standard input, output and error streams should be handled (the -i, -o and -e
options)
• The project to which the program belongs for accounting and scheduling purposes
(the -P option)
Two variants of a program with eight processes are shown in Figure 3.1: first, with one
process per node; and then, with two processes per node.
Figure 3.1: Distribution of Parallel Processes
01234567
01
45
3-2 Parallel Programs Under RMS
1ProcessPerNode
23
67
2ProcessesPerNode
Loading...
+ 164 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.