The software described in this document is furnished under a license agreement. The softwar e may be used
or copied only under the terms of the license agreement. No part of this manual may be photocopied or
reproduced in any form without prior written consent from The MathW orks, Inc.
FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation
by, for, or through the federal government of the United States. By accepting delivery of the Program
or Documentation, the government hereby agrees that this software or documentation qualifies as
commercial computer software or commercial computer software documentation as such terms are used
or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and
conditions of this Agreement and only those rights specified in this Agreement, shall pertain to and govern
theuse,modification,reproduction,release,performance,display,anddisclosureoftheProgramand
Documentation by the federal government (or other entity acquiring for o r through the federal government)
and shall supersede any conflicting contractual terms or conditions. If this License fails to meet the
government’s needs or is inconsistent in any respect with federal procurement law, the government agrees
to return the Program and Docu mentation, unused, to The MathWorks, Inc.
Trademarks
MATLAB and Simulink are registered trademarks of The MathWorks, Inc. S ee
www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand
names may be trademarks or registered trademarks of their respective holders.
Patents
The MathWorks products are protected by one or more U.S. patents. Please see
www.mathworks.com/patents for more information.
Distributed Computing Server™ System Administrator’s Guide
Page 3
Revision History
November 2005Online onlyNew for Version 2.0 (Release 14SP3+)
December 2005Online onlyRevised for Version 2.0 (Release 14SP3+)
March 2006Online onlyRevised for Version 2.0.1 (Release 2006a)
September 2006 Online onlyRevised for Version 3.0 (Release 2006b)
March 2007Online onlyRevised for Version 3.1 (Release 2007a)
September 2007 Online onlyRevised for Version 3.2 (Release 2007b)
March 2008Online onlyRevised for Version 3.3 (Release 2008a)
October 2008Online onlyRevised for Version 4.0 (Release 2008b)
March 2009Online onlyRevised for Version 4.1 (Release 2009a)
September 2009 Online onlyRevised for Version 4.2 (Release 2009b)
March 2010Online onlyRevised for Version 4.3 (Release 2010a)
Starting A dm in Center .............................3-2
Setting Up Resources
Adding Hosts
Starting a Job Manager
Starting Workers
Stopping, Destroying, Resuming, Restarting Processes
Moving a Worker
Updating the Display
Testing Connectivity
Saving and Loading Sessions
.....................................3-3
..............................3-3
............................3-4
..................................3-5
...3-7
..................................3-8
..............................3-8
...............................3-9
....................... 3-13
Page 7
Preparing for User Configurations ..................3-14
Control Script Reference
4
mdce Process Control ..............................4-2
Job Manager Control
Worker Control
5
..............................4-2
....................................4-2
Control Scripts — Alphabetical List
Glossary
Index
vii
Page 8
viiiContents
Page 9
Introduction
1
This chapter provides an introduction to the concepts and terms of Parallel
Computing Toolbox™ software and MATLAB
Server™ software.
• “Product Overview” on page 1-2
• “Toolbox and Server Components” on page 1-4
• “Using Parallel Computing Toolbox Software” on page 1-8
®
Distributed Computing
Page 10
1 Introduction
Product Overview
Overview
Parallel Computing Toolbox and MATLAB Distributed Computing S erver
software let y ou solve computationally and data-intensive problems using
MATLAB
Parallel processing constructs such as parallel for-loops and code blocks,
distributed arrays, parallel numerical algorithms, and message-passing
functions let you implement task-parallel and data-parallel algorithms at
a high level in MATLAB without programming for specific hardware and
network architectures.
A job is some large operation that you need to perform in your MATLAB
session. A job is broken down into segments called tasks.Youdecidehowbest
to divide your job into tasks. You could divide your job into identical tasks,
but tasks do not have to be identical.
In this section...
“Overview” on page 1-2
“Determining Product Installation and Versions” on page 1-3
®
and Simulink®on multicore and multiprocessor computers.
1-2
The MATLAB session in which the job an d its tasks are defined is called the
client session. Often, this is on the machine where you program MATLAB.
The client uses Parallel Computing Toolbox software to perform the definition
of jobs and tasks. The MATLAB Distributed Computing Server product
performs the execution of your job by evaluating each of its tasks and
returning the result to your client session.
The job manager is the part of the server software that coordinates the
execution of jobs and the evaluation of their tasks. The job manager
distributes the tasks for evaluation to the server’s individual MATLAB
sessions called workers. Use of the MathWorks™ job manager is optional;
the distribution of tasks to workers can also be performed by a third-party
scheduler, such as Window HPC Server (including CCS), a Platform LSF
scheduler, or a PBS Pro®scheduler.
®
Page 11
Product Overview
See the “Glossary” on page Glossary-1 for definitions of the parallel computing
termsusedinthismanual.
MATLAB Worker
MATLAB Distributed
Computing Server
MATLAB Client
Parallel
Computing
Toolbox
Scheduler
or
Job Manager
MATLAB Worker
MATLAB Distributed
Computing Server
MATLAB Worker
MATLAB Distributed
Computing Server
Basic Parallel Computing Configuration
Determining Product Installation and Versions
To determine if Parallel Computing Toolbox software is installed on your
system, type this command at the MATLAB prompt:
ver
When you enter this command, MATLAB displays information about the
version of MATLAB you are running, including a list of all toolboxes installed
on your system and their version numbers.
You can run the
application to determine what version of MATLAB Distributed Computing
Server software is installed on a worker machine. Note that the toolbox and
server software must be the same version.
ver command as part of a task in a distributed or parallel
1-3
Page 12
1 Introduction
Toolbox and Ser ver Components
In this section...
“Job Managers, Workers, and Clients” on page 1-4
“Third-Party Schedulers” on page 1-6
“Components on Mixed Platforms or Heterogeneo us Clusters” on page 1-7
“mdce Service” on page 1-7
Job Managers, Workers, and Clients
The o ptional job manager can run on any machine on the network. The job
manager runs jobs in the order in which they are submitted, unless any jobs
in its queue are promoted, demoted, canceled, or destroyed.
Each worker receives a task of the runningjobfromthejobmanager,executes
the task, returns the result to the job manager, and then receives another
task. When all tasks for a running job have been assigned to workers, the job
manager starts running the next job with the next available worker.
1-4
A MATLAB Distributed Computing Server network configuration usually
includes m any workers that can all execute tasks simultaneously, speeding
up execution of large MATLAB jobs. It is generally not important which
worker executes a specific task. Each worker evaluates tasks one at a time,
returning the results to the job manager. The job manager then returns the
results of all the tasks in the job to the client session.
Note For testing your application locally or other purposes, you can configure
a single computer as client, worker, and job manager. You can also have more
than one worker session or more than one job manager session on a machine.
Page 13
Toolbox and Server Components
Task
Job
Results
Worker
Client
All Results
Job
Client
All Results
Interactions of Parallel Computing Sessions
A large network might include several job managers as well as several
client sessions. Any client session can create, run, and access jobs on any
job manager, but a worker session is registered with and dedicated to only
one job manager at a time. The following figure shows a configuration with
multiple job managers.
Scheduler
or
Job Manager
Task
Worker
Results
Task
Worker
Results
Worker
Client
Scheduler
or
Worker
Job Manager
Worker
Client
Client
Scheduler
Client
Configuration with Multiple Clients and Job Managers
or
Job Manager
Worker
Worker
Worker
1-5
Page 14
1 Introduction
Third-Party Sch
As an alternativ
third-party sch
(including CCS
scheduler, mpi
eduler. This could be a Microsoft
), Platform LSF scheduler, PBS Pro scheduler, TORQUE
exec, or a generic scheduler.
edulers
etousingtheMathWorksjobmanager,youcanusea
®
Windows HPC Server
Choosing Between a Scheduler and Job Manager
You should co
MathWorks jo
• Does your cl
If you alrea
of controll
as easy to u
administr
• Is the hand
manageme
The MathW
paralle
third-p
• Is there
nsider the following when deciding to use a scheduler or the
b manager for distributing your tasks:
uster already have a scheduler?
dy have a scheduler, you may be required to use it as a means
ing a cces s to the cluster. Your existing schedule r might be just
se as a job manager, so there might be no need for the extra
ation involved.
ling of parallel computing jobs the only cluster scheduling
nt you n eed?
orks job manager is designed specifically for MathWorks
l computing applications. If other scheduling tasks are not needed, a
arty scheduler migh t not offer any advantages.
a file sharing configuration on your cluster already?
1-6
The Mat
necess
in conf
• Are yo
When y
all t
work
and s
time
• Are
You
req
• Ho
hWorks job manager can handle all file and data sharing
ary for your parallel computing applications. This might be helpful
igurations where shared access is limited.
u interested in batch or interactive processing?
ou use a job manager, worker processes usually remain running at
imes, dedicated to their job manager. With a third-party scheduler,
ers are run as a pplications that are started for the evaluation of tasks,
topped w hen their tasks are complete. If tasks are small or take little
, starting a worker for each one might involve too much overhead time.
there security concerns?
r scheduler may be configured to accommodate your particular security
uirements.
w many nodes are on your cluster?
Page 15
Toolbox and Server Components
Ifyouhavealargecluster,youprobably already have a scheduler. C onsult
your MathWorks representative if you have questions about cluster size
and the job manager.
• Who administers your cluster?
The person administering your cluster might have a preference for how
jobs are scheduled.
Components on Mixed Platforms or Heterogeneous
Clusters
Parallel Computing Toolbo x software and MATL AB Distributed Computing
Server software are supported on Windows
Macintosh
clients, job managers, and workers do not have to be on the same platform.
The cluster can also be comprised of both 32-bit and 64-bit machines, so long
as your data does not exceed the limitations posed by the 32-bit systems.
For a complete listing of all netw ork requirem en t s, incl ud i ng those for
heterogeneous environments, see the System Requirements page for MATLAB
Distributed Computing Server software at
®
operating systems. Mixed platforms are supported, so that the
In a mixed platform environment, be sure to follow the proper installation
instructions for each local machine on which you are installing the software.
mdce Service
If you are using the MathWorks job manager, every machine that hosts a
workerorjobmanagersessionmustalsorunthemdceservice.
The mdce service recovers worker and job manager sessions when their
host machines crash. If a worker or job manager machine crashes, when
mdce starts up again (usually configured to start at machine boot time), it
automatically restarts the job manager and worker sessions to resume their
sessions from before the system crash.
1-7
Page 16
1 Introduction
Using Parallel Computing Toolbox Software
A typical Parallel Computing Toolbox client session includes the following
steps:
1 Find a Job Manager (or scheduler) — Your network may have one or more
job managers available (but usually only one scheduler). The function you
usetofindajobmanagerorscheduler creates an object in your current
MATLAB session to represent the job manager or scheduler that will run
your job.
2 Create a Job — You create a job to hold a collection of tasks. The job exists
on the job manager (or scheduler’s data location), but a job object in the
local MATLAB session represents that job.
3 Create Tasks — You create tasks to add to the job. Each task of a job can
be represented by a task object in your local MATLAB session .
4 Submit a Job to the Job Queue for Executio n — When your job has all its
tasks defined, you submit it to the queueinthejobmanagerorscheduler.
Thejobmanagerorschedulerdistributes your job’s tasks to the worker
sessions for evaluation. When all of the workers are completed with the
job’s tasks, the job moves to the finished state.
1-8
5 Retrieve the Job’s Results — The resulting data from the evaluation of the
job is available as a property value of each task object.
6 Destroy the Job — When the job is complete and all its results are gathered,
you can destroy the job to free memory resources.
Page 17
Network Administration
This chapter provides information useful for network administration of
Parallel Computing Toolbo x software and MATL AB Distributed Computing
Server software.
• “Preparing for Parallel Computing” on page 2-2
• “Installing and Configuring” on page 2-5
• “Using a Different MPI Build on UNIX Operating Systems” on page 2-6
• “Shutting Down a Job Manager Configuration” on page 2-9
2
• “Customizing Server Services” on page 2-13
• “Accessing Service R ecord Files” on page 2-17
• “Troubleshooting” on page 2-19
Page 18
2 Network Administration
Preparing for Parallel Computing
In this section...
“Before You Start” on page 2-2
“Planning Your Network Layout” on page 2-2
“Network Requirements” on page 2-3
“Fully Qualified Domain Names” on page 2-3
“Security Considerations” on page 2-4
This section discusses the requirements and configurations for your network
to support parallel computing.
Before You Start
Before attempting to install Parallel Computing Toolbox software and
MATLAB Distributed Computing Server software, read Chapter 1,
“Introduction” to familiarize yourself with the concepts and vocabulary of
the products.
2-2
Planning Your Network Layout
Generally, it is easy to decide which machines w ill run worker processes and
which will run client processes. Worker sessions usually run on the cluster of
machines dedicated to that purpose. The MATLAB client session usually runs
where MATLAB programs are run, often on a user’s desktop.
The job manager process should run on a stable machine, with adequate
resources to manage the number of tasks and amount of data expected in
your parallel computing applications.
The follo wing table sho ws what products and process es are neede d for each of
these roles in the p arallel computing configuration.
Page 19
Preparing for Parallel Computing
SessionProductProcesses
ClientParallel Computing
Toolbox
Worker
Job manager
Theserversoftwareincludesthemdceserviceordaemon. Themdceservice
is separate from the worker and job manager processes, and it must be
running on all machines that run job manager sessions or workers that are
registered with a job manager. (The mdce service is not used with third-party
schedulers.)
You can install both toolbox and server software on the same machine, so that
one machine can run both client and server sessions.
MATLAB Distributed
Computing Server
MATLAB Distributed
Computing Server
MATLAB with toolbox
worker; mdce service (if
using a job manager)
mdce service; job
manager
Network Requirements
To view the network requirements for MATLAB Distributed Computing
Server software, visit the product requirements page on the MathWorks
Web site at
MATLAB Distributed Computing Server software and Parallel Computing
Toolbox software support both short hostnames and fully qualified domain
names. The default usage is short hostnames. If your network requires fully
qualified hostnames, you can use the
nodes by their full names. See “Customizing Server Services” on page 2-13.
To set the hostname used for a MATLAB client session, see the
reference page.
mdce_def file to identify the w orker
pctconfig
2-3
Page 20
2 Network Administration
Security Consid
The parallel com
Therefore, be aw
• MATLAB workers
mdce service u
operating sys
systems. Beca
that execute
• The mdce serv
Anyone with
their worke
• The job mana
data. Usin
could allo
• The parall
or you mus
other thr
job canno
communi
• If certa
computi
in ports are restricted, you can specify the ports used for parallel
puting products do not provide any security measures.
are of the following security considerations:
nder. By default, the mdce service starts as
tems, and as
use MATLAB provides system calls, users can submit jobs
shell commands.
ice does not enforce any access control or authentication.
local or remote access to the mdce services can start and stop
rs and job managers, and query for their status.
ger does not restrict access to the cluster, nor to job and task
g a third-party scheduler instead of the MathWorks job manager
w you to take advantage of the security measures it provides.
el computing processes must all be onthesamesideofafirewall,
ttakemeasurestoenablethemtocommunicatewitheach
ough the firewall. Workers running tasks of the same parallel
t be firewalled off from each other, because their MPI-based
cation will not work.
ng. See “Defining the Script Defaults” on page 2-13.
erations
run as whatever user the administrator starts the node’s
root on UNIX
LocalSystem on Microsoft Windows operating
2-4
• If your
accomm
networ
betwe
compu
• If you
(MBo
MBon
rally the default condition. If you have any questions about MBone
gene
memb
network supports multicast, the parallel computing processes
odate m ulticast. However, because multicast is disabled on many
ks for security reasons, you might require unicast communication
en parallel computing processes. Most examples of parallel
ting scripts and functions in this documentation show unicast usage.
r organization is a member of the Internet Multicast Backbone
ne), make sure that your parallel computing cluster is isolated from
e access if you are using multicast for para llel computing. This is
ership, contact your network administrator.
Page 21
Installing and Configuring
To find the most up-to-date instructions for installing and configuring
the current or past versions of the parallel computing products, visit the
MathWorks Web site at
Using a Different MPI Build on UNIX Operating Systems
In this section...
“Building MPI” on page 2-6
“Using Your MPI Build” on page 2-6
Building MPI
To use an MPI build that differs from the one provided with Parallel
Computing Toolbox, this stage outlines the steps for creating an MPI build.
If you already have an alternative MPI build, proceed to “Using Your MPI
Build” on page 2-6.
1 Unpack the MPI sources into the target file system on your machine. For
example, suppose you have downloaded
to unpack it into
# cd /opt
# mkdir mpich2 && cd mpich2
# tar zxvf path/to/mpich2-distro.t gz
# cd mpich2-1.0.8
/opt for building:
mpich2-distro.tgz and want
2-6
2 Build your MP I using the enable-sha redlibs option(thisisvital,asyou
must build a shared library MPI, binary compatible w ith
for R2009b and later). For example, the following commands build an MPI
with the
nemesis channel device and the gforker launcher.
# ./config ure -prefix=/opt/mpich2/mpich2-1.0.8 \
--enable-sharedlibs=gcc \
--with-device=ch3:nemesis \
--with-pm=gforker 2>&1 | tee log
# make 2>&1 | tee -a log
# make install 2>&1 | tee -a log
MPICH2-1.0.8
Using Your MPI Build
When your MPI build is ready, this stage highlights the steps to use it. To get
the Parallel Computing Toolbox mpiex ec scheduler working with a different
MPI build, follow these steps. Most of these steps are also needed if you want
to use a different MPI build with third party-schedulers (LSF, generic).
Page 23
Using a Differen t MPI Build on UNIX®Operating Systems
1 Test your build by running the mpiexec executable. The build should be
ready to test if its
bin/mpiexec and lib/libmpich.so are available in the
MPI installation location.
Following the example in “Building MPI” on page
2-6,
/opt/mpich2/mpich2-1.0.8/bin/mpiexec and
/opt/mpich2/mpich2-1.0.8/lib/libmpich.so are ready to use, so you
for an example of Sun Grid Engine, look in the folder
sgeParallelWrapper.sh. Adopt and modify the appropriate script
sge for
foryourparticularclusterusage.
2-8
Page 25
Shutting Down a Job Manager Configuration
Shutting Down a Job Manager Configuration
In this section...
“UNIX and Macintosh Operating Systems” on page 2-9
“Microsoft Windows Operating Systems” o n page 2-11
If you a re done using the job manager and its workers, you might want to shut
downtheserversoftwareprocessessothat they are not consuming network
resources. You do not need to be at the computer running the processes that
you are shutting down. You can run these commands from any machine with
network access to the processes. The following sections explain shutting down
the processes for different platforms.
UNIX and Macintosh Operating Systems
Enter the commands of this section at the prompt in a UNIX shell.
Stopping the Job Manager and Workers
1 To shut down the job manager, enter the comm ands
cd matlabroot/toolbox/distc omp/bin
(Enter the following command on a single line.)
stopjobmanager -remotehost <job m anag er hostname> -name
<MyJobManager> -v
If you have more than one job manager running, stop each of them
individually by host and name.
For a list of all options to the script, ty pe
stopjobmanager -help
2 For each MATLAB worker you want to shut down, enter the commands
cd matlabroot/toolbox/distc omp/bin
stopworker -remotehost <worker hos tna me> -v
2-9
Page 26
2 Network Administration
Ifyouhavemorethanoneworkersessionrunning,youcanstopeachof
them individually by host and name.
Normally, you configure the mdce daemon to start at system boot time
and continue running until the machine shuts down. However, if you plan
to uninstall the MATLAB Distributed Computing Server product from a
machine, you might want to uninstall the mdce daemon also, because you
no longer need it.
2-10
Note You must have root privileges to stop or uninstall the mdce daemon.
1 Use the following command to stop the mdce daemon:
/etc/init.d/mdce stop
2 Remove the installed link to prevent the daemon from starting up again
at system reboot:
cd /etc/in it.d/
rm mdce
Stopping the D aemon Manually. If you used the alternative manual
startup of the mdce daemon, use the following commands to stop it manually:
cd matlabroot/toolbox/distc omp/bin
mdce stop
Page 27
Shutting Down a Job Manager Configuration
Microsoft Windows Operating Systems
Stopping the Job Manager and Workers
Enter the commands of this section at the prompt in a DOS command window.
1 To shut down the job manager, enter the comm ands
cd matlabroot\toolbox\distc omp\bin
(Enter the following command on a single line.)
stopjobmanager -remotehost <job m anag er hostname> -name
<MyJobManager> -v
If you have more than one job manager running, stop each of them
individually by host and name.
For a list of all options to the script, ty pe
stopjobmanager -help
2 For each MATLAB worker you want to shut down, enter the commands
Normally, you configure the mdce service to start at system boot time and
continue running until the machine shuts down. If you need to stop the mdce
2-11
Page 28
2 Network Administration
service while leaving the machine on, enter the following commands at a
DOS command prompt:
cd matlabroot\toolbox\distc omp\bin
mdce stop
If you plan to uninstall the MATLAB Distributed Computing Server product
from a machine, you might want to uninstall the mdce service also, because
you no longer need it.
You do not need to stop the service before uninstalling it.
To uninstall the mdce service, enter the following commands at a DOS
command prompt:
cd matlabroot\toolbox\distc omp\bin
mdce uninst all
2-12
Page 29
Customizing Server Services
In this section...
“Defining the Script Defaults” on page 2-13
“Overriding the Script Defaults” on page 2-15
The MATLAB Distributed Computing Server scripts run using several default
parameters. You can customize the scripts, as described in this section.
Defining the Script Defaults
The scripts for the server services require values for several paramete rs.
These parameters set the process name, the user name, log file location, ports,
etc. Some of these can be set using flags on the command lines, but the full
set o f user-configurable p arameters are in the
Note The startup script flags take precedence over the settings in the
mdce_def file.
Customizing Server Services
mdce_def file.
The default parameters used by the server service scripts are defined in the
file:
•
matlabroot\toolbox\distcomp\bin\mdce_def.bat (on Microsoft
Windows operating systems)
•
matlabroot/toolbox/distcomp/bin/mdce_def.sh (on UNIX or Macintosh
operating systems)
To set the default parameters, edit this file before installing or starting the
mdce service.
The
mdce_def file is self-documented, and includes explanations of all its
parameters.
2-13
Page 30
2 Network Administration
Note If you want to run more than one job manager on the same m achine,
they must all have unique names. Specify the names using flags with the
startup commands.
Setting the User
By default, the job manager and worker services run as the user who starts
them. You can run the services as a different user with the following settings
in the
mdce_def file.
ParameterDescription
MDCEUSER
Set this parameter to run the mdce services as a user
different from the user who starts the service. On a
UNIX operating system, set the value before starting
the service; on a Windo ws operating system, set it
before installing the service.
MDCEPASS
On a Windows operating system, set this parameter
to specify the password for the user identified in the
MDCEUSER parameter; otherwise, the system prompts
you for the password when the service is installed.
2-14
On UNIX operating systems, MDCEUSER requires that the current machine
has the
sudo to execute commands as the user identified by MDCEUSER.Forfurther
information, refer to your system documentation on the
sudo utility installed, and that the current user be allowed to use
sudo and sudoers
utilities (for example, man sudo and man sudoers).
On Windows operating systems, when executing the
the user defined by
MDCEUSER must be listed among those who can log
mdce start script,
on as a service. To see the list of valid users, select the Windows menu
Start > Settings > Control Panel. Double-click
then
Local Secur ity Policy. In the tree, select User Rights Assignment,
then in the right pane, double-click
must list the user defined for
Log on as a service. This dialog box
MDCEUSER in your mdce_def.bat file. If not,
Administrative Tools,
you can add the user to this dialog box according to the instructions in
the
mdce_def.bat file, or when running mdce start, you can use another
mdce_def.bat file that specifies a listed user.
Page 31
Customizing Server Services
Overriding the Script Defaults
Specifying an Alternative Defaults File
The default parameters used by the mdce service, job managers, and workers
are defined in the file:
•
matlabroot\toolbox\distcomp\bin\mdce_def.bat (on Windows
operating systems)
•
matlabroot/toolbox/distcomp/bin/mdce_def.sh (on UNIX or Macintosh
operating systems)
Before installing and starting the mdce service, you can edit this file to set
the default parameters with values you require.
Alternatively, you can make a copy of this file, modify the copy, and specify
that this copy be used for the default parameters.
On UNIX or Macintosh operating systems, enter the command
mdce start -mdcedef my_mdce_def.s h
On Windows operating systems, enter the command
mdce insta ll -mdcedef my_mdce_def.bat
mdce start -mdcedef my_mdce_def.ba t
If you specify a new mdce_d ef file instead of the default file for the service on
onecomputer,thenewfileisnotautomaticallyusedbythemdceserviceon
other computers. If you want to use the same alternative file for all your mdce
services, you must specify it for each mdce service you install or start.
For more information, see “Defining the Script Defaults” on page 2-13.
Note The startup script flags take precedence over the settings in the
mdce_def file.
2-15
Page 32
2 Network Administration
Starting in a Clean State
When a job manager or worker starts up, it normally resumes its session from
the past. This way, a job queue is not destroyed or lost if the job manager
machine crashes or if the job manager is inadvertently shut down. To start up
a j ob manager or worker from a clean state, with all history deleted, use the
The MATLAB Distributed Computing Server services generate various record
files in the normal course of their operations. The mdce service, job manager,
and worker sessions all generate such files. This section describes the types of
information stored by the services.
Locating Log Files
Log files for each service contain entries for the service’s operations. These
might be of particular interest to the network administrator in cases when
problems arise.
Operating SystemFile Loc ation
Accessing Service Record Files
Windows
UNIX and Macintosh
The default location of the log files is
<TEMP>\MDCE\Log,where<TEMP> is the value
of the system
TEMP is set to C:\TEMP, the log files are placed
in
C:\TEMP\MDCE\Log.
You can set a lte rnativ e locations for the log
files by modifying the
mdce_def.bat file before starting the mdce
service.
The default location of the log files is
/var/log/mdce/.
You can set a lte rnativ e locations for the log
files by modifying the
mdce_def.sh file before starting the mdce
service.
TEMP variable. For example, if
LOGBASE setting in the
LOGBASE setting in the
2-17
Page 34
2 Network Administration
Locating Checkp
Checkpoint dire
the server servi
another. For ex
continues the o
A primary feat
This allows se
system goes d
if a MATLAB wo
is neither re
finished jo
any unfinis
Note If a jo
minutes to
PlatformFile Location
Windows
ctories contain information related to persistence data, which
ces use to create continuity from one instance of a session to
ample, if you stop and restart a job manager, the new session
ld session, using all the same data.
ure offered by the checkpoint directories is in crash recovery.
rver services to automatically resume their sessions after a
own and comes back up, minimizing the loss of data. However,
rker goes down during the evaluation of a task, that task
evaluated nor reassigned to another worker. In this case, a
b may not have a complete set of o utput data, because data from
hed tasks might be missing.
b manager crashes and restarts, its workers can take up to 2
reregister with it.
oint Directories
The default location of the checkpoint
directories is
where
TEMP variable. For example, if TEMP is set to
C:\TEMP, the checkpoint directories are placed
in
<TEMP> is the value of the system
C:\TEMP\MDCE\Checkpoint.
<TEMP>\MDCE\Checkpoint,
2-18
UNIX and Macintosh
You can set alternative locations for the
checkpoint directories by modifying the
CHECKPOINTBASE setting in the mdce_def.bat
file before starting the mdce service.
The checkpoint directories are placed by default
in
/var/lib/mdce/.
You can set alternative locations for the
checkpoint directories by modifying the
CHECKPOINTBASE setting in the mdce_def.sh
file before starting the mdce service.
Page 35
Troubleshooting
In this section...
“License Errors” on page 2-19
“Memory Errors on UNIX Operating Systems” on page 2-21
“Running Server Processes from a Windows Network Installation” on page
2-21
“Required Ports” on page 2-21
“Ephemeral TCP Ports with Job Manager” on page 2-23
“Host Communications Problems” on page 2-23
“Verifying Multicast Communications” on page 2-25
This section offers advice on solving problems you might encounter with
MATLAB Distributed Computing Server software.
Troubleshooting
License Errors
When starting a MATLAB worker, a licensing problem m ight result in the
message
License ch eckout fail ed. No such FEATURE exists.
License Man ager Error -5
There are many reasons why you might receive this error:
• This message usually indicates that you are trying to use a product for
which you are not licensed. Look at y our
your MATLAB installation to see if you are licensed to use this product.
• If you are licensed for this product, this error may be the result of having
extra carriage returns or tabs in your license file. To avoid this, ensure that
each line begins with either
After fixing your
MATLAB should work properly.
• This error may als o be the result of an incorrect system date. If your system
date is before the date that your license was made, you will get this error.
license.dat file, restart your license manager and
#, SERVER, DAEMON,orINCREMENT.
license.dat file located within
2-19
Page 36
2 Network Administration
• If you receive this error when starting a worker with MATLAB Distributed
Computing Server software:
- You may be calling the sta rtwo rker command from an installation that
does not have access to a worker license. For example, starting a worker
from a client installation of the Parallel ComputingToolboxproduct
causes the following error:
The mdce service on the host hostname
returned th e following error:
Most likely , the MATLAB worker failed to start due to a
licensing p roblem, or MATLAB crashed during startup.Check
the worker log file
/tmp/mdce_user/node_node_worker_05-11-01_16-52-03_953.log
for more detailed information.The mdce log file
/tmp/mdce_user/mdce-service.log
may also contain some additional information.
In the worker log files, you see the following information:
License ch eckout fail ed.
License Man ager Error -15
MATLAB is unable to connect to the li cense server.
Check that the license manager has been started, and that the
MATLAB clie nt machine can communicate with the license server.
Troubleshoot this issu e by visiting:
http://www.mathworks.com/support/lme/R2009a/15
Diagnostic Information:
Feature: MA TLAB_Distrib_Comp_Engine
License pat h: /apps/matlab/etc/license.dat
FLEXnet Lic ensing erro r: -15,570. System Error: 115
Page 37
Troubleshooting
- If you installed only the Parallel Computing Toolbox product, and you
are attempting to run a worker on the same machine, you will receive
this error because the MATLAB Distributed Computing Server product
is not installed, and therefore the worker cannot obtain a license.
Memory Errors on UNIX Operating Systems
If the number of threads created by the server services on a machine running
a UNIX operating system exceeds the limitation set by the
services fail and generate an out-of-memory error. Check your
on a UNIX operating system with the
UNIX software might have different names for this property.)
limit command. (Different versions of
Running Server Processes from a Windows Network
Installation
Many networks are configured not to allow LocalSystem to have access to
UNC or mapped network shares. In this case, run the mdce process under
a different user with rights to log on as a service. See “Setting the User”
on page 2-14.
maxproc value , the
maxproc value
Required Ports
Using a Job Manager
BASE_PORT. The mdce_def file specifies and describes the ports required
by the job manager and all workers. See the following file in the MATLAB
installation used for each cluster process:
•
matlabroot/toolbox/distcomp/bin/mdce_def.sh (on UNIX operating
systems)
•
matlabroot\toolbox\distcomp\bin\mdce_def.bat (on Windows
operating systems)
Parallel Jobs. On worker machines running a UNIX operating system, the
number of ports required by MPICH for the running of parallel jobs ranges
from
BASE_PORT + 1000 to BASE_PORT + 2000.
2-21
Page 38
2 Network Administration
Using a Third-Party Scheduler
Before the worker processes start, you can control the range of ports used
by the workers for parallel jobs by defining the environment variable
MPICH_PORT_RANGE with the value minport:maxport.
Client Ports
With the pctcon fig function, you specify the ports used by the client. If
thedefaultportscannotbeused,thisfunction allows you to configure ports
separately for communication with the job manager and communication with
pmode or a MATLAB pool.
2-22
Page 39
Troubleshooting
Ephemeral TCP Po
If you use the job
systems, you mus
are available o
ephemeral TCP p
transfers of l
particular, i
maximum valid
1 Start the Reg
2 Locate the following subkey in the registry, and click Pa ra meters:
3 On the Registry Editor window, select Edit>New>DWORDValue.
4 In the list of entries on th e right, change the new value name to
MaxUserPort and press Enter.
5 Right-cl
6 In the Edit DWORD Value dialog, enter 65534 in the Value data field.
Select
ick on the
Decimal for the Base value. Click OK.
manager on a cluster of nodes running Windows operating
tmakesurethatalargenumberofephemeralTCPports
n the job manager machine. By default, the maximum valid
ort number on a Windows operating system is 5000, but
arge data sets might fail if this setting is not increased. In
f your cluster has 32 or more w orkers, you should increase the
ephemeral TCP port number using the following procedure:
istry Editor.
rts with Job Manager
MaxUserPort entry name and select Modify.
This parameter controls the maximum port number that is used when
a program requests any available user port from the system. T ypically,
ephemeral (short-lived) ports are allocated between the values of 1024 and
5000 inclusive. This action allows allocation for port numbers up to 65534.
7 Quit t
8 Reboot your machine.
he Registry Editor.
Host Communications Problems
Ifaworkerisnotabletomakeaconnectionwithitsjobmanager,orifaclient
session cannot find a job manager with the
indicate communications problems between nodes.
findResource function, this might
2-23
Page 40
2 Network Administration
Using a Command Line I nterface
First, be sure that the machines in question agree on their IP resolutions. The
IP address for a particular host should be the same for itself as it is from the
perspective of another host. For example, if a process on
to one on
address for
hostA,findoutthehostA IP address for itself, then see what the IP
hostA is from hostB. They should be the same.
hostB cannot connect
If the machines can identify each other, the
nodestatus command can be
useful for diagnosing problems between their processes. Use the function to
determine what MDCS processes are running on the local host, and which
are accessible from remote hosts. If a worker on
its job manager on
can see on
On
hostB,execute:
nodestatus -remotehose hostB
hostB.
hostB,runnod estatus on both hosts to see what each
hostA cannot register with
Then on hostA, run exactly the same command:
nodestatus -remotehose hostB
The results should be the same, showing the same listing of job managers
and workers.
If the output indicates problems, run the command again with a higher
information level to receive more detailed information:
nodestatus -remotehose hostB -inf olev el 3
Using a GUI
You can diagnose some communications problems using Admin Center.
2-24
If you cannot successfully add hosts to the listing by specifying host name,
you can use their IP addresses instead (see “Adding Hosts” on page 3-3). If
you suspect any communications problems, in the Admin Center GUI click
Test Connectivity (see “Testing Connectivity” on page 3-9). This testing
verifies that the nodes can identify each other and allow their processes to
communicate with each other.
Page 41
Troubleshooting
Verifying Multicast Communications
Note Although the current version of the parallel computing products
continues to support multicast communications between its processes,
multicast is not recommended and might not be supported in future releases.
Multicast, unlike TCP/IP or UDP, is a subscription-based protocol where
a number of machines on a network indicate to the network their interest
in particular packets originating somewhere on that network. By contrast,
both UDP and TCP packets are always bound for a single machine, usually
indicated by its IP address.
The main tools for investigating this type of packet are:
•
tcpdump for UNIX operating systems
winpcap and ethereal for Microsoft Windows operating systems
•
• A Java™ class included with the parallel computing products.
main method and its constructor take two input arguments: the multicast
group to join and the port numbe r to use.
This Java class has a number of simple methods to attempt to join a specified
multicast group. Once the class has successfully joined the group, it has
methods to send messages to the group, listen for messages from the group,
and display what it receives. You can use this class both from a command-line
call to Java software and inside MATLAB.
m = com.mathworks.toolbox.distcomp.test.MulticastTester('239.1.1.1', 9999);
m.startSendingThread;
m.startListeningThread;
These instructions cause each MATLAB session to issue a stream of multicast
test packets, and to listen for test packets. If multicast is working between
the machines, you see a stream of lines like the following:
The number on the left in each string is the line number for the received
packet. The text in the center is the host from which the packet is received.
The number on the right is the packet number sent by the sending host. It is
normal for a host to report a test packet from itself.
If either machine does not receive a stream of test packets, or if the remote
host is not included in either stream, then multicast communication is not
operating properly.
2-26
To terminate the test stream, execute the following in both MATLA B sessions:
m.stopSendingThread;
m.stopListeningThread;
Page 43
Admin Center
• “Starting Admin Center” on page 3-2
• “Setting Up Resources” on page 3-3
• “Testing Connectivity” on page 3-9
• “Saving and Loading Sessions” on page 3-13
• “Preparing for User Configurations” on page 3-14
3
Page 44
3 Admin Center
Starting Admin Center
Admin Center is a graphical user interface that lets you control and verify
MATLAB Distributed Computing Server resources if you are using a job
manager as your scheduler.
You must start Admin Center outside a MATLAB session by executing the
following:
•
matlabroot/toolbox/distcomp/bin/admincenter (on UNIX operating
systems)
•
matlabroot\toolbox\distcomp\bin\admincenter.bat (on Microsoft
Windows operating systems)
The first time you start A dmin Center, you see a welcome dialog box.
3-2
A new session has no hosts listed, so the usual first step is to identify the
hosts you want to include in your listing. To do this, click Add or Find.
Further information continues in the next section.
If you start Admin Center again on the same host, your previous session for
that machine is loaded; and unless the update rate is set to
Center performs an update immediately for the listed hosts and processes.
To clear this information and start a new session, select the pull-down
File > New Session.
never,Admin
Page 45
Setting Up Resources
In this section...
“Adding Hosts” on page 3-3
“Starting a Job Manager” on page 3-4
“Starting Workers” on page 3-5
“Stopping, Destroying, Resuming, Restarting Processes” on page 3-7
“Moving a Worker” on page 3-8
“Updating the Display” on page 3-8
Adding Hosts
To specify the ho sts you want displayed in Admin Center, click Add or Find
in the Welcome dialog box, or if this is not a new session, click Add or Find
in the Hosts module.
Setting Up Resources
In the Add or Find Hosts dialog box, identify the hosts you want to add to the
listing by one of the following methods:
• Select Enter Hostnames and provide short host names, fully qualified
domain names, or individual IP addresses for the hosts, or
• Select Enter IP Range and provide the range of IP addresses for your
hosts.
Note While you can add any hosts to Admin Center, a host must be running
themdceserviceifajobmanagerorworkeristorunonthathost. Seethe
installation instructions available at:
If one of the hosts you have specified is running a job manager, Admin Center
will automatically find and list all the hosts running workers registered with
that job manager. Similarly, if you specify a host that is running a worker,
3-3
Page 46
3 Admin Center
Admin Center will find and list the host running that worker’s job manager,
and also all hosts running other workers under that job manager.
3-4
Starting a Job Manager
To start a job manager, click Start in the Job Manager module.
In the New Job Manager dialog box, provide a name for the job manager,
and select a host to run it on.
Page 47
Setting Up Resources
Alternative methods for starting a job manager include selecting the
pull-down Job Manager > Start, or right-clicking a listed host and selecting,
Start Job Manager.
With a job manager running on your cluster, Admin Center might look like
the following figure, with the job manager listed in the Job Manager module,
as well as being listed by name in the Hosts module in the line for the host
on which it is running.
Starting Workers
To start MATLAB workers, click Start in the Workers module.
In the Start Workers dialog box, specify the numbers of workers to start on
each host, and select the hosts to run them. From the list, select the job
manager for these workers. Click OK to start the workers. Admin center
3-5
Page 48
3 Admin Center
automatically provides names for the workers, based on the hosts running
them.
Alternative methods for starting workers include selecting the pull-down
Workers > Start, or right-cl ick in g a listed host or job manager and selecting,
Start Workers.
3-6
With workers running on your cluster, Admin Center might look like the
following figure, which shows the workers listed in the Workers module. Also,
the number of workers running under the job manager is listed in the Job
Manager module, and the number of workers for each job manager is listed
in the Hosts module.
Page 49
Setting Up Resources
To get more information on any host, job manager, or worker listed in
Admin Center, right-click its name in the display and select Properties.
Alternatively, you can find the Properties option under the Hosts, JobManager,andWorkers drop-down menus.
You can Stop or Destroy job managers an d workers. The primary difference
is that stopping a process shuts it down but retains its data; destroying a
processshutsitdownandclearsitsdata. UseResume to have a process
continue with its existing data. When you use Restart, a dialog box requires
you to confirm your intention of starting a new process while keeping or
discarding data.
3-7
Page 50
3 Admin Center
Moving a Worker
To move a worker f
than start a new w
rom one host to another, you must completely shut it down,
orker on the desired host:
1 Right-click th
2 Select Destroy. This shuts down the worker process and removes all its
data.
3 If the old worker host is not running any other MDCS processes (mdce
service, job manager, or workers), you might want to remove it from the
Admin Center listing.
4 If necessar
5 In the Workers module, click Start. Select the desired host in the Start
Workers dialog box, along with the appropriate number and job manager
name.
Use a similar process to move a job manager from one host to another. Note,
however, that all workers registered w ith the job manager must be destroyed
and started again, registering them with the new instance of the job manager.
e worker in the Workers module list.
y, add the new host to the Admin Center host listing.
Updating the Display
Admin C enter updates its data automatically at regu l ar intervals. To set the
update rate, select an option from the Update list. Click Update Now to
immediately update the display data.
3-8
Page 51
Testing Connectivity
Admin Center lets you test communications between your job manager node,
worker nodes, and the node where Admin Center is running.
The tests are divided into four categories:
• Client — Verifies that the node running Admin Center is properly
configured so that further cluster testing can proceed.
• Client to Nodes — Verifies that the node running Admin Center can
identify and communicate w ith the other nodes in the cluster.
• Nodes to Nodes — Verifies that the other nodes in the cluster can identify
each other, and that each node allows its mdce service to communicate with
the mdce service on the other cluster nodes.
• Nodes to Client — Verifies that other cluster nodes can identify and
communicate with the node running Admin Center.
First click Test Connectivity to open the Connectivity Testing dialog box.
By default, the dialog box displays the results of the last test. To run new
tests and update the display, click Run.
Testing Connectivity
3-9
Page 52
3 Admin Center
During test execution, Admin Center displaysthisprogressdialogbox.
3-10
When the t
closes,
dialog b
and Admin Center displays the test resul ts in the Connectivity Testing
ox.
ests are complete, the Running Tests dialog box automatically
Page 53
The possible test result symbols are described in the following table.
Testing Connectivity
Test Result
Description
Test passed.
Test passed, extra information is available.
Test passed, but generated a warning.
Test failed.
Test was skipped, possibly because prerequisite tests did
not pass.
Test that inclu de failures or other results might look like the following figure.
Double-click any of the symbols in the test results to drill down for more
detail. Use the Log tab to see the raw data from the tests.
3-11
Page 54
3 Admin Center
The results of the tests that run on only the client are displayed in the
lower-left corner of the dialog box. To drill into client-only test results, click
More Info.
3-12
Page 55
Saving and Loading Sessions
By default, Admin Center saves the cluster definition, process status, and
test res ults, so the next time the same user runs Admin Center on the same
machine, that information is available and displayed by default. You can
export session data so that a different user or a different host can access it,
by selecting the pull-down File > Export. Browsetothelocationwhereyou
want to store the session data and provide a name for the file. Admin Center
applies the extension
You can import that saved session data into a subsequent session of Admin
Center by selecting the pull-down File > Import. The imported data includes
cluster definition and test results.
When identifying the file for importing in the Import Session dialog box,
there is a Disable updates check box. Checking this box lets you import a
session that does not automatically update, so that you can statically examine
a cluster setup for evaluation or diagnostic purposes. Otherwise, unless the
update rate is set to
after starting or loading a session.
.mdcs tothefilename.
never, Admin Center performs an update immediately
Saving and Loading Sessions
3-13
Page 56
3 Admin Center
Preparing for User Configurations
Admin Center does not create user configurations, but the information
displayed in Admin Center is of vital importance when you create your
parallel configuration — information such as job manager name, job manager
host, and number of workers. For more information about creating and using
configurations, see “Programming with User Configurations” in the Parallel
Computing Toolbox d ocumentation.
3-14
Page 57
Control Script Reference
mdce P rocess Control (p. 4-2)Control mdce service
Job Manager Control (p. 4-2)Control job manag er
Worker Control (p. 4-2)Control MATLAB workers
4
Page 58
4 Control Script Reference
mdce Process Control
mdceInstall, start, stop, or uninstall mdce
service
nodestatus
remotemdceExecute mdce
Job Manager Control
startjobmanager
stopjobmanager
Worker
Control
startw
stopw
orker
orker
Status of mdce processes running on
node
command on one or
more remote h
protocol
Start job manager process
Stop job manager process
Start MATLAB worker session
Stop MAT LAB worker session
osts by transport
4-2
Page 59
ControlScripts—
Alphabetical List
5
Page 60
mdce
PurposeInstall, start, stop, or uninstall mdce service
Syntaxmdce ins tall
mdce uni nstall
mdce sta rt
mdce sto p
mdce con sole
mdce res tart
mdce ... -mdcedef <mdce_default s_fi le>
mdce ... -clean
mdce sta tus
mdce -ve rsion
DescriptionThe mdce service ensures that all other processes are running and that
it is poss ible to communicate with them. Once the mdce service is
running, you can use the
about the mdce service and all the processes it maintains.
The
mdce executable resides in the folder
matlabroot\toolbox\distcomp\bin (Windows operating system ) or
matlabroot/toolbox/distcomp/bin (UNIX operating system). Enter
the following commands at a DOS or UNIX command-line prompt,
respectively.
nodestatus command to obtain information
5-2
mdce ins tall installsthemdceserviceintheMicrosoftWindows
Service Control Manager. This causes the service to automatically start
when the Windo ws operating system boots up. The service must be
installed before it is started.
mdce uni nstall uninstalls the mdce service from the Windows Service
Control Manager. Note that if you wish to install mdce service as a
different user, you must first uninstall the service and then reinstall
as the new user.
mdce sta rt starts the mdce service. This creates the required logging
and checkpointing directories, and then starts the service as specified
in the m dce defaults file.
Page 61
mdce
mdce sto p stops running the mdce service. This automatically stops all
job managers and workers o n the computer, but leaves their checkpoint
information intact so that they w ill start again when the mdce service
is started again.
mdce con sole starts the mdce service as a process in the current
terminal or command window rather than as a service running in the
background.
mdce res tart performs the equivalent of mdce stop followed by mdce
. This command is available only on UNIX and Macintosh
start
operating systems.
mdce ...-mdcedef <mdce_defaults_file> uses the specified
alternativemdcedefaultsfileinsteadoftheonefoundin
matlabroot/toolbox/distcomp/bin.
mdce ...-clean performs a complete cleanup of all service
checkpoint and log files before installing or starting the service, or after
stopping or uninstalling it. This deletes all information about any job
managers or workers this service has ever maintained.
mdce sta tus reports the status of the mdce service, indicating
whether it is running and with what PID. Use
more detailed information about the mdce service. The
nodestatus to obtain
mdce stat us
command is available only on UNIX and Macintosh operating systems.
mdce -ve rsion prints version information of the mdce process to
standard output, then exits.
See Alsonodestatus, startjobmanager, startwor ker, stopjobmanager,
stopworker
5-3
Page 62
nodestatus
PurposeStatus of mdce processes running on node
Syntaxnodestatus
nodestatus -flags
Descriptionnodestatus displays the status of the mdce service and the processes
which it maintains. The mdce service must already be running on the
specified computer.
system) or
system). Enter the following command syntax at a DOS or UNIX
command-line prompt, respectively.
nodestatus -flags accepts the following input flags. Multiple flags
can be used together on the same command.
matlabroot/toolbox/distcomp/bin (UNIX operating
Flag
-remotehost <hostname>
-infolevel <level>
Operation
Displays the status of the mdce
service and the processes it
maintains on the specified host.
The default value is the local host.
Specifies how much status
information to report, using
a level of 1-3. 1 means only
the basic information, 3 means
all information available. The
default value is 1.
5-4
Page 63
nodestatus
Flag
-baseport < port_number>
-v
Operation
Specifies the base port that the
mdce service on the remote host
is using. You need to specify this
only if the value of
the local
match the base port being used
by the mdce service on the remote
host.
Verbose mode displays the
progress of the command
execution.
mdce_def file does not
BASE_PORT in
ExamplesDisplay basic information about the mdce processes on the local host.
nodestatus
Display detailed information about the status of the mdce processes
on host
node27.
nodestatus -remotehost node27 -in fole vel 2
See Alsomdce, startjobmanager, startworker, stopjobmanager, st opwo rker
5-5
Page 64
remotemdce
PurposeExecute mdce command on one or more remote hosts by transport
protocol
Syntaxremotemdce <mdce opt ions><flags><protocol op tions>
remote hosts. For a description of the mdce service, see the
reference page. The general form of the syntax is:
remotemdce <mdce opt ions><flags><protocol op tions>
Thefollowingtabledescribesthesupportedflagsandoptions. Theycan
becombinedinthesamecommand. Notethatflagsareeachpreceded
by a dash (
-).
mdce
Flags and Options
<mdce o ptio ns>
-matlabroot MATLABROOT_DIR
-remotehost HOST1[,HOST2[,...]
-remoteplatform { UNIX | WINDOWS }
-quiet
Operation
Options and arguments of the mdce
command, such as
mdce reference page for a full list.
The MATLAB installation folder on the
remote hosts, required only if the remote
installation folder differs from the one on
the local machine.
Specify the names of the hosts where you
want to run the mdce command. Separate
the host names by commas without
anywhitespaces. Thisisamandatory
argument.
Indicate the platform of the remote hosts.
This option is required only if different from
the local platform.
Prevent mdce from prompting the user for
missing information. The command fails if
all required information is not specified.
start, stop, etc. See the
5-6
Page 65
remotemdce
Flags and Options
-help
-protocol t ype
<protocol options>
Note IfyouareusingOpenSSHdonaMicrosoft Windows operating
system, you can encounter a p roble m when using backslashes in
path names for your command options. In most cases, you can work
around this pro bl em by using forward slashes instead. For example,
to specify the file
C:/temp/mdce_def.bat.
Operation
Print the help information.
Force the usage of a particular protocol
type. Specifying a protocol type with all its
required param eters also avoids interactive
prompting and allows for use in scripts.
The supported protocol types are
rsh.
To get more information about one
particular protocol type, enter
remotemdce -protocol type -help
Specify particular options for the protocol
type being used.
C:\temp\mdce_def.bat, you should identify it as
ssh and
ExamplesStart mdce o n three remote machines of the same platform as the client:
remotemdce start -rem otehost hostA,hostB,hostC
Start mdce in a clean state on two UNIX operating system machines
from a W indow s operating system machine, using the ssh protocol.
Enter the following co mmand on a single line:
remotemdce start -cle an -matlabroot /usr/local/m atla b
Descriptionstartjobmanager starts a job manager process and the associated
job manager lookup process under the mdce service, which maintains
them after that. The job manager handles the storage of jobs and the
distribution of tasks contained in jobs to MATLAB workers that are
registered with it. The mdce service must already be running on the
specified computer.
The
startjobmanager executable resides in the folder
matlabroot\toolbox\distcomp\bin (Windows operating system ) or
matlabroot/toolbox/distcomp/bin (UNIX operating system). Enter
the following command syntax at a DOS or UNIX command-line
prompt, respectively.
startjobmanager -flags accepts the following input flags. Multiple
flags can be used together on the same command.
Flag
-name <job_ manager_name>
-remotehost <hostname>
Operation
Specifies the name of the job manager.
This identifies the job manager to
MATLAB worker sessions and MATLAB
clients. The default is the value of the
DEFAULT_JOB_MANAGER_NAME parameter in
the
mdce_def file.
Specifies the name of the host where you
want to start the job manager and the job
manager lookup process. If omitted, they
are started on the local host.
5-9
Page 68
startjobmanager
Flag
-clean
-multicast
-baseport < port_number>
-v
Operation
Deletes all checkpoint information stored
on disk from previous instances of this job
manager before starting. This cleans the job
managersothatitinitializeswithnojobs
or tasks.
Overrides the use of unicast to contact
the job manager lookup process. It is
recommended that you not use
unless you are certain that multicast works
on your network. This overrides the setting
of
JOB_MANAGER_HOST in the mdce_def file
on the remote host, which would have the
job manager use unicast. If this flag is
omitted and
the job manager uses unicast to contact the
job manager lookup process running on the
same host.
Specifies the base port that the mdce service
on the remote host is using. You need to
specify this only if the value of
the local
base port being used by the mdce service on
theremotehost.
See Alsomdce, nodestatus, startworker, stopjobmanager, stopworker
5-11
Page 70
startworker
PurposeStart MATLAB worker session
Syntaxstartworker
startworker -flags
Descriptionstartworker starts a MATL AB worker process under the mdce service,
which maintains it after that. The worker registers with the specified
job manager, from which it will get tasks for evaluation. The mdce
service m ust already be running on the specified computer.
system) or
system). Enter the following command syntax at a DOS or UNIX
command-line prompt, respectively.
startworker -flags accepts the following input flags. Multiple flags
can be used together on the same comm and, except where noted.
matlabroot/toolbox/distcomp/bin (UNIX operating
Flag
-name <work er_name>
-remotehost <hostname>
-jobmanager <job_manager_name>
5-12
Operation
Specifies the name of the MATLAB
worker. The default is the value o f the
DEFAULT_WORKER_NAME parameter in the
mdce_def file.
Specifies the name of the computer where
you want to start the MATLAB worker. If
omitted, the worker is started on the local
computer.
Specifies the name of the job manager
this MATLAB w orker will receive tasks
from. The default is the value of the
DEFAULT_JOB_MANAGER_NAME parameter
in the
mdce_def file.
Page 71
startworker
Flag
-jobmanagerhost <job_manager_hostname>
-multicast
-clean
-baseport < port_number>
-v
Operation
Specifies the host on which the job
manager is running. The worker uses
unicast to contact the job manager lookup
process on that host to register with the
job manager.
This overrides the setting of
JOB_MANAGER_HOST in the mdce_def
file on the w orker computer, which would
also have the worker use unicast.
Cannot be used together with
-multicast.
If you are certain that multicast works on
your network, you can force the worker to
use multicast to locate the job manager
lookup process by specifying
-multicast.
Note: If you are using this flag to change
the settings of and restart a stopped
worker, then you should also use the
-clean flag.
Cannot be used together with
-jobmanagerhost.
Deletes all checkpoint information
associated with this worker nam e before
starting.
Specifies the base port that the mdce
service on the remote host is using. You
only need to specify this if the value of
BASE_PORT in the local mdce_def file does
not match the base port being used by the
mdce service on the remote host.
Verbose m ode displays the progress of the
command execution.
5-13
Page 72
startworker
ExamplesStart a worker on the local host, using the default worker name,
Start a worker on the host WorkerHost, using the default worker name,
and registering with the job manager
(The following command should be entered on a single line.)
Start two workers, named worker1 and worker2, o n the host
WorkerHost, registering with the job manager MyJobManager that is
running on the host
JMHost. Notethattostarttwoworkersonthe
same computer, you must give them different names. (Each of the two
commands below should be entered on a single line.)
startworker -name worker1 -remotehost WorkerHost
-jobmanager MyJobManager -jobmanagerhost JMHost
startworker -name worker2 -remotehost WorkerHost
-jobmanager MyJobManager -jobmanagerhost JMHost
MyJobManager on the host JMHost.
MyJobManager on the host JMHost.
See Alsomdce, nodestatus, startjobmanager, stopjobmanager, stopworker
5-14
Page 73
stopjobmanager
PurposeStop job manager process
Syntaxstopjobmanager
stopjobmanager -flags
Descriptionstopjobmanager stops a job manager that is running under the mdce
service.
The
stopjobmanager executable resides in the folder
matlabroot\toolbox\distcomp\bin (Windows operating system ) or
matlabroot/toolbox/distcomp/bin (UNIX operating system). Enter
the following command syntax at a DOS or UNIX command-line
prompt, respectively.
stopjobmanager -flags accepts the following input flags. Multiple
flags can be used together on the same command.
Flag
-name <job_ manager_name>
-remotehost <hostname>
-clean
Operation
Specifies the name of the
job manager to stop. The
default is the value of
DEFAULT_JOB_MANAGER_NAME
parameter the mdce_def file.
Specifies the name of the host
where you want to stop the job
manager and the associated job
manager lookup process. The
default value is the local host.
Deletes all checkpoint
information stored on disk
for the current instance of this job
manager after stopping it. This
cleans the job manager of all its
job and task data.
5-15
Page 74
stopjobmanager
Flag
-baseport < port_number>
-v
Operation
Specifies the base port that the
mdce service on the remote host
is using. You need to specify this
only if the value of
the local
match the base port being used
by the mdce service on the remote
host.
Verbose mode displays the
progress of the command
execution.
mdce_def file does not
BASE_PORT in
ExamplesStop the job manager MyJobManager on the local host.
stopjobmanager -name MyJobManager
Stop the job manager MyJobManager on the host JMHost.
system) or
system). Enter the following command syntax at a DOS or UNIX
command-line prompt, respectively.
stopworker -flags accepts the following input flags. Multiple flags
can be used together on the same command.
matlabroot/toolbox/distcomp/bin (UNIX operating
Flag
-name <work er_name>
-remotehost <hostname>
-clean
Operation
Specifies the name of the
MATLAB worker to stop.
The default is the value of the
DEFAULT_WORKER_NAME parameter
in the
Specifies the name of the host
where you w ant to stop the
MATLAB worker. The default
value is the local host.
Deletes all checkpoint
information associa ted with
this worker name after stopping
it.
mdce_def file.
5-17
Page 76
stopworker
Flag
-baseport < port_number>
-v
Operation
Specifies the base port that the
mdce service on the remote host
is using. You need to specify this
only if the value of
the local
match the base port being used
by the mdce service on the remote
host.
Verbose mode displays the
progress of the command
execution.
mdce_def file does not
ExamplesStop the worker with the default name on the local host.
stopworker
Stop the worker with the default name, running on the computer
WorkerHost.
stopworker -remotehost WorkerHost
Stop the workers named worker1 and worker2, running on the
computer
See Alsomdce, nodestatus, startjobmanager, startworker, stopjo bman ager
5-18
Page 77
Glossary
Glossary
CHECKPOINTBASE
Thenameoftheparameterinthe
of the job manager and worker checkpoint directories.
checkpoint directory
Location where job manager checkpoint information and worker
checkpoint information is stored.
client
The MATLAB session that defines and submits the job. This is the
MATLAB session in which the programmer usually develops and
prototypes applications. Also known as the MATLAB client.
client computer
The computer running the MATLAB client.
cluster
A collection of compu ters that are connected via a netw ork and intended
for a common purpose.
mdce_def file that defines the location
coarse-grained application
An application for which run time is significantly greater than
the communication time needed to start and stop the program.
Coarse-grained dis tributed applications are also called embarrassingly
parallel applications.
codistributed array
An array partitioned into segments,witheachsegmentresidinginthe
workspace of a different lab.
Composite
An object in a MATLAB client session that provides access to data
values stored on the labs in a MATLAB pool, such as the values of
variables that are assigned inside an
computer
A system with one or more processors.
spmd statement.
Glossary-1
Page 78
Glossary
distributed application
The same application that runs independently on several nodes,
possibly with different input parameters. There is no communication,
shared data, or synchronization points between the nodes. Distributed
applications can be either coarse-grained or fine-grained.
distributed computing
Computing with distributed applications, running the application on
several nodes simultaneously.
distributed computing demos
Demonstration programs that use Parallel Computin g Toolbox software,
as opposed to sequential demos.
DNS
Domain Name System. A system that translates Internet domain
names into IP addresses.
dynamic licensing
The ability of a MATLAB worker or lab to employ all the functionality
you are licensed for in the MATLAB client, while checking out only a
server product license. When a job is created in the MATLAB client
with Parallel Computing Toolbox software, the products for which
the client is licensed will be available for all workers or labs that
evaluate tasks for that job. This allows you to run any code on the
cluster for which you are licensed on your MATLAB client, w ithout
requiring extra licenses for the worker beyond that for the MATLAB
Distributed Computing Server product. For a list of products that are
not eligible for use with Parallel Computing Toolbox software, see
An application for which run time is significa n tly l ess th an the
communication time needed to start and stop the program. Compare to
coarse-grained applications.
head node
Usually, the node of the cluster designated for running the job manager
and license manager. It is often u seful to run all the nonworker-related
processes on a single machine.
Page 79
heterogeneous cluster
A cluster that is not homogeneous.
homogeneous cluster
A cluster of identical machines, in terms of both hardware and software.
job
The complete large-scale operation to perform in MATLAB, composed
of a set of tasks.
job manager
The MathWorks process that queues jobs and assigns tasks to workers.
A third-party process that performs this function is called a scheduler.
The general term “scheduler” can also refer to a job manager.
job manager checkpoint informa tio n
Snapshot of information necessary for the job manager to recover from
a system crash or reboot.
Glossary
job manager database
The database that the job manager uses to store the information about
its jobs and tasks.
job manager lookup process
The p roces s that allows clients, workers, and job managers to find each
other. It starts automatically when the job manager starts.
lab
When workers start, they work independently by default. They can
then connect to each other and work together as peers, and are then
referred to as labs.
LOGDIR
Thenameoftheparameterinthe
mdce_def file that defines the
directory where logs are stored.
MathWorks job manager
See job manager.
Glossary-3
Page 80
Glossary
MATLAB client
See client.
MATLAB pool
A collection of labs that are reserved by the client for execution of
parfor-loops or spmd statements. See also lab.
MATLAB worker
See worker.
mdce
The service that has to run on all machines before they can run a job
manager or worker. This is the server foundation process, making sure
that the job manager and worker processes that it controls are always
running.
Note that the program and service name is all lowercase letters.
mdce_def file
The file that defines all the defaults for the mdce processes by allowing
you to set preferences or definitions in the form o f parameter values.
Glossary-4
MPI
Message Passing Interface, the means by which labs communicate with
each other while running tasks in the same job.
node
Acomputerthatispartofacluster.
parallel application
The same application that runs on several labs simultaneously, with
communication, shared data, or synchronization points between the
labs.
private array
An array which resides in the work spaces of one or more, but perhaps
not all labs. There might or might not be a relationship between the
values of these arrays among the labs.
Page 81
random port
A random unprivileged TCP port, i.e., a random TCP port above 1024.
register a worker
The action that happens when both worker and job manager are started
and the worker contacts job manager.
replicated array
An array which resides in the workspaces of all labs, and whose size and
content are identical on all labs.
scheduler
The process, either third-party or the MathWorks job manager, that
queues jobs and assigns tasks to workers.
spmd (single program multiple data)
A block of code that ex ecutes simultaneously on multiple labs in
a MATLAB pool. Each lab can operate on a different data set or
different portion of distributed data, and can communicate with other
participating labs while performing the parallel computations.
Glossary
task
One segment of a job to be evaluated by a worker.
variant array
An array which resides in the workspaces of all labs, but whose con ten t
differs on these labs.
worker
The MATLAB process that performs the task computations. A lso known
as the MATLAB worker or worker process.
worker checkpoint information
Files required by the worker during the execution of tasks.