Apple MAC OS X SERVER 10.5 LEOPARD XGRID ADMINISTRATION

Download

Mac OS X Server

Xgrid Administration and High Performance Computing

For Version 10.5 Leopard



Apple Inc.

The owner or authorized user of a valid copy of Mac OS X Server software may reproduce this publication for the purpose of learning to use such software. No part of this publication may be reproduced or transmitted for commercial purposes, such as selling copies of this publication or for providing paid-for support services.

Every effort has been made to ensure that the information in this manual is accurate. Apple Inc. is not responsible for printing or clerical errors.

Apple 1 Infinite Loop Cupertino, CA 95014-2084 408-996-1010 www.apple.com

Use of the “keyboard” Apple logo (Option-Shift-K) for commercial purposes without the prior written consent of Apple may constitute trademark infringement and unfair competition in violation of federal and state laws.

AirPort, Apple, the Apple logo, Bonjour, FireWire, iPod, Mac, Macintosh, Mac OS, Xgrid, Xsan, and Xserve are trademarks of Apple Inc., registered in the U.S. and other countries. Apple Remote Desktop and Finder are trademarks of Apple Inc.

Intel, Intel Core, and Xeon are trademarks of Intel Corp. in the U.S. and other countries.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries.

UNIX is a registered trademark of The Open Group.

Other company and product names mentioned herein are trademarks of their respective companies. Mention of third-party products is for informational purposes only and constitutes neither an endorsement nor a recommendation. Apple assumes no responsibility with regard to the performance or use of these products.

019-0946/2007-09-01

Preface 9 About This Guide

What’s New in Xgrid Administration

What’s in This Guide

Using This Guide

Using Onscreen Help

Advanced Server Administration Guides

Viewing PDF Guides on Screen

Printing PDF Guides

Getting Documentation Updates

Getting Additional Information

Part I Xgrid Administration

Chapter 1 17 Introducing Xgrid Service

About Xgrid and Computational Grids

18 20 20

21 22 23 23 24 24 24

Chapter 2 25 Setting Up and Configuring Xgrid Service

25 26 26 27

How Xgrid Works Common Types of Grids and Grid Computing Styles Xgrid Clusters Local Grids Distributed Grids

Xgrid Components

Agent Client Controller Jobs Requirements and Capacities

Setup Overview Before Setting Up Xgrid Service

Authentication Methods for Xgrid Single Sign-On (SSO)

27 27 28 28 28 29 29 30 30 30

31 32 33 34 34 34 35 35 36 37 37 37 38

Password-Based Authentication No Authentication

Hosting the Grid Controller Turning Xgrid Service On Configuring Xgrid with the Xgrid Service Configuration Assistant

Configuring Xgrid to Host a Grid Using the Xgrid Service Configuration Assistant

Configuring Xgrid to Join a Grid Using Xgrid Service Configuration Assistant Setting Up Xgrid Service

Xgrid and Multiple Network Interfaces

Configuring Controller Settings

Starting Xgrid Service

Configuring an Xgrid Agent (Mac OS X Server)

Configuring an Xgrid Agent (Mac OS X) Setting Up Grid Authentication

Setting Up Kerberos for Xgrid

Setting Passwords for Xgrid Managing Client Access

Setting SACL Permissions for Users and Groups

Setting SACL Permissions for Administrators Managing Xgrid Service

Viewing Xgrid Service Status

Viewing Xgrid Service Logs

Stopping Xgrid Service

Chapter 3 39 Managing a Grid

Using Xgrid Admin

40 40 40

41 41

41 42 43 43 44 44 44 44 45 45 45 46

Status Indicators in Xgrid Admin

Managing the Xgrid Controller

Connecting to an Xgrid Controller Disconnecting from an Xgrid Controller Adding an Xgrid Controller Removing an Xgrid Controller

Managing Agents

Viewing a List of Agents Adding an Agent Deleting an Agent

Managing Jobs

Viewing a List of Jobs Stopping a Job Repeating or Restarting a Job

Deleting a Job Adding a Grid Deleting a Grid

Contents

Monitoring Grid Activity

Chapter 4 47 Planning and Submitting Xgrid Jobs

Structuring Jobs for Xgrid

47 48 48 48 49 49

Chapter 5 51 Solving Xgrid Problems

51 52 52 53 53 53 53 54 55

About Job Styles About Job Failure

Submitting a Job

Examples of Xgrid Job Submission and Results Retrieval Viewing Job Status Retrieving Job Results

If Your Agents Can’t Connect to the Xgrid Controller If You Use Xgrid over SSH If You Run Tasks on Multi-CPU Machines If You Submit a Large Number of Jobs If You Want to Use Xgrid on Other Platforms If the Xgrid Controller Must Be Restarted If Xgrid Has Crashed If You Are Trying to Submit Jobs over 2 GB If You Want to Enable Kerberos/SSO for Xgrid For More Information

Part II Configuring High Performance Computing

Chapter 6 59 Introducing High Performance Computing

Understanding HPC

Apple and HPC

60 60 60 62

Chapter 7 63 Reviewing the Cluster Setup Process

Chapter 8 67 Identifying Prerequisites and System Requirements

67 67 67 68 68 72

Mac OS X Server

Xserve Clusters

Xserve 64-Bit Architecture

Support of Loosely Coupled Computations

Cluster Setup Overview

Prerequisites

Expertise

Xserve Configuration System Requirements

Infrastructure Requirements

Software Requirements

Contents

Private Network Requirements

Static IP Address and Hostname Requirements

Chapter 9 75 Preparing the Cluster for Configuration

Preparing the Cluster Nodes for Software Configuration

(Optional) Setting Up the Management Computer

Chapter 10 81 Setting Up the Cluster Controller

Setting Up Server Software on the Cluster Controller

Configuring DNS Service

85 86 86 87 88 90 90 90

Verifying DNS Settings Configuring Open Directory Service

Configuring the Cluster Controller as an Open Directory Master Configuring DHCP Service Configuring Firewall Settings on the Cluster Controller Configuring NAT Settings on the Cluster Controller Configuring NFS Configuring VPN Service Configuring Xgrid Service

92 Preparing the Data Drive as a Mirrored RAID set 93 Creating a Home Directory Automount Share Point 94 Creating User Accounts

Chapter 11 95 Setting Up Compute Nodes

95 Creating an Auto Server Setup Record for Compute Nodes 98 Verifying LDAP Record Creation 98 Setting Up Compute Nodes

99 Configuring Cluster Nodes 101 Creating and Verifying a VPN Connection 101 Joining a Remote Client to the Kerberos Realm

10 2 Verifying Remote Client Access to the Kerberos Realm

Chapter 12 103 Testing Your Cluster

10 3 Checking Your Cluster Using Xgrid Admin 10 4 Testing Your Xgrid Cluster 10 5 Verifying Your Xgrid Configuration 10 6 Verifying Your SSH Connection

Appendix A 107 Cluster Setup Checklist

Appendix B 111 Automating Compute Node Configuration

111 Naming Multiple Cluster Nodes 112 Joining Multiple Cluster Nodes to the Kerberos Realm 112 Configuring Xgrid Agent Settings Using Apple Remote Desktop

Contents

11 4 Using SSH Without Passwords

Glossary 11 5

Index 121

Contents 7

8 Contents

About This Guide

This guide describes the Xgrid components included in Mac OS X Server and tells you how to configure and use them in computational grids.

Xgrid in Mac OS X Server version 10.5 includes a controller for computational grids and an agent that allows the server’s processor to work on jobs submitted to a grid. The agent is also available in computers using Mac OS X v10.3 or v10.4.

What’s New in Xgrid Administration

Xgrid service, Xgrid Admin, and high performance computing (HPC) in Mac OS X Server v10.5 Leopard include the following valuable new features.

Â Improved security with Xgrid superuser access controls Â New Xgrid service configuration assistant Â Logging improvements

Preface

What’s in This Guide

This guide is organized as follows: Â Part I—Xgrid Administration. The chapters in this part of the guide introduce you to

Xgrid service and the applications and tools available for administering xgrid.

Â Part II—Configuring High Performance Computing. The chapters in this part of the

guide introduce you to HPC and the applications and tools available for administering HPC.

Note: Because Apple frequently releases new versions and updates to its software, images shown in this book may be different from what you see on your screen.

Using This Guide

The following list contains suggestions for using this guide: Â Read the guide in its entirety. Subsequent sections might build on information and

recommendations discussed in prior sections.

Â The instructions in this guide should always be tested in a nonoperational

environment before deployment. This nonoperational environment should simulate, as much as possible, the environment where the computer will be deployed.

Using Onscreen Help

You can get task instructions on screen in Help Viewer while you’re managing Leopard Server. You can view help on a server or an administrator computer. (An administrator computer is a Mac OS X computer with Leopard Server administration software installed on it.)

To get help for an advanced configuration of Leopard Server:

m Open Server Admin or Workgroup Manager and then:

Â Use the Help menu to search for a task you want to perform. Â Choose Help > Server Admin or Help > Workgroup Manager to browse and search

the help topics.

The help for Server Admin and Workgroup Manager contains instructions taken from Server Administration and other advanced administration guides described in “Advanced Server Administration Guides,” next.

To see the latest server help topics:

m Make sure the server or administrator computer is connected to the Internet while

you’re getting help.

Help Viewer automatically retrieves and caches the latest server help topics from the Internet. When not connected to the Internet, Help Viewer displays cached help topics.

10 Preface About This Guide

Advanced Server Administration Guides

Getting Started covers basic installation and initial setup methods for a standard, workgroup, or advanced configuration of Leopard Server. An advanced guide, Server Administration, covers advanced planning, installation, setup, and more. A suite of additional guides, listed below, covers advanced planning, setup, and management of individual services. You can get these guides in PDF format from the Mac OS X Server documentation website at www.apple.com/server/documentation.

This guide ... tells you how to:

Getting Started and Mac OS X Server Worksheet

Command-Line Administration Install, set up, and manage Mac OS X Server using UNIX command-

File Services Administration Share selected server volumes or folders among server clients

iCal Service Administration Set up and manage iCal shared calendar service.

iChat Service Administration Set up and manage iChat instant messaging service.

Mac OS X Security Configuration Make Mac OS X computers (clients) more secure, as required by

Mac OS X Server Security Configuration

Mail Service Administration Set up and manage IMAP, POP, and SMTP mail services on the

Network Services Administration Set up, configure, and administer DHCP, DNS, VPN, NTP, IP firewall,

Open Directory Administration Set up and manage directory and authentication services, and

Podcast Producer Administration Set up and manage Podcast Producer service to record, process,

Print Service Administration Host shared printers and manage their associated queues and print

QuickTime Streaming and Broadcasting Administration

Server Administration Perform advanced installation and setup of server software, and

System Imaging and Software Update Administration

Upgrading and Migrating Use data and service settings from an earlier version of

User Management Create and manage user accounts, groups, and computers. Set up

Install Mac OS X Server and set it up for the first time.

line tools and configuration files.

using the AFP, NFS, FTP, and SMB/CIFS protocols.

enterprise and government customers.

Make Mac OS X Server and the computer it’s installed on more secure, as required by enterprise and government customers.

server.

NAT, and RADIUS services on the server.

configure clients to access directory services.

and distribute podcasts.

jobs.

Capture and encode QuickTime content. Set up and manage QuickTime streaming service to deliver media streams live or on demand.

manage options that apply to multiple services or to the server as a whole.

Use NetBoot, NetInstall, and Software Update to automate the management of operating system and other software used by client computers.

Mac OS X Server or Windows NT.

managed preferences for Mac OS X clients.

Preface About This Guide 11

This guide ... tells you how to:

Web Technologies Administration Set up and manage web technologies, including web, blog,

webmail, wiki, MySQL, PHP, Ruby on Rails, and WebDAV.

Xgrid Administration and High Performance Computing

Mac OS X Server Glossary Learn about terms used for server and storage products.

Set up and manage computational clusters of Xserve systems and Mac computers.

Viewing PDF Guides on Screen

While reading the PDF version of a guide on screen: Â Show bookmarks to see the guide’s outline, and click a bookmark to jump to the

corresponding section.

Â Search for a word or phrase to see a list of places where it appears in the document.

Click a listed place to see the page where it occurs.

Â Click a cross-reference to jump to the referenced section. Click a web link to visit the

website in your browser.

Printing PDF Guides

If you want to print a guide:

Â Save ink or toner by not printing the cover page. Â Save color ink on a color printer by looking in the panes of the Print dialog for an

option to print in grays or black and white.

Â Maximize the printed page image by changing the Scale setting in the Page Setup

dialog. Try 122% with Paper Size set to US Letter. (PDF pages are 7.5 by 9 inches except Getting Started, which is CD size, 125 by 125 mm.)

Â Reduce the bulk of the printed document and save paper by printing more than one

page per sheet of paper. In the Print dialog, choose Layout from the untitled pop-up menu. If your printer supports two-sided (duplex) printing, select one of the TwoSided options. Otherwise, choose 2 from the Pages per Sheet pop-up menu, and optionally choose Single Hairline from the Border menu.

12 Preface About This Guide

Getting Documentation Updates

Periodically, Apple posts revised help pages and new editions of guides. Some revised help pages update the latest editions of the guides.

Â To view new onscreen help topics for a server application, make sure your server or

administrator computer is connected to the Internet and click “Latest help topics” or “Staying current” in the main help page for the application.

Â To download the latest guides in PDF format, go to the Mac OS X Server

documentation website:

www.apple.com/server/documentation

Getting Additional Information

For more information, consult these resources: Â Read Me documents—important updates and special information. Look for them on

the server discs.

Â Mac OS X Server website (www.apple.com/macosx/server)—gateway to extensive

product and technology information.

Â Apple Service & Support website (www.apple.com/support)—access to hundreds of

articles from Apple’s support organization.

Â Apple customer training (train.apple.com)—instructor-led and self-paced courses for

honing your server administration skills.

Â Apple discussion groups (discussions.info.apple.com)—a way to share questions,

knowledge, and advice with other administrators.

Â Apple mailing list directory (www.lists.apple.com)—subscribe to mailing lists so you

can communicate with other administrators using email.

Â Open Source website (developer.apple.com/darwin/)—Access to Darwin open source

code, developer information, and FAQs.

Preface About This Guide 13

14 Preface About This Guide

Part I: Xgrid Administration

Use the chapters in this part of the guide to learn about Xgrid service and the applications and tools available for administering Xgrid.

Chapter 1 Introducing Xgrid Service

Chapter 2 Setting Up and Configuring Xgrid Service

Chapter 3 Managing a Grid

Chapter 4 Planning and Submitting Xgrid Jobs

Chapter 5 Solving Xgrid Problems

1 Introducing Xgrid Service

Use this chapter to learn about what Xgrid is and how it can help you.

You use Xgrid to create grids of multiple computers and distribute complex jobs among them for high-throughput computing.

Xgrid, a technology in Mac OS X Server and Mac OS X, simplifies deployment and management of computational grids. Xgrid enables administrators to group computers in grids or clusters, and enables users to easily submit complex computations to groups of computers (local, remote, or both), as either an ad hoc grid or a centrally managed cluster.

About Xgrid and Computational Grids

Xgrid makes it easy to turn an ad hoc group of Mac systems into a low-cost supercomputer. Xgrid is ideal for individual researchers, specialized collaborators, and application developers. For example:

Â Scientists can search biological databases on a cluster of Xserve systems. Â Engineers can perform finite element analyses on their workgroup’s desktops. Â Animators can render images using Mac systems across multiple corporate locations. Â Research teams can enlist colleagues and interested laypeople in Internet-scale

volunteer grids to perform long-running scientific calculations.

Â Anyone needing to perform CPU-intensive calculations can simultaneously run a

single job across multiple computers, dramatically improving throughput and responsiveness.

With Xgrid functionality integrated into Mac OS X Server, system administrators can quickly enable Xgrid on Mac systems throughout their company, turning idle CPU cycles into a productive cluster at no incremental cost.

How Xgrid Works

Xgrid creates multiple tasks for each job and distributes those tasks among multiple nodes. These nodes can be desktop computers running Mac OS X v10.3 or later, or server computers running Mac OS X Server v10.4 or later.

Many desktop computers sit idle during the day, in evenings, and on weekends. The assembly of these systems into a computational grid is known as desktop recovery. This method of grid construction enables you to vastly improve your computational capacity without purchasing extra hardware, and Xgrid makes the software configuration a straightforward task.

For a server to function as a controller, Xgrid requires Mac OS X Server v10.4 or later, with a minimum of 256 MB of RAM. To operate as an agent in a grid, Xgrid requires Mac OS X v10.3 or later, with a minimum of 128 MB of RAM (256 MB advisable). All Xgrid participants must have a network connection. As always, the more RAM a system has, the better it performs, particularly for high-performance computing applications.

A grid is a group of computers working together to solve a single problem. The systems in a grid can be loosely coupled, geographically dispersed and, to some extent, heterogeneous. In contrast, systems in a cluster are often homogeneous, collocated, and strictly managed.

Highly dispersed grids, such as SETI@Home, enable individuals to donate their spare processor cycles to a cause. In office environments, large rendering or simulation jobs can be distributed across all the systems left idle overnight. These can even be used to augment a dedicated computational cluster, which is available to Xgrid clients at all times.

These distinct grid configurations are explained in “Common Types of Grids and Grid Computing Styles” on page 20.

18 Chapter 1 Introducing Xgrid Service

The illustration below gives an example of how a grid handles a job.

Distributed agents

1 Client submits job to Controller

Controller

2 Controller splits job into tasks, then submits tasks to Agents

3 Agents execute tasks

Dedicated Desktop

Client

5 Controller collects tasks and returns job results to Client

4 Agents return tasks to Controller

Dedicated Server

Part-Time Desktop

Xgrid has no limitations on the amount of computational power it can support. The performance of the grid depend on the systems participating, the software running, and the network, among other factors. However, individual applications strongly influence the performance of the grid.

You determine if an application is improved by being deployed on a computational grid. In the best case, application performance may scale linearly with the size of the grid. In the worst case, the addition of agents to a grid can cause a job to complete in even more time than if there were fewer agents. (In such a situation, tasks become so small that the overhead associated with distributing the increased number of tasks supersedes the performance gain of using more agents.) You should be aware of these considerations.

Many proprietary projects enable you to participate in a large computational grid. Often these projects, such as SETI@Home and FightAIDS@Home, are tied to a specific scientific purpose. They usually have easy-to-install software that enables any volunteer to participate in that particular project, and they frequently take the form of a screen saver or background process.

Chapter 1 Introducing Xgrid Service 19

You don’t need to think in terms of thousands or millions of seldom-used computers to see the significance of a computational grid. For example, computers used by university students or corporate employees often work fewer hours than the hours they sit idle at night or on weekends. These computers could contribute productively to the work of a grid without diminishing their usefulness to the students or employees.

Other grid projects are designed for large-scale computational grids, such as the Globus Alliance (a group founded by universities and researchers), with flexible resource management tools and more intelligent grid deployment methods. Instead of developing neatly packaged applications for a specific grid, such projects provide comprehensive frameworks for application deployment.

Xgrid enables users to participate in a computational grid of their choice while still providing the flexibility of a more generic framework for grid developers when deploying grid applications. Xgrid provides the primary benefits of both.

The advantages of the Xgrid technology include:

Â Easy grid configuration and deployment Â Straightforward yet flexible job submission Â Automatic controller discovery by agents and clients Â Flexible architecture based on open standards Â Support for the UNIX security model, including Kerberos single sign-on or regular

password authentication

Â Choice between a command-line interface or an API-based model for grid interaction

Common Types of Grids and Grid Computing Styles

Xgrid can be used in tightly coupled clusters, worldwide grids, and everything in between. This immense flexibility enables you to deploy grids of almost any nature. Three main topologies are commonly used for Xgrid deployments, discussed as follows:

Â “Xgrid Clusters” on page 20 Â “Local Grids” on page 21 Â “Distributed Grids” on page 21

Xgrid Clusters

Computational clusters are sets of systems dedicated to computation. In a cluster, systems are typically co-located in a rack, connected using gigabit Ethernet or another high-performance network, and strictly managed for maximum performance.

Cluster systems are often entirely homogeneous: their operating systems are the same versions, they have the same software installed, and they generally have the same processor, disk, and RAM configurations.

20 Chapter 1 Introducing Xgrid Service

Xgrid enables administrators to easily configure the distributed resource management functionality of the cluster. Each server in the system runs the agent software, and the head node in the cluster runs the controller software.

Xgrid distributes tasks across the cluster. In clusters, failure rates are generally very low. Systems are rarely, if ever, offline, and their resources are not shared with general user tasks. Clusters are the most efficient but most expensive model of distributed computing.

Local Grids

Systems that are under common administration in a company, university computer lab, or other managed environment can often be easily assembled into a grid for desktop recovery. These systems are often on a local area network (LAN) and they are generally managed by a single organization. As a result, they provide good network performance and offer substantial manageability.

Because these systems are often also used as day-to-day workstations, users can easily interrupt grid tasks by moving the mouse, resetting the system, or even accidentally disconnecting the system from the network. In such cases, a task might fail as part of an Xgrid job the Xgrid controller eventually reassigns the failed task to another agent, and the job completes successfully.

In local grids, performance is limited by such situations and by the varying performance of any given agent on the grid.

Distributed Grids

When a system is permitted to donate its time, a distributed grid is formed.

The Xgrid agent enables a user to specify any IP address or host name for its controller. By specifying a grid, a user can dedicate his or her CPU time to that grid no matter where the controller is located.

The manager of the controller has no direct management control or knowledge of the agent system but is nonetheless able to harness its CPU time.

Distributed grids have very high failure rates for jobs but place a very low burden for the grid administrator. With very, very large jobs, high task failure rates may not substantially affect the performance of the grid if such failures can be rapidly reassigned to other available agents.

Network performance can also be a consideration because data is sent over the Internet, rather than over a local network, to agents connected to a grid. The monetary cost of such distributed grids is extremely low.

Chapter 1 Introducing Xgrid Service 21

Xgrid Components

The Xgrid three-tier architecture simplifies the distribution of complicated tasks. Its user clients, grid controllers, and computational agents work together to streamline the process of assembling nodes, submitting jobs, and retrieving results.

The illustration below gives an example of the Xgrid components and the process of auto configuration for a grid.

Distributed agents

Client

4 Client submits using mDNS, DNS, or name/address

1 Controller advertises via mDNS

Controller

5 Clients and Controller mutually authenticate using passwords or single sign-on

2 Agents locate Controller using mDNS, DNS, or name/address

3 Agents and Controller mutually authenticate using passwords or single sign-on

Dedicated Desktop

Dedicated Server

Part-time Desktop

The primary components of a computational grid perform the following functions: Â An agent runs one task at a time per CPU; a dual-processor computer can run two

tasks simultaneously.

Â A controller queues tasks, distributes those tasks to agents, and handles task

reassignment.

Â A client submits jobs to the Xgrid controller in the form of multiple tasks. (A client

can be any computer running Mac OS X v10.4 or later or Mac OS X Server v10.4 or later.)

In principle, the agent, controller, and client can run on the same server, but it is often more efficient to have a dedicated controller node.

22 Chapter 1 Introducing Xgrid Service

Agent

Xgrid agents run the computational tasks of a job. In Mac OS X Server, the agent is turned off by default. When an agent is turned on and becomes active at startup, it registers with a controller. (An agent can be connected to only one controller at a time.) The controller sends instructions and data to the agent as needed for the controller’s jobs. After it receives instructions from the controller, the agent performs its assigned tasks and sends the results back to the controller.

By default, agents seek to bind to the first available controller on the LAN. Alternatively, you can specify that it bind to a specific controller.

You can also specify whether an agent is always available or is available only when the computer is idle. A computer is considered idle when it has no mouse or keyboard input and ignores CPU and network activity. If a user returns to a computer that is running a grid task, the computer continues to run the task until it is finished.

By default, the agent on Mac OS X Server is dedicated and the agent on a Mac OS X computer (not a server) is configured to accept tasks only when the computer has had no user input for 15 minutes.

For details about configuring an agent, see “Configuring an Xgrid Agent (Mac OS X Server)” on page 32.

For information about managing agents, see “Managing Agents” on page 42.

Client

Any system can be an Xgrid client if it is running Mac OS X v10.4 or later and has a network connection to the Xgrid controller system. In general, the client can connect to only a single controller.

Depending on how a controller is configured, the client must supply a password or be authenticated by Kerberos (single sign-on) before submitting a job to the grid.

A user submits a job to the controller from a system running the Xgrid client software, usually a command-line tool accessed with the Terminal application. The job can specify the controller or use multicast DNS (mDNS) to dynamically discover the first available controller. When the job is complete, the controller notifies the client and the client can retrieve the results of the job.

For information about client authentication to the controller, see “Setting Up Grid Authentication” on page 34.

Chapter 1 Introducing Xgrid Service 23

Controller

The Xgrid controller manages the communications among the computational resources of a grid. The controller requires Mac OS X Server v10.4 or later. The controller accepts network connections from clients and agents. It receives job submissions from clients, divides the jobs into tasks, dispatches tasks to agents, and returns results to the clients.

Although there can be more than one Xgrid controller running on a subnet, there can only be one controller per logical grid. Each controller can have an arbitrary number of agents connected, but Apple has tested 128 agents per controller.

However, there is no software limitation on the number of agents, and users of Xgrid can choose to exceed 128 agents on a controller at their own risk, with a theoretical maximum equal to the number of available sockets on the controller system.

For details about setting up an Xgrid controller, see “Configuring Controller Settings” on page 30.

For information about managing controllers and grids, see “Managing the Xgrid Controller” on page 40.

Jobs

A job is a collection of execution instructions that can include data and executables. Xgrid can run scripts, utilities, and custom software (anything that doesn’t require user interaction).

A client submits a job to the grid. The controller accepts the job and its associated files, divides the job into tasks, and then distributes the tasks to agents. Agents accept the tasks, perform the calculations, and return the results to the controller, which aggregates them and returns them to the clients.

For more information about jobs, see “Structuring Jobs for Xgrid” on page 47 and “Submitting a Job” on page 48.

Requirements and Capacities

Xgrid is designed to scale from small clusters of a few computers up to large organization-wide grids. Xgrid supports up to 128 agents, any number of jobs comprising up to 100,000 queued tasks, up to 128 MB of submitted data per job, and up to 128 MB of results per job. These are recommended limits and are not enforced by the software. You may choose to exceed these limits at your own risk.

24 Chapter 1 Introducing Xgrid Service

2 Setting Up and Configuring Xgrid

Service

Use this chapter to plan your grid and set up the Xgrid agent and controller.

Xgrid simplifies deployment and management of computational grids. Using Server Admin you can configure Xgrid to set up computer groups (grids or clusters) and allow users to easily submit complex computations to these grids (local, remote, or both), as either an ad hoc grid or a centrally managed cluster.

Setup Overview

Here is an overview of the steps for setting up Xgrid service:

Step 1: Before you begin

See “Before Setting Up Xgrid Service” on page 26. Identify the Xgrid environment you need. Before configuring Xgrid, you must make some decisions about the grid.

Step 2: Turn Xgrid service on

Prior to configuring, turn on Xgrid service. See “Turning Xgrid Service On” on page 28.

Step 3: (Optional) Use the Xgrid service configuration assistant to configure Xgrid

If you choose to, you can configure Xgrid using the Xgrid service configuration assistant. This assistant helps with Xgrid configuration by automating many of the settings you make. See “Configuring Xgrid with the Xgrid Service Configuration Assistant” on page 28.

Step 4: Configure Xgrid controller settings

Configure your server as an Xgrid controller using Server Admin. See “Configuring Controller Settings” on page 30.

Step 5: Start Xgrid service

Start Xgrid service on the server using Server Admin. See “Starting Xgrid Service” on page 31.

Step 6: Configure Xgrid agent settings (Mac OS X Server)

Configure your server as an Xgrid agent using Server Admin. See “Configuring an Xgrid Agent (Mac OS X Server)” on page 32.

Step 7: Configuring Xgrid agent settings (Mac OS X)

Configure computers as Xgrid agents by using Sharing Preferences. See “Configuring an Xgrid Agent (Mac OS X)” on page 33.

Before Setting Up Xgrid Service

Before configuring Xgrid service, you must define the grid environment you’ll create. In particular, you must decide the following:

Â The kind of authentication to use. See “Authentication Methods for Xgrid” on

page 26.

Â Where to host your controller. See “Hosting the Grid Controller” on page 28. Â How you will manage the controller. See “Managing Xgrid Service” on page 37 and

“Monitoring Grid Activity” on page 46.

Authentication Methods for Xgrid

You can configure Xgrid with or without authentication. If you choose to require authentication of controllers to mutually authenticate with clients and agents, you can choose Single Sign-On or Password-Based Authentication. The following authentication options are available:

Â Single Sign-On Â Password-Based Authentication Â No Authentication

You set up an Xgrid controller using Server Admin. You can specify the type of authentication for agents and clients. The passwords entered in Server Admin for the controller must match those entered for each agent and client.

Consider these points when establishing passwords for agents and clients: Â Kerberos authentication (single sign-on or SSO). If you use Kerberos authentication

for agents or clients, the server that’s the Xgrid controller must be configured for Kerberos, in the same realm as the server running the Kerberos domain controller (KDC) system, and bound to the Open Directory master.

The agent uses the host principal found in the /etc/krb5.keytab file. The controller uses the Xgrid service principal found in the /etc/krb5.keytab file.

Â Agents. The agent determines the authentication method. The controller must

conform to that method and password (if a password is used). When an agent is configured with a standard password (not SSO), you must use the same password for agents when you configure the controller. If the agent has specified SSO, the correct service principal and host principals must be available.

26 Chapter 2 Setting Up and Configuring Xgrid Service

Â Clients. If your server is the controller for a grid, be sure that Mac OS X and Mac OS X

Server clients use the correct authentication method for the controller.

A client cannot submit a job to the controller unless the user chooses the correct authentication method and enters their password correctly, or has the correct ticketgranting ticket from Kerberos.

For more information, see “Setting Up Grid Authentication” on page 34.

Single Sign-On (SSO)

SSO is the most powerful and flexible form of authentication. It leverages the Open Directory and Kerberos infrastructures in Mac OS X Server to manage authentication behind the scenes, without user intervention.

Each Xgrid participant must have a Kerberos principal. The clients and agents obtain ticket-granting tickets for their principal, which is used to obtain a service ticket for the controller service principal. The controller looks at the ticket granted to the client to determine the user’s principal and verifies it with the relevant service access control lists (SACLs) and groups to determine privileges.

Generally, you should use this option if any of the following conditions are true:

Â You already have SSO in your environment. Â You have administrator control over all agents and clients in use. Â Jobs must run with special privileges (such as for local, network, or SAN file system

access).

Password-Based Authentication

When you can’t use SSO, you can require password authentication. You may not be able to use SSO if:

Â Potential Xgrid clients are not trusted by your SSO domain (or you don’t have one) Â You want to use agents across the Internet or that are outside your control Â It is an ad hoc grid, without the ability to prearrange a web of trust

In these situations, your best option is to specify a password. You have two distinct password settings: one for controller-client and one for controller-agent. For security reasons these should be different passwords.

Note: You can also create hybrid environments, such as with client-controller authentication done using passwords but controller-agent authentication done using SSO (or vice versa).

No Authentication

This option is suitable only for testing a private network in a home or a lab that is inaccessible from any untrusted computer, or when none of the jobs or the computers contain sensitive or important information.

Chapter 2 Setting Up and Configuring Xgrid Service 27

Otherwise, do not use this option. It creates a potential security hole (because anyone can connect or run a job) and should never be used on a system exposed to the Internet, especially when potentially sensitive data is involved.

If you choose to use no authentication, agents can join the grid and clients can submit jobs to the grid without authenticating.

Hosting the Grid Controller

The primary requirement for a controller is that it must be network-accessible to clients and agents. In some cases this may mean the controller must be placed outside an organizational firewall (or inside a buffer zone); otherwise, you would need to open up port 4111 so the controller can be contacted.

It is much simpler (though not essential) for the controller to be on the same subnet as the agents and usual clients, so they can discover each other using Bonjour. If that’s not feasible, host the controller on a server with a fixed IP address and fully qualified DNS name (or alternatively, using Dynamic DNS and a service lookup entry) so that agents and clients know where to find it.

Turning Xgrid Service On

Before you can configure Xgrid settings, you must turn Xgrid service on in Server Admin.

To turn Xgrid service on:

1 Open Server Admin and connect to the server.

2 Click Settings.

3 Click Services.

4 Select the Xgrid checkbox.

5 Click Save.

Configuring Xgrid with the Xgrid Service Configuration Assistant

You can set up Xgrid service by configuring the controller and agent using the Xgrid service configuration assistant. This optional configuration assistant guides you through setting up a server to host a grid or join an existing grid.

Before this assistant proceeds, your server must have access to a directory server that provides Kerberos services.

28 Chapter 2 Setting Up and Configuring Xgrid Service

Configuring Xgrid to Host a Grid Using the Xgrid Service Configuration Assistant

Use the Xgrid service configuration assistant to configure the Xgrid agent and controller to run on this server. This also configures a network file system.

To set up Xgrid to host a grid using the Xgrid service configuration assistant:

1 Open Server Admin and connect to the server.

2 Click the triangle to the left of the server.

The list of services appears.

3 In the expanded Servers list, click Xgrid.

4 Click Overview.

5 Click Configure Xgrid Service (at the lower right).

This opens the Xgrid service configuration assistant.

6 Click Continue.

7 Choose “Host a grid,” then click Continue.

8 Enter the username and password for the directory administrator to authenticate with

the directory domain displayed, then click Continue.

9 Review and confirm your configuration settings, then click Continue.

This restarts Xgrid service using your settings.

10 Click Close.

Configuring Xgrid to Join a Grid Using Xgrid Service Configuration Assistant

Use the Xgrid service configuration assistant to configure the Xgrid agent to run on this server. Joining a grid means that an agent is set up on this server and is bound to an existing controller.

To join a grid using the Xgrid service configuration assistant:

1 Open Server Admin and connect to the server.

2 Click the triangle to the left of the server.

The list of services appears.

3 In the expanded Servers list, click Xgrid.

4 Click Overview.

5 Click Configure Xgrid Service (at the lower right).

This opens the Xgrid service configuration assistant.

6 Click Continue.

7 Choose “Join a grid,” then click Continue.

Chapter 2 Setting Up and Configuring Xgrid Service 29

8 Specify the controller you want to bind your agent to.

Select “Browse Bonjour-discoverable controllers” to view and select from available controllers.

Select “Use controller with hostname” to enter the hostname of a specific controller.

9 Click Continue.

10 Review and confirm your configuration settings, then click Continue.

This restarts Xgrid service using your settings.

11 Click Close.

Setting Up Xgrid Service

You set up Xgrid service by configuring two groups of settings on the Settings pane for Xgrid service in Server Admin:

Â Controller. Use to configure your server as an Xgrid controller and set client and

agent authentication.

Â Agent. Use to configure your server as an Xgrid agent, to specify the controller, and

to set controller authentication.

The following section describes how to configure these settings. An additional section tells you how to start Xgrid service when you finish. (By default, the Xgrid controller and agent are disabled.)

Important: If you specify a password, the agent and controller must use the same

password or must authenticate using Kerberos (SSO). For information about authentication options, see “Setting Passwords for Xgrid” on page 34.

Xgrid and Multiple Network Interfaces

On a server with multiple network interfaces, Mac OS X Server makes Xgrid service available over all interfaces. You can’t configure Xgrid service separately for each interface.

Configuring Controller Settings

You use Server Admin to configure an Xgrid controller. When configuring the controller, you can also set a password for any agent using the grid and for any client that submits a job to the grid.

To configure an Xgrid controller:

1 Open Server Admin and connect to the server.

2 Click the triangle to the left of the server.

The list of services appears.

3 In the expanded Servers list, click Xgrid.

30 Chapter 2 Setting Up and Configuring Xgrid Service

4 Click Settings.

5 Click Controller.

6 Click “Enable controller service.”

7 From the Client Authentication pop-up menu, choose one of the following

authentication options for clients and enter the password.

Â Password requires that the agent and controller use the same password. Â Kerberos uses SSO authentication for the agent’s administrator. Â None does not require a password for the agent. This option provides no protection

from potentially malicious use of your grid. With no authentication, a malicious agent could receive tasks and potentially access sensitive data.

For details about password options, see “Setting Up Grid Authentication” on page 34.

8 From the Agent Authentication pop-up menu, choose from the following

authentication options for agents and enter the password.

Â Password requires that the agent and controller use the same password. Â Kerberos uses SSO authentication for the agent’s administrator. Â Any uses any authentication available for the agent’s administrator. Â None does not require a password for the agent. This option provides no protection

from potentially malicious use of your grid. With no authentication, a malicious agent could receive tasks and potentially access sensitive data.

For information about password options, see “Setting Up Grid Authentication” on page 34.

9 Click Save.

Important: If you require authentication, the agent and controller must use the same

password or must authenticate using Kerberos (SSO). For information about authentication options, see “Setting Up Grid Authentication” on page 34.

Starting Xgrid Service

Use Server Admin to start Xgrid service.

The Xgrid service must be running for your server to control a grid or participate in a grid as an agent.

For details about using the server as an agent and controller, see “Configuring an Xgrid Agent (Mac OS X Server)” on page 32 and “Configuring Controller Settings” on page 30.

After you start Xgrid, it restarts when the server is restarted.

To start Xgrid service:

1 Open Server Admin and connect to the server.

Chapter 2 Setting Up and Configuring Xgrid Service 31

2 Click the triangle to the left of the server.

The list of services appears.

3 In the expanded Servers list, click Xgrid.

4 Click the Start Xgrid button (below the Servers list).

Configuring an Xgrid Agent (Mac OS X Server)

You use Server Admin to set up your server as an Xgrid agent. In addition, you can associate the agent with a specific controller or permit it to join a grid, specify when the agent accepts tasks, and set a password that the controller must recognize.

To configure an Xgrid agent on the server:

1 Open Server Admin and connect to the server.

2 Click the triangle to the left of the server.

The list of services appears.

3 In the expanded Servers list, click Xgrid.

4 Click Settings.

5 Click Agent.

6 Click “Enable agent service.”

7 Specify a controller by choosing its name in the Controller pop-up menu or by entering

the controller name.

By default, the agent uses the first available controller.

Note: An agent can find a controller in one of three ways: a specific hostname or IP address, the first available controller that advertises on Bonjour on the local subnet, or to a specific Bonjour service name.

8 Specify when the agent will accept tasks.

Tasks can be accepted when the computer is idle or always.

A computer is considered idle when it has no mouse or keyboard input and ignores CPU and network activity. If a user returns to a computer that is running a grid task, the computer continues to run the task until it is finished.

9 From the pop-up menu, choose one of the following authentication options and enter

the password.

For details, see “Setting Up Grid Authentication” on page 34.

from potentially malicious use of your grid. With no authentication, a malicious agent could receive tasks and potentially access sensitive data.

32 Chapter 2 Setting Up and Configuring Xgrid Service

10 Click Save.

Important: If you require authentication, the agent and controller must use the same

password or must authenticate using Kerberos SSO. For details about authentication option, see “Setting Up Grid Authentication” on page 34.

Configuring an Xgrid Agent (Mac OS X)

You use Sharing preferences to set up client computers as Xgrid agents. In addition, you can associate the agent with a specific controller or permit it to join any grid, specify when the agent accepts tasks, and set a password that the controller must recognize.

To configure an Xgrid agent on a client:

1 On the client computer, open Sharing preferences and click Services.

2 Click Xgrid and then click Configure.

3 Specify a controller by choosing its name in the Controller pop-up menu or by entering

the controller name.

By default, the agent uses the first available controller.

4 Specify when the agent will accept tasks.

Tasks can be accepted when the computer is idle or always.

5 Choose one of the following authentication options from the pop-up menu and enter

the password.

For more information, see “Setting Up Grid Authentication” on page 34.

from potentially malicious use of your grid. With no authentication, a malicious agent could receive tasks and potentially access sensitive data.

6 Click OK.

Important: If you require authentication, the agent and controller must use the same

password or must authenticate using Kerberos (SSO). For more information about authentication options, see “Setting Up Grid Authentication” on page 34.

7 Click Start to turn Xgrid sharing on.

Chapter 2 Setting Up and Configuring Xgrid Service 33

Setting Up Grid Authentication

You can configure Xgrid to require authentication of controllers, clients, and agents. For more information, see “Authentication Methods for Xgrid” on page 26.

Setting Up Kerberos for Xgrid

You use Server Admin to configure Kerberos as the authentication method for your Xgrid. Kerberos authentication uses SSO.

To configure Kerberos authentication:

1 Open Server Admin and connect to the server.

2 Click the triangle to the left of the server.

The list of services appears.

3 In the expanded Servers list, click Xgrid.

4 Click Settings.

5 Click Agent.

6 Click “Enable agent service.”

7 For the authentication option for the agent, choose Kerberos from the Controller

Authentication pop-up menu.

8 Click Controller.

9 Click “Enable controller service.”

10 For the authentication option for the client, choose Kerberos from the Client

Authentication pop-up menu.

11 For the authentication option for the agent, choose Kerberos from the Agent

Authentication pop-up menu.

12 Click Save and restart the service.

Setting Passwords for Xgrid

You use Server Admin to configure your Xgrid controllers to authenticate clients and agents using password authentication. Password authentication requires that the agent and controller use the same password.

You specify password options in Server Admin as part of configuring the agent and controller. See “Configuring an Xgrid Agent (Mac OS X Server)” on page 32 and “Configuring Controller Settings” on page 30.

To configure password authentication:

1 Open Server Admin and connect to the server.

2 Click the triangle to the left of the server.

The list of services appears.

34 Chapter 2 Setting Up and Configuring Xgrid Service

3 In the expanded Servers list, click Xgrid.

4 Click Settings.

5 Click Agent.

6 Click “Enable agent service.”

7 For the authentication option for the agent, choose Password from the Controller

Authentication pop-up menu and enter a password.

8 Click Controller.

9 Click “Enable controller service.”

10 For the authentication option for the client, choose Password from the Client

Authentication pop-up menu and enter a password.

11 For the authentication option for the agent, choose Password from the Agent

Authentication pop-up menu and enter a password.

You can also choose Any from the Agent Authentication pop-up menu to permit any method of authentication.

Note: Password authentication requires that the agent and controller use the same password.

12 Click Save and restart the service.

Managing Client Access

Server Admin in Mac OS X Server enables you to configure service access control lists (SACLs), which enable you to specify which users and groups have access to Xgrid and which administrators can manage it.

Using SACLs enables you to add another layer of access control in addition to password and Kerberos authentication. Only users and groups listed in an SACL have access to its corresponding service.

Setting SACL Permissions for Users and Groups

You use Server Admin to set SACL permissions for users and groups to access Xgrid service.

To set user and group SACL permissions for Xgrid service:

1 Open Server Admin and connect to the server.

2 Click Settings.

3 Click Access.

4 Click Services.

5 Select the level of restriction you want for the services:

Chapter 2 Setting Up and Configuring Xgrid Service 35

To restrict access to all services, select “For all services.”

To set access permissions for individual services, select “For selected services below,” then select a service from the Service list.

6 To provide unrestricted access to services, click “Allow all users and groups.”

7 To restrict access to users and groups:

a Select “Allow only users and groups below.”

b Click the Add (+) button to open the Users and Groups drawer.

c Drag users and groups from the Users and Groups drawer to the list.

8 Click Save.

Setting SACL Permissions for Administrators

Use Server Admin to set SACL permissions for administrators to monitor and manage Xgrid service.

To set administrator SACL permissions for Xgrid service:

1 Open Server Admin and connect to the server.

2 Click Settings.

3 Click Access.

4 Click Administrators.

5 Select the level of restriction you want for the services:

To restrict access to all services, select “For all services.”

To set access permissions for individual services, select “For selected services below,” then select a service from the Service list.

6 Open the Users and Groups drawer by clicking the Add (+) button.

7 From the Users and Groups drawer, drag users and groups to the list.

8 Set user permissions:

To grant administrator access, choose Administer from the Permission pop-up menu next to the user name.

To grant monitoring access, choose Monitor from the Permission pop-up menu next to the user name.

9 Click Save.

36 Chapter 2 Setting Up and Configuring Xgrid Service

Managing Xgrid Service

This section describes typical day-to-day tasks you might perform after you set up Xgrid service on your server. For information about initial setup, see “Setting Up Xgrid Service” on page 30.

You can monitor and manage grids using Xgrid Admin. For more information, see Chapter 3, “Managing a Grid.”

Viewing Xgrid Service Status

You can use Server Admin to view the status of Xgrid service.

To view Xgrid service status:

1 Open Server Admin and connect to the server.

2 Click the triangle to the left of the server.

The list of services appears.

3 From the expanded Servers list, select Xgrid.

4 Click Overview to see whether the service is running, when it started, agent and

controller information, the number of jobs running and pending, and the amount of processor power available and used.

5 Click Logs to review the system, controller, and agent logs.

Use the View pop-up menu to choose which log to view.

Viewing Xgrid Service Logs

You can use Server Admin to view the Xgrid system, controller, and agent logs for Xgrid service.

To view logs:

1 Open Server Admin and connect to the server.

2 Click the triangle to the left of the server.

The list of services appears.

3 From the expanded Servers list, select Xgrid.

4 Click Logs, then use the Show pop-up menu to choose System Log (Xgrid), Xgrid

Controller Log, or Xgrid Agent Log.

To search for specific entries, use the filter field above the log.

From the Command Line

You can also view the Xgrid service log at /var/log/system.log using the cat or tail commands in Terminal.

Chapter 2 Setting Up and Configuring Xgrid Service 37

Stopping Xgrid Service

You use Server Admin to stop Xgrid service.

To stop Xgrid service:

1 Open Server Admin and connect to the server.

2 Click the triangle to the left of the server.

The list of services appears.

3 From the expanded Servers list, select Xgrid.

4 Click the Stop Xgrid button (below the Servers list).

From the Command Line

You can also stop Xgrid service immediately by using the serveradmin command in Terminal.

38 Chapter 2 Setting Up and Configuring Xgrid Service

3 Managing a Grid

Use this chapter to learn how to use the Xgrid Admin application to manage grids, add controllers and agents, and work with jobs.

After you set up an Xgrid controller, you can use Xgrid Admin to manage a grid. You can use Xgrid Admin on the server or on a remote computer that is running Mac OS X v10.4 or later.

You can manage one or more computational grids with Xgrid Admin. A computational grid is a fixed group of agents with a dedicated queue. There can be multiple grids per controller but an agent can belong to only one grid. You cannot move an agent between grids while a job (or a task) is running.

Using Xgrid Admin

Xgrid Admin is a tool you use to monitor one or more grids and manage agents and jobs.

With Xgrid Admin, you can: Â Check the status of a grid and its activity, including the number of agents working

and available, processing power in use and available, and the number of jobs running and pending

Â Add or remove controllers and grids to manage Â See a list of agents in a grid and the CPU power available and in use for each agent Â Add or remove agents in a grid Â See a list of jobs in a grid, the date and time each job was submitted, its progress,

and the active CPU power for the job

Â Remove jobs in a grid Â Stop a job in progress Â Restart a job that was stopped or is complete

Xgrid Admin provides controls in its graphical interface and menu commands for all of its options.

Note: You can also use the Xgrid command-line tool to perform these tasks. For more information about using the command-line tool, see Chapter 4, “Planning and Submitting Xgrid Jobs.”

Status Indicators in Xgrid Admin

Xgrid Admin provides status indicators, which are small color bubbles indicating the status of controllers, agents, and jobs. The color indicators are:

Â Colorless = controller or agent is offline, job is pending Â Gray = job is submitting Â Green = controller is connected, agent is working, job is running Â Yellow = agent is available but not running Â Red = agent is unavailable, job is failed or canceled Â Blue = job is complete

Managing the Xgrid Controller

In general, you manage the Xgrid controller like any other service running on Mac OS X Server, using Server Admin to manage which processes are running and using Xgrid Admin to manage the agent and job queues on the controller.

The amount of management required also depends on how many queues you have and the number (and temperament) of the users who submit jobs.

Xgrid uses a simple first-in, first-out (FIFO) queue for scheduling each grid, which means that as the administrator you must obtain your colleagues’ cooperation to make sure resources are allocated correctly among multiple users.

For more information, see the following sections:

Connecting to an Xgrid Controller

You use Xgrid Admin to connect to an Xgrid controller. The controller must be reachable on any network by the administrative computer running Xgrid Admin.

After Xgrid Admin is connected to the controller, you can view the status of its grid and manage its agents and jobs.

40 Chapter 3 Managing a Grid

To connect to an Xgrid controller:

1 Open Xgrid Admin and do one of the following:

Â From the pop-up menu, choose the controller or enter its name and click Connect. Â In the Controllers and Grids list, select the controller name and click Connect.

2 If necessary, select the correct authentication option, enter a password, and then click

OK.

Disconnecting from an Xgrid Controller

You use Xgrid Admin to disconnect froman Xgrid controller in the Controllers and Grids list.

To disconnect an Xgrid controller:

1 Open Xgrid Admin.

2 In the Controllers and Grids list, select a controller.

3 Click Disconnect.

Adding an Xgrid Controller

You use Xgrid Admin to add an Xgrid controller to the Controllers and Grids list.

To add an Xgrid controller to the monitoring list:

1 Open Xgrid Admin.

2 Click Add Controller.

3 From the pop-up menu, choose a controller or enter its name and click Connect.

4 If necessary, select the correct authentication option, enter a password, and then click

OK.

Removing an Xgrid Controller

You can easily remove an Xgrid controller from the Controllers and Grids list in Xgrid Admin.

To remove an Xgrid controller:

1 Open Xgrid Admin.

2 In the Controllers and Grids list, select a controller.

3 Click Remove Controller.

Chapter 3 Managing a Grid 41

Managing Agents

Use Xgrid Admin to view, add, or delete agents. Xgrid Admin also uses status indicators to display the status of agents.

Although Server Admin provides a simple interface for enabling Xgrid services on one server or across a rack of Xserve systems, it doesn’t provide a way to configure Xgrid on desktop computers running Mac OS X v10.3 or later.

If you are relying on volunteers to provide desktop agents, you can send instructions for enabling Xgrid from the Sharing pane of System Preferences.

If the volunteers are using Mac OS X v10.3, you must first download the Xgrid Agent for Mac OS X v10.3 and then use the Xgrid pane of System Preferences. You can download the Xgrid Agent for Mac OS X v10.3 from:

www.apple.com/server/macosx/xgrid.html

If you administer a group of computers and want the computers to participate in a grid using Xgrid, you can use the following methods:

Â Apple Remote Desktop Â SSH Â NetBoot or NetInstall

Apple Remote Desktop

Apple Remote Desktop (ARD) v2.1 is a separate product available from Apple that integrates common administrative tasks across multiple computers (such as screen sharing, software installation, running UNIX scripts, and so on).

You can use ARD to remotely run System Preferences on each computer but it is usually simpler to change the preferences once and then push the new preferences file (/Library/Preferences/com.apple.xgrid.agent.plist) to all relevant nodes.

For more information, see the Apple Remote Desktop Administration guide at www.apple.com/server/documentation.

SSH

If you don’t have ARD but you’ve set up SSH logins, you can do the same thing as ARD using the

xgridctl tool with the following command:

$ ssh root@remotehost xgridctl agent start

scp command-line tool (or rsync, if you’ve set that up). You can also use the

For more details, see the man pages for SSH, SCP, SFTP, or rsync in the Terminal application.

42 Chapter 3 Managing a Grid

NetBoot or Network Install

For large networks, it often makes sense to use a common system image that is mounted or installed by each agent to configure the agents.

Although Xgrid isn’t reason enough to use NetBoot, consider whether using Network Install would simplify your general administrator’s tasks. If you use Netboot with Xgrid, all agents must have unique hostnames and must keep all files intact between reboots. For more information, see System Imaging and Software Update Administration at www.apple.com/server/documentation.

Viewing a List of Agents

You can see a list of agents for a controller in Xgrid Admin. To see a list of agents for an Xgrid controller:

1 Open Xgrid Admin.

2 In the Controllers and Grids list, select the grid.

3 Click Agents.

4 Select an agent in the list to see information about the CPU power and processors it

uses.

The color bubble to the left of the name shows each agent’s status. For details, see “Status Indicators in Xgrid Admin” on page 40.

Adding an Agent

You can add an agent to a controller in Xgrid Admin. You can add agents that are offline. The agents will be available to the controller when the computers are online or when the controller administrator makes the agents active.

To add an agent:

1 Open Xgrid Admin.

2 In the Controllers and Grids list, select the controller.

3 Click Agents.

4 Click the Add (+) button below the list of agents.

5 Enter a name for the agent and click OK.

The agent is added to the list. The color bubble to the left of the name shows the agent’s status. For details, see “Status Indicators in Xgrid Admin” on page 40.

Chapter 3 Managing a Grid 43

Deleting an Agent

You can delete an agent for an Xgrid controller in Xgrid Admin.

To delete an agent:

1 Open Xgrid Admin.

2 In the Controllers and Grids list, select the controller.

3 Click Agents.

4 Click the Delete (–) button below the list of agents.

Note: If you delete an agent that you know is on the local subnet and is configured to attach to that controller, wait a few moments and it will reappear in the list. If the agent doesn’t reappear, use the Add (+) button and enter its name to retrieve it.

Managing Jobs

You use Xgrid Admin to manage jobs after they are submitted by a client.

You cannot move a job between grids.

Viewing a List of Jobs

You can see a list of jobs in Xgrid Admin.

To see a list of jobs:

1 Open Xgrid Admin.

2 In the Controllers and Grids list, select the controller.

3 Click Jobs.

4 Select a job in the list to see details of that job.

Stopping a Job

You can stop a job in Xgrid Admin.

To stop a job:

1 Open Xgrid Admin.

2 In the Controllers and Grids list, select the controller.

3 Click Jobs.

4 Select the job you want to stop.

5 Click the Stop button below the list of jobs.

44 Chapter 3 Managing a Grid

Repeating or Restarting a Job

You can repeat a job or restart a stopped job in Xgrid Admin.

To repeat or restart a job:

1 Open Xgrid Admin.

2 In the Controllers and Grids list, select the controller.

3 Click Jobs.

4 Select the job you want to repeat or restart.

5 Click the Start button below the list of jobs.

Deleting a Job

You can delete a job in Xgrid Admin.

To delete a job:

1 Open Xgrid Admin.

2 In the Controllers and Grids list, select the controller.

3 Click Jobs.

4 Select the job you want to delete.

5 Click the Delete (–) button below the list of jobs.

Adding a Grid

You use Xgrid Admin to add a grid to an Xgrid controller in the Controllers and Grids list.

To add a grid:

1 Open Xgrid Admin.

2 Select the Xgrid controller you want to add the grid to.

3 Click the Add (+) button below the Controller and Grids list.

4 In the pop-up menu, enter a name for the new grid and click OK.

Chapter 3 Managing a Grid 45

Deleting a Grid

You use Xgrid Admin to remove a grid from an Xgrid controller in the Controllers and Grids list.

To delete a grid:

1 Open Xgrid Admin.

2 In the Controllers and Grids list, select the grid.

3 Click the Action pop-up menu below the Controller and Grids list and select Remove

Grid.

4 Click OK.

Monitoring Grid Activity

You can quickly view the activity of a grid in Xgrid Admin. You can also view agents and job activity using Xgrid Admin. For more information, see “Viewing a List of Agents” on page 43 and “Viewing a List of Jobs” on page 44.

To monitor the activity of a grid:

1 Open Xgrid Admin.

2 In the Controllers and Grids list, select the Xgrid controller.

3 Click Overview to see the number of agents, the amount of processor power available

and used, and the number of jobs running and pending.

46 Chapter 3 Managing a Grid

4 Planning and

Submitting Xgrid Jobs

Use this chapter to learn how to use Xgrid command-line tools and the Terminal application to submit jobs to a grid and to get information about jobs.

After you configure an Xgrid controller and add agents to a grid, you can use the Terminal application to send a job to the grid.

Structuring Jobs for Xgrid

Carefully planning and structuring a job can result in efficient use of the grid. For example, the best structure for a job that requires multiple searches of a large database may be to divide the database into multiple sections and provide a section to each agent in the grid.

About Job Styles

Different styles of jobs often require different handling. Similarly, the way a job is structured influences how efficiently the grid completes it.

Consider the following job styles:

Â Everything in one single large job, with numerous small tasks. Â Everything divided into medium-sized jobs, where each job has roughly as many

tasks as there are nodes in the grid. (This type of job is usually created by a meta job script, which divides the job into smaller chunks, each of which is a job in itself.)

Â An entire workflow composed of several interrelated jobs.

Deciding how to structure a job can involve experimentation to discover the best way to complete it.

For example, you might create a simple, small version of a job in two styles, such as by planning all tasks in one job or by subdividing into multiple tiny jobs. Running both experimental jobs under similar conditions in the grid will give you a good idea of which job style is better suited to those conditions.

About Job Failure

Xgrid jobs can rely on message-passing interface (MPI) APIs. For jobs that rely on MPI, if a single task fails, the entire job fails and must be resubmitted. Therefore you should not use MPI-based jobs on grids with high task-failure rates.

Jobs that are more parallel in nature are generally unaffected by occasional task failures. Tasks are typically reassigned to other available agents to complete the job. Most jobs fall into this category.

Submitting a Job

You submit jobs to a grid using the command-line tool and Terminal. Example code is available on the Apple developer website (developer.apple.com) for alternative methods of submitting jobs. Also If you have Developer Tools installed you can view the examples located in /Developer/Examples/Xgrid/.

For more information about the syntax and options for the xgrid command-line tool, see the xgrid man pages.

Some developers and organizations offer specialized applications for submitting jobs to a grid. Or you can create such an application using Apple’s developer tools for Xgrid.

When determining whether to use the xgrid command-line tool or another method for submitting jobs, consider these points:

Â If the job is simple, use the command-line tool. Â If you use a shell script, use the command-line tool. Â If you want to use Xgrid as part of an application with a graphical user interface

(GUI), use the Xgrid API to create the GUI or incorporate it in an existing application. For more information about the API, see the Xgrid Reference at:

developer.apple.com/documentation

Examples of Xgrid Job Submission and Results Retrieval

The following Terminal commands are examples of jobs a client can submit to the controller.

$ xgrid -h <controller> -p <password> -job submit /bin/echo "Hello, World!"

This job runs /bin/echo on the controller and agent systems with the “Hello, World!” parameter.

$ xgrid -h <controller> -p <password> -job results -id <id>

This command shows the results of the job with the id indicated.

For an executable shell script marked hello.sh:

#!/bin/sh

/bin/echo "Hello, World!"

48 Chapter 4 Planning and Submitting Xgrid Jobs

The following command copies the shell script hello.sh to the Xgrid controller and agent systems and runs the script. /bin/echo must be installed on the agent system. The hello.sh script must have its executable bit set before it can execute.

xgrid -h <controller> -p <password> -job submit hello.sh

Viewing Job Status

You can monitor jobs in Xgrid Admin (for details, see “Managing Jobs” on page 44) or with the command-line tool.

The following commands in Terminal provide job status:

$ xgrid -h <controller> -p <password> -job list

$ xgrid -h <controller> -p <password> -job attributes -id <job-id>

Retrieving Job Results

You can retrieve job results using the command-line tool.

The following commands in Terminal retrieve job results.

$ xgrid -h <controller> -p <password> -job results

$ xgrid -h <controller> -p <password> -job results id <job-id>

Chapter 4 Planning and Submitting Xgrid Jobs 49

50 Chapter 4 Planning and Submitting Xgrid Jobs

5 Solving Xgrid Problems

Use this chapter to help solve common problems you might encounter and questions you might have while working with Xgrid service.

This section contains answers to common problems and questions.

If Your Agents Can’t Connect to the Xgrid Controller

If an agent is a server, make sure the agent service is enabled and the Xgrid service is started. The Xgrid controller is the only component of Xgrid that has an open port (port 4111) and requires a firewall opening.

This means the Xgrid controller is the only component that advertises on or responds to queries over Bonjour. When enabling the controller, make sure firewall port 4111 is open on your computer’s firewall (enabled in the Sharing Pane of System Preferences) or your corporate firewall (if accepting agents or clients outside your organization).

Agents and clients access the controller through a Bonjour lookup or an explicit hostname/IP address, then they initiate a connection to the controller over a user port, avoiding the need to perform privileged operation or opening the firewall.

If You Use Xgrid over SSH

The simplest way to secure Xgrid using SSH is to create a tunnel from the client or the agent to the controller:

$ ssh user@controller.hostname.com -L 4111:controller.hostname.com:4111

Then, have the agent or client connect to localhost instead of the controller. By doing this, SSH tunnels to the remote connection. You can use other ports on the local machine and even tunnel through an intermediary host.

To run an Xgrid agent over an SSH tunnel as a particular user:

Using Terminal, enter the following:

$ ssh -R

20000

address and port number of the controller, and

20000:192.168.1.100:4111 user@192.168.1.102

20000

GridAgent -ServiceName localhost: UsesRendezvous NO -OnlyWhenIdle NO -BindToFirstAvailable NO

-RequireControllerPassword NO -

is the port to tunnel through the ssh connection,

user

is the name of the user to connect,

192.168.1.102

is the address of the remote computer to run the agent.

/usr/libexec/xgrid/

192.168.1.100:4111

is the

If You Run Tasks on Multi-CPU Machines

By default, each Xgrid agent (one per machine) accepts as many tasks as there are CPUs on that host, as reported by

Agents assume that tasks are single-threaded, so they will run two tasks to make best use of a dual-CPU system. To run multithreaded tasks that take up both CPUs, edit the agent configuration file /Library/Preferences/com.apple.xgrid.agent.plist.

To make it always only accept a single task, change the MaximumTaskCount line to:

MaximumTaskCount=1

Note: This must be done explicitly for each agent, and is permanent until reversed. You can’t specify this kind of constraint as part of a job submission.

$ sysctl hw.ncpu.

If You Submit a Large Number of Jobs

GridStuffer is a third-party Cocoa application created by Charles Parnot of Stanford to manage multitask jobs. It provides a friendly GUI for many common Xgrid tasks. GridStuffer is available at:

http://cmgm.stanford.edu/~cparnot/xgrid-stanford/html/goodies/GridStuffer-info.html

A companion command-line tool, xgridstatus, provides an easy way to retrieve information about your grid and jobs. Xgridstatus is available at:

http://cmgm.stanford.edu/~cparnot/xgrid-stanford/html/goodies/xgridstatus-info.html

52 Chapter 5 Solving Xgrid Problems

If You Want to Use Xgrid on Other Platforms

Third-party agents are available that run Xgrid jobs on non-Mac platforms. You are responsible for ensuring that your tasks contain and call appropriate platform-specific code.

There is no intrinsic support for heterogeneous execution, although there is nothing that relies on Mac-specific technology.

The primary technical requirement is a sufficiently functional BEEP protocol stack. Several open source implementations are available, of varying quality.

Two cross-platform Xgrid agents are available: Â Curtis Campbell’s java agent, at:

http://sourceforge.net/projects/xgridagent-java/

Â Daniel Cote’s Linux/UNIX agent (not yet updated for Mac OS X v10.4), at:

http://www.novajo.ca/xgridagent/

If the Xgrid Controller Must Be Restarted

When the Xgrid controller is restarted, by Server Admin, xgridctl tool, a power-outage, or a kernel panic, the following occurs:

Â Clients and agents are disconnected. Â Tasks running when the controller restarted are stopped. Â Partial data from killed tasks is discarded. Â data from finished tasks is saved and can be retrieved as usual. Â Queued jobs and tasks are saved and run as usual. Â Tasks are started/restarted as agents reconnect and become available.

If Xgrid Has Crashed

The Xgrid controller and agent should restart automatically if they crash. CrashReporter logs can be found in /Library/Logs/CrashReporter. Xgrid logs notices, warnings, and errors to the console as well as to log files in /Library/Logs/Xgrid

If You Are Trying to Submit Jobs over 2 GB

The Xgrid controller is a 32-bit process and keeps most job input and output data in memory. This means that the controller can crash if your jobs require a large amount of input or produce a large amount of output. This limitation might change in the future.

We recommend using a shared filesystem (such as Xsan or NFS) if you need to share large amounts of data between distributed processes.

Chapter 5 Solving Xgrid Problems 53

If You Want to Enable Kerberos/SSO for Xgrid

For Xgrid to use SSO, you need the following:

Â The agent must have the host’s user principal in the system keytab. Â The Kerberos database on the KDC must contain the agent’s principal. Â The controller’s realm must be the default realm on the agent computer.

The agent’s principal is created in the KDC and is put in the agent’s keytab if the agent computer is bound to the OD master using _AUTHENTICATED BINDING_ with Directory access. Otherwise, you must use kadmin to create the principal in the KDC and export it to the keytab.

For example, the computer hosting the agent must have the host’s user principal in the system keytab, as shown here:

$ hostname:~ user

$ sudo klist -k

$ Password:

$ Keytab name: FILE:/etc/krb5.keytab

KVNO Principal

---- --------------------------------------------------------------

1 hostname.apple.com@XGRIDTEST.APPLE.COM

The Kerberos database on the KDC must contain the agent’s principal, as in the following:

$ sudo kadmin.local -q "get_principal hostname.apple.com"

Authenticating as principal root/admin@XGRIDTEST.APPLE.COM with password.

Principal: hostname.apple.com@XGRIDTEST.APPLE.COM

Expiration date: [never]

Last password change: Tue Apr 12 17:46:41 PDT 2005

Password expiration date: [none]

Maximum ticket life: 0 days 10:00:00

Maximum renewable life: 7 days 00:00:00

Last modified: Tue Apr 12 17:46:41 PDT 2005 (root/admin@XGRIDTEST.APPLE.COM)

Last successful authentication: [never]

Last failed authentication: [never]

Failed password attempts: 0

Number of keys: 4

Key: vno 1, Triple DES cbc mode with HMAC/sha1, no salt

Key: vno 1, ArcFour with HMAC/md5, no salt

Key: vno 1, DES cbc mode with CRC-32, no salt

Key: vno 1, DES cbc mode with CRC-32, Version 4

Attributes: REQUIRES_PRE_AUTH

Policy: [none]

54 Chapter 5 Solving Xgrid Problems

The controller’s realm must be the default realm on the agent computer, as shown:

$ cat /Library/Preferences/edu.mit.Kerberos

# WARNING This file is automatically created, if you wish to make changes

# delete the next two lines

# autogenerated from : /LDAPv3/xgridtest.apple.com

# generation_id : 1637891359

[libdefaults]

default_realm = XGRIDTEST.APPLE.COM

[realms]

XGRIDTEST.APPLE.COM = {

kdc = xgridtest.apple.com

admin_server = xgridtest.apple.com

}

[domain_realm]

apple.com = XGRIDTEST.APPLE.COM

.apple.com = XGRIDTEST.APPLE.COM

For More Information

If you’re an experienced server administrator or even a novice server administrator working with Xgrid, you can review the Xgrid FAQ site. The FAQ site will provide you with access to news, posted questions and threads, and the ability to post your own Xgrid questions.

The site is at http://lists.apple.com/faq/pub/xgrid_users/.

For more information about advanced configuration options, see the xgridctl man page.

Chapter 5 Solving Xgrid Problems 55

56 Chapter 5 Solving Xgrid Problems

Part II: Configuring High

Performance Computing

Use the chapters in this part of the guide to learn about high performance computing and the applications and tools available for administering it.

Chapter 6 Introducing High Performance Computing

Chapter 7 Reviewing the Cluster Setup Process

Chapter 8 Identifying Prerequisites and System Requirements

Chapter 9 Preparing the Cluster for Configuration

Chapter 10 Setting Up the Cluster Controller

Chapter 11 Setting Up Compute Nodes

Chapter 12 Testing Your Cluster

6 Introducing High Performance

Computing

Use this chapter to learn about high performance computing (HPC) and how it’s supported by Apple technology.

With high performance computing, you can speed the processing of complex computations by using Xserve computers with the Xgrid service.

Understanding HPC

HPC refers to the use of high-end computer systems to solve computationally intensive problems. HPC includes large supercomputers, symmetric multiprocessing (SMP) systems, cluster computers, and other hardware and software architectures.

In recent years, developers have made it feasible for standard off-the-shelf computer systems to achieve supercomputer-scale performance by clustering them in efficient ways.

Apple and HPC

Apple’s hardware and software facilitate HPC in unique and meaningful ways. Although many hardware and software architectures can be used for cluster computing, Mac OS X Server v10.5 and Xserve have specific features that enhance the performance and manageability of cluster installations.

The integration of Xserve with Mac OS X Server provides unparalleled ease of use, performance, and manageability. Because Apple makes the hardware and the software, the benefits of tight integration are immediately evident in the quality of the user experience with a Macintosh-based cluster.

Mac OS X Server

Mac OS X Server v10.5 is Apple’s award-winning UNIX server operating system. Mac OS X Server can compile and run UNIX 03-complaint code, and runs 64-bit applications alongside 32-bit applications at native performance.

The Mach kernel provides preemptive multitasking for outstanding performance, protected system memory for stability, and modern SMP locking for efficient use of multi processor and multi core systems.

Mac OS X Server also includes highly optimized math libraries that enable software developers to take maximum advantage of the G5 or Intel-based processor without the use of difficult programming techniques or expensive development tools.

Mac OS X Server also includes Xgrid, an integrated distributed resource manager for both grids and clusters.

Xserve Clusters

Using a combination of Xserve systems, you can build clusters that aggregate the power of these systems to provide HPC solutions at comparatively low cost.

An Xserve cluster consists of at least 2nodes: a cluster controller and one or more compute nodes, as shown in the following illustration:

Controller

Compute nodes

Xserve cluster

Xserve 64-Bit Architecture

The 64-bit architecture of Xserve systems is ideal for HPC applications. It provides 64bit math precision, higher data throughput, and very large memory space.

60 Chapter 6 Introducing High Performance Computing

Memory Space

The 64-bit architecture provides four billion times the memory space available in a 32bit architecture, which puts the theoretical address space available to Mac OS X Server applications at 16 exabytes. Xserve G5 systems support 8 GB of memory. Xserve Intel systems support 32 GB of memory.

Libraries

Mac OS X Server provides the following highly optimized libraries for developing HPC applications. In addition to standard libraries like libSystem, numerical libraries like BLAS, LAPACK, and others provide industry-standard routines that have been handtuned for the G5 or Intel processor. Developers can make efficient use of the system architecture without writing computer code or vector code.

Library Description

libSystem A collection of core system libraries

libMathCommon A common math functions library

vDSP A library that provides mathematical functions for applications that operate

on real and complex data types

BLAS A library of basic linear algebra subprograms, which are a standard set of

building blocks for vector and matrix operations

LAPACK The linear algebra package, which is a standard library for solving

simultaneous linear equations

vForce A library of highly-optimized single- and double-precision mathematical

intrinsic functions

vBasicOps A collection of basic operations that complement the vector processor’s basic

operations up to 128 bits

vBigNum A library of optimized arithmetic operations for 256-, 512-, and 1024-bit

operands

Easy Porting of UNIX Applications

Mac OS X Server is now an Open Brand UNIX 03 Registered Product, conforming to the SUSv3 and POSIX 1003.1 specifications for the C API, shell utilities, and threads. It can compile and run all your existing UNIX 03-compliant code.

Chapter 6 Introducing High Performance Computing 61

Support of Loosely Coupled Computations

You can use Xserve clusters to perform most types of loosely coupled or embarrassingly parallel computations. Embarrassingly parallel computations consist of somewhat

independent computational tasks that can be run in parallel on many different processors to achieve faster results.

Here are examples of loosely coupled computations that you can accelerate using the setup described in this guide:

Â Image rendering. Different rendering tasks, such as ray tracing, reflection mapping,

and radiosity, can be accelerated by parallel processing.

Â Bioinformatics. The throughput of bioinformatics applications like BLAST and

HMMER can be greatly enhanced by running them on a cluster.

Note: The Apple Workgroup Cluster is a preconfigured cluster solution that has everything you need to get up and running quickly. It includes qualified, integrated hardware components and easy-to-use management tools. You can add clusteraware commercial applications, such as iNquiry or gridMathematica, or develop your own custom applications using Xcode. For more information, see http:// www.apple.com/science/solutions/workgroupcluster.html.

Â Cryptography. Brute-force key search is a classic example of a cryptography

application that can be greatly accelerated when run on a computer cluster.

Â Data mining. High performance computing is essential in data mining because of

the amount of data that is analyzed.

Note: This guide assumes that the cluster nodes communicate over gigabit Ethernet. Although the network latency of Gigabit Ethernet is low enough for most loosely coupled computations, those that require lower latency may benefit from another interconnect technology.

62 Chapter 6 Introducing High Performance Computing

7 Reviewing the Cluster Setup

Process

Use this chapter to learn about the process of setting up a high performance cluster.

You will use multiple server tools to configure services, a cluster controller, compute nodes, and users when setting up a high performance cluster.

The following chapters provide a step-by-step process to assemble and configure a computational cluster. The resulting cluster will consist of a controller and a number of compute nodes. The compute nodes will be connected to the controller via a private (isolated) Ethernet network switch. The controller will be connected to both the private Ethernet network and a public network, potentially the Internet. The controller will also provide a shared file system to compute nodes.

The controller will provide a number of services to the compute nodes: Â A Firewall will isolate the controller and compute nodes from the public network,

protecting against unwanted access. Access to the private network from outside the firewall will require remote users to use SSH for command-line access or VPN to use or manage cluster resources with graphical applications or administrative tools such as Apple Remote Desktop.

Â Network services such as DHCP, DNS, and NAT will allow the compute nodes to

communicate with each other and external networks.

Â Open Directory will contain user account information, including usernames and

passwords, and make these accounts available to compute nodes. Using Kerberos with Open Directory provides single sign-on capability, reducing the number of times a user will need to enter passwords to access cluster resources.

Â Open Directory will also publish network file system (NFS) share points, providing

automatic file sharing between compute nodes and controller. A shared network home directory, containing home folders for each cluster user, will be mounted on each compute node.

Â The controller will host the Xgrid controller service.

Cluster Setup Overview

Here is a summary of what you’ll be doing to set up and test an HPC cluster.

Step 1: Before you begin

Before setting up your cluster, understand the expectations and requirements that you must fulfill. See Chapter 8, “Identifying Prerequisites and System Requirements.”

Step 2: Prepare the cluster for configuration

Prepare your cluster nodes for configuration by setting up the hardware and connecting your nodes to a network. See Chapter 9, “Preparing the Cluster for Configuration.”

Step 3: Enable, configure, and start services

After your cluster is assembled and ready, start by setting up and configuring the cluster controller. Use Server Assistant to set up the server software on the cluster controller. See Chapter 10, “Setting Up the Cluster Controller.”

Use Server Admin to configure and start the following services:

Â DNS service. See “Configuring DNS Service” on page 84. Â Open Directory service. See “Configuring Open Directory Service” on page 86. Â DHCP service. See “Configuring DHCP Service” on page 87. Â Firewall service. See “Configuring Firewall Settings on the Cluster Controller” on

page 88.

Â NAT service. See “Configuring NAT Settings on the Cluster Controller” on page 90. Â NFS service. See “Configuring NFS” on page 90. Â VPN service. See “Configuring VPN Service” on page 90. Â Xgrid service. See “Configuring Xgrid Service” on page 91.

Step 4: (Optional) Prepare the data drive

Use Disk Utility to configure the data drive. See “Preparing the Data Drive as a Mirrored RAID set” on page 92.

Step 5: Create an automounted network share

Use Server Admin to create an automounted network share. See “Creating a Home Directory Automount Share Point” on page 93.

Step 6: Create network user accounts

Use Workgroup Manager to create network user accounts for cluster users. See “Creating User Accounts” on page 94.

64 Chapter 7 Reviewing the Cluster Setup Process

Step 7: Create an Auto Server Setup record for the compute nodes

Use Server Assistant to save configuration settings to a file or Open Directory record. This allows cluster nodes to automatically configure themselves when they start up for the first time.

See “Creating an Auto Server Setup Record for Compute Nodes” on page 95 and “Verifying LDAP Record Creation” on page 98.

Step 8: Set up compute nodes

Start compute nodes to begin the Auto Server Setup process. They’ll automatically configure themselves and then restart. See “Setting Up Compute Nodes” on page 98.

Step 9: Finish compute node configuration

Use Server Admin to name the compute nodes, join them to the Kerberos realm, and configure their Xgrid agent software.

Step 10: Test your cluster setup

After configuring the controller and compute nodes, test your cluster with Xgrid Admin and a sample Xgrid application. See Chapter 12, “Testing Your Cluster.”

Chapter 7 Reviewing the Cluster Setup Process 65

66 Chapter 7 Reviewing the Cluster Setup Process

8 Identifying Prerequisites and

System Requirements

Before setting up your cluster, read the prerequisites and requirements in this chapter and familiarize yourself with the setup process.

To make sure that your cluster is successfully set up, read this chapter to familiarize yourself with the expectations and requirements you must meet before starting the setup procedure. Then read the last section, which provides an overview of the cluster setup process.

Prerequisites

This guide assumes you have the expertise needed to set up and manage the cluster, perform the initial configuration of the cluster nodes, and carry out the types of computations you can perform on the cluster.

Expertise

To set up and deploy clusters, you should have a good understanding of how Mac OS X Server works and you should have a fundamental understanding of UNIX, Xgrid, and TCP/IP networking.

Xserve Configuration

This guide assumes that you’ll be using new, out-of-the-box Xserve systems running Mac OS X Server v10.5 or later. If not, you must install a clean version of Mac OS X Server v10.5 or later on your systems.

System Requirements

Take time to define the requirements needed to make sure the cluster setup is successful. System requirements are categorized as infrastructure, software, and private network requirements.

Infrastructure Requirements

This section describes the most important hardware infrastructure requirements. Consult with your system administrator about other requirements.

For example, you might need one or more uninterruptible power supplies (UPSs) to provide backup power to key cluster components. Another requirement might be a physical security system to protect the cluster from unauthorized access to sensitive information.

Infrastructure requirements are divided into the following subcategories:

Â “General Hardware Requirements” on page 68 Â “Power Requirements” on page 68 Â “Cooling Requirements” on page 69 Â “Weight Requirements” on page 70 Â “Space Requirements” on page 70 Â “Network Access Requirements” on page 71

General Hardware Requirements

To set up a cluster, you should have the necessary hardware infrastructure in place. This includes:

Â Racks Â Electrical power Â Cooling system Â Network access points and switches

Power Requirements

When setting up the physical infrastructure for your cluster, consider the following power consumption figures:

Â Rated power consumption. This figure represents the maximum power consumption

of a given system’s power supply.

Â Typical power consumption. This figure represents the typical power consumption

of a server under normal operating conditions.

Note: This section focuses only on the rated power consumption figure because it guarantees that your circuit won’t be overloaded at any time—unlike the typical power consumption figure, which doesn’t protect your circuit from abnormal surges in power consumption.

68 Chapter 8 Identifying Prerequisites and System Requirements

To obtain power consumption figures for cluster nodes, see the following articles on the AppleCare Service & Support website:

Â Article 86694, “Xserve G5: Power consumption and thermal output (BTU)

information,” at www.info.apple.com/kbnum/n86694

Â Article 75383, “Xserve: Power Consumption and Thermal Output (BTU) Information,”

at www.info.apple.com/kbnum/n75383

Â Article 86251, “Xserve (Slot Load): Power Consumption and Thermal Output (BTU)

Information,” at www.info.apple.com/kbnum/n86251

Â Article 304887, “Xserve (Late 2006): Power consumption and thermal output (BTU)

information,” at www.info.apple.com/kbnum/n304887

Although the rated current load covers your cluster nodes, you must also consider the power consumption of other devices connected to your circuit.

For large clusters, speak with an Apple Systems Engineer to determine the correct power infrastructure. For information about Apple consulting services and service and support plans, see the Apple Server Service and Support website at http://www.apple.com/server/support.

WARNING: The formulas in this section help you estimate your power requirements.

These estimates may not be high enough, depending on your configuration.

For example, if your cluster uses one or more Xserve RAID systems, or other thirdparty hardware, you must include their power consumption requirements.

Cooling Requirements

It’s very important to keep your Xserve computers running at normal operating temperatures (see www.apple.com/xserve/specs.html). If your servers overheat they will shut down and any work being done will be lost. You can also damage or shorten the life span of your servers by running them at high temperatures.

To obtain thermal output figures for cluster nodes, see the following articles on the AppleCare Service & Support website:

Â Article 86694, “Xserve G5: Power consumption and thermal output (BTU)

information,” at www.info.apple.com/kbnum/n86694

Â Article 75383, “Xserve: Power Consumption and Thermal Output (BTU) Information,”

at www.info.apple.com/kbnum/n75383

Â Article 86251, “Xserve (Slot Load): Power Consumption and Thermal Output (BTU)

Information,” at www.info.apple.com/kbnum/n86251

Â Article 304887, “Xserve (Late 2006): Power consumption and thermal output (BTU)

information,” at www.info.apple.com/kbnum/n304887

Chapter 8 Identifying Prerequisites and System Requirements 69

Consider the thermal output of other devices, such as the management computer, Xserve RAID systems, monitors, and other heat-generating devices used in the same room.

As always, consult with your system administrator to determine the necessary level of cooling that your cluster and its associated hardware require for safe and effective operation.

Weight Requirements

For Xserve and cluster node weight information, see the Apple Xserve website at www.apple.com/xserve.

Also include the weight of the rack if you’re bringing in a dedicated rack, and the weight of other devices used by the cluster.

If you mount cluster nodes in a rack with casters, set up the rack where you’ll keep the cluster and then mount the systems. A heavy rack is difficult to move, particularly across carpet. In addition, vibrations caused by moving your cluster long distances when racked might damage your hardware.

After determining weight requirements, consult with your facilities personnel to make sure the room where the cluster will be installed meets the weight requirements.

Space Requirements

You should have enough space to house the cluster and enable easy access to it to perform routine maintenance tasks. Also, locate the cluster where it doesn’t affect and isn’t affected by other hardware in your server room.

Consider the following when choosing a location for your cluster:

Â Don’t place the cluster next to an air vent, air intake, or heat source. Â Don’t place the cluster directly under a sprinkler head. Â Don’t obstruct doors (especially emergency exit doors) with your cluster. Â Leave enough room in front of, beside, and especially behind your cluster. Â Make sure air can flow around the cluster. The room might be very well cooled, but if

air can’t easily flow around the cluster, your computers can still overheat.

70 Chapter 8 Identifying Prerequisites and System Requirements

If you’re housing your cluster in a computer room, make sure you have at least 18 inches of clearance in front and behind your systems. If you’re housing it in an office or other unmanaged space, make sure your cluster has at least 18 inches of clearance on all sides of the rack, as shown in the following illustration:

18 ”

18 ” 18 ”

18 ”

You should have enough space to open the rack’s door, slide out systems, and perform other routine maintenance tasks.

Network Access Requirements

Your cluster requires access to two networks: Â Private network. This is a high performance Gigabit Ethernet network. You’ll need at

least a 1-Gigabit switch.

Â Public network. This network connects the cluster controller to the client computers

that submit jobs to your cluster.

This guide uses a number of 10.0.2.x addresses as examples for your public network connections. Do not use these example addresses when configuring your cluster. When you see a 10.0.2.x address, substitute the IP address appropriate for your organization’s network.

The following illustration shows a configuration of a cluster connected through a switch creating a private network. The illustration also shows the headnode connected to the public and private network.

Public Network

Gigabit Switch

Private Network

Chapter 8 Identifying Prerequisites and System Requirements 71

Software Requirements

You need:

Â A site-licensed copy of Mac OS X Server v10.5 or later. Â One or more copies of Apple Remote Desktop v3 or later (recommended). Â The latest version of Server Tools.

Volume-Licensed Serial Number

To run multiple copies of Mac OS X Server, you should obtain a volume-licensed serial number. If you haven’t obtained a volume-license serial number, contact your local Apple sales representative.

Note: The format of the server serial number is xsvr-999-999-x-xxx-xxx-xxx-xxx-xxx-xxx-x, where x is a letter and 9 is a digit. The first element (xsvr) and the fourth (x) must be lowercase.

Apple Remote Desktop

Configuration and administration of your cluster will be greatly enhanced with Apple Remote Desktop v3 or later. You can use Apple Remote Desktop to configure, monitor, and control your cluster, as well as rapidly install software.

Server Tools

If you are using a management computer, you must install Server Tools on your management computer. The Server Tools suite includes:

Â Server Assistant Â Server Admin Â Server Monitor Â Xgrid Admin

You use these tools to remotely manage the cluster. Install these tools using the Server Admin Tools CD, which is included with Xserve and Mac OS X Server.

Private Network Requirements

The compute nodes will be connected through a private Ethernet network, separate from your organization’s primary (public) network. The cluster controller will be connected to the private and public networks and will act as a gateway, allowing users connected to the public network (or the Internet) to use the cluster’s resources, and allowing the compute nodes to use resources outside the private network.

Private network requirements include the following: Â A range of IP addresses should be reserved for the private network. A number of non

routable IP address ranges are available for use with private networks. These addresses cannot be used with the Internet without Network Address Translation (NAT), which will be provided by the cluster controller.

72 Chapter 8 Identifying Prerequisites and System Requirements

Â Addresses in ranges such as 192.168.x.x, 10.0.x.x, and 172.16.x.x are commonly used for

private networks. Because the first two are used more commonly with NAT devices used in the home, and because your users may want to connect to your cluster from behind one of these devices, it is best to choose a range less likely to exist on your user’s networks. This guide uses the range 172.16.1.1 - 17 2 .16.1. 2 54 (subnet mask

255.255.255.0). You can use this range for your cluster, or use a different one if you prefer.

Â You need a Domain Name System (DNS) server that will be used to assign names to

network addresses so you don’t need to remember IP addresses. Your private network can use a DNS domain name that is not in use on (and is not valid with) the Internet. This guide uses the .cluster domain. You can use this domain with your cluster as well.

WARNING: Where you see the DNS domain .example.com, you should substitute the

DNS domain used for your organization’s public network.

Static IP Address and Hostname Requirements

Your cluster requires a single static IP address and a matching fully qualified and reverse resolvable DNS entry for the cluster controller.

By using a static IP address rather than a dynamic one you can maintain a consistent address that clients can always use.

Note: Initiate the process of requesting an IP address and a hostname as early as you can before setting up the cluster, to account for the lead time typically required.

Chapter 8 Identifying Prerequisites and System Requirements 73

74 Chapter 8 Identifying Prerequisites and System Requirements

9 Preparing the Cluster for

Configuration

Use this chapter to mount the systems on the rack, connect the systems to a power source and the private network, and configure the optional management computer.

To prepare the cluster nodes for configuration, you mount them in racks and connect them to the power source and private network. You also set up the management computer by installing Apple Remote Desktop and Server Tools.

Preparing the Cluster Nodes for Software Configuration

After you prepare the physical infrastructure for hosting the cluster, the next step is to mount the cluster nodes and prepare them for software configuration.

To prepare the cluster for configuration:

1 Unpack the computers and mount them in the rack.

For more information, use the instructions provided with your hardware.

Note: If you’re using existing Xserve computers, you must perform a clean installation of Mac OS X Server v10.5 or later to restore the systems to default settings.

2 Record each computer’s serial number and keep the information in a safe place.

When recording the serial numbers, do it in a way that makes it easy for you to tell which serial number belongs to each computer. For example, use a table to map a system’s serial number to the name on a label on the system’s front panel.

Serial Number Name

serial_number_0 Cluster controller

serial_number_1 Compute node 1

serial_number_2 Compute node 2

. . . . . .

You can find the serial number of an Xserve computer in four places: Â The unit’s back panel:

Serial number label

Â The unit’s interior

If you look for the serial number on the unit’s interior, don’t confuse the serial number for the server with the serial number for the optical drive—these are different numbers. The Xserve computer’s serial number is denoted by “Serial#” (not “S/N”) followed by 11 characters.

Â The large pull-out plastic tab on Xserve computers with Intel processors Â The cardboard shipping box

You can use a barcode scanner on the box label to get the serial number.

3 Use the following guidelines, connect the cluster computers to a power source:

Â Power cables. Use the long power cables with a horizontal power distribution unit

(PDU) and the short cables with a vertical PDU. When using the long cables, connect the servers so you can tell which cable belongs to which node. Consider labeling cables to make it easier to map a cable to a node.

Â Connection to the uninterruptible power supply (UPS). Connect the cluster

controller, storage devices used by the cluster, and the private network switch to a UPS unit to protect against data loss in case of a power outage. If your UPS is connected to the controller through USB, you can use the UPS configuration settings in System Preferences.

Note: If you are using a UPS, the UPS low power shutdown script is available for additional advanced power options. This script is located at /usr/libexec/upsshutdown.

76 Chapter 9 Preparing the Cluster for Configuration

Serial number label

Â UPS connection to wall outlet. Make sure the electrical outlets support the UPS plug

shape.

Â Power cord retainer clips. To prevent power cables from slipping out, use the power

cord retainer clips that come with your Xserve systems.

Â Air flow. Don’t permit a mass of power cables to obstruct air flow.

4 Connect the two Ethernet ports (shown in the illustration below) by connecting port 1

on the cluster controller to the public network and port 2 to the private network.

Ethernet port 2 (private network)

Ethernet port 1 (public network)

5 Connect Ethernet port 1 on the remaining nodes in the cluster to the private network,

in order.

Use the last port on the switch for the cluster controller, the first port for the first compute node, the second port for the second compute node, and so on.

Connecting the Ethernet cables to the switch in order helps you identify which cluster node a cable belongs to.

Chapter 9 Preparing the Cluster for Configuration 77

(Optional) Setting Up the Management Computer

You can use the management computer to remotely set up, configure, and administer your cluster.

To set up the management computer:

1 Connect the management computer to the private network (as shown) using the

second-to-last switch port.

Optional Management

computer

Private Network

2 Start the management computer.

3 Disable AirPort and any network connection other than the one you’ll be using to

connect to your private network.

4 If they aren’t installed, install the latest version of the Mac OS X Server tools and

applications from the Mac OS X Server Administration Tools CD, which is included with the Mac OS X Server installation kit.

The Mac OS X Server tools and applications are installed into /Applications/Server/.

5 Configure the management computer’s network address.

If your cluster controller is not connected to a keyboard, video display, and mouse, or if you prefer to set up the cluster from a management computer, you will connect the management computer to the private network and disable all other network connections.

Until the controller is assigned an IP address on the private network, configure your management computer to use DHCP. After the controller is assigned an IP address, you should configure your management computer to use a static address in the range reserved for your private network, but outside the range reserved for compute nodes.

78 Chapter 9 Preparing the Cluster for Configuration

If you are adopting the IP address range that is used in this guide (172.16.1.1 - 172 .16 .1.199 for compute nodes, 172.16.1.254 for the controllers), you can configure your management computer to use 172.16.1.253.

After you connect to the private network, the server administration tools mentioned in this guide (Server Assistant, Server Admin, Workgroup Manager, and Xgrid Admin) can be installed and used on your management computer, connecting via IP address to the cluster controller (and later the compute nodes).

You can also use Apple Remote Desktop, or the screen sharing feature included with Mac OS X v10.5, to control the nodes via the network, using the server administration tools directly on the remote nodes.

Chapter 9 Preparing the Cluster for Configuration 79

80 Chapter 9 Preparing the Cluster for Configuration

10 Setting Up the Cluster Controller

Use this chapter to set up server software on the cluster controller and configure the services running on it.

You use Server Assistant, Server Admin, and Apple Remote Desktop (optional) to set up and configure the cluster controller.

Setting Up Server Software on the Cluster Controller

To set up the cluster controller, use Server Assistant (located in /Applications/Server/).

To set up the cluster controller:

1 Start the cluster controller.

The cluster controller should have two Ethernet cables, with Ethernet port 1 connected to the public network switch and Ethernet port 2 connected to the private network switch. Only the cluster controller should be running on the private network.

If you are using a management computer, use Server Assistant to connect to the controller. For more information about using Server Assistant remotely, see Server Administration.

If you are using the Apple Remote Desktop to manage the controller, connect to the controller and initiate a screen control session. For more information, see the Apple Remote Desktop Guide.

2 In the Welcome screen, click Continue.

3 In the Server Configuration screen:

a Select Advanced.

b Click Continue.

4 In the Keyboard screen:

a Select the keyboard layout for the server.

b Click Continue.

5 In the Serial Number screen:

a Enter a volume license Mac OS X Server serial number.

b Click Continue.

6 In the Registration Information screen, fill out the form or press Command-Q and click

Skip.

7 In the Administrator Account screen:

a Create the user account you’ll use to administer the cluster controller (for example,

Administrator).

b Click Continue.

8 In the Network Address screen:

a Choose “No, configure network settings manually.”

b Click Continue.

9 In the Network Interfaces screen:

a Enable TCP/IP only for Ethernet 1 and Ethernet 2 by selecting the checkboxes for

both Ethernet 1 and Ethernet 2.

b Click Continue.

10 In the TCP/IP Connection screen for the Ethernet 1 port:

a From the Configure pop-up menu, choose Manually.

b In the IP Address field, enter the public IP address of the cluster controller (for

example, 10.0.2.199).

c In the Subnet Mask field, enter the public subnet mask of the cluster controller (for

example, 255.255.255.0).

d In the Router field, enter the IP address of the router for the public network (for

example, 10.0.2.1).

e Leave the DNS Servers field blank.

f Leave the Search Domains field blank.

g Click Configure IPv6.

h From the Configure IPv6 pop-up menu, choose Off.

i Click OK, then click Continue.

11 In the TCP/IP Connection screen for the Ethernet 2 port:

a From the Configure pop-up menu, choose Manually.

b In the IP Address field, enter the private IP address of the cluster controller (for

example, 172.16.1.254).

c In the Subnet Mask field, enter the private subnet mask of the cluster controller (for

example, 255.255.255.0).

d In the Router field, enter the private IP address of the cluster controller (for example,

172.16.1.254).

82 Chapter 10 Setting Up the Cluster Controller

e Leave the DNS Servers field blank.

f Leave the Search Domains field blank.

g Click Configure IPv6.

h From the Configure IPv6 pop-up menu, choose Off.

i Click OK, then click Continue.

12 In the Network Names screen:

a Enter the primary DNS name and computer name.

The cluster controller has a public and a private DNS name. Use the controller’s private names. For example, use controller.cluster for the primary DNS name and controller for the computer name.

A warning may appear saying the server’s address resolves to another name. Click OK.

b Verify that the Enable Remote Management checkbox is selected.

c Click Continue.

13 In the Time Zone screen:

a In the Closest City pop-up menu, choose your time zone.

b Click Continue.

14 In the Directory Usage screen:

a From the “Set directory usage to” pop-up menu, choose Standalone Server.

b Click Continue.

15 In the Confirm Settings screen:

a Review the settings.

b Click Apply.

c Wait for your settings to be applied.

16 Click Start Now, wait until Server Admin launches, and then (if prompted) enter the

administrator user name and password.

17 When prompted, click Start Now; then when Server Admin launches, connect using the

administrator user name and password.

18 Select the checkboxes to enable the following services: DHCP, DNS, Firewall, NAT, NFS,

Open Directory, VPN, and Xgrid.

19 Click Save.

20 To reveal the enabled services, expand the triangle next to the controller in the Servers

list.

Chapter 10 Setting Up the Cluster Controller 83

Configuring DNS Service

Use Server Admin on the cluster controller to create a local DNS zone and add records to map cluster nodes to their corresponding IP addresses.

To configure DNS service:

1 Open Server Admin if it is not already open.

2 If necessary, click the triangle to the left of the controller to view a list of services.

3 Click DNS in the expanded Servers list.

4 Click Settings.

5 Click the Add (+) button below the “Forwarder IP Addresses” list, then enter the

network address of your public DNS server (for example, 10.0.2.201).

6 Click Save.

7 Click Zones.

8 Click the Add Zone button, then select “Add Primary Zone (Master).”

A default zone named example.com is created.

9 Select the default example.com zone.

10 Change the primary zone name to your private DNS domain.

The primary zone name must end with a period (for example, “cluster.”).

11 Set Admin Email to the mail address of the person who should be notified of DNS

errors (for example, administrator@example.com).

12 Double-click the first entry in the Nameservers list and change it to the private DNS

hostname of the cluster controller (for example, controller).

13 Click Save.

14 Select the cluster DNS zone.

15 Click the triangle to the left of the cluster DNS zone.

16 Click Add Record, then select “Add Machine (A).”

17 Select the newly created newMachine.

18 Change the Machine Name field to the private hostname of the controller (for example,

cluster).

19 Double-click the first IP address in the IP Address list and then change the first IP

address to the public IP address for the controller (for example, 10.0.2.199).

20 Click Save.

21 Repeat steps 16 through 20 for each compute node using the private IP address

reserved for them.

For example, the name of the first compute node is node1 assigned to 172.16.1.1 , n ode2 assigned to 172.16.1.2, and so on.

84 Chapter 10 Setting Up the Cluster Controller

22 Click the Start DNS button (below the Servers list).

The DNS service status indicator turns green when the service starts.

23 From the Apple Menu open System Preferences (/Applications/System Preferences).

24 Click Network.

25 Select the Ethernet 1 interface.

26 In the DNS Server field enter the public IP address of the controller (for example,

10.0.2.199).

27 In the Search Domains field enter the private DNS domain (for example, cluster).

28 Click Apply.

29 Quit System Preferences.

Verifying DNS Settings

Open Directory requires correct configuration of the DNS service. Before configuring the Open Directory Master, verify your DNS settings carefully. Any incomplete or incorrect Open Directory configuration prevents the cluster from functioning.

To verify DNS settings:

1 From the Dock on the cluster controller open the Terminal application.

2 Verify the fully qualified DNS name of the cluster controller using the

command.

For example, entering hostname returns controller.cluster.

$ hostname

controller.cluster

3 Verify that the hostname of the cluster controller matches its assigned IP address in

DNS using the host command.

For example, entering host controller returns 10.0.2.199.

$ host controller

controller.cluster has address 10.0.2.199

4 Verify that the fully-qualified DNS name of the cluster controller matches its public IP

address using the host command.

For example, entering host controller.cluster returns 10.0.2.199.

$ host controller.cluster

controller.cluster has address 10.0.2.199

5 Verify that the reverse DNS record of the controller matches its fully-qualified DNS

name using the host command.

For example, entering host 10.0.2.199 returns controller.cluster.

$ host 10.0.2.199

199.2.0.10.in-addr.arpa domain name pointer controller.cluster

hostname

Chapter 10 Setting Up the Cluster Controller 85

If any DNS lookups do not match, repeat the process to create the DNS zone and entry for the controller. Do not continue the cluster setup process until DNS resolves correctly.

6 Quit Terminal.

Configuring Open Directory Service

The Open Directory service is responsible for authenticating users, publishing server setup configurations, and publishing network share automount records.

Configuring the Cluster Controller as an Open Directory Master

Use Server Admin to configure the Open Directory service on the cluster controller.

To configure Open Directory settings:

1 Open Server Admin if it is not already open.

2 In the controller’s list of services, click Open Directory.

3 Click Settings, click General, then click Change.

This opens the Open Directory service configuration assistant.

4 Select Open Directory Master, then click Continue.

5 Create a Directory Administrator account, then click Continue.

Name, Short Name, User ID, Password: The Directory Administrator account administers the Open Directory domain that all nodes share. You can use the default Name, Short Name, and User ID. Choose a unique password.

6 Enter the Master Domain information, then click Continue.

Kerberos Realm: This field is preset to be the same as the server’s private fully qualified DNS name converted to capital letters. Use the preset Kerberos Realm (for example, CONTROLLER.CLUSTER).

Search Base: This field is preset to a search base suffix for the new LDAP directory, derived from the private DNS name of the cluster controller. Use the preset LDAP search base (for example, dc=controller,dc=cluster).

WARNING: If these fields are not prepopulated, it might indicate your DNS settings

were not configured properly. If so, click the Cancel button and redo the steps listed in “Configuring DNS Service” on page 84.

7 Confirm settings, then click Continue.

8 When the service configuration assistant completes, click Close.

9 Verify the Role is set to Open Directory Master.

86 Chapter 10 Setting Up the Cluster Controller

Note: You can click Logs and monitor the log file /Library/Logs/slapconfig.log for errors while the Open Directory domain is being created. You can also use the Console (located in /Applications/Utilities/) or Terminal with the command “tail -f/Library/Logs/ slapconfig.log.” In the log, warnings such as the following can be ignored:

WARNING: no policy specified for [...] defaulting to no policy

After the Open Directory domain is created, the Open Directory service starts and the status icon turns green.

Configuring DHCP Service

Using Server Admin, configure DHCP service on the cluster controller to provide LDAP and DNS information to the compute nodes.

To configure DHCP service:

1 Open Server Admin if it is not already open.

2 In the controller’s list of services, click DHCP.

3 Click Subnets.

4 Remove all subnets.

5 Create a new subnet for Ethernet port 2.

6 Click General and do the following:

a In the Subnet Name field, enter a subnet name (for example, Cluster Private

Network).

b In the Starting IP Address field, enter the first IP address in the private network range

available for compute nodes (for example, 172.16.1.1 ) .

c In the Ending IP Address field, enter the last IP address in the private network range

available for compute nodes (for example, 172.16.1.99).

Note: Leave some addresses unused at the end of the range for other devices and VPN connections.

d In the Subnet Mask field, enter the subnet mask for your private network (for

example, 255.255.255.0).

e From the Network Interface pop-up menu, select en1 if it is not already selected.

This menu shows the UNIX name for the port. The UNIX name for Ethernet 2 should be en1.

f In the Router field, enter the private IP address of the cluster controller (for example,

172.16.1.254).

g Set the lease time for the IP addresses served by the DHCP service to at least 1

month.

7 Click Save.

Chapter 10 Setting Up the Cluster Controller 87

8 Click DNS below the Subnets list.

9 In the DNS Servers field, enter the public address of the cluster controller (for example,

10.0.2.199).

10 In the Default Search Domain field, enter the DNS domain for your private network (for

example, cluster).

11 Click Save.

12 Click LDAP.

13 In the Server Name field, enter the fully qualified DNS name of the cluster controller

(for example, controller.cluster).

14 In the Search Base field, enter the LDAP search base for your shared Open Directory

domain (for example, dc=controller, dc=cluster).

This entry should match the LDAP search base entry you made when you created the Open Directory domain.

Note: Verify the Server Name and Search Base fields. Errors in the LDAP configuration of your DHCP service prevent proper autoconfiguration of cluster nodes, automounting of network directories, and use of network user accounts.

To avoid typographical errors, copy and paste the search base settings from the Open Directory service search base settings.

15 Select the Enable checkbox to the left of the subnet you just created.

16 Click Save.

17 Click the Start DHCP button (below the Servers list).

Configuring Firewall Settings on the Cluster Controller

The firewall on the controller is configured to enable access to all protocols from the public and private networks, but more limited access (for SSH and VPN) from external networks, including the Internet. You can adjust these rules to narrow or expand access to your controller.

To configure firewall settings on the cluster controller:

1 In the controller’s list of services, click Firewall.

2 Click Settings, then click Address Groups.

3 From the IP Address Groups list, remove all entries except for “any.”

4 Click the Add (+) button.

5 In the Group name field, enter the name of your public network (for example,

example.com).

6 In the “Addresses in group” field, change the first entry to match your public IP network

in CIDR notation.

88 Chapter 10 Setting Up the Cluster Controller

For a subnet mask of 255.255.255.0, use “/24” after the network address (for example,

10.0.2.0/24).

7 Verify that the address range for the list accurately describes the address range used by

your public network.

8 Click OK.

9 Click the Add (+) button to add another IP address group.

10 In the “Group name” field, name the group with your private DNS domain name (for

example, cluster).

11 In the “Addresses in group” field, change the first entry to match your private IP

network in CIDR notation.

For a subnet mask of 255.255.255.0, use “/24” after the network address (for example,

172.16.1.0/24).

12 Click OK.

13 Click Save.

14 Click Services.

15 From the “Edit Services for” pop-up menu, choose “any.”

16 Select “Allow only traffic from ‘any’ to these ports.”

17 Select the following ports (in addition to what’s already selected):

Â ESP - Encapsulating Security Payload protocol Â IKE NAT Traversal Â VPN ISAKMP/IKE (500) Â VPN PPTP—Point-to-Point Tunneling Protocol (1723)

Note: Enabling SSH and VPN ports on the controller allows remote access to the controller from your public network. Your public network can also be protected by a firewall service or device. If you plan to access your cluster from outside your public network (for example, using the Internet), talk to your system administrator about enabling the same ports on that firewall as well.

18 Click Save.

19 From the “Edit Services for” pop-up menu, choose the public network that was created

in step 5 (for example, example.com).

20 Select “Allow all traffic from <public network>.”

21 Click Save.

22 From the “Edit Services for” pop-up menu, choose the private network that was created

in step 10 (for example, cluster).

23 Select “Allow all traffic from <private network>.”

24 Click Save.

Chapter 10 Setting Up the Cluster Controller 89

25 Click the Start Firewall button (below the Servers list).

Configuring NAT Settings on the Cluster Controller

Network Address Translation (NAT) allows compute nodes to share the controller’s connection to the public network.

To configure NAT:

1 In the controller’s list of services, click NAT.

2 Click Settings, then verify that IP Forwarding and Network Address Translation (NAT) is

selected.

3 Verify that the “External network interface” pop-up menu is set to your public Ethernet

interface (for example, Ethernet 1).

4 Verify that the Enable NAT Port Mapping Protocol checkbox is selected.

5 Click the Start NAT button (below the Servers list).

Configuring NFS

Using Server Admin, configure the NFS service on the cluster controller. NFS is used for file sharing and network home directory mounts.

To configure NFS service:

1 In the controller’s list of services, click NFS.

2 Click Settings.

3 In the “Use__server threads” field, enter a number to specify the maximum number of

NFS threads, or daemons, you want to run at one time.

An nfsd daemon is a server process that runs continuously behind the scenes and processes read and write requests from clients. The more threads that are available, the more concurrent clients can be served.

4 Click Save.

5 Click the Start NFS button (below the Servers list).

Configuring VPN Service

Configure the VPN service to enable secure connections from computers on remote networks.

To configure VPN service:

1 In the controller’s list of services, click VPN.

2 Click Settings, then click PPTP.

3 Select the Enable PPTP checkbox.

90 Chapter 10 Setting Up the Cluster Controller

4 In the Starting IP address field, enter the first private IP address you want to assign to

remote VPN clients (for example, 172.16.1.200).

5 In the Ending IP address field, enter the last private IP address you want to assign to

remote VPN clients (for example, 172.16.1.229).

6 Click Save.

7 Click the Start VPN button (below the Servers list).

Configuring Xgrid Service

Using Server Admin on the cluster controller, configure it as an Xgrid controller and then start Xgrid service.

Note: Because the cluster controller is also responsible for authentication, NFS sharing, network services, and possibly other critical services, it is not advisable for a cluster controller to run the Xgrid agent.

To configure the Xgrid service:

1 In the controller’s list of services, click Xgrid.

2 Click Overview.

3 Click Configure Xgrid Service.

The service configuration assistant will launch.

4 Click Continue.

5 Select “Host a grid,” then click Continue.

6 Enter the directory administrator’s user name and password.

This is the directory administrator account you created when you enabled the Open Directory service.

7 Click Continue.

8 Verify that the Xgrid settings include the correct Kerberos realm (for example,

CONTROLLER CLUSTER).

9 Click Continue.

10 Once the Xgrid service is configured, click Close.

11 Click Settings.

12 Click Agent, then deselect Enable Agent Service.

13 Click Save.

14 When prompted to restart Xgrid, click Restart.

Chapter 10 Setting Up the Cluster Controller 91

Preparing the Data Drive as a Mirrored RAID set

When preparing your data drive you should protect your data by using a mirrored RAID set, also referred to as RAID 1. You can use the Disk Utility application to create the mirrored RAID set. To create a mirrored RAID set you must have two or more disks.

Note: Your network share points should be located on a different drive than your operating system, ideally on a mirrored RAID set.

To prepare the data drive as a mirrored RAID set:

1 Open the Disk Utility application (in /Applications/Utilitie).

2 From the drive list on the left, click one of the two drives to be used in the RAID.

3 Click RAID.

4 Enter a name for the RAID set (for example, Data).

5 Drag the disks you want to mirror from the left side of the pane to the disk list at the

center of the pane.

6 For each disk you dragged to the disk list, verify the disk type is set to “Raid Slice.”

To use the disk as a mirror at all times, select RAID Slice.

To use the disk as a mirror only when another disk fails, select Spare.

7 To automatically rebuild mirror data, click Options, select “Automatically rebuild RAID

mirror sets,” and then click OK.

8 Select the RAID set from the disk list and then from the Volume Format pop-up menu

choose either “Mac OS Extended (Journaled)” or “Mac OS Extended (Case-sensitive, Journaled)”.

If you plan to work with applications or source code that was designed for other UNIX operating systems, choose the case-sensitive option.

9 From the RAID Type pop-up menu, choose Mirrored RAID Set.

10 Click Create.

11 Select the mirrored RAID that will host your data volume.

12 Use the cluster administrator username and password to authenticate.

13 Verify that the RAID set has the correct format.

14 Quit the Disk Utility application.

92 Chapter 10 Setting Up the Cluster Controller

Creating a Home Directory Automount Share Point

Use Server Admin to configure an automount share point on the cluster controller.

To create an automount home directory share point:

1 Open Server Admin and select the controller in the Servers list.

2 Click File Sharing, then click Volumes.

3 Select the volume you want to contain the home directory share point (for example,

Data).

4 Click Browse.

5 Click New Folder, name the folder “home,” then click Create.

6 Click Save.

7 Select the home folder you created.

8 Click Share, then click Share Point.

9 Select Enable Automount.

The Automount configuration screen appears.

10 Verify that the directory is set to /LDAPv3/127.0.0.1.

11 From the protocol pop-up menu choose NFS.

12 Verify that “Use for” is set to User home folders.

13 Click OK.

14 When prompted, enter the directory administrator’s user name and password.

15 Deselect “Enable Spotlight searching.”

16 From Share Point, click Protocol Options.

The Protocol Options screen appears.

17 Click NFS.

18 Select the “Export this item and its contents to” checkbox, then choose Subnet from

the pop-up menu.

19 Set the Subnet address field to your private network address (for example, 172.16.1.0).

20 Set the Subnet mask field to your private network subnet mask (for example,

255.255.255.0).

21 Verify that the mapping pop-up menu is set to “Root to Nobody.”

22 Click OK.

23 Click Save.

24 Restart the controller (Apple Menu > Restart).

Chapter 10 Setting Up the Cluster Controller 93

Creating User Accounts

Use Workgroup Manager to create user accounts.

To create user accounts:

1 If you did not restart the cluster controller at the end of the previous section (“Creating

a Home Directory Automount Share Point” on page 93), restart it now.

2 Log in using your administrator account.

3 Open Workgroup Manager (located at /Applications/Server/).

You can also open Workgroup Manager from the Dock.

4 Connect to the cluster controller using its hostname and your administrator user name

and password.

5 On the right side of the Workgroup Manager window, click the lock button.

6 Authenticate with the directory administrator username and password.

7 Click Accounts.

8 Select the users icon tab above the accounts listing.

9 Click New User.

10 In the Name field, enter the full name for a user (for example, “Tom C”).

11 In the Short Names list box, enter a short username for the user (for example, “tac”).

12 In the Password field, enter a password for the user.

13 In the Verify field, reenter the password for the user.

14 Click Save.

15 Click Advanced.

16 From the Login Shell pop-up menu, choose the preferred shell for the user.

17 Click Home.

18 From the list, select the NFS automount share point (home).

19 Click Create Home Now.

20 Click Save.

21 Repeat this process for each cluster user.

22 Quit Workgroup Manager.

94 Chapter 10 Setting Up the Cluster Controller

11 Setting Up Compute Nodes

Simplify the compute node setup process by creating Auto Server Setup records.

An Auto Server Setup record is an XML property list with values that can be used to automatically complete the Server Assistant for newly installed Mac OS X servers. Auto Server Setup records can be accessed using external storage (for example a CD, USB drive, or iPod) or over a network using Open Directory.

For more information about creating and using Auto Server Setup records, see Server Administration.

You can accomplish additional automation of compute node configuration by using scripts executed with SSH or Apple Remote Desktop software.

Creating an Auto Server Setup Record for Compute Nodes

To automate the process of setting up compute nodes, use Server Assistant to save the compute node configuration to a file or Open Directory record.

To create an Auto Server Setup record:

1 On the cluster controller, open Server Assistant (located in /Applications/Server/).

2 In the Welcome screen:

a Select “Save advanced setup information in a file or directory record.”

b Click Continue.

3 In the Language screen:

a Select the language you want to use to administer the server.

b Click Continue.

4 In the Keyboard screen:

a Select the keyboard layout for the server.

b Click Continue.

5 In the Serial Number screen:

a Enter a site-licensed Mac OS X Server serial number.

Note: If you don’t have a site-licensed number you must manually enter unique serial numbers for each compute node after it has been configured.

b Click Continue.

6 In the Administrator Account screen:

a Create the account you’ll use to administer compute nodes.

b Click Continue.

7 In the Network Interfaces screen:

a Click Add.

b In the Port Name field, enter “Ethernet 1.”

c In the Device Name field, enter “en0” and leave the Ethernet Address field blank.

d Click OK.

e Enable TCP/IP for Ethernet 1.

f Click Continue.

8 In the TCP/IP Connection screen for the Built-in Ethernet 1 port:

a From the Configure pop-up menu, choose Using DHCP.

b Leave the other fields blank.

c Click Continue.

9 In the Network Names screen:

a Leave the Primary DNS Name field blank.

b Leave the Computer Name field blank.

c Verify that the “Enable Remote Management” checkbox is selected.

d Click Continue.

A warning appears indicating you left some fields blank.

e Click Continue.

10 In the Time Zone screen:

a From the Closest City pop-up menu, choose your time zone.

b Click Continue.

96 Chapter 11 Setting Up Compute Nodes

11 In the Directory Usage screen:

a From the “Set directory usage to” pop-up menu, choose “Connected to a Directory

System”.

b From the Connect pop-up menu, choose “Open Directory Server.”

c In the IP Address or DNS Name field, enter the private DNS name of the cluster

controller (for example, controller.cluster).

d Click Continue.

12 In the Confirm Settings screen:

a Read the configuration summary to confirm that you have made the correct settings.

b Click Save As.

13 In Save settings, use the following to choose whether to save your setting to a

configuration file or Open Directory record.

If you use a configuration file, it should be named generic.plist and saved to a CD, DVD, USB drive, iPod, or other removable drive. It should be located in a folder called Auto Server Setup at the top level of the removable file system. The file is used if the removable drive is present when an unconfigured compute node starts for the first time.

If you save your settings to an Open Directory record, an unconfigured compute node discovers the record via DHCP and configures itself accordingly. Save the record to the LDAPv3/127.0.0.1 domain and name it generic. When asked, specify an Open Directory server using the controller’s DNS name (for example, controller.cluster) or IP address (for example, 10.0.2.199).

Saving settings to an Open Directory record without encryption will require the use of password (.pass) files. Saving them without encryption will expose the administrator password to anyone with access to the Open Directory domain. For more information about the creation and use of Auto Server Setup record and encryption, see Server Administration.

a Select Directory Record.

b If creating a Directory Record, choose /LDAPv3/127.0.0.1 from the Directory Domain

pop-up menu.

c Decide if you want to encrypt the record.

d In the Record Name field, enter “generic.”

e Click OK and then authenticate using the directory admin login and password you

created when you configured Open Directory.

14 Click OK.

15 Quit Server Assistant.

Chapter 11 Setting Up Compute Nodes 97

Verifying LDAP Record Creation

To verify the creation of the LDAP directory record that will be used by compute nodes to autoconfigure, use the slapcat command on the cluster controller.

To verify the LDAP record creation:

1 Open a Terminal window on the cluster controller and enter the following command:

$ sudo slapcat | grep generic

2 When prompted enter the administrator password .

This command displays the generic records in the LDAP database on the cluster controller. In this case, there should only be one record—the one you created in the previous section.

dn: cn=generic,cn=autoserversetup,dc=controller,dc=cluster

cn: generic

Setting Up Compute Nodes

Setting up compute nodes involves obtaining IP addresses for each compute node connected to your private network. This section provides useful tips for setting up compute nodes depending on your cluster configuration.

To set up compute nodes:

1 Make sure compute nodes are connected to the private network through Ethernet

port 1.

2 Start the first compute node.

The DHCP service hosted on the cluster controller provides IP addresses to nodes when they start, beginning with the first address in the range and incrementing the address for each request. The DHCP lease time specified in the Server Admin settings for the DHCP service determines how long this address is reserved for a computer.

It is advisable for each node in a cluster to use sequential IP addresses that correspond to their physical position in a rack and the names they have been assigned. Node1 would have an address that ends in 1 (for example, 172.16.1.1) and node199 would have an address that ends with 199 (for example, 172.16.1.199).

If you set up your cluster in this manner, start the first node and wait until you verify its IP address before starting the next one. You can check DHCP IP address assignments in the DHCP Clients pane of Server Admin. Because Server Admin does not maintain a persistent connection to the servers it administers, you might need to click the Refresh button in the toolbar to update the client listing immediately.

98 Chapter 11 Setting Up Compute Nodes

If an Auto Server Setup record is available to the compute node through a removable drive or Open Directory record, it will configure itself and reboot. After you verify that the first node has completed this process, start the remaining compute nodes sequentially, allowing time for them to obtain sequential IP addresses from the DHCP server and for autoconfiguration. Do not disconnect or remove disks until you are sure the server has applied the settings.

3 Select the DHCP service and view client connections.

Static Maps in the DHCP Static Maps pane of Server Admin enable you to guarantee that an IP address is always reserved for a specific node, regardless of how much time has elapsed since it was assigned its address.

In addition to providing the IP address assignment, the DHCP service on the cluster controller provides the IP address and search base for the Open Directory domain on the cluster controller.

Configuring Cluster Nodes

When configuring cluster nodes, use Server Admin to name cluster nodes, join them to the Kerberos realm, and join them to a grid.

To configure cluster nodes:

1 Open Server Admin.

2 Click the Add Server (+) button below the Servers list.

3 Connect to the cluster node using its IP address.

If you used an Auto Server Setup record to configure the nodes, use the administrator user name and password you created with that record.

4 In the Servers list, click the cluster node.

5 Click Settings.

Note: If the Mac OS X Server serial number is not valid, Server Admin doesn’t permit you to administer services. If you did not supply a volume license serial number when creating the Auto Server Setup file, you must enter a valid serial number for each node before you can continue. Click General to verify the serial number.

6 Click Network.

7 In the Computer Name and Local Hostname fields, enter the computer name and

hostname of the cluster node (for example, node1).

8 Click Save.

9 Click Services.

10 Select the Open Directory checkbox.

11 Select the Xgrid checkbox.

Chapter 11 Setting Up Compute Nodes 99

12 Click Save.

13 Repeat steps 2 through 12 for each compute node.

You can also use Apple Remote Desktop to set the names of all cluster nodes at once. For more information, see “Naming Multiple Cluster Nodes” on page 111.

14 Select the node’s Open Directory service.

15 Click Settings, then click General.

16 Verify the role is set to “Connected to a Directory System.”

17 Click Join Kerberos.

A Join Kerberos Realm screen appears. Set the realm to your Kerberos realm (for example, CONTROLLER.CLUSTER).

18 Enter the Open Directory administrator user name and password.

19 Click Refresh below the Servers list.

If the node has joined the Kerberos realm, the Join Kerberos button and associated text will disappear.

20 In the Servers list select the node’s Xgrid service.

21 Click Overview.

22 Click Configure Xgrid Service.

The Xgrid Service Configuration Assistant appears.

23 Click Continue, then select “Join a grid.”

24 Click Continue.

25 In the “Use controller with hostname” field, enter the controller’s private DNS name (for

example, controller.cluster).

26 Click Continue.

27 Confirm the settings.

The Directory Server entry should be an LDAPv3 path based on the controller’s DNS name (for example, /LDAPv3/controller.cluster). The Kerberos realm should be the same as the controller’s DNS name in all capital letters (for example, CONTROLLER.CLUSTER).

28 Click Continue.

29 Click Close.

You can automate steps. For more information, see Appendix B, “Automating Compute Node Configuration.”

100 Chapter 11 Setting Up Compute Nodes

Apple MAC OS X SERVER 10.5 LEOPARD XGRID ADMINISTRATION

Specifications and Main Features

Frequently Asked Questions

User Manual

Contents

About This Guide

What’s New in Xgrid Administration

What’s in This Guide

Using This Guide

Using Onscreen Help

Advanced Server Administration Guides

Viewing PDF Guides on Screen

Printing PDF Guides

Getting Documentation Updates

Getting Additional Information

1 Introducing Xgrid Service

About Xgrid and Computational Grids

How Xgrid Works

Common Types of Grids and Grid Computing Styles

Xgrid Clusters

Local Grids

Distributed Grids

Xgrid Components

Agent

Client

Controller

Jobs

Requirements and Capacities

Setup Overview

Before Setting Up Xgrid Service

Authentication Methods for Xgrid

Single Sign-On (SSO)

Password-Based Authentication

No Authentication

Hosting the Grid Controller

Turning Xgrid Service On

Configuring Xgrid with the Xgrid Service Configuration Assistant

Configuring Xgrid to Host a Grid Using the Xgrid Service Configuration Assistant

Configuring Xgrid to Join a Grid Using Xgrid Service Configuration Assistant

Setting Up Xgrid Service

Xgrid and Multiple Network Interfaces

Configuring Controller Settings

Starting Xgrid Service

Setting Up Grid Authentication

Setting Up Kerberos for Xgrid

Setting Passwords for Xgrid

Managing Client Access

Setting SACL Permissions for Users and Groups

Setting SACL Permissions for Administrators

Managing Xgrid Service

Viewing Xgrid Service Status

Viewing Xgrid Service Logs

Stopping Xgrid Service

3 Managing a Grid

Using Xgrid Admin

Status Indicators in Xgrid Admin

Managing the Xgrid Controller

Connecting to an Xgrid Controller

Disconnecting from an Xgrid Controller

Adding an Xgrid Controller

Removing an Xgrid Controller

Managing Agents

Viewing a List of Agents

Adding an Agent

Deleting an Agent

Managing Jobs

Viewing a List of Jobs

Stopping a Job

Repeating or Restarting a Job

Deleting a Job

Adding a Grid

Deleting a Grid

Monitoring Grid Activity

Structuring Jobs for Xgrid

About Job Styles

About Job Failure

Submitting a Job

Examples of Xgrid Job Submission and Results Retrieval

Viewing Job Status

Retrieving Job Results