VMware vSphere Big Data Extensions - 2.3 User’s Manual

VMware vSphere Big Data Extensions
Command-Line Interface Guide
vSphere Big Data Extensions 2.3
This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent editions of this document, see http://www.vmware.com/support/pubs.
EN-001702-00
You can find the most up-to-date technical documentation on the VMware Web site at:
http://www.vmware.com/support/
The VMware Web site also provides the latest product updates.
If you have comments about this documentation, submit your feedback to:
docfeedback@vmware.com
Copyright © 2013 – 2015 VMware, Inc. All rights reserved. Copyright and trademark information. This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 United States License
(http://creativecommons.org/licenses/by-nd/3.0/us/legalcode).
VMware, Inc.
3401 Hillview Ave. Palo Alto, CA 94304 www.vmware.com
2 VMware, Inc.

Contents

About This Book 7
Using the Serengeti Remote Command-Line Interface Client 9
1
Access the Serengeti CLI By Using the Remote CLI Client 9
Log in to Hadoop Nodes with the Serengeti Command-Line Interface Client 10
Managing Application Managers 13
2
About Application Managers 13
Add an Application Manager by Using the Serengeti Command-Line Interface 14
View List of Application Managers by using the Serengeti Command-Line Interface 14
Modify an Application Manager by Using the Serengeti Command-Line Interface 15
View Supported Distributions for All Application Managers by Using the Serengeti Command-
Line Interface 15
View Configurations or Roles for Application Manager and Distribution by Using the Serengeti
Command-Line Interface 15
Delete an Application Manager by Using the Serengeti Command-Line Interface 16
Managing the Big Data Extensions Environment by Using the Serengeti
3
Command-Line Interface 17
About Application Managers 17
Add a Resource Pool with the Serengeti Command-Line Interface 20
Remove a Resource Pool with the Serengeti Command-Line Interface 21
Add a Datastore with the Serengeti Command-Line Interface 21
Remove a Datastore with the Serengeti Command-Line Interface 21
Add a Network with the Serengeti Command-Line Interface 22
Remove a Network with the Serengeti Command-Line Interface 22
Reconfigure a Static IP Network with the Serengeti Command-Line Interface 23
Reconfigure the DNS Type with the Serengeti Command-Line Interface 23
Increase Cloning Performance and Resource Usage of Virtual Machines 24
VMware, Inc.
Managing Users and User Accounts 27
4
Create an LDAP Service Configuration File Using the Serengeti Command-Line Interface 27
Activate Centralized User Management Using the Serengeti Command-Line Interface 29
Create a Cluster With LDAP User Authentication Using the Serengeti Command-Line Interface 29
Change User Management Modes Using the Serengeti Command-Line Interface 30
Modify LDAP Configuration Using the Serengeti Command-Line Interface 31
Creating Hadoop and HBase Clusters 33
5
About Hadoop and HBase Cluster Deployment Types 35
Default Hadoop Cluster Configuration for Serengeti 35
Default HBase Cluster Configuration for Serengeti 36
3
About Cluster Topology 36
About HBase Clusters 39
About MapReduce Clusters 46
About Data Compute Clusters 49
About Customized Clusters 60
Managing Hadoop and HBase Clusters 69
6
Stop and Start a Cluster with the Serengeti Command-Line Interface 69
Scale Out a Cluster with the Serengeti Command-Line Interface 70
Scale CPU and RAM with the Serengeti Command-Line Interface 70
Reconfigure a Cluster with the Serengeti Command-Line Interface 71
Delete a Cluster by Using the Serengeti Command-Line Interface 73
About vSphere High Availability and vSphere Fault Tolerance 73
Reconfigure a Node Group with the Serengeti Command-Line Interface 73
Expanding a Cluster with the Command-Line Interface 74
Recover from Disk Failure with the Serengeti Command-Line Interface Client 75
Recover a Cluster Node Virtual Machine 75
Enter Maintenance Mode to Perform Backup and Restore with the Serengeti Command-Line
Interface Client 76
Monitoring the Big Data Extensions Environment 79
7
View List of Application Managers by using the Serengeti Command-Line Interface 79
View Available Hadoop Distributions with the Serengeti Command-Line Interface 80
View Supported Distributions for All Application Managers by Using the Serengeti Command-
Line Interface 80
View Configurations or Roles for Application Manager and Distribution by Using the Serengeti
Command-Line Interface 80
View Provisioned Clusters with the Serengeti Command-Line Interface 81
View Datastores with the Serengeti Command-Line Interface 81
View Networks with the Serengeti Command-Line Interface 81
View Resource Pools with the Serengeti Command-Line Interface 82
Cluster Specification Reference 83
8
Cluster Specification File Requirements 83
Cluster Definition Requirements 83
Annotated Cluster Specification File 84
Cluster Specification Attribute Definitions 87
White Listed and Black Listed Hadoop Attributes 90
Convert Hadoop XML Files to Serengeti JSON Files 92
Serengeti CLI Command Reference 93
9
appmanager Commands 93
cluster Commands 95
connect Command 102
datastore Commands 102
disconnect Command 103
distro list Command 103
mgmtvmcfg Commands 103
4 VMware, Inc.
network Commands 104
resourcepool Commands 106
template Commands 107
topology Commands 107
usermgmt Commands 107
Index 109
Contents
VMware, Inc. 5
6 VMware, Inc.

About This Book

VMware vSphere Big Data Extensions Command-Line Interface Guide describes how to use the Serengeti Command-Line Interface (CLI) to manage the vSphere resources that you use to create Hadoop and HBase clusters, and how to create, manage, and monitor Hadoop and HBase clusters with the VMware Serengeti™ CLI.
VMware vSphere Big Data Extensions Command-Line Interface Guide also describes how to perform Hadoop and HBase operations with the Serengeti CLI, and provides cluster specification and Serengeti CLI command references.
Intended Audience
This guide is for system administrators and developers who want to use Serengeti to deploy and manage Hadoop clusters. To successfully work with Serengeti, you should be familiar with Hadoop and VMware vSphere®.
VMware Technical Publications Glossary
VMware Technical Publications provides a glossary of terms that might be unfamiliar to you. For definitions of terms as they are used in VMware technical documentation, go to
http://www.vmware.com/support/pubs.
®
VMware, Inc.
7
8 VMware, Inc.
Using the Serengeti Remote
Command-Line Interface Client 1
The Serengeti Remote Command-Line Interface Client lets you access the Serengeti Management Server to deploy, manage, and use Hadoop.
This chapter includes the following topics:
“Access the Serengeti CLI By Using the Remote CLI Client,” on page 9
n
“Log in to Hadoop Nodes with the Serengeti Command-Line Interface Client,” on page 10
n

Access the Serengeti CLI By Using the Remote CLI Client

You can access the Serengeti Command-Line Interface (CLI) to perform Serengeti administrative tasks with the Serengeti Remote CLI Client.
Prerequisites
Use the VMware vSphere Web Client to log in to the VMware vCenter Server® on which you deployed
n
the Serengeti vApp.
Verify that the Serengeti vApp deployment was successful and that the Management Server is running.
n
Verify that you have the correct password to log in to Serengeti CLI. See the VMware vSphere Big Data
n
Extensions Administrator's and User's Guide.
The Serengeti CLI uses its vCenter Server credentials.
Verify that the Java Runtime Environment (JRE) is installed in your environment and that its location is
n
in your path environment variable.
Procedure
1 Download the Serengeti CLI package from the Serengeti Management Server.
Open a Web browser and navigate to the following URL: https://server_ip_address/cli/VMware-
Serengeti-CLI.zip
2 Download the ZIP file.
The filename is in the format VMware-Serengeti-cli-version_number-build_number.ZIP.
3 Unzip the download.
The download includes the following components.
The serengeti-cli-version_number JAR file, which includes the Serengeti Remote CLI Client.
n
The samples directory, which includes sample cluster configurations.
n
Libraries in the lib directory.
n
VMware, Inc.
9
4 Open a command shell, and change to the directory where you unzipped the package.
5 Change to the cli directory, and run the following command to enter the Serengeti CLI.
For any language other than French or German, run the following command.
n
java -jar serengeti-cli-version_number.jar
For French or German languages, which use code page 850 (CP 850) language encoding when
n
running the Serengeti CLI from a Windows command console, run the following command.
java -Dfile.encoding=cp850 -jar serengeti-cli-version_number.jar
6 Connect to the Serengeti service.
You must run the connect host command every time you begin a CLI session, and again after the 30 minute session timeout. If you do not run this command, you cannot run any other commands.
a Run the connect command.
connect --host xx.xx.xx.xx:8443
b At the prompt, type your user name, which might be different from your login credentials for the
Serengeti Management Server.
NOTE If you do not create a user name and password for the Serengeti Command-Line Interface Client, you can use the default vCenter Server administrator credentials. The Serengeti Command-Line Interface Client uses the vCenter Server login credentials with read permissions on the Serengeti Management Server.
c At the prompt, type your password.
A command shell opens, and the Serengeti CLI prompt appears. You can use the help command to get help with Serengeti commands and command syntax.
To display a list of available commands, type help.
n
To get help for a specific command, append the name of the command to the help command.
n
help cluster create
Press Tab to complete a command.
n

Log in to Hadoop Nodes with the Serengeti Command-Line Interface Client

To perform troubleshooting or to run your management automation scripts, log in to Hadoop master, worker, and client nodes with SSH from the Serengeti Management Server using SSH client tools such as SSH, PDSH, ClusterSSH, and Mussh, which do not require password authentication.
To connect to Hadoop cluster nodes over SSH, you can use a user name and password authenticated login. All deployed nodes are password-protected with either a random password or a user-specified password that was assigned when the cluster was created.
Prerequisites
Use the vSphere Web Client to log in to vCenter Server, and verify that the Serengeti Management Server virtual machine is running.
10 VMware, Inc.
Chapter 1 Using the Serengeti Remote Command-Line Interface Client
Procedure
1 Right-click the Serengeti Management Server virtual machine and select Open Console.
The password for the Serengeti Management Server appears.
NOTE If the password scrolls off the console screen, press Ctrl+D to return to the command prompt.
2 Use the vSphere Web Client to log in to the Hadoop node.
The password for the root user appears on the virtual machine console in the vSphere Web Client.
3 Change the password of the Hadoop node by running the set-password -u command.
sudo /opt/serengeti/sbin/set-password -u
VMware, Inc. 11
12 VMware, Inc.

Managing Application Managers 2

A key to managing your Hadoop clusters is understanding how to manage the different application managers that you use in your Big Data Extensions environment.
This chapter includes the following topics:
“About Application Managers,” on page 13
n
“Add an Application Manager by Using the Serengeti Command-Line Interface,” on page 14
n
“View List of Application Managers by using the Serengeti Command-Line Interface,” on page 14
n
“Modify an Application Manager by Using the Serengeti Command-Line Interface,” on page 15
n
“View Supported Distributions for All Application Managers by Using the Serengeti Command-Line
n
Interface,” on page 15
“View Configurations or Roles for Application Manager and Distribution by Using the Serengeti
n
Command-Line Interface,” on page 15
“Delete an Application Manager by Using the Serengeti Command-Line Interface,” on page 16
n

About Application Managers

You can use Cloudera Manager, Apache Ambari, and the default application manager to provision and manage clusters with VMware vSphere Big Data Extensions.
After you add a new Cloudera Manager or Ambari application manager to Big Data Extensions, you can redirect your software management tasks, including monitoring and managing clusters, to that application manager.
You can use an application manager to perform the following tasks:
n
n
n
Check the documentation for your application manager for tool-specific requirements.
Restrictions
The following restrictions apply to Cloudera Manager and Ambari application managers:
n
n
VMware, Inc.
List all available vendor instances, supported distributions, and configurations or roles for a specific application manager and distribution.
Create clusters.
Monitor and manage services from the application manager console.
To add an application manager with HTTPS, use the FQDN instead of the URL.
You cannot rename a cluster that was created with a Cloudera Manager or Ambari application manager.
13
You cannot change services for a big data cluster from Big Data Extensions if the cluster was created
n
with Ambari or Cloudera Manager application manager.
To change services, configurations, or both, you must make the changes from the application manager
n
on the nodes.
If you install new services, Big Data Extensions starts and stops the new services together with old services.
If you use an application manager to change services and big data cluster configurations, those changes
n
cannot be synced from Big Data Extensions. The nodes that you create with Big Data Extensions do not contain the new services or configurations.

Add an Application Manager by Using the Serengeti Command-Line Interface

To use either Cloudera Manager or Ambari application managers, you must add the application manager and add server information to Big Data Extensions.
NOTE If you want to add a Cloudera Manager or Ambari application manager with HTTPS, use the FQDN in place of the URL.
Procedure
1 Access the Serengeti CLI.
2 Run the appmanager add command.
appmanager add --name application_manager_name --type [ClouderaManager|Ambari]
--url http[s]://server:port
Application manager names can include only alphanumeric characters ([0-9, a-z, A-Z]) and the following special characters; underscores, hyphens, and blank spaces.
You can use the optional description variable to include a description of the application manager instance.
3 Enter your username and password at the prompt.
4 If you specified SSL, enter the file path of the SSL certificate at the prompt.
What to do next
To verify that the application manager was added successfully, run the appmanager list command.
View List of Application Managers by using the Serengeti Command­Line Interface
You can use the appManager list command to list the application managers that are installed on the Big Data Extensions environment.
Prerequisites
Verify that you are connected to an application manager.
Procedure
1 Access the Serengeti CLI.
2 Run the appmanager list command.
appmanager list
14 VMware, Inc.
Chapter 2 Managing Application Managers
The command returns a list of all application managers that are installed on the Big Data Extensions environment.
Modify an Application Manager by Using the Serengeti Command­Line Interface
You can modify the information for an application manager with the Serengeti CLI, for example, you can change the manager server IP address if it is not a static IP, or you can upgrade the administrator account.
Prerequisites
Verify that you have at least one external application manager installed on your Big Data Extensions environment.
Procedure
1 Access the Serengeti CLI.
2 Run the appmanager modify command.
appmanager modify --name application_manager_name
--url <http[s]://server:port>
Additional parameters are available for this command. For more information about this command, see
“appmanager modify Command,” on page 94.

View Supported Distributions for All Application Managers by Using the Serengeti Command-Line Interface

Supported distributions are those distributions that are supported by Big Data Extensions. Available distributions are those distributions that have been added into your Big Data Extensions environment. You can view a list of the Hadoop distributions that are supported in the Big Data Extensions environment to determine if a particular distribution is available for a particular application manager.
Prerequisites
Verify that you are connected to an application manager.
Procedure
1 Access the Serengeti CLI.
2 Run the appmanager list command.
appmanager list --name application_manager_name [--distros]
If you do not include the --name parameter, the command returns a list of all the Hadoop distributions that are supported on each of the application managers in the Big Data Extensions environment.
The command returns a list of all distributions that are supported for the application manager of the name that you specify.

View Configurations or Roles for Application Manager and Distribution by Using the Serengeti Command-Line Interface

You can use the appManager list command to list the Hadoop configurations or roles for a specific application manager and distribution.
The configuration list includes those configurations that you can use to configure the cluster in the cluster specifications.
VMware, Inc. 15
The role list contains the roles that you can use to create a cluster. You should not use unsupported roles to create clusters in the application manager.
Prerequisites
Verify that you are connected to an application manager.
Procedure
1 Access the Serengeti CLI.
2 Run the appmanager list command.
appmanager list --name application_manager_name [--distro distro_name (--configurations | --roles) ]
The command returns a list of the Hadoop configurations or roles for a specific application manager and distribution.

Delete an Application Manager by Using the Serengeti Command-Line Interface

You can use the Serengeti CLI to delete an application manager when you no longer need it.
Prerequisites
Verify that you have at least one external application manager installed on your Big Data Extensions
n
environment.
Verify that application manager you want to delete does not contain any clusters, or the deletion
n
process will fail.
Procedure
1 Access the Serengeti CLI.
2 Run the appmanager delete command.
appmanager delete --name application_manager_name
16 VMware, Inc.
Managing the Big Data Extensions Environment by Using the Serengeti
Command-Line Interface 3
You must manage yourBig Data Extensions, which includes ensuring that if you choose not to add the resource pool, datastore, and network when you deploy the Serengeti vApp, you add the vSphere resources before you create a Hadoop or HBase cluster. You must also add additional application managers, if you want to use either Ambari or Cloudera Manager to manage your Hadoop clusters. You can remove resources that you no longer need.
This chapter includes the following topics:
“About Application Managers,” on page 17
n
“Add a Resource Pool with the Serengeti Command-Line Interface,” on page 20
n
“Remove a Resource Pool with the Serengeti Command-Line Interface,” on page 21
n
“Add a Datastore with the Serengeti Command-Line Interface,” on page 21
n
“Remove a Datastore with the Serengeti Command-Line Interface,” on page 21
n
“Add a Network with the Serengeti Command-Line Interface,” on page 22
n
“Remove a Network with the Serengeti Command-Line Interface,” on page 22
n
“Reconfigure a Static IP Network with the Serengeti Command-Line Interface,” on page 23
n
“Reconfigure the DNS Type with the Serengeti Command-Line Interface,” on page 23
n
“Increase Cloning Performance and Resource Usage of Virtual Machines,” on page 24
n

About Application Managers

You can use Cloudera Manager, Apache Ambari, and the default application manager to provision and manage clusters with VMware vSphere Big Data Extensions.
After you add a new Cloudera Manager or Ambari application manager to Big Data Extensions, you can redirect your software management tasks, including monitoring and managing clusters, to that application manager.
You can use an application manager to perform the following tasks:
List all available vendor instances, supported distributions, and configurations or roles for a specific
n
application manager and distribution.
Create clusters.
n
Monitor and manage services from the application manager console.
n
Check the documentation for your application manager for tool-specific requirements.
VMware, Inc.
17
Restrictions
The following restrictions apply to Cloudera Manager and Ambari application managers:
To add an application manager with HTTPS, use the FQDN instead of the URL.
n
You cannot rename a cluster that was created with a Cloudera Manager or Ambari application
n
manager.
You cannot change services for a big data cluster from Big Data Extensions if the cluster was created
n
with Ambari or Cloudera Manager application manager.
To change services, configurations, or both, you must make the changes from the application manager
n
on the nodes.
If you install new services, Big Data Extensions starts and stops the new services together with old services.
If you use an application manager to change services and big data cluster configurations, those changes
n
cannot be synced from Big Data Extensions. The nodes that you create with Big Data Extensions do not contain the new services or configurations.

Add an Application Manager by Using the Serengeti Command-Line Interface

To use either Cloudera Manager or Ambari application managers, you must add the application manager and add server information to Big Data Extensions.
NOTE If you want to add a Cloudera Manager or Ambari application manager with HTTPS, use the FQDN in place of the URL.
Procedure
1 Access the Serengeti CLI.
2 Run the appmanager add command.
appmanager add --name application_manager_name --type [ClouderaManager|Ambari]
--url http[s]://server:port
Application manager names can include only alphanumeric characters ([0-9, a-z, A-Z]) and the following special characters; underscores, hyphens, and blank spaces.
You can use the optional description variable to include a description of the application manager instance.
3 Enter your username and password at the prompt.
4 If you specified SSL, enter the file path of the SSL certificate at the prompt.
What to do next
To verify that the application manager was added successfully, run the appmanager list command.

Modify an Application Manager by Using the Serengeti Command-Line Interface

You can modify the information for an application manager with the Serengeti CLI, for example, you can change the manager server IP address if it is not a static IP, or you can upgrade the administrator account.
Prerequisites
Verify that you have at least one external application manager installed on your Big Data Extensions environment.
18 VMware, Inc.
Chapter 3 Managing the Big Data Extensions Environment by Using the Serengeti Command-Line Interface
Procedure
1 Access the Serengeti CLI.
2 Run the appmanager modify command.
appmanager modify --name application_manager_name
--url <http[s]://server:port>
Additional parameters are available for this command. For more information about this command, see
“appmanager modify Command,” on page 94.

View Supported Distributions for All Application Managers by Using the Serengeti Command-Line Interface

Supported distributions are those distributions that are supported by Big Data Extensions. Available distributions are those distributions that have been added into your Big Data Extensions environment. You can view a list of the Hadoop distributions that are supported in the Big Data Extensions environment to determine if a particular distribution is available for a particular application manager.
Prerequisites
Verify that you are connected to an application manager.
Procedure
1 Access the Serengeti CLI.
2 Run the appmanager list command.
appmanager list --name application_manager_name [--distros]
If you do not include the --name parameter, the command returns a list of all the Hadoop distributions that are supported on each of the application managers in the Big Data Extensions environment.
The command returns a list of all distributions that are supported for the application manager of the name that you specify.

View Configurations or Roles for Application Manager and Distribution by Using the Serengeti Command-Line Interface

You can use the appManager list command to list the Hadoop configurations or roles for a specific application manager and distribution.
The configuration list includes those configurations that you can use to configure the cluster in the cluster specifications.
The role list contains the roles that you can use to create a cluster. You should not use unsupported roles to create clusters in the application manager.
Prerequisites
Verify that you are connected to an application manager.
Procedure
1 Access the Serengeti CLI.
2 Run the appmanager list command.
appmanager list --name application_manager_name [--distro distro_name (--configurations | --roles) ]
VMware, Inc. 19
The command returns a list of the Hadoop configurations or roles for a specific application manager and distribution.

View List of Application Managers by using the Serengeti Command-Line Interface

You can use the appManager list command to list the application managers that are installed on the Big Data Extensions environment.
Prerequisites
Verify that you are connected to an application manager.
Procedure
1 Access the Serengeti CLI.
2 Run the appmanager list command.
appmanager list
The command returns a list of all application managers that are installed on the Big Data Extensions environment.

Delete an Application Manager by Using the Serengeti Command-Line Interface

You can use the Serengeti CLI to delete an application manager when you no longer need it.
Prerequisites
Verify that you have at least one external application manager installed on your Big Data Extensions
n
environment.
Verify that application manager you want to delete does not contain any clusters, or the deletion
n
process will fail.
Procedure
1 Access the Serengeti CLI.
2 Run the appmanager delete command.
appmanager delete --name application_manager_name

Add a Resource Pool with the Serengeti Command-Line Interface

You add resource pools to make them available for use by Hadoop clusters. Resource pools must be located at the top level of a cluster. Nested resource pools are not supported.
When you add a resource pool to Big Data Extensions it symbolically represents the actual vSphere resource pool as recognized by vCenter Server. This symbolic representation lets you use the Big Data Extensions resource pool name, instead of the full path of the resource pool in vCenter Server, in cluster specification files.
NOTE After you add a resource pool to Big Data Extensions, do not rename the resource pool in vSphere. If you rename it, you cannot perform Serengeti operations on clusters that use that resource pool.
Procedure
1 Access the Serengeti Command-Line Interface client.
20 VMware, Inc.
Chapter 3 Managing the Big Data Extensions Environment by Using the Serengeti Command-Line Interface
2 Run the resourcepool add command.
The --vcrp parameter is optional.
This example adds a Serengeti resource pool named myRP to the vSphere rp1 resource pool that is contained by the cluster1 vSphere cluster.
resourcepool add --name myRP --vccluster cluster1 --vcrp rp1

Remove a Resource Pool with the Serengeti Command-Line Interface

You can remove resource pools from Serengeti that are not in use by a Hadoop cluster. You remove resource pools when you do not need them or if you want the Hadoop clusters you create in the Serengeti Management Server to be deployed under a different resource pool. Removing a resource pool removes its reference in vSphere. The resource pool is not deleted.
Procedure
1 Access the Serengeti Command-Line Interface client.
2 Run the resourcepool delete command.
If the command fails because the resource pool is referenced by a Hadoop cluster, you can use the
resourcepool list command to see which cluster is referencing the resource pool.
This example deletes the resource pool named myRP.
resourcepool delete --name myRP

Add a Datastore with the Serengeti Command-Line Interface

You can add shared and local datastores to the Serengeti server to make them available to Hadoop clusters.
NOTE After you add a resource pool to Big Data Extensions, do not rename the resource pool in vSphere. If you rename it, you cannot perform Serengeti operations on clusters that use that resource pool.
Procedure
1 Access the Serengeti CLI.
2 Run the datastore add command.
This example adds a new, local storage datastore named myLocalDS. The value of the --spec parameter,
local*, is a wildcard specifying a set of vSphere datastores. All vSphere datastores whose names begin
with “local” are added and managed as a whole by Serengeti.
datastore add --name myLocalDS --spec local* --type LOCAL
What to do next
After you add a datastore to Big Data Extensions, do not rename the datastore in vSphere. If you rename it, you cannot perform Serengeti operations on clusters that use that datastore.

Remove a Datastore with the Serengeti Command-Line Interface

You can remove any datastore from Serengeti that is not referenced by any Hadoop clusters. Removing a datastore removes only the reference to the vCenter Server datastore. The datastore itself is not deleted.
You remove datastores if you do not need them or if you want to deploy the Hadoop clusters that you create in the Serengeti Management Server under a different datastore.
VMware, Inc. 21
Procedure
1 Access the Serengeti CLI.
2 Run the datastore delete command.
If the command fails because the datastore is referenced by a Hadoop cluster, you can use the datastore
list command to see which cluster is referencing the datastore.
This example deletes the myDS datastore.
datastore delete --name myDS

Add a Network with the Serengeti Command-Line Interface

You add networks to Big Data Extensions to make their IP addresses available to Hadoop clusters. A network is a port group, as well as a means of accessing the port group through an IP address.
After you add a network to Big Data Extensions, do not rename it in vSphere. If you rename the network, you cannot perform Serengeti operations on clusters that use that network.
Prerequisites
If your network uses static IP addresses, be sure that the addresses are not occupied before you add the network.
Procedure
1 Access the Serengeti CLI.
2 Run the network add command.
This example adds a network named myNetwork to the 10PG vSphere port group. Virtual machines that use this network use DHCP to obtain the IP addresses.
network add --name myNetwork --portGroup 10PG --dhcp
This example adds a network named myNetwork to the 10PG vSphere port group. Hadoop nodes use addresses in the 192.168.1.2-100 IP address range, the DNS server IP address is 10.111.90.2, the gateway address is 192.168.1.1, and the subnet mask is 255.255.255.0.
network add --name myNetwork --portGroup 10PG --ip 192.168.1.2-100 --dns 10.111.90.2
--gateway 192.168.1.1 --mask 255.255.255.0
To specify multiple IP address segments, use multiple strings to express the IP address range in the format xx.xx.xx.xx-xx[,xx]*.
xx.xx.xx.xx-xx, xx.xx.xx.xx-xx, single_ip, single_ip
This example adds a dynamic network with DHCP assigned IP addresses and meaningful host name.
network add --name ddnsNetwork --dhcp --portGroup pg1 --dnsType DYNAMIC

Remove a Network with the Serengeti Command-Line Interface

You can remove networks from Serengeti that are not referenced by any Hadoop clusters. Removing an unused network frees the IP addresses for reuse.
Procedure
1 Access the Serengeti CLI.
22 VMware, Inc.
Chapter 3 Managing the Big Data Extensions Environment by Using the Serengeti Command-Line Interface
2 Run the network delete command.
network delete --name network_name
If the command fails because the network is referenced by a Hadoop cluster, you can use the network
list --detail command to see which cluster is referencing the network.

Reconfigure a Static IP Network with the Serengeti Command-Line Interface

You can reconfigure a Serengeti static IP network by adding IP address segments to it. You might need to add IP address segments so that there is enough capacity for a cluster that you want to create.
If the IP range that you specify includes IP addresses that are already in the network, Serengeti ignores the duplicated addresses. The remaining addresses in the specified range are added to the network. If the network is already used by a cluster, the cluster can use the new IP addresses after you add them to the network. If only part of the IP range is used by a cluster, the unused IP address can be used when you create a new cluster.
Prerequisites
If your network uses static IP addresses, be sure that the addresses are not occupied before you add the network.
Procedure
1 Access the Serengeti CLI.
2 Run the network modify command.
This example adds IP addresses from 192.168.1.2 to 192.168.1.100 to a network named myNetwork.
network modify --name myNetwork --addIP 192.168.1.2-100

Reconfigure the DNS Type with the Serengeti Command-Line Interface

You can reconfigure a network's Domain Name System (DNS) type, and specify that Big Data Extensions generate meaningful host names for the nodes in a Hadoop cluster.
After you add a network to Big Data Extensions, do not rename it in vSphere. If you rename the network, you cannot perform Serengeti operations on clusters that use that network.
VMware, Inc. 23
There are three DNS options you can specify:
Normal
Dynamic
Others
Host names provide easier visual identification, as well as allowing you to use services such as Single Sign­On, which requires the use of a properly configured DNS.
Procedure
1 Access the Serengeti CLI.
2 Run the network modify command.
The DNS server provides both forward and reverse FQDN to IP resolution. Reverse DNS is IP address to domain name mapping. The opposite of forward (normal) DNS which maps domain names to IP addresses. Normal is the default DNS type.
Dynamic DNS (DDNS or DynDNS) is a method of automatically updating a name server in the Domain Name System (DNS) with the active DNS configuration of its configured hostnames, addresses or other information. Big Data Extensions integrates with a Dynamic DNS server in its network through which it provides meaningful host names to the nodes in a Hadoop cluster. . The cluster will then automatically register with the DNS server.
There is no DNS server, or the DNS server doesn't provide normal DNS resolution or Dynamic DNS services. In this case, you must add FQDN/IP mapping for all nodes in the /etc/hosts file for each node in the cluster. Through this mapping of hostnames to IP addresses each node can contact another node in the cluster.
There are three DNS types you can specify: NORMAL, DYNAMIC, and OTHERS. NORMAL is the default value.
This example modifies a network named myNetwork to use a Dynamic DNS type. Virtual machines that use this network will use DHCP to obtain the IP addresses.
network modify --name myNetwork --dnsType DYNAMIC

Increase Cloning Performance and Resource Usage of Virtual Machines

You can rapidly clone and deploy virtual machines using Instant Clone, a feature of vSphere 6.0.
Using Instant Clone, a parent virtual machine is forked, and then a child virtual machine (or instant clone) is created. The child virtual machine leverages the storage and memory of the parent, reducing resource usage.
When provisioning a cluster, Big Data Extensions creates a parent virtual machine for each host on which a cluster node has been placed. After provisioning a new resource pool labeled BDE-ParentVMs-
$serengeti.uuid-$template.name is visible in vCenter Server. This resource pool contains several parent
virtual machines. Normal cluster nodes are instantly cloned from these parent virtual machines. Once the parent virtual machines are created on the cluster hosts, the time required to provision and scale a cluster is significantly reduced.
When scaling a cluster the clone type you specified during cluster creation continues to be used, regardless of what the current clone type is. For example, if you create a cluster using instant clone, then change your Big Data Extensions clone type to fast clone, the cluster you provisioned using instant clone will continue to use instant clone to scale out the cluster.
If you create clusters and later want to make changes to the template virtual machine used to provision those clusters, you must first delete all the existing parent virtual machines before using the new template virtual machine. When you create clusters using the new template, Big Data Extensions creates new parent virtual machines based on the new template.
24 VMware, Inc.
Chapter 3 Managing the Big Data Extensions Environment by Using the Serengeti Command-Line Interface
Prerequisites
Your Big Data Extensions deployment must use vSphere 6.0 to take advantage of Instant Clone.
Procedure
1 Log into the Serengeti Management Server.
2 Edit the /opt/serengeti/conf/serengeti.properties file and change the value of
cluster.clone.service=fast.
The default clone type when running vSphere 6.0 is Instant Clone.
cluster.clone.service = instant
3 To enable Instant Clone, restart the Serengeti Management Server .
sudo /sbin/service tomcat restart
The Serengeti Management Server reads the revised serengeti.properties file and applies the Fast Clone feature to all new clusters you create.
What to do next
All clusters you create will now use Instant Clone to deploy virtual machines. See Chapter 5, “Creating
Hadoop and HBase Clusters,” on page 33.
VMware, Inc. 25
26 VMware, Inc.

Managing Users and User Accounts 4

By default Big Data Extensions provides authentication only for local user accounts. If you want to use LDAP (either Active Directory or an OpenLDAP compatible directory) to authenticate users, you must configure Big Data Extensions for use with your LDAP or Active Directory service.
This chapter includes the following topics:
“Create an LDAP Service Configuration File Using the Serengeti Command-Line Interface,” on
n
page 27
“Activate Centralized User Management Using the Serengeti Command-Line Interface,” on page 29
n
“Create a Cluster With LDAP User Authentication Using the Serengeti Command-Line Interface,” on
n
page 29
“Change User Management Modes Using the Serengeti Command-Line Interface,” on page 30
n
“Modify LDAP Configuration Using the Serengeti Command-Line Interface,” on page 31
n

Create an LDAP Service Configuration File Using the Serengeti Command-Line Interface

Create a configuration file that identifies your LDAP or Active Directory server environment.
VMware, Inc.
Prerequisites
Deploy the Serengeti vApp.
n
Ensure that you have adequate resources allocated to run the Hadoop cluster.
n
To use any Hadoop distribution other than the default distribution, add one or more Hadoop
n
distributions. See the VMware vSphere Big Data Extensions Administrator's and User's Guide.
Procedure
1 Access the Serengeti CLI.
2 Navigate to a directory on the Serengeti Management Server where you want to create and store the
configuration file.
You can use the directory /opt/serengeti/etc to store your configuration file.
27
3 Using a text editor, create a JavaScript Object Notation (JSON) file containing the configuration settings
for your LDAP or Active Directory service.
The format of the configuration file is shown below.
{ "type": "user_mode_type", "primaryUrl": "ldap://AD_LDAP_server_IP_address:network_port", "baseUserDn": "DN_information", "baseGroupDn": "DN_information", "userName": "username", "password": "password", "mgmtVMUserGroupDn":"DN_information" }
Table 41. LDAP Connection Information
type The external user authentication service you will use, which is either AD_AS_LDAP or LDAP.
baseUserDn
baseGroupDn
primaryUrl
mgmtVMUserGroupDn
userName
password
Specify the base user DN.
Specify the base group DN.
Specify the primary server URL of your Active Directory or LDAP server.
(Optional) Specify the base DN for searching groups to access the Serengeti Management Server.
Type the username of the Active Directory or LDAP server administrator account.
Type the password of the Active Directory or LDAP server administrator account.
4 When you complete the file, save your work.
Example: Example LDAP Configuration File
The following example illustrates the configuration file for an LDAP server within the acme.com domain.
{ "type": "LDAP", "primaryUrl": "ldap://acme.com:8888", "baseUserDn": "ou=users,dc=dev,dc=acme,dc=com", "baseGroupDn": "ou=users,dc=dev,dc=acme,dc=com", "userName": "jsmith", "password": "MyPassword", "mgmtVMUserGroupDn":"cn=Administrators,cn=Builtin,dc=dev,dc=acme,dc=com" }
What to do next
With an LDAP configuration file created, you can now activate centralized user management for your Big Data Extensions environment. See “Activate Centralized User Management Using the Serengeti
Command-Line Interface,” on page 29.
28 VMware, Inc.
Chapter 4 Managing Users and User Accounts

Activate Centralized User Management Using the Serengeti Command-Line Interface

You must specify that Big Data Extensions use an external user identity source before you can manage users through your LDAP or Active Directory.
Prerequisites
Deploy the Serengeti vApp.
n
Ensure that you have adequate resources allocated to run the Hadoop cluster.
n
To use any Hadoop distribution other than the default distribution, add one or more Hadoop
n
distributions. See the VMware vSphere Big Data Extensions Administrator's and User's Guide.
Create a configuration file identifying your LDAP or Active Directory environment for use with
n
Big Data Extensions. See “Create an LDAP Service Configuration File Using the Serengeti Command-
Line Interface,” on page 27
Procedure
1 Access the Serengeti CLI.
2 Run the command usermgmtserver add --cfgfile config_file_path
This example activates centralized user management, specifying the file /opt/serengeti/LDAPConfigFile.cfg as the file containing your LDAP configuration settings.
usermgmtserver add --cfgfile /opt/serengeti/LDAPConfigFile.cfg
3 Run the mgmtvmcfg get to verify successful configuration of your environment by printing out the
LDAP or Active Directory configuration information.
The contents of the active configuration file in use by your Big Data Extensions environment prints to the terminal.
What to do next
When you activate centralized user management, you can create clusters and assign user management to roles using the users and user groups defined by your LDAP or Active Directory service. See “Create a
Cluster With LDAP User Authentication Using the Serengeti Command-Line Interface,” on page 29.

Create a Cluster With LDAP User Authentication Using the Serengeti Command-Line Interface

With centralized user management configured and activated, you can grant privileges to users and user groups in your LDAP or Active Directory service to individual Hadoop clusters that you create.
As an example of how you can use centralized user management in your Big Data Extensions environment, you can assign groups with administrative privileges in your LDAP or Active Directory service access to the Serengeti Management Server. This allows those users to administer Big Data Extensions and the Serengeti Management Server. You can then give another user group access to Hadoop cluster nodes, allowing them to run Hadoop jobs.
To access the Serengeti CLI and Serengeti commands, users must change to the user serengeti after they login. For example, you can use the command su to change to the serengeti user, after which you can access the Serengeti CLI.
su serengeti
VMware, Inc. 29
Prerequisites
Deploy the Serengeti vApp.
n
Ensure that you have adequate resources allocated to run the Hadoop cluster.
n
To use any Hadoop distribution other than the default distribution, add one or more Hadoop
n
distributions. See the VMware vSphere Big Data Extensions Administrator's and User's Guide.
Activate centralized user management for your Big Data Extensions deployment. See “Activate
n
Centralized User Management Using the Serengeti Command-Line Interface,” on page 29.
Procedure
1 Access the Serengeti CLI.
2 Run the cluster create command, and specify the value of the --adminGroupName parameter and --
userGroupName parameter using the names of administrative groups and user groups to whom you
want to grant privileges for the cluster you are creating.
cluster create --name cluster_name --type hbase --adminGroupName AdminGroupName -­userGroupName UserGroupName
What to do next
After you deploy the cluster, you can access the Hadoop cluster by using several methods. See the VMware vSphere Big Data Extensions Administrator's and User's Guide.

Change User Management Modes Using the Serengeti Command-Line Interface

You can change the user management mode of your Big Data Extensions environment. You can choose to use local user management, LDAP, or a combination of the two.
Big Data Extensions lets you authenticate local users, those managed by LDAP or Active Directory, or a combination of these authentication methods.
Table 42. User Authentication Modes
User Mode Description
Local
LDAP user
Mixed mode
Prerequisites
Deploy the Serengeti vApp.
n
Ensure that you have adequate resources allocated to run the Hadoop cluster.
n
Specify LOCAL to create and manage users and groups that are stored locally in your Big Data Extensions environment. Local is the default user management solution.
Specify LDAP to create and manage users and groups that are stored in your organization's identity source, such as Active Directory or LDAP. If you choose LDAP user you must configure Big Data Extensions to use an LDAP or Active Directory service (Active Directory as LDAP).
Specify MIXED to use a combination of both local users and users stored in an external identity source. If you choose mixed mode you must configure Big Data Extensions to use an LDAP or Active Directory service (Active Directory as LDAP).
To use any Hadoop distribution other than the default distribution, add one or more Hadoop
n
distributions. See the VMware vSphere Big Data Extensions Administrator's and User's Guide.
Procedure
1 Access the Serengeti CLI.
30 VMware, Inc.
Loading...
+ 84 hidden pages