HP StoreAll 9730, IBRIX X9720 Administrator's Manual

Page 1
HP IBRIX X9720/StoreAll 9730 Storage Administrator Guide
Abstract
This guide describes tasks related to cluster configuration and monitoring, system upgrade and recovery, hardware component replacement, and troubleshooting. It does not document StoreAll file system features or standard Linux administrative tools and commands. For information about configuring and using StoreAll file system features, see the
nl
This guide is intended for system administrators and technicians who are experienced with installing and administering networks, and with performing Linux operating and administrative tasks. For the latest StoreAll guides, browse to
nl
http://www.hp.com/support/StoreAllManuals.
HP Part Number: AW549-96073 Published: July 2013 Edition: 14
Page 2
© Copyright 2009, 2013 Hewlett-Packard Development Company, L.P.
Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license.
The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
Acknowledgments
Microsoft® and Windows® are U.S. registered trademarks of Microsoft Corporation.
UNIX® is a registered trademark of The Open Group.
Warranty
WARRANTY STATEMENT: To obtain a copy of the warranty for this product, see the warranty information website:
http://www.hp.com/go/storagewarranty
Revision History
DescriptionSoftware
Version
DateEdition
Initial release of the IBRIX X9720 Storage.5.3.1December 20091
Added network management and Support ticket.5.4April 20102
Added Fusion Manager backup, migration to an agile Fusion Manager configuration, software upgrade procedures, and system recovery procedures.
5.4.1August 20103
Revised upgrade procedure.5.4.1August 20104
Added information about NDMP backups and configuring virtual interfaces, and updated cluster procedures.
5.5December 20105
Updated segment evacuation information.5.5March 20116
Revised upgrade procedure.5.6April 20117
Added or updated information about the agile Fusion Manager, Statistics tool, Ibrix Collect, event notification, capacity block installation, NTP servers, upgrades.
6.0September 20118
Added or updated information about 9730 systems, hardware monitoring, segment evacuation, HP Insight Remote Support, software upgrades, events, Statistics tool.
6.1June 20129
Added or updated information about High Availability, failover, server tuning, VLAN tagging, segment migration and evacuation, upgrades, SNMP.
6.2December 201210
Updated information on upgrades, remote support, collection logs, phone home and troubleshooting. Now point users to website for the latest spare parts list instead of shipping
6.3March 201311
the list. Added before and after upgrade steps for Express Query when going from 6.2 to 6.3.
Removed post upgrade step that tells users to modify the /etc/hosts file on every StoreAll node. Changed firmware version to 4.0.0-13 in “Upgrading IBRIX X9720 chassis
6.3April 201312
firmware.” In the “Cascading Upgrades” appendix, added a section that tells users to ensure that the NFS exports option subtree_check is the default export option for every NFS export when upgrading from a StoreAll 5.x release. Also changed ibrix_fm -m nofmfailover -A to ibrix_fm -m maintenance -A in the “Cascading Upgrades” appendix. Updated information about SMB share creation.
Updated the example in the section “Enabling collection and synchronization.”6.3June 201313
Page 3
Contents
1 Upgrading the StoreAll software to the 6.3 release.......................................10
Upgrading 9720 chassis firmware............................................................................................12
Online upgrades for StoreAll software.......................................................................................12
Preparing for the upgrade...................................................................................................13
Performing the upgrade......................................................................................................13
After the upgrade..............................................................................................................14
Automated offline upgrades for StoreAll software 6.x to 6.3.........................................................14
Preparing for the upgrade...................................................................................................14
Performing the upgrade......................................................................................................14
After the upgrade..............................................................................................................15
Manual offline upgrades for StoreAll software 6.x to 6.3.............................................................15
Preparing for the upgrade...................................................................................................15
Performing the upgrade manually.........................................................................................17
After the upgrade..............................................................................................................17
Upgrading Linux StoreAll clients................................................................................................18
Installing a minor kernel update on Linux clients.....................................................................18
Upgrading Windows StoreAll clients.........................................................................................19
Upgrading pre-6.3 Express Query enabled file systems...............................................................19
Required steps before the StoreAll Upgrade for pre-6.3 Express Query enabled file systems.........19
Required steps after the StoreAll Upgrade for pre-6.3 Express Query enabled file systems...........20
Troubleshooting upgrade issues................................................................................................21
Automatic upgrade............................................................................................................21
Manual upgrade...............................................................................................................22
Offline upgrade fails because iLO firmware is out of date........................................................22
Node is not registered with the cluster network .....................................................................22
File system unmount issues...................................................................................................23
File system in MIF state after StoreAll software 6.3 upgrade.....................................................23
2 Product description...................................................................................25
System features.......................................................................................................................25
System components.................................................................................................................25
HP StoreAll software features....................................................................................................25
High availability and redundancy.............................................................................................26
3 Getting started.........................................................................................27
Setting up the X9720/9730 Storage.........................................................................................27
Installation steps................................................................................................................27
Additional configuration steps.............................................................................................27
Logging in to the system..........................................................................................................28
Using the network..............................................................................................................28
Using the TFT keyboard/monitor..........................................................................................28
Using the serial link on the Onboard Administrator.................................................................29
Booting the system and individual server blades.........................................................................29
Management interfaces...........................................................................................................29
Using the StoreAll Management Console..............................................................................29
Customizing the GUI..........................................................................................................32
Adding user accounts for Management Console access..........................................................33
Using the CLI.....................................................................................................................33
Starting the array management software...............................................................................33
StoreAll client interfaces......................................................................................................34
StoreAll software manpages.....................................................................................................34
Changing passwords..............................................................................................................34
Contents 3
Page 4
Configuring ports for a firewall.................................................................................................35
Configuring NTP servers..........................................................................................................36
Configuring HP Insight Remote Support on StoreAll systems..........................................................36
Configuring the StoreAll cluster for Insight Remote Support......................................................38
Configuring Insight Remote Support for HP SIM 7.1 and IRS 5.7...............................................41
Configuring Insight Remote Support for HP SIM 6.3 and IRS 5.6..............................................44
Testing the Insight Remote Support configuration....................................................................47
Updating the Phone Home configuration...............................................................................47
Disabling Phone Home.......................................................................................................47
Troubleshooting Insight Remote Support................................................................................48
4 Configuring virtual interfaces for client access..............................................49
Network and VIF guidelines.....................................................................................................49
Creating a bonded VIF............................................................................................................50
Configuring backup servers......................................................................................................50
Configuring NIC failover.........................................................................................................50
Configuring automated failover................................................................................................51
Example configuration.............................................................................................................51
Specifying VIFs in the client configuration...................................................................................51
Configuring VLAN tagging......................................................................................................52
Support for link state monitoring...............................................................................................52
5 Configuring failover..................................................................................53
Agile management consoles....................................................................................................53
Agile Fusion Manager modes..............................................................................................53
Agile Fusion Manager and failover......................................................................................53
Viewing information about Fusion Managers.........................................................................54
Configuring High Availability on the cluster................................................................................54
What happens during a failover..........................................................................................55
Configuring automated failover with the HA Wizard...............................................................55
Configuring automated failover manually..............................................................................62
Changing the HA configuration manually.........................................................................63
Failing a server over manually.............................................................................................64
Failing back a server .........................................................................................................64
Setting up HBA monitoring..................................................................................................65
Checking the High Availability configuration.........................................................................66
Capturing a core dump from a failed node................................................................................68
Prerequisites for setting up the crash capture..........................................................................68
Setting up nodes for crash capture.......................................................................................69
6 Configuring cluster event notification...........................................................70
Cluster events.........................................................................................................................70
Setting up email notification of cluster events..............................................................................70
Associating events and email addresses................................................................................70
Configuring email notification settings..................................................................................71
Dissociating events and email addresses...............................................................................71
Testing email addresses......................................................................................................71
Viewing email notification settings........................................................................................72
Setting up SNMP notifications..................................................................................................72
Configuring the SNMP agent...............................................................................................73
Configuring trapsink settings................................................................................................73
Associating events and trapsinks..........................................................................................74
Defining views...................................................................................................................74
Configuring groups and users..............................................................................................75
Deleting elements of the SNMP configuration........................................................................75
Listing SNMP configuration information.................................................................................75
4 Contents
Page 5
7 Configuring system backups.......................................................................76
Backing up the Fusion Manager configuration............................................................................76
Using NDMP backup applications............................................................................................76
Configuring NDMP parameters on the cluster........................................................................77
NDMP process management...............................................................................................78
Viewing or canceling NDMP sessions..............................................................................78
Starting, stopping, or restarting an NDMP Server..............................................................78
Viewing or rescanning tape and media changer devices.........................................................79
NDMP events....................................................................................................................79
8 Creating host groups for StoreAll clients.......................................................80
How host groups work.............................................................................................................80
Creating a host group tree.......................................................................................................80
Adding a StoreAll client to a host group....................................................................................81
Adding a domain rule to a host group.......................................................................................81
Viewing host groups................................................................................................................82
Deleting host groups...............................................................................................................82
Other host group operations....................................................................................................82
9 Monitoring cluster operations.....................................................................83
Monitoring X9720/9730 hardware..........................................................................................83
Monitoring servers.............................................................................................................83
Monitoring hardware components........................................................................................87
Monitoring blade enclosures...........................................................................................88
Obtaining server details.................................................................................................91
Monitoring storage and storage components.........................................................................94
Monitoring storage clusters.............................................................................................96
Monitoring drive enclosures for a storage cluster...........................................................96
Monitoring pools for a storage cluster.........................................................................99
Monitoring storage controllers for a storage cluster.....................................................100
Monitoring storage switches in a storage cluster..............................................................101
Managing LUNs in a storage cluster..............................................................................101
Monitoring the status of file serving nodes................................................................................102
Monitoring cluster events.......................................................................................................103
Viewing events................................................................................................................103
Removing events from the events database table..................................................................104
Monitoring cluster health.......................................................................................................104
Health checks..................................................................................................................104
Health check reports........................................................................................................104
Viewing logs........................................................................................................................106
Viewing operating statistics for file serving nodes......................................................................106
10 Using the Statistics tool..........................................................................108
Installing and configuring the Statistics tool..............................................................................108
Installing the Statistics tool.................................................................................................108
Enabling collection and synchronization..............................................................................108
Upgrading the Statistics tool from StoreAll software 6.0.............................................................109
Using the Historical Reports GUI.............................................................................................109
Generating reports...........................................................................................................111
Deleting reports...............................................................................................................111
Maintaining the Statistics tool.................................................................................................112
Space requirements..........................................................................................................112
Updating the Statistics tool configuration.............................................................................112
Changing the Statistics tool configuration............................................................................112
Fusion Manager failover and the Statistics tool configuration.................................................112
Checking the status of Statistics tool processes.....................................................................113
Contents 5
Page 6
Controlling Statistics tool processes.....................................................................................113
Troubleshooting the Statistics tool............................................................................................114
Log files...............................................................................................................................114
Uninstalling the Statistics tool.................................................................................................114
11 Maintaining the system..........................................................................115
Shutting down the system.......................................................................................................115
Shutting down the StoreAll software....................................................................................115
Powering off the system hardware......................................................................................116
Starting up the system...........................................................................................................117
Powering on the system hardware......................................................................................117
Powering on after a power failure......................................................................................117
Starting the StoreAll software.............................................................................................117
Powering file serving nodes on or off.......................................................................................117
Performing a rolling reboot....................................................................................................118
Starting and stopping processes.............................................................................................118
Tuning file serving nodes and StoreAll clients............................................................................118
Managing segments.............................................................................................................122
Migrating segments..........................................................................................................123
Evacuating segments and removing storage from the cluster ..................................................125
Removing a node from a cluster..............................................................................................128
Maintaining networks............................................................................................................129
Cluster and user network interfaces....................................................................................129
Adding user network interfaces..........................................................................................129
Setting network interface options in the configuration database..............................................131
Preferring network interfaces..............................................................................................131
Unpreferring network interfaces.........................................................................................132
Making network changes..................................................................................................132
Changing the IP address for a Linux StoreAll client...........................................................132
Changing the cluster interface.......................................................................................133
Managing routing table entries.....................................................................................133
Adding a routing table entry....................................................................................133
Deleting a routing table entry...................................................................................133
Deleting a network interface.........................................................................................133
Viewing network interface information................................................................................134
12 Licensing.............................................................................................135
Viewing license terms............................................................................................................135
Retrieving a license key.........................................................................................................135
Using AutoPass to retrieve and install permanent license keys......................................................135
13 Upgrading firmware..............................................................................136
Components for firmware upgrades.........................................................................................136
Steps for upgrading the firmware............................................................................................137
Finding additional information on FMT...............................................................................140
Adding performance modules on 9730 systems...................................................................140
Adding new server blades on 9720 systems........................................................................141
14 Troubleshooting....................................................................................143
Collecting information for HP Support with the IbrixCollect.........................................................143
Collecting logs................................................................................................................143
Downloading the archive file.............................................................................................144
Deleting the archive file....................................................................................................144
Configuring Ibrix Collect...................................................................................................145
Obtaining custom logging from ibrix_collect add-on scripts....................................................146
Creating an add-on script.............................................................................................146
Running an add-on script.............................................................................................147
6 Contents
Page 7
Viewing the output from an add-on script........................................................................147
Viewing data collection information....................................................................................149
Adding/deleting commands or logs in the XML file..............................................................149
Viewing software version numbers..........................................................................................149
Troubleshooting specific issues................................................................................................150
Software services.............................................................................................................150
Failover..........................................................................................................................150
Windows StoreAll clients...................................................................................................151
Synchronizing information on file serving nodes and the configuration database...........................151
Troubleshooting an Express Query Manual Intervention Failure (MIF)...........................................152
15 Recovering the X9720/9730 Storage......................................................154
Obtaining the latest StoreAll software release...........................................................................154
Preparing for the recovery......................................................................................................154
Restoring an X9720 node with StoreAll 6.1 or later...............................................................155
Recovering an X9720 or 9730 file serving node.......................................................................155
Completing the restore .........................................................................................................162
Troubleshooting....................................................................................................................165
Manually recovering bond1 as the cluster...........................................................................165
iLO remote console does not respond to keystrokes...............................................................169
The ibrix_auth command fails after a restore........................................................................169
16 Support and other resources...................................................................170
Contacting HP......................................................................................................................170
Related information...............................................................................................................170
Obtaining spare parts...........................................................................................................171
HP websites.........................................................................................................................171
Rack stability........................................................................................................................171
Product warranties................................................................................................................171
Subscription service..............................................................................................................171
17 Documentation feedback.......................................................................173
A Cascading Upgrades.............................................................................174
Upgrading the StoreAll software to the 6.1 release....................................................................174
Upgrading 9720 chassis firmware.....................................................................................175
Online upgrades for StoreAll software 6.x to 6.1..................................................................175
Preparing for the upgrade............................................................................................175
Performing the upgrade................................................................................................176
After the upgrade........................................................................................................176
Offline upgrades for StoreAll software 5.6.x or 6.0.x to 6.1...................................................176
Preparing for the upgrade............................................................................................176
Performing the upgrade................................................................................................178
After the upgrade........................................................................................................178
Upgrading Linux StoreAll clients.........................................................................................179
Installing a minor kernel update on Linux clients..............................................................179
Upgrading Windows StoreAll clients..................................................................................180
Upgrading pre-6.0 file systems for software snapshots..........................................................180
Upgrading pre-6.1.1 file systems for data retention features....................................................181
Troubleshooting upgrade issues.........................................................................................182
Automatic upgrade......................................................................................................182
Manual upgrade.........................................................................................................182
Offline upgrade fails because iLO firmware is out of date.................................................182
Node is not registered with the cluster network ...............................................................183
File system unmount issues............................................................................................183
Moving the Fusion Manager VIF to bond1......................................................................184
Upgrading the StoreAll software to the 5.6 release...................................................................185
Contents 7
Page 8
Automatic upgrades.........................................................................................................185
Manual upgrades............................................................................................................186
Preparing for the upgrade............................................................................................186
Saving the node configuration......................................................................................186
Performing the upgrade................................................................................................186
Restoring the node configuration...................................................................................187
Completing the upgrade..............................................................................................187
Troubleshooting upgrade issues.........................................................................................188
Automatic upgrade......................................................................................................188
Manual upgrade.........................................................................................................188
Upgrading the StoreAll software to the 5.5 release....................................................................188
Automatic upgrades.........................................................................................................189
Manual upgrades............................................................................................................190
Standard upgrade for clusters with a dedicated Management Server machine or blade........190
Standard online upgrade.........................................................................................190
Standard offline upgrade.........................................................................................192
Agile upgrade for clusters with an agile management console configuration.......................194
Agile online upgrade..............................................................................................194
Agile offline upgrade..............................................................................................198
Troubleshooting upgrade issues.........................................................................................200
B StoreAll 9730 component and cabling diagrams........................................201
Back view of the main rack....................................................................................................201
Back view of the expansion rack.............................................................................................202
StoreAll 9730 CX I/O modules and SAS port connectors...........................................................202
StoreAll 9730 CX 1 connections to the SAS switches.................................................................203
StoreAll 9730 CX 2 connections to the SAS switches.................................................................204
StoreAll 9730 CX 3 connections to the SAS switches.................................................................205
StoreAll 9730 CX 7 connections to the SAS switches in the expansion rack..................................206
C The IBRIX X9720 component and cabling diagrams....................................207
Base and expansion cabinets.................................................................................................207
Front view of a base cabinet..............................................................................................207
Back view of a base cabinet with one capacity block...........................................................208
Front view of a full base cabinet.........................................................................................209
Back view of a full base cabinet.........................................................................................210
Front view of an expansion cabinet ...................................................................................211
Back view of an expansion cabinet with four capacity blocks.................................................212
Performance blocks (c-Class Blade enclosure)............................................................................212
Front view of a c-Class Blade enclosure...............................................................................212
Rear view of a c-Class Blade enclosure...............................................................................213
Flex-10 networks...............................................................................................................213
Capacity blocks...................................................................................................................214
X9700c (array controller with 12 disk drives).......................................................................215
Front view of an X9700c..............................................................................................215
Rear view of an X9700c..............................................................................................215
X9700cx (dense JBOD with 70 disk drives)..........................................................................215
Front view of an X9700cx............................................................................................216
Rear view of an X9700cx.............................................................................................216
Cabling diagrams................................................................................................................216
Capacity block cabling—Base and expansion cabinets........................................................216
Virtual Connect Flex-10 Ethernet module cabling—Base cabinet.............................................217
SAS switch cabling—Base cabinet.....................................................................................218
SAS switch cabling—Expansion cabinet..............................................................................218
8 Contents
Page 9
D Warnings and precautions......................................................................220
Electrostatic discharge information..........................................................................................220
Preventing electrostatic discharge.......................................................................................220
Grounding methods.....................................................................................................220
Grounding methods.........................................................................................................220
Equipment symbols...............................................................................................................221
Weight warning...................................................................................................................221
Rack warnings and precautions..............................................................................................221
Device warnings and precautions...........................................................................................222
E Regulatory information............................................................................224
Belarus Kazakhstan Russia marking.........................................................................................224
Turkey RoHS material content declaration.................................................................................224
Ukraine RoHS material content declaration..............................................................................224
Warranty information............................................................................................................224
Glossary..................................................................................................225
Index.......................................................................................................227
Contents 9
Page 10
1 Upgrading the StoreAll software to the 6.3 release
This chapter describes how to upgrade to the 6.3 StoreAll software release. You can also use this procedure for any subsequent 6.3.x patches.
IMPORTANT: Print the following table and check off each step as you complete it.
NOTE: (Upgrades from version 6.0.x) CIFS share permissions are granted on a global basis in
v6.0.X. When upgrading from v6.0.X, confirm that the correct share permissions are in place.
Table 1 Prerequisites checklist for all upgrades
Step completed?DescriptionStep
Verify that the entire cluster is currently running StoreAll 6.0 or later by entering the following command:
ibrix_version -l
1
IMPORTANT: All the StoreAll nodes must be at the same release.
If you are running a version of StoreAll earlier than 6.0, upgrade the product as
described in “Cascading Upgrades” (page 174).
If you are running StoreAll 6.0 or later, proceed with the upgrade steps in this section.
Verify that the /local partition contains at least 4 GB for the upgrade by using the following command:
df -kh /local
2
For 9720 systems, enable password-less access among the cluster nodes before starting the upgrade.
3
The 6.3 release requires that nodes hosting the agile Fusion Manager be registered on the cluster network. Run the following command to verify that nodes hosting the agile Fusion Manager have IP addresses on the cluster network:
4
ibrix_fm -l
If a node is configured on the user network, see “Node is not registered with the cluster
network ” (page 22) for a workaround.
NOTE: The Fusion Manager and all file serving nodes must be upgraded to the new
release at the same time. Do not change the active/passive Fusion Manager configuration during the upgrade.
Verify that the crash kernel parameter on all nodes has been set to 256M by viewing the default boot entry in the /etc/grub.conf file, as shown in the following example:
kernel /vmlinuz-2.6.18-194.el5 ro root=/dev/vg1/lv1 crashkernel=256M@16M
5
The /etc/grub.conf file might contain multiple instances of the crash kernel parameter. Make sure you modify each instance that appears in the file.
If you must modify the /etc/grub.conf file, follow the steps in this section:
1. Use SSH to access the active Fusion Manager (FM).
2. Do one of the following:
(Versions 6.2 and later) Place all passive FMs into nofmfailover mode:
ibrix_fm -m nofmfailover -A
(Versions earlier than 6.2) Place all passive FMs into maintenance mode:
ibrix_fm -m maintenance -A
3. Disable Segment Server Failover on each node in the cluster:
ibrix_server -m -U -h <node>
10 Upgrading the StoreAll software to the 6.3 release
Page 11
Table 1 Prerequisites checklist for all upgrades (continued)
Step completed?DescriptionStep
4. Set the crash kernel to 256M in the /etc/grub.conf file. The /etc/grub.conf
file might contain multiple instances of the crash kernel parameter. Make sure you modify each instance that appears in the file.
NOTE: Save a copy of the /etc/grub.conf file before you modify it.
The following example shows the crash kernel set to 256M:
kernel /vmlinuz-2.6.18-194.el5 ro root=/dev/vg1/lv1 crashkernel=256M@16M
5. Reboot the active FM.
6. Use SSH to access each passive FM and do the following:
a. Modify the /etc/grub.conf file as described in the previous steps. b. Reboot the node.
7. After all nodes in the cluster are back up, use SSH to access the active FM.
8. Place all disabled FMs back into passive mode:
ibrix_fm -m passive -A
9. Re-enable Segment Server Failover on each node:
ibrix_server -m -h <node>
If your cluster includes G6 servers, check the iLO2 firmware version. This issue does not affect G7 servers. The firmware must be at version 2.05 for HA to function properly. If your servers have an earlier version of the iLO2 firmware, run the CP014256.scexe script as described in the following steps:
6
1. Ensure that the /local/ibrix/ folder is empty prior to copying the contents of pkgfull.
When you upgrade the StoreAll software later in this chapter, this folder must contain only .rpm packages listed in the build manifest for the upgrade or the upgrade will fail.
2. Mount the pkg-full ISO image and copy the entire directory structure to the /local/
ibrix/ directory, as shown in the following example:
mount -o loop /local/pkg/ibrix-pkgfull-FS_6.3.72+IAS_6.3.72-x86_64.signed.iso /mnt/
3. Execute the firmware binary at the following location:
/local/ibrix/distrib/firmware/CP014256.scexe
Make sure StoreAll is running the latest firmware. For information on how to find the version of firmware that StoreAll is running, see the Administrator Guide for your release.
7
Verify that all file system nodes can “see” and “access” every segment logical volume that the file system node is configured for as either the owner or the backup by entering the following commands:
8
1. To view all segments, logical volume name, and owner, enter the following command
on one line:
ibrix_fs -i | egrep -e OWNER -e MIXED|awk '{ print $1, $3, $6, $2, $14, $5}' | tr " " "\t"
2. To verify the visibility of the correct segments on the current file system node enter the
following command on each file system node:
lvm lvs | awk '{print $1}'
Ensure that no active tasks are running. Stop any active remote replication, data tiering, or rebalancer tasks running on the cluster. (Use ibrix_task -l to list active tasks.) When the upgrade is complete, you can start the tasks again.
9
For additional information on how to stop a task, enter the ibrix_task command for the help.
11
Page 12
Table 1 Prerequisites checklist for all upgrades (continued)
Step completed?DescriptionStep
For 9720 systems, delete the existing vendor storage by entering the following command:
ibrix_vs -d -n EXDS
10
The vendor storage is registered automatically after the upgrade.
Record all host tunings, FS tunings and FS mounting options by using the following commands:
11
1. To display file system tunings, enter: ibrix_fs_tune -l
>/local/ibrix_fs_tune-l.txt
2. To display default StoreAll tunings and settings, enter: ibrix_host_tune -L
>/local/ibrix_host_tune-L.txt
3. To display all non-default configuration tunings and settings, enter: ibrix_host_tune
-q >/local/ibrix_host_tune-q.txt
Ensure that the "ibrix" local user account exists and it has the same UID number on all the servers in the cluster. If they do not have the same UID number, create the account and change the UIDs as needed to make them the same on all the servers. Similarly, ensure that the "ibrix-user" local user group exists and has the same GID number on all servers.
12
Enter the following commands on each node:
grep ibrix /etc/passwd
grep ibrix-user /etc/group
Ensure that all nodes are up and running. To determine the status of your cluster nodes, check the health of each server by either using the dashboard on the Management Console or entering the ibrix_health -S -i -h <hostname> command for each node in the cluster. At the top of the output look for “PASSED.”
13
If you are running StoreAll 6.2.x or earlier and you have one or more Express Query enabled file system, each one needs to be manually upgraded as described in “Upgrading
pre-6.3 Express Query enabled file systems” (page 19).
14
IMPORTANT: Run the steps in “Required steps before the StoreAll Upgrade for pre-6.3
Express Query enabled file systems” (page 19) before the upgrade. This section provides
steps for saving your custom metadata and audit log. After you upgrade the StoreAll software, run the steps in “Required steps after the StoreAll Upgrade for pre-6.3 Express
Query enabled file systems” (page 20). These post-upgrade steps are required for you to
preserve your custom metadata and audit log data.
Upgrading 9720 chassis firmware
Before upgrading 9720 systems to StoreAll software 6.3, the 9720 chassis firmware must be at version 4.0.0-13. If the firmware is not at this level, upgrade it before proceeding with the StoreAll upgrade.
To upgrade the firmware, complete the following steps:
1. Go to http://www.hp.com/go/StoreAll.
2. On the HP StoreAll Storage page, select HP Support & Drivers from the Support section.
3. On the Business Support Center, select Download Drivers and Software and then select HP 9720 Base Rack > Red Hat Enterprise Linux 5 Server (x86-64).
4. Click HP 9720 Storage Chassis Firmware version 4.0.0-13.
5. Download the firmware and install it as described in the HP 9720 Network Storage System
4.0.0-13 Release Notes.
Online upgrades for StoreAll software
Online upgrades are supported only from the StoreAll 6.x release. Upgrades from earlier StoreAll releases must use the appropriate offline upgrade procedure.
12 Upgrading the StoreAll software to the 6.3 release
Page 13
When performing an online upgrade, note the following:
File systems remain mounted and client I/O continues during the upgrade.
The upgrade process takes approximately 45 minutes, regardless of the number of nodes.
The total I/O interruption per node IP is four minutes, allowing for a failover time of two minutes
and a failback time of two additional minutes.
Client I/O having a timeout of more than two minutes is supported.
Preparing for the upgrade
To prepare for the upgrade, complete the following steps, ensure that high availability is enabled on each node in the cluster by running the following command:
ibrix_haconfig -l
If the command displays an Overall HA Configuration Checker Results - PASSED status, high availability is enabled on each node in the cluster. If the command returns Overall HA Configuration Checker Results - FAILED, complete the following list items based on the result returned for each component:
1. Make sure you have completed all steps in the upgrade checklist (Table 1 (page 10)).
2. If Failed was displayed for the HA Configuration or Auto Failover columns or both, perform
the steps described in the section “Configuring High Availability on the cluster” in the
administrator guide for your current release.
3. If Failed was displayed for the NIC or HBA Monitored columns, see the sections for
ibrix_nic -m -h <host> -A node_2/node_interface and ibrix_hba -m -h
<host> -p <World_Wide_Name> in the CLI guide for your current release.
Performing the upgrade
The online upgrade is supported only from the StoreAll 6.x releases.
IMPORTANT: Complete all steps provided in the Table 1 (page 10).
Complete the following steps:
1. This release is only available through the registered release process. To obtain the ISO image, contact HP Support to register for the release and obtain access to the software dropbox.
2. Ensure that the /local/ibrix/ folder is empty prior to copying the contents of pkgfull. The upgrade will fail if the /local/ibrix/ folder contains leftover .rpm packages not listed in the build manifest.
3. Mount the pkg-full ISO image and copy the entire directory structure to the /local/ibrix/ directory, as shown in the following example:
mount -o loop /local/pkg/ibrix-pkgfull-FS_6.3.72+IAS_6.3.72-x86_64.signed.iso /mnt/
4. Change the permissions of all components in the /local/ibrix/ directory structure by entering the following command:
chmod -R 777 /local/ibrix/
5. Change to the /local/ibrix/ directory.
cd /local/ibrix/
6. Run the upgrade script and follow the on-screen directions:
./auto_online_ibrixupgrade
7. Upgrade Linux StoreAll clients. See “Upgrading Linux StoreAll clients” (page 18).
8. If you received a new license from HP, install it as described in “Licensing” (page 135).
Online upgrades for StoreAll software 13
Page 14
After the upgrade
Complete these steps:
1. If your cluster nodes contain any 10Gb NICs, reboot these nodes to load the new driver. You must do this step before you upgrade the server firmware, as requested later in this procedure.
2. Upgrade your firmware as described in “Upgrading firmware” (page 136).
3. Start any remote replication, rebalancer, or data tiering tasks that were stopped before the upgrade.
4. If you have a file system version prior to version 6, you might have to make changes for snapshots and data retention, as mentioned in the following list:
Snapshots. Files used for snapshots must either be created on StoreAll software 6.0 or
later, or the pre-6.0 file system containing the files must be upgraded for snapshots. To upgrade a file system, use the upgrade60.sh utility. For more information, see
“Upgrading pre-6.0 file systems for software snapshots” (page 180).
Data retention. Files used for data retention (including WORM and auto-commit) must be
created on StoreAll software 6.1.1 or later, or the pre-6.1.1 file system containing the files must be upgraded for retention features. To upgrade a file system, use the ibrix_reten_adm -u -f FSNAME command. Additional steps are required before and after you run the ibrix_reten_adm -u -f FSNAME command. For more information, see “Upgrading pre-6.1.1 file systems for data retention features” (page 181).
5. If you have an Express Query enabled file system prior to version 6.3, manually complete each file system upgrade as described in “Required steps after the StoreAll Upgrade for pre-6.3
Express Query enabled file systems” (page 20).
Automated offline upgrades for StoreAll software 6.x to 6.3
Preparing for the upgrade
To prepare for the upgrade, complete the following steps:
1. Make sure you have completed all steps in the upgrade checklist (Table 1 (page 10)).
2. Stop all client I/O to the cluster or file systems. On the Linux client, use lsof </mountpoint> to show open files belonging to active processes.
3. Verify that all StoreAll file systems can be successfully unmounted from all FSN servers:
ibrix_umount -f fsname
Performing the upgrade
This upgrade method is supported only for upgrades from StoreAll software 6.x to the 6.3 release. Complete the following steps:
1. This release is only available through the registered release process. To obtain the ISO image, contact HP Support to register for the release and obtain access to the software dropbox.
2. Ensure that the /local/ibrix/ folder is empty prior to copying the contents of pkgfull. The upgrade will fail if the /local/ibrix/ folder contains leftover .rpm packages not listed in the build manifest.
3. Mount the pkg-full ISO image and copy the entire directory structure to the /local/ibrix/ directory, as shown in the following example:
mount -o loop /local/pkg/ibrix-pkgfull-FS_6.3.72+IAS_6.3.72-x86_64.signed.iso /mnt/
4. Change the permissions of all components in the /local/ibrix/ directory structure by entering the following command:
chmod -R 777 /local/ibrix/
14 Upgrading the StoreAll software to the 6.3 release
Page 15
5. Change to the /local/ibrix/ directory.
cd /local/ibrix/
6. Run the following upgrade script:
./auto_ibrixupgrade
The upgrade script automatically stops the necessary services and restarts them when the upgrade is complete. The upgrade script installs the Fusion Manager on all file serving nodes. The Fusion Manager is in active mode on the node where the upgrade was run, and is in passive mode on the other file serving nodes. If the cluster includes a dedicated Management Server, the Fusion Manager is installed in passive mode on that server.
7. Upgrade Linux StoreAll clients. See “Upgrading Linux StoreAll clients” (page 18).
8. If you received a new license from HP, install it as described in “Licensing” (page 135).
After the upgrade
Complete the following steps:
1. If your cluster nodes contain any 10Gb NICs, reboot these nodes to load the new driver. You must do this step before you upgrade the server firmware, as requested later in this procedure.
2. Upgrade your firmware as described in “Upgrading firmware” (page 136).
3. Mount file systems on Linux StoreAll clients.
4. If you have a file system version prior to version 6, you might have to make changes for snapshots and data retention, as mentioned in the following list:
Snapshots. Files used for snapshots must either be created on StoreAll software 6.0 or
later, or the pre-6.0 file system containing the files must be upgraded for snapshots. To upgrade a file system, use the upgrade60.sh utility. For more information, see
“Upgrading pre-6.0 file systems for software snapshots” (page 180).
Data retention. Files used for data retention (including WORM and auto-commit) must be
created on StoreAll software 6.1.1 or later, or the pre-6.1.1 file system containing the files must be upgraded for retention features. To upgrade a file system, use the ibrix_reten_adm -u -f FSNAME command. Additional steps are required before and after you run the ibrix_reten_adm -u -f FSNAME command. For more information, see “Upgrading pre-6.1.1 file systems for data retention features” (page 181).
5. If you have an Express Query enabled file system prior to version 6.3, manually complete each file system upgrade as described in “Required steps after the StoreAll Upgrade for pre-6.3
Express Query enabled file systems” (page 20).
Manual offline upgrades for StoreAll software 6.x to 6.3
Preparing for the upgrade
To prepare for the upgrade, complete the following steps:
1. Make sure you have completed all steps in the upgrade checklist (Table 1 (page 10)).
2. Verify that ssh shared keys have been set up. To do this, run the following command on the node hosting the active instance of the agile Fusion Manager:
ssh <server_name>
Repeat this command for each node in the cluster.
3. Verify that all file system node servers have separate file systems mounted on the following partitions by using the df command:
/
/local
Manual offline upgrades for StoreAll software 6.x to 6.3 15
Page 16
/stage
/alt
4. Verify that all FSN servers have a minimum of 4 GB of free/available storage on the /local partition by using the df command .
5. Verify that all FSN servers are not reporting any partition as 100% full (at least 5% free space) by using the df command .
6. Note any custom tuning parameters, such as file system mount options. When the upgrade is complete, you can reapply the parameters.
7. Stop all client I/O to the cluster or file systems. On the Linux client, use lsof </mountpoint> to show open files belonging to active processes.
8. On the active Fusion Manager, enter the following command to place the Fusion Manager into maintenance mode:
<ibrixhome>/bin/ibrix_fm -m nofmfailover -P -A
9. On the active Fusion Manager node, disable automated failover on all file serving nodes:
<ibrixhome>/bin/ibrix_server -m -U
10. Run the following command to verify that automated failover is off. In the output, the HA column should display off.
<ibrixhome>/bin/ibrix_server -l
11. Unmount file systems on Linux StoreAll clients:
ibrix_umount -f MOUNTPOINT
12. Stop the SMB, NFS and NDMP services on all nodes. Run the following commands on the node hosting the active Fusion Manager:
ibrix_server -s -t cifs -c stop
nl
ibrix_server -s -t nfs -c stop
nl
ibrix_server -s -t ndmp -c stop
If you are using SMB, verify that all likewise services are down on all file serving nodes:
ps -ef | grep likewise
Use kill -9 to stop any likewise services that are still running. If you are using NFS, verify that all NFS processes are stopped:
ps -ef | grep nfs
If necessary, use the following command to stop NFS services:
/etc/init.d/nfs stop
Use kill -9 to stop any NFS processes that are still running. If necessary, run the following command on all nodes to find any open file handles for the
mounted file systems:
lsof </mountpoint>
Use kill -9 to stop any processes that still have open file handles on the file systems.
13. Unmount each file system manually:
ibrix_umount -f FSNAME
Wait up to 15 minutes for the file systems to unmount. Troubleshoot any issues with unmounting file systems before proceeding with the upgrade.
See “File system unmount issues” (page 23).
16 Upgrading the StoreAll software to the 6.3 release
Page 17
Performing the upgrade manually
This upgrade method is supported only for upgrades from StoreAll software 6.x to the 6.3 release. Complete the following steps first for the server running the active Fusion Manager and then for the servers running the passive Fusion Managers:
1. This release is only available through the registered release process. To obtain the ISO image, contact HP Support to register for the release and obtain access to the software dropbox.
2. Ensure that the /local/ibrix/ folder is empty prior to copying the contents of pkgfull. The upgrade will fail if the /local/ibrix/ folder contains leftover .rpm packages not listed in the build manifest.
3. Mount the pkg-full ISO image and copy the entire directory structure to the /local/ibrix/ directory, as shown in the following example:
mount -o loop /local/pkg/ibrix-pkgfull-FS_6.3.72+IAS_6.3.72-x86_64.signed.iso /mnt/
4. Change the permissions of all components in the /local/ibrix/ directory structure by entering the following command:
chmod -R 777 /local/ibrix/
5. Change to the /local/ibrix/ directory, and then run the upgrade script:
cd /local/ibrix/
./ibrixupgrade f
The upgrade script automatically stops the necessary services and restarts them when the upgrade is complete. The upgrade script installs the Fusion Manager on the server.
6. After completing the previous steps for the server running the active Fusion Manager, repeat the steps for each of the servers running the passive Fusion Manager.
7. Upgrade Linux StoreAll clients. See “Upgrading Linux StoreAll clients” (page 18).
8. If you received a new license from HP, install it as described in “Licensing” (page 135).
After the upgrade
Complete the following steps:
1. If your cluster nodes contain any 10Gb NICs, reboot these nodes to load the new driver. You must do this step before you upgrade the server firmware, as requested later in this procedure.
2. Upgrade your firmware as described in “Upgrading firmware” (page 136).
3. Run the following command to rediscover physical volumes:
ibrix_pv -a
4. Apply any custom tuning parameters, such as mount options.
5. Remount all file systems:
ibrix_mount -f <fsname> -m </mountpoint>
6. Re-enable High Availability if used:
ibrix_server -m
7. Start any remote replication, rebalancer, or data tiering tasks that were stopped before the upgrade.
8. If you are using SMB, set the following parameters to synchronize the SMB software and the Fusion Manager database:
smb signing enabled
smb signing required
ignore_writethru
Manual offline upgrades for StoreAll software 6.x to 6.3 17
Page 18
Use ibrix_cifsconfig to set the parameters, specifying the value appropriate for your cluster (1=enabled, 0=disabled). The following examples set the parameters to the default values for the 6.3 release:
ibrix_cifsconfig -t -S "smb_signing_enabled=0, smb_signing_required=0"
ibrix_cifsconfig -t -S "ignore_writethru=1"
The SMB signing feature specifies whether clients must support SMB signing to access SMB shares. See the HP StoreAll Storage File System User Guide for more information about this feature. When ignore_writethru is enabled, StoreAll software ignores writethru buffering to improve SMB write performance on some user applications that request it.
9. Mount file systems on Linux StoreAll clients.
10. If you have a file system version prior to version 6, you might have to make changes for snapshots and data retention, as mentioned in the following list:
Snapshots. Files used for snapshots must either be created on StoreAll software 6.0 or
later, or the pre-6.0 file system containing the files must be upgraded for snapshots. To upgrade a file system, use the upgrade60.sh utility. For more information, see
“Upgrading pre-6.0 file systems for software snapshots” (page 180).
Data retention. Files used for data retention (including WORM and auto-commit) must be
created on StoreAll software 6.1.1 or later, or the pre-6.1.1 file system containing the files must be upgraded for retention features. To upgrade a file system, use the ibrix_reten_adm -u -f FSNAME command. Additional steps are required before and after you run the ibrix_reten_adm -u -f FSNAME command. For more information, see “Upgrading pre-6.1.1 file systems for data retention features” (page 181).
11. If you have an Express Query enabled file system prior to version 6.3, manually complete each file system upgrade as described in “Required steps after the StoreAll Upgrade for pre-6.3
Express Query enabled file systems” (page 20).
Upgrading Linux StoreAll clients
Be sure to upgrade the cluster nodes before upgrading Linux StoreAll clients. Complete the following steps on each client:
1. Download the latest HP StoreAll client 6.3 package.
2. Expand the tar file.
3. Run the upgrade script:
./ibrixupgrade -tc -f
The upgrade software automatically stops the necessary services and restarts them when the upgrade is complete.
4. Execute the following command to verify the client is running StoreAll software:
/etc/init.d/ibrix_client status IBRIX Filesystem Drivers loaded IBRIX IAD Server (pid 3208) running...
The IAD service should be running, as shown in the previous sample output. If it is not, contact HP Support.
Installing a minor kernel update on Linux clients
The StoreAll client software is upgraded automatically when you install a compatible Linux minor kernel update.
If you are planning to install a minor kernel update, first run the following command to verify that the update is compatible with the StoreAll client software:
/usr/local/ibrix/bin/verify_client_update <kernel_update_version>
18 Upgrading the StoreAll software to the 6.3 release
Page 19
The following example is for a RHEL 4.8 client with kernel version 2.6.9-89.ELsmp:
# /usr/local/ibrix/bin/verify_client_update 2.6.9-89.35.1.ELsmp
nl
Kernel update 2.6.9-89.35.1.ELsmp is compatible.
If the minor kernel update is compatible, install the update with the vendor RPM and reboot the system. The StoreAll client software is then automatically updated with the new kernel, and StoreAll client services start automatically. Use the ibrix_version -l -C command to verify the kernel version on the client.
NOTE: To use the verify_client command, the StoreAll client software must be installed.
Upgrading Windows StoreAll clients
Complete the following steps on each client:
1. Remove the old Windows StoreAll client software using the Add or Remove Programs utility
in the Control Panel.
2. Copy the Windows StoreAll client MSI file for the upgrade to the machine.
3. Launch the Windows Installer and follow the instructions to complete the upgrade.
4. Register the Windows StoreAll client again with the cluster and check the option to Start Service
after Registration.
5. Check Administrative Tools | Services to verify that the StoreAll client service is started.
6. Launch the Windows StoreAll client. On the Active Directory Settings tab, click Update to
retrieve the current Active Directory settings.
7. Mount file systems using the StoreAll Windows client GUI.
NOTE: If you are using Remote Desktop to perform an upgrade, you must log out and log back
in to see the drive mounted.
Upgrading pre-6.3 Express Query enabled file systems
The internal database schema format of Express Query enabled file systems changed between releases 6.2.x and 6.3. Each file system with Express Query enabled must be manually upgraded to 6.3. This section has instructions to be run before and after the StoreAll upgrade, on each of those file systems.
Required steps before the StoreAll Upgrade for pre-6.3 Express Query enabled file systems
These steps are required before the StoreAll Upgrade:
1. Mount all Express Query file systems on the cluster to be upgraded if they are not mounted yet.
2. Save your custom metadata by entering the following command:
/usr/local/ibrix/bin/MDExport.pl --dbconfig /usr/local/Metabox/scripts/startup.xml --database <FSNAME>
--outputfile /tmp/custAttributes.csv --user ibrix
3. Save your audit log data by entering the following commands:
ibrix_audit_reports -t time -f <FSNAME>
cp <path to report file printed from previous command> /tmp/auditData.csv
4. Disable auditing by entering the following command:
ibrix_fs -A -f <FSNAME> -oa audit_mode=off
In this instance <FSNAME> is the file system.
Upgrading Windows StoreAll clients 19
Page 20
5. If any archive API shares exist for the file system, delete them.
NOTE: To list all HTTP shares, enter the following command:
ibrix_httpshare -l
To list only REST API (Object API) shares, enter the following command:
ibrix_httpshare -l -f <FSNAME> -v 1 | grep "objectapi: true" | awk '{ print $2 }'
In this instance <FSNAME> is the file system.
Delete all HTTP shares, regular or REST API (Object API) by entering the following
command:
ibrix_httpshare -d -f <FSNAME>
In this instance <FSNAME> is the file system.
Delete a specific REST API (Object API) share by entering the following command:
ibrix_httpshare -d <SHARENAME> -c <PROFILENAME> -t <VHOSTNAME>
In this instance
<SHARENAME> is the share name.
<PROFILENAME> is the profile name.
<VHOSTNAME> is the virtual host name
6. Disable Express Query by entering the following command:
ibrix_fs -T -D -f <FSNAME>
7. Shut down Archiving daemons for Express Query by entering the following command:
ibrix_archiving -S -F
8. delete the internal database files for this file system by entering the following command:
rm -rf <FS_MOUNTPOINT>/.archiving/database
In this instance <FS_MOUNTPOINT> is the file system mount point.
Required steps after the StoreAll Upgrade for pre-6.3 Express Query enabled file systems
These steps are required after the StoreAll Upgrade:
1. Restart the Archiving daemons for Express Query:
2. Re-enable Express Query on the file systems you disabled it from before by entering the following command:
ibrix_fs -T -E -f <FSNAME>
In this instance <FSNAME> is the file system. Express Query will begin resynchronizing (repopulating) a new database for this filesystem.
3. Re-enable auditing if you had it running before (the default) by entering the following command:
ibrix_fs -A -f <FSNAME> -oa audit_mode=on
In this instance <FSNAME> is the file system.
4. Re-create REST API (Object API) shares deleted before the upgrade on each node in the cluster (if desired) by entering the following command:
20 Upgrading the StoreAll software to the 6.3 release
Page 21
NOTE: The REST API (Object API) functionality has expanded, and any REST API (Object
API) shares you created in previous releases are now referred to as HTTP-StoreAll REST API shares in file-compatible mode. The 6.3 release is also introducing a new type of share called HTTP-StoreAll REST API share in Object mode.
ibrix_httpshare -a <SHARENAME> -c <PROFILENAME> -t <VHOSTNAME> -f <FSNAME> -p <DIRPATH> -P <URLPATH> -S
ibrixRestApiMode=filecompatible, anonymous=true
In this instance:
<SHARENAME> is the share name.
<PROFILENAME> is the profile name.
<VHOSTNAME> is the virtual host name
<FSNAME> is the file system.
<DIRPATH> is the directory path.
<URLPATH> is the URL path.
<SETTINGLIST> is the settings.
5. Wait for the resynchronizer to complete by entering the following command until its output is
<FSNAME>: OK:
ibrix_archiving -l
6. Restore your audit log data by entering the following command:
MDImport -f <FSNAME> -n /tmp/auditData.csv -t audit
In this instance <FSNAME> is the file system.
7. Restore your custom metadata by entering the following command:
MDImport -f <FSNAME> -n /tmp/custAttributes.csv -t custom
In this instance <FSNAME> is the file system.
Troubleshooting upgrade issues
If the upgrade does not complete successfully, check the following items. For additional assistance, contact HP Support.
Automatic upgrade
Check the following:
If the initial execution of /usr/local/ibrix/setup/upgrade fails, check
/usr/local/ibrix/setup/upgrade.log for errors. It is imperative that all servers are
up and running the StoreAll software before you execute the upgrade script.
If the install of the new OS fails, power cycle the node. Try rebooting. If the install does not
begin after the reboot, power cycle the machine and select the upgrade line from the grub boot menu.
After the upgrade, check /usr/local/ibrix/setup/logs/postupgrade.log for errors
or warnings.
If configuration restore fails on any node, look at
/usr/local/ibrix/autocfg/logs/appliance.log on that node to determine which feature restore failed. Look at the specific feature log file under /usr/local/ibrix/setup/ logs/ for more detailed information.
Troubleshooting upgrade issues 21
Page 22
To retry the copy of configuration, use the following command:
/usr/local/ibrix/autocfg/bin/ibrixapp upgrade -f -s
If the install of the new image succeeds, but the configuration restore fails and you need to
revert the server to the previous install, run the following command and then reboot the machine. This step causes the server to boot from the old version (the alternate partition).
/usr/local/ibrix/setup/boot_info -r
If the public network interface is down and inaccessible for any node, power cycle that node.
NOTE: Each node stores its ibrixupgrade.log file in /tmp.
Manual upgrade
Check the following:
If the restore script fails, check /usr/local/ibrix/setup/logs/restore.log for
details.
If configuration restore fails, look at /usr/local/ibrix/autocfg/logs/appliance.log
to determine which feature restore failed. Look at the specific feature log file under /usr/ local/ibrix/setup/logs/ for more detailed information.
To retry the copy of configuration, use the following command:
/usr/local/ibrix/autocfg/bin/ibrixapp upgrade -f -s
Offline upgrade fails because iLO firmware is out of date
If the iLO2 firmware is out of date on a node, the auto_ibrixupgrade script will fail. The /usr/ local/ibrix/setup/logs/auto_ibrixupgrade.log reports the failure and describes how to update the firmware.
After updating the firmware, run the following command on the node to complete the StoreAll software upgrade:
/local/ibrix/ibrixupgrade -f
Node is not registered with the cluster network
Nodes hosting the agile Fusion Manager must be registered with the cluster network. If the ibrix_fm command reports that the IP address for a node is on the user network, you will need to reassign the IP address to the cluster network. For example, the following commands report that node ib51-101, which is hosting the active Fusion Manager, has an IP address on the user network (192.168.51.101) instead of the cluster network.
[root@ib51-101 ibrix]# ibrix_fm -i FusionServer: ib51-101 (active, quorum is running) ================================================== [root@ib51-101 ibrix]# ibrix_fm -l NAME IP ADDRESS
-------- ---------­ib51-101 192.168.51.101 ib51-102 10.10.51.102
1. If the node is hosting the active Fusion Manager, as in this example, stop the Fusion Manager on that node:
[root@ib51-101 ibrix]# /etc/init.d/ibrix_fusionmanager stop Stopping Fusion Manager Daemon [ OK ] [root@ib51-101 ibrix]#
22 Upgrading the StoreAll software to the 6.3 release
Page 23
2. On the node now hosting the active Fusion Manager (ib51-102 in the example), unregister node ib51-101:
[root@ib51-102 ~]# ibrix_fm -u ib51-101 Command succeeded!
3. On the node hosting the active Fusion Manager, register node ib51-101 and assign the correct IP address:
[root@ib51-102 ~]# ibrix_fm -R ib51-101 -I 10.10.51.101 Command succeeded!
NOTE: When registering a Fusion Manager, be sure the hostname specified with -R matches
the hostname of the server.
The ibrix_fm commands now show that node ib51-101 has the correct IP address and node
ib51-102 is hosting the active Fusion Manager.
[root@ib51-102 ~]# ibrix_fm -f NAME IP ADDRESS
-------- ----------
ib51-101 10.10.51.101 ib51-102 10.10.51.102 [root@ib51-102 ~]# ibrix_fm -i FusionServer: ib51-102 (active, quorum is running) ==================================================
File system unmount issues
If a file system does not unmount successfully, perform the following steps on all servers:
1. Run the following commands:
chkconfig ibrix_server off
chkconfig ibrix_ndmp off
chkconfig ibrix_fusionmanager off
2. Reboot all servers.
3. Run the following commands to move the services back to the on state. The commands do not start the services.
chkconfig ibrix_server on
chkconfig ibrix_ndmp on
chkconfig ibrix_fusionmanager on
4. Run the following commands to start the services:
service ibrix_fusionmanager start
service ibrix_server start
5. Unmount the file systems and continue with the upgrade procedure.
File system in MIF state after StoreAll software 6.3 upgrade
If an Express Query enabled file systems ended in MIF state after completing the StoreAll software upgrade process (ibrix_archiving -l prints <FSNAME>: MIF), check the MIF status by running the following command:
cat /<FSNAME>/.archiving/database/serialization/ManualInterventionFailure
If the command’s output displays Version mismatch, upgrade needed (as shown in the following output), steps were not performed as described in “Required steps after the StoreAll
Upgrade for pre-6.3 Express Query enabled file systems” (page 20).
MIF:Version mismatch, upgrade needed. (error code 14)
Troubleshooting upgrade issues 23
Page 24
If you did not see the Version mismatch, upgrade needed in the command’s output, see
“Troubleshooting an Express Query Manual Intervention Failure (MIF)” (page 152).
Perform the following steps only if you see the Version mismatch, upgrade needed in the command’s output:
1. Disable auditing by entering the following command:
ibrix_fs -A -f <FSNAME> -oa audit_mode=off
In this instance <FSNAME> is the file system.
2. Disable Express Query by entering the following command:
ibrix_fs -T -D -f <FSNAME>
In this instance <FSNAME> is the file system.
3. Delete the internal database files for this file system by entering the following command:
rm -rf <FS_MOUNTPOINT>/.archiving/database
In this instance <FS_MOUNTPOINT> is the file system mount point.
4. Clear the MIF condition by running the following command:
ibrix_archiving -C <FSNAME>
In this instance <FSNAME> is the file system.
5. Re-enable Express Query on the file systems:
ibrix_fs -T -E -f <FSNAME>
In this instance <FSNAME> is the file system. Express Query will begin resynchronizing (repopulating) a new database for this file system.
6. Re-enable auditing if you had it running before (the default).
ibrix_fs -A -f <FSNAME> -oa audit_mode=on
In this instance <FSNAME> is the file system.
7. Restore your audit log data:
MDImport -f <FSNAME> -n /tmp/auditData.csv -t audit
In this instance <FSNAME> is the file system.
8. Restore your custom metadata:
MDImport -f <FSNAME> -n /tmp/custAttributes.csv -t custom
In this instance <FSNAME> is the file system.
24 Upgrading the StoreAll software to the 6.3 release
Page 25
2 Product description
HP X9720 and 9730 Storage are a scalable, network-attached storage (NAS) product. The system combines HP StoreAll software with HP server and storage hardware to create a cluster of file serving nodes.
System features
The X9720 and 9730 Storage provide the following features:
Segmented, scalable file system under a single namespace
NFS, SMB(Server Message Block), FTP, and HTTP support for accessing file system data
Centralized CLI and GUI for cluster management
Policy management
Continuous remote replication
Dual redundant paths to all storage components
Gigabytes-per-second of throughput
IMPORTANT: It is important to keep regular backups of the cluster configuration. See “Backing
up the Fusion Manager configuration” (page 76) for more information.
System components
IMPORTANT: All software included with the X9720/9730 Storage is for the sole purpose of
operating the system. Do not add, remove, or change any software unless instructed to do so by HP-authorized personnel.
For information about 9730 system components and cabling, see “StoreAll 9730 component and
cabling diagrams” (page 201).
For information about X9720 system components and cabling, see “The IBRIX X9720 component
and cabling diagrams” (page 207).
For a complete list of system components, see the HP StoreAll Storage QuickSpecs, which are available at:
http://www.hp.com/go/StoreAll
HP StoreAll software features
HP StoreAll software is a scale-out, network-attached storage solution including a parallel file system for clusters, an integrated volume manager, high-availability features such as automatic failover of multiple components, and a centralized management interface. StoreAll software can scale to thousands of nodes.
Based on a segmented file system architecture, StoreAll software integrates I/O and storage systems into a single clustered environment that can be shared across multiple applications and managed from a central Fusion Manager.
StoreAll software is designed to operate with high-performance computing applications that require high I/O bandwidth, high IOPS throughput, and scalable configurations.
Some of the key features and benefits are as follows:
Scalable configuration. You can add servers to scale performance and add storage devices
to scale capacity.
Single namespace. All directories and files are contained in the same namespace.
System features 25
Page 26
Multiple environments. Operates in both the SAN and DAS environments.
High availability. The high-availability software protects servers.
Tuning capability. The system can be tuned for large or small-block I/O.
Flexible configuration. Segments can be migrated dynamically for rebalancing and data
tiering.
High availability and redundancy
The segmented architecture is the basis for fault resilience—loss of access to one or more segments does not render the entire file system inaccessible. Individual segments can be taken offline temporarily for maintenance operations and then returned to the file system.
To ensure continuous data access, StoreAll software provides manual and automated failover protection at various points:
Server. A failed node is powered down and a designated standby server assumes all of its
segment management duties.
Segment. Ownership of each segment on a failed node is transferred to a designated standby
server.
Network interface. The IP address of a failed network interface is transferred to a standby
network interface until the original network interface is operational again.
Storage connection. For servers with HBA-protected Fibre Channel access, failure of the HBA
triggers failover of the node to a designated standby server.
26 Product description
Page 27
3 Getting started
This chapter describes how to log in to the system, boot the system and individual server blades, change passwords, and back up the Fusion Manager configuration. It also describes the StoreAll software management interfaces.
IMPORTANT: Follow these guidelines when using your system:
Do not modify any parameters of the operating system or kernel, or update any part of the
X9720/9730 Storage unless instructed to do so by HP; otherwise, the system could fail to operate properly.
File serving nodes are tuned for file serving operations. With the exception of supported
backup programs, do not run other applications directly on the nodes.
Setting up the X9720/9730 Storage
An HP service specialist sets up the system at your site, including the following tasks:
Installation steps
Before starting the installation, ensure that the product components are in the location where
they will be installed. Remove the product from the shipping cartons, confirm the contents of each carton against the list of included items, check for any physical damage to the exterior of the product, and connect the product to the power and network provided by you.
Review your server, network, and storage environment relevant to the HP Enterprise NAS
product implementation to validate that prerequisites have been met.
Validate that your file system performance, availability, and manageability requirements have
not changed since the service planning phase. Finalize the HP Enterprise NAS product implementation plan and software configuration.
Implement the documented and agreed-upon configuration based on the information you
provided on the pre-delivery checklist.
Document configuration details.
Additional configuration steps
When your system is up and running, you can continue configuring the cluster and file systems. The Management Console and CLI are used to perform most operations. (Some features described here may be configured for you as part of the system installation.)
Cluster. Configure the following as needed:
Firewall ports. See “Configuring ports for a firewall” (page 35)
HP Insight Remote Support and Phone Home. See “Configuring HP Insight Remote Support
on StoreAll systems” (page 36).
Virtual interfaces for client access. See “Configuring virtual interfaces for client access”
(page 49).
Cluster event notification through email or SNMP. See “Configuring cluster event notification”
(page 70).
Fusion Manager backups. See “Backing up the Fusion Manager configuration” (page 76).
NDMP backups. See “Using NDMP backup applications” (page 76).
Statistics tool. See “Using the Statistics tool” (page 108).
Ibrix Collect. See “Collecting information for HP Support with the IbrixCollect” (page 143).
Setting up the X9720/9730 Storage 27
Page 28
File systems. Set up the following features as needed:
NFS, SMB (Server Message Block), FTP, or HTTP. Configure the methods you will use to access
file system data.
Quotas. Configure user, group, and directory tree quotas as needed.
Remote replication. Use this feature to replicate changes in a source file system on one cluster
to a target file system on either the same cluster or a second cluster.
Data retention and validation. Use this feature to manage WORM and retained files.
Antivirus support. This feature is used with supported Antivirus software, allowing you to scan
files on a StoreAll file system.
StoreAll software snapshots. This feature allows you to capture a point-in-time copy of a file
system or directory for online backup purposes and to simplify recovery of files from accidental deletion. Users can access the file system or directory as it appeared at the instant of the snapshot.
File allocation. Use this feature to specify the manner in which segments are selected for storing
new files and directories.
Data tiering. Use this feature to move files to specific tiers based on file attributes.
For more information about these file system features, see the HP StoreAll Storage File System User
Guide.
Localization support
Red Hat Enterprise Linux 5 uses the UTF-8 (8-bit Unicode Transformation Format) encoding for supported locales. This allows you to create, edit and view documents written in different locales using UTF-8. StoreAll software supports modifying the /etc/sysconfig/i18n configuration file for your locale. The following example sets the LANG and SUPPORTED variables for multiple character sets:
LANG="ko_KR.utf8" SUPPORTED="en_US.utf8:en_US:en:ko_KR.utf8:ko_KR:ko:zh_CN.utf8:zh_CN:zh" SYSFONT="lat0-sun16" SYSFONTACM="iso15"
Logging in to the system
Using the network
Use ssh to log in remotely from another host. You can log in to any server using any configured site network interface (eth1, eth2, or bond1).
With ssh and the root user, after you log in to any server, your .ssh/known_hosts file will work with any server in the cluster.
The original server blades in your cluster are configured to support password-less ssh. After you have connected to one server, you can connect to the other servers without specifying the root password again. To enable the same support for other server blades, or to access the system itself without specifying a password, add the keys of the other servers to .ssh/authorized keys on each server blade.
Using the TFT keyboard/monitor
If the site network is down, you can log in to the console as follows:
1. Pull out the keyboard monitor (See “Front view of a base cabinet” (page 207)).
2. Access the on-screen display (OSD) main dialog box by pressing Print Scrn or by pressing Ctrl twice within one second.
28 Getting started
Page 29
3. Double-click the first server name.
4. Log in as normal.
NOTE: By default, the first port is connected with the dongle to the front of blade 1 (that is, server
1). If server 1 is down, move the dongle to another blade.
Using the serial link on the Onboard Administrator
If you are connected to a terminal server, you can log in through the serial link on the Onboard Administrator.
Booting the system and individual server blades
Before booting the system, ensure that all of the system components other than the server blades—the capacity blocks or performance modules and so on—are turned on. By default, server blades boot whenever power is applied to the system performance chassis (c-Class Blade enclosure). If all server blades are powered off, you can boot the system as follows:
1. Press the power button on server blade 1.
2. Log in as root to server 1.
3. Power on the remaining server blades:
ibrix_server -P on -h <hostname>
NOTE: Alternatively, press the power button on all of the remaining servers. There is no
need to wait for the first server blade to boot.
Management interfaces
Cluster operations are managed through the StoreAll Fusion Manager, which provides both a Management Console and a CLI. Most operations can be performed from either the StoreAll Management Console or the CLI.
The following operations can be performed only from the CLI:
SNMP configuration (ibrix_snmpagent, ibrix_snmpgroup, ibrix_snmptrap,
ibrix_snmpuser, ibrix_snmpview)
Health checks (ibrix_haconfig, ibrix_health, ibrix_healthconfig)
Raw storage management (ibrix_pv, ibrix_vg, ibrix_lv)
Fusion Manager operations (ibrix_fm) and Fusion Manager tuning (ibrix_fm_tune)
File system checks (ibrix_fsck)
Kernel profiling (ibrix_profile)
Cluster configuration (ibrix_clusterconfig)
Configuration database consistency (ibrix_dbck)
Shell task management (ibrix_shell)
The following operations can be performed only from the StoreAll Management Console:
Scheduling recurring data validation scans
Scheduling recurring software snapshots
Using the StoreAll Management Console
The StoreAll Management Console is a browser-based interface to the Fusion Manager. See the release notes for the supported browsers and other software required to view charts on the dashboard. You can open multiple Management Console windows as necessary.
Booting the system and individual server blades 29
Page 30
If you are using HTTP to access the Management Console, open a web browser and navigate to the following location, specifying port 80:
http://<management_console_IP>:80/fusion
If you are using HTTPS to access the Management Console, navigate to the following location, specifying port 443:
https://<management_console_IP>:443/fusion
In these URLs, <management_console_IP> is the IP address of the Fusion Manager user VIF. The Management Console prompts for your user name and password. The default administrative
user is ibrix. Enter the password that was assigned to this user when the system was installed. (You can change the password using the Linux passwd command.) To allow other users to access the Management Console, see “Adding user accounts for Management Console access” (page 33).
Upon login, the Management Console dashboard opens, allowing you to monitor the entire cluster. (See the online help for information about all Management Console displays and operations.) There are three parts to the dashboard: System Status, Cluster Overview, and the Navigator.
30 Getting started
Page 31
System Status
The System Status section lists the number of cluster events that have occurred in the last 24 hours. There are three types of events:
Alerts. Disruptive events that can result in loss of access to file system data. Examples are a segment that is unavailable or a server that cannot be accessed.
Warnings. Potentially disruptive conditions where file system access is not lost, but if the situation is not addressed, it can escalate to an alert condition. Examples are a very high server CPU utilization level or a quota limit close to the maximum.
Information. Normal events that change the cluster. Examples are mounting a file system or creating a segment.
Cluster Overview
The Cluster Overview provides the following information:
Capacity
The amount of cluster storage space that is currently free or in use.
File systems
The current health status of the file systems in the cluster. The overview reports the number of file systems in each state (healthy, experiencing a warning, experiencing an alert, or unknown).
Segment Servers
The current health status of the file serving nodes in the cluster. The overview reports the number of nodes in each state (healthy, experiencing a warning, experiencing an alert, or unknown).
Services
Whether the specified file system services are currently running:
One or more tasks are running.
No tasks are running.
Management interfaces 31
Page 32
Statistics
Historical performance graphs for the following items:
Network I/O (MB/s)
Disk I/O (MB/s)
CPU usage (%)
Memory usage (%)
On each graph, the X-axis represents time and the Y-axis represents performance. Use the Statistics menu to select the servers to monitor (up to two), to change the maximum
value for the Y-axis, and to show or hide resource usage distribution for CPU and memory.
Recent Events
The most recent cluster events. Use the Recent Events menu to select the type of events to display.
You can also access certain menu items directly from the Cluster Overview. Mouse over the Capacity, Filesystems or Segment Server indicators to see the available options.
Navigator
The Navigator appears on the left side of the window and displays the cluster hierarchy. You can use the Navigator to drill down in the cluster configuration to add, view, or change cluster objects such as file systems or storage, and to initiate or view tasks such as snapshots or replication. When you select an object, a details page shows a summary for that object. The lower Navigator allows you to view details for the selected object, or to initiate a task. In the following example, we selected Filesystems in the upper Navigator and Mountpoints in the lower Navigator to see details about the mounts for file system ifs1.
NOTE: When you perform an operation on the GUI, a spinning finger is displayed until the
operation is complete. However, if you use Windows Remote Desktop to access the GUI, the spinning finger is not displayed.
Customizing the GUI
For most tables in the GUI, you can specify the columns that you want to display and the sort order of each column. When this feature is available, mousing over a column causes the label to change color and a pointer to appear. Click the pointer to see the available options. In the following
32 Getting started
Page 33
example, you can sort the contents of the Mountpoint column in ascending or descending order, and you can select the columns that you want to appear in the display.
Adding user accounts for Management Console access
StoreAll software supports administrative and user roles. When users log in under the administrative role, they can configure the cluster and initiate operations such as remote replication or snapshots. When users log in under the user role, they can view the cluster configuration and status, but cannot make configuration changes or initiate operations. The default administrative user name is ibrix. The default regular username is ibrixuser.
User names for the administrative and user roles are defined in the /etc/group file. Administrative users are specified in the ibrix-admin group, and regular users are specified in the ibrix-user group. These groups are created when StoreAll software is installed. The following entries in the
/etc/group file show the default users in these groups:
ibrix-admin:x:501:root,ibrix ibrix-user:x:502:ibrix,ibrixUser,ibrixuser
You can add other users to these groups as needed, using Linux procedures. For example:
adduser -G ibrix-<groupname> <username>
When using the adduser command, be sure to include the -G option.
Using the CLI
The administrative commands described in this guide must be executed on the Fusion Manager host and require root privileges. The commands are located in $IBRIXHOMEbin. For complete information about the commands, see the HP StoreAll Network Storage System CLI Reference Guide.
When using ssh to access the machine hosting the Fusion Manager, specify the IP address of the Fusion Manager user VIF.
Starting the array management software
Depending on the array type, you can launch the array management software from the GUI. In the Navigator, select Vendor Storage, select your array from the Vendor Storage page, and click Launch Storage Management.
Management interfaces 33
Page 34
StoreAll client interfaces
StoreAll clients can access the Fusion Manager as follows:
Linux clients. Use Linux client commands for tasks such as mounting or unmounting file systems
and displaying statistics. See the HP StoreAll Storage CLI Reference Guide for details about these commands.
Windows clients. Use the Windows client GUI for tasks such as mounting or unmounting file
systems and registering Windows clients.
Using the Windows StoreAll client GUI
The Windows StoreAll client GUI is the client interface to the Fusion Manager. To open the GUI, double-click the desktop icon or select the StoreAll client program from the Start menu on the client. The client program contains tabs organized by function.
NOTE: The Windows StoreAll client GUI can be started only by users with Administrative
privileges.
Status. Shows the client’s Fusion Manager registration status and mounted file systems, and
provides access to the IAD log for troubleshooting.
Registration. Registers the client with the Fusion Manager, as described in the HP StoreAll
Storage Installation Guide.
Mount. Mounts a file system. Select the Cluster Name from the list (the cluster name is the
Fusion Manager name), enter the name of the file system to mount, select a drive, and then click Mount. (If you are using Remote Desktop to access the client and the drive letter does not appear, log out and log in again.)
Umount. Unmounts a file system.
Tune Host. Tunable parameters include the NIC to prefer (the client uses the cluster interface
by default unless a different network interface is preferred for it), the communications protocol (UDP or TCP), and the number of server threads to use.
Active Directory Settings. Displays current Active Directory settings.
For more information, see the client GUI online help.
StoreAll software manpages
StoreAll software provides manpages for most of its commands. To view the manpages, set the MANPATH variable to include the path to the manpages and then export it. The manpages are in the $IBRIXHOME/man directory. For example, if $IBRIXHOME is /usr/local/ibrix (the default), set the MANPATH variable as follows and then export the variable:
MANPATH=$MANPATH:/usr/local/ibrix/man
Changing passwords
IMPORTANT: The hpspAdmin user account is added during the StoreAll software installation
and is used internally. Do not remove this account or change its password.
You can change the following passwords on your system:
Hardware passwords. See the documentation for the specific hardware for more information.
Root password. Use the passwd(8) command on each server.
StoreAll software user password. This password is created during installation and is used to
log in to the GUI. The default is ibrix. You can change the password using the Linux passwd command.
34 Getting started
Page 35
# passwd ibrix
You will be prompted to enter the new password.
Configuring ports for a firewall
IMPORTANT: To avoid unintended consequences, HP recommends that you configure the firewall
during scheduled maintenance times.
When configuring a firewall, you should be aware of the following:
SELinux should be disabled.
By default, NFS uses random port numbers for operations such as mounting and locking.
These ports must be fixed so that they can be listed as exceptions in a firewall configuration file. For example, you will need to lock specific ports for rpc.statd, rpc.lockd, rpc.mountd, and rpc.quotad.
It is best to allow all ICMP types on all networks; however, you can limit ICMP to types 0, 3,
8, and 11 if necessary.
Be sure to open the ports listed in the following table.
DescriptionPort
SSH22/tcp
SSH for Onboard Administrator (OA); only for X9720/9730 blades9022/tcp
NTP123/tcp, 123/upd
Multicast DNS, 224.0.0.2515353/udp
netperf tool12865/tcp
Fusion Manager to file serving nodes80/tcp
443/tcp
Fusion Manager and StoreAll file system5432/tcp 8008/tcp 9002/tcp 9005/tcp 9008/tcp 9009/tcp 9200/tcp
Between file serving nodes and NFS clients (user network)2049/tcp, 2049/udp
NFS111/tcp, 111/udp
RPC875/tcp, 875/udp
quota32803/tcp
lockmanager32769/udp
lockmanager892/tcp, 892/udp
mount daemon662/tcp, 662/udp
stat2020/tcp, 2020/udp
stat outgoing4000:4003/tcp
reserved for use by a custom application (CMU) and can be disabled if not used
Configuring ports for a firewall 35
Page 36
DescriptionPort
137/udp Between file serving nodes and SMB clients (user network) 138/udp 139/tcp 445/tcp
Between file serving nodes and StoreAll clients (user network)9000:9002/tcp
9000:9200/udp
Between file serving nodes and FTP clients (user network)20/tcp, 20/udp
21/tcp, 21/udp
Between GUI and clients that need to access the GUI7777/tcp
8080/tcp
Dataprotector5555/tcp, 5555/udp
Internet Printing Protocol (IPP)631/tcp, 631/udp
ICAP1344/tcp, 1344/udp
Configuring NTP servers
When the cluster is initially set up, primary and secondary NTP servers are configured to provide time synchronization with an external time source. The list of NTP servers is stored in the Fusion Manager configuration. The active Fusion Manager node synchronizes its time with the external source. The other file serving nodes synchronize their time with the active Fusion Manager node. In the absence of an external time source, the local hardware clock on the agile Fusion Manager node is used as the time source. This configuration method ensures that the time is synchronized on all cluster nodes, even in the absence of an external time source.
On StoreAll clients, the time is not synchronized with the cluster nodes. You will need to configure NTP servers on StoreAll clients.
List the currently configured NTP servers:
ibrix_clusterconfig -i -N
Specify a new list of NTP servers:
ibrix_clusterconfig -c -N SERVER1[,...,SERVERn]
Configuring HP Insight Remote Support on StoreAll systems
IMPORTANT: In the StoreAll software 6.1 release, the default port for the StoreAll SNMP agent
changed from 5061 to 161. This port number cannot be changed.
36 Getting started
Page 37
NOTE: Configuring Phone Home enables the hp-snmp-agents service internally. As a result, a
large number of error messages, such as the following, could occasionally appear in
/var/log/hp-snmp-agents/cma.log:
Feb 08 13:05:54 x946s1 cmahostd[25579]: cmahostd: Can't update OS filesys object: /ifs1 (PEER3023)
The cmahostd daemon is part of the hp-snmp-agents service. This error message occurs because the file system exceeds <n> TB. If this occurs, HP recommends that before you perform operations such as unmounting a file system or stopping services on a file serving node (using the
ibrix_server command), you disable the hp-snmp-agent service on each server first:
service hp-snmp-agents stop
After remounting the file system or restarting services on the file serving node, restart the hp-snmp-agents service on each server:
service hp-snmp-agents start
Prerequisites
The required components for supporting StoreAll systems are preinstalled on the file serving nodes. You must install HP Insight Remote Support on a separate Windows system termed the Central Management Server (CMS):
HP Insight Manager (HP SIM). This software manages HP systems and is the easiest and least
expensive way to maximize system uptime and health.
Insight Remote Support Advanced (IRSA). This version is integrated with HP Systems Insight
Manager (SIM). It provides comprehensive remote monitoring, notification/advisories, dispatch, and proactive service support. IRSA and HP SIM together are referred to as the CMS.
The Phone Home configuration does not support backup or standby NICs that are used for
NIC failover. If backup NICs are currently configured, remove the backup NICs from all nodes before configuring Phone Home. After a successful Phone Home configuration, you can reconfigure the backup NICs.
The following versions of the software are supported.
HP SIM 6.3 and IRSA 5.6
HP SIM 7.1 and IRSA 5.7
IMPORTANT: Keep in mind the following:
For each file serving node, add the physical user network interfaces (by entering the
ibrix_nic command or selecting the Server > NICs tab in the GUI) so the interfaces can communicate with HP SIM.
Ensure that all user network interfaces on each file serving node can communicate with the
CMS.
IMPORTANT: Insight Remote Support Standard (IRSS ) is not supported with StoreAll software
6.1 and later.
For product descriptions and information about downloading the software, see the HP Insight Remote Support Software web page:
http://www.hp.com/go/insightremotesupport
For information about HP SIM:
http://www.hp.com/products/systeminsightmanager
For IRSA documentation:
http://www.hp.com/go/insightremoteadvanced-docs
Configuring HP Insight Remote Support on StoreAll systems 37
Page 38
IMPORTANT: You must compile and manually register the StoreAll MIB file by using HP Systems
Insight Manager:
1. Download ibrixMib.txt from /usr/local/ibrix/doc/.
2. Rename the file to ibrixMib.mib.
3. In HP Systems Insight Manager, complete the following steps: a. Unregister the existing MIB by entering the following command:
<BASE>\mibs>mxmib -d ibrixMib.mib
b. Copy the ibrixMib.mib file to the <BASE>\mibs directory, and then enter the following
commands:
<BASE>\mibs>mcompile ibrixMib.mib
<BASE>\mibs>mxmib -a ibrixMib.cfg
For more information about the MIB, see the "Compiling and customizing MIBs" chapter in the HP Systems Insight Manager User Guide, which is available at:
http://www.hp.com/go/insightmanagement/sim/
Click Support & Documents and then click Manuals. Navigate to the user guide.
Limitations
Note the following:
For StoreAll systems, the HP Insight Remote Support implementation is limited to hardware
events.
The X9720 CX storage device is not supported for HP Insight Remote Support.
Configuring the StoreAll cluster for Insight Remote Support
To enable X9720/9730 systems for remote support, you need to configure the Virtual SAS Manager, Virtual Connect Manager, and Phone Home settings. All nodes in the cluster should be up when you perform this step.
NOTE: Configuring Phone Home removes any previous StoreAll snmp configuration details and
populates the SNMP configuration with Phone Home configuration details. When Phone Home is enabled, you cannot use ibrix_snmpagent to edit or change the snmp agent configuration. However, you can use ibrix_snmptrap to add trapsink IPs and you can use ibrix_event to associate events to the trapsink IPs.
Registering Onboard Administrator
The Onboard Administrator is registered automatically.
Configuring the Virtual SAS Manager
On 9730 systems, the SNMP service is disabled by default on the SAS switches. To enable the SNMP service manually and provide the trapsink IP on all SAS switches, complete these steps:
1. Open the Virtual SAS Manager from the OA. Select OA IP > Interconnect Bays > SAS Switch > Management Console.
2. On the Virtual SAS Manager, open the Maintain tab, click SAS Blade Switch, and select SNMP Settings. On the dialog box, enable the SNMP service and supply the information needed for alerts.
38 Getting started
Page 39
Configuring the Virtual Connect Manager
To configure the Virtual Connect Manager on an X9720/9730 system, complete the following steps:
1. From the Onboard Administrator, select OA IP > Interconnect Bays > HP VC Flex-10 > Management Console.
2. On the HP Virtual Connect Manager, open the SNMP Configuration tab.
3. Configure the SNMP Trap Destination:
Enter the Destination Name and IP Address (the CMS IP).
Select SNMPv1 as the SNMP Trap Format.
Specify public as the Community String.
4. Select all trap categories, VCM traps, and trap severities.
Configuring HP Insight Remote Support on StoreAll systems 39
Page 40
Configuring Phone Home settings
To configure Phone Home on the GUI, select Cluster Configuration in the upper Navigator and then select Phone Home in the lower Navigator. The Phone Home Setup panel shows the current configuration.
40 Getting started
Page 41
Click Enable to configure the settings on the Phone Home Settings dialog box. Skip the Software Entitlement ID field; it is not currently used.
The time required to enable Phone Home depends on the number of devices in the cluster, with larger clusters requiring more time.
To configure Phone Home settings from the CLI, use the following command:
ibrix_phonehome -c -i <IP Address of the Central Management Server> -P Country Name [-z Software Entitlement ID] [-r Read Community] [-w Write Community] [-t System Contact] [-n System Name] [-o System Location]
For example:
ibrix_phonehome -c -i 99.2.4.75 -P US -r public -w private -t Admin -n SYS01.US -o Colorado
Next, configure Insight Remote Support for the version of HP SIM you are using:
HP SIM 7.1 and IRS 5.7. See “Configuring Insight Remote Support for HP SIM 7.1 and IRS
5.7” (page 41).
HP SIM 6.3 and IRS 5.6. See “Configuring Insight Remote Support for HP SIM 6.3 and IRS
5.6” (page 44).
Configuring Insight Remote Support for HP SIM 7.1 and IRS 5.7
To configure Insight Remote Support, complete these steps:
1. Configure Entitlements for the servers and chassis in your system.
2. Discover devices on HP SIM.
Configuring Entitlements for servers and chassis
Expand Phone Home in the lower Navigator. When you select Chassis or Servers, the GUI displays the current Entitlements for that type of device. The following example shows Entitlements for the servers in the cluster.
Configuring HP Insight Remote Support on StoreAll systems 41
Page 42
To configure Entitlements, select a device and click Modify to open the dialog box for that type of device. The following example shows the Server Entitlement dialog box. The customer-entered serial number and product number are used for warranty checks at HP Support.
Use the following commands to entitle devices from the CLI. The commands must be run for each device present in the cluster.
Entitle a server:
ibrix_phonehome -e -h <Host Name> -b <Customer Entered Serial Number>
-g <Customer Entered Product Number> Enter the Host Name parameter exactly as it is listed by the ibrix_fm -l command.
Entitle a chassis:
ibrix_phonehome -e -C <OA IP Address of the Chassis> -b <Customer Entered Serial Number> -g <Customer Entered Product Number>
NOTE: The Phone Home > Storage selection on the GUI does not apply to X9720/9730 systems.
Discovering devices on HP SIM
HP Systems Insight Manager (SIM) uses the SNMP protocol to discover and identify StoreAll systems automatically. On HP SIM, open Options > Discovery > New. Select Discover a group of systems, and then enter the discovery name and the Fusion Manager IP address on the New Discovery dialog box.
42 Getting started
Page 43
Enter the read community string on the Credentials > SNMP tab. This string should match the Phone Home read community string. If the strings are not identical, the Fusion Manager IP might be discovered as “Unknown.”
Configuring HP Insight Remote Support on StoreAll systems 43
Page 44
Devices are discovered as described in the following table.
Discovered asDevice
Fusion Manager
nl
System Type:
nl
Fusion Manager IP
9000
nl
System Subtype:
nl
HP 9000 SolutionProduct Model:
Storage Device
nl
System Type:
nl
File serving nodes
9000, Storage, HP ProLiant
nl
System Subtype:
nl
HP X9720 NetStor FSN(ProLiant BL460 G6)
nl
Product Model:
nl
HP X9720 NetStor FSN(ProLiant BL460 G6)
nl
HP 9730 NetStor FSN(ProLiant BL460 G7)
nl
HP 9730 NetStor FSN(ProLiant BL460 G7)
The following example shows discovered devices on HP SIM 7.1.
File serving nodes and the OA IP are associated with the Fusion Manager IP address. In HP SIM, select Fusion Manager and open the Systems tab. Then select Associations to view the devices.
You can view all StoreAll devices under Systems by Type > Storage System > Scalable Storage Solutions > All X9000 Systems
Configuring Insight Remote Support for HP SIM 6.3 and IRS 5.6
Discovering devices in HP SIM
HP Systems Insight Manager (SIM) uses the SNMP protocol to discover and identify StoreAll systems automatically. On HP SIM, open Options > Discovery > New, and then select Discover a group of systems. On the New Discovery dialog box, enter the discovery name and the IP addresses of the devices to be monitored. For more information, see the HP SIM 6.3 documentation.
NOTE: Each device in the cluster should be discovered separately.
44 Getting started
Page 45
Enter the read community string on the Credentials > SNMP tab. This string should match the Phone Home read community string. If the strings are not identical, the device will be discovered as “Unknown.”
The following example shows discovered devices on HP SIM 6.3. File serving nodes are discovered as ProLiant server.
Configuring device Entitlements
Configure the CMS software to enable remote support for StoreAll systems. For more information, see "Using the Remote Support Setting Tab to Update Your Client and CMS Information” and “Adding Individual Managed Systems” in the HP Insight Remote Support Advanced A.05.50 Operations Guide.
Configuring HP Insight Remote Support on StoreAll systems 45
Page 46
Enter the following custom field settings in HP SIM:
Custom field settings for X9720/9730 Onboard Administrator
The Onboard Administrator (OA) is discovered with OA IP addresses. When the OA is discovered, edit the system properties on the HP Systems Insight Manager. Locate the Entitlement Information section of the Contract and Warranty Information page and update the following:
Enter the StoreAll enclosure product number as the Customer-Entered product number
Enter X9000 as the Custom Delivery ID
Select the System Country Code
Enter the appropriate Customer Contact and Site Information details
Contract and Warranty Information
Under Entitlement Information, specify the Customer-Entered serial number, Customer-Entered product number, System Country code, and Custom Delivery ID.
Verifying device entitlements
To verify the entitlement information in HP SIM, complete the following steps:
1. Go to Remote Support Configuration and Services and select the Entitlement tab.
2. Check the devices discovered.
NOTE: If the system discovered on HP SIM does not appear on the Entitlement tab, click
Synchronize RSE.
3. Select Entitle Checked from the Action List.
4. Click Run Action.
5. When the entitlement check is complete, click Refresh.
NOTE: If the system discovered on HP SIM does not appear on the Entitlement tab, click
Synchronize RSE.
46 Getting started
Page 47
The devices you entitled should be displayed as green in the ENT column on the Remote Support System List dialog box.
If a device is red, verify that the customer-entered serial number and part number are correct and then rediscover the devices.
Testing the Insight Remote Support configuration
To determine whether the traps are working properly, send a generic test trap with the following command:
snmptrap -v1 -c public <CMS IP> .1.3.6.1.4.1.232 <Managed System IP> 6 11003 1234 .1.3.6.1.2.1.1.5.0 s test .1.3.6.1.4.1.232.11.2.11.1.0 i 0 .1.3.6.1.4.1.232.11.2.8.1.0 s "IBRIX remote support testing"
For example, if the CMS IP address is 99.2.2.2 and the StoreAll node is 99.2.2.10, enter the following:
snmptrap -v1 -c public 99.2.2.2 .1.3.6.1.4.1.232 99.2.2.10 6 11003 1234 .1.3.6.1.2.1.1.5.0 s test .1.3.6.1.4.1.232.11.2.11.1.0 i 0 .1.3.6.1.4.1.232.11.2.8.1.0 s "IBRIX remote support testing"
Updating the Phone Home configuration
The Phone Home configuration should be synchronized after you add or remove devices in the cluster. The operation enables Phone Home on newly added devices (servers, storage, and chassis) and removes details for devices that are no longer in the cluster. On the GUI, select Cluster Configuration in the upper Navigator, select Phone Home in the lower Navigator, and click Rescan on the Phone Home Setup panel.
On the CLI, run the following command:
ibrix_phonehome -s
Disabling Phone Home
When Phone Home is disabled, all Phone Home information is removed from the cluster and hardware and software are no longer monitored. To disable Phone Home on the GUI, click Disable on the Phone Home Setup panel. On the CLI, run the following command:
ibrix_phonehome -d
Configuring HP Insight Remote Support on StoreAll systems 47
Page 48
Troubleshooting Insight Remote Support
Devices are not discovered on HP SIM
Verify that cluster networks and devices can access the CMS. Devices will not be discovered properly if they cannot access the CMS.
The maximum number of SNMP trap hosts has already been configured
If this error is reported, the maximum number of trapsink IP addresses have already been configured. For OA devices, the maximum number of trapsink IP addresses is 8. Manually remove a trapsink IP address from the device and then rerun the Phone Home configuration to allow Phone Home to add the CMS IP address as a trapsink IP address.
A cluster node was not configured in Phone Home
If a cluster node was down during the Phone Home configuration, the log file will include the following message:
SEVERE: Sent event server.status.down: Server <server name> down
When the node is up, rescan Phone Home to add the node to the configuration. See “Updating
the Phone Home configuration” (page 47).
Fusion Manager IP is discovered as “Unknown”
Verify that the read community string entered in HP SIM matches the Phone Home read community string.
Also run snmpwalk on the VIF IP and verify the information:
# snmpwalk -v 1 -c <read community string> <FM VIF IP> .1.3.6.1.4.1.18997
Critical failures occur when discovering X9720 OA
The 3GB SAS switches have internal IPs in the range 169.x.x.x, which cannot be reached from HP SIM. These switches will not be monitored; however, other OA components are monitored.
Discovered device is reported as unknown on CMS
Run the following command on the file serving node to determine whether the Insight Remote Support services are running:
# service snmpd status # service hpsmhd status # service hp-snmp-agents status
If the services are not running, start them:
# service snmpd start # service hpsmhd start # service hp-snmp-agents start
Alerts are not reaching the CMS
If nodes are configured and the system is discovered properly but alerts are not reaching the CMS, verify that a trapif entry exists in the cma.conf configuration file on the file serving nodes.
Device Entitlement tab does not show GREEN
If the Entitlement tab does not show GREEN, verify the Customer-Entered serial number and part number or the device.
SIM Discovery
On SIM discovery, use the option Discover a Group of Systems for any device discovery.
48 Getting started
Page 49
4 Configuring virtual interfaces for client access
StoreAll software uses a cluster network interface to carry Fusion Manager traffic and traffic between file serving nodes. This network is configured as bond0 when the cluster is installed. To provide failover support for the Fusion Manager, a virtual interface is created for the cluster network interface.
Although the cluster network interface can carry traffic between file serving nodes and clients, HP recommends that you configure one or more user network interfaces for this purpose.
To provide high availability for a user network, you should configure a bonded virtual interface (VIF) for the network and then set up failover for the VIF. This method prevents interruptions to client traffic. If necessary, the file serving node hosting the VIF can fail over to its backup server, and clients can continue to access the file system through the backup server.
StoreAll systems also support the use of VLAN tagging on the cluster and user networks. See
“Configuring VLAN tagging” (page 52) for an example.
Network and VIF guidelines
To provide high availability, the user interfaces used for client access should be configured as bonded virtual interfaces (VIFs). Note the following:
Nodes needing to communicate for file system coverage or for failover must be on the same
network interface. Also, nodes set up as a failover pair must be connected to the same network interface.
Use a Gigabit Ethernet port (or faster) for user networks.
NFS, SMB, FTP, and HTTP clients can use the same user VIF. The servers providing the VIF
should be configured in backup pairs, and the NICs on those servers should also be configured for failover. See “Configuring High Availability on the cluster” in the administrator guide for information about performing this configuration from the GUI.
For Linux and Windows StoreAll clients, the servers hosting the VIF should be configured in
backup pairs. However, StoreAll clients do not support backup NICs. Instead, StoreAll clients should connect to the parent bond of the user VIF or to a different VIF.
Ensure that your parent bonds, for example bond0, have a defined route:
1. Check for the default Linux OS route/gateway for each parent interface/bond that was
defined during the HP StoreAll installation by entering the following command at the command prompt:
# route
The output from the command is the following:
The default destination is the default gateway/route for Linux. The default destination, which was defined during the HP StoreAll installation, had the operating system default gateway defined but not for StoreAll.
2. Display network interfaces controlled by StoreAll by entering the following command at
the command prompt:
# ibrix_nic -l
Notice if the “ROUTE” column is unpopulated for IFNAME.
Network and VIF guidelines 49
Page 50
3. To assign the IFNAME a default route for the parent cluster bond and the user VIFS assigned to FSNs for use with SMB/NFS, enter the following ibrix_nic command at the command prompt:
# ibrix_nic -r -n IFNAME -h HOSTNAME-A -R <ROUTE_IP>
4. Configure backup monitoring, as described in “Configuring backup servers” (page 50).
Creating a bonded VIF
NOTE: The examples in this chapter use the unified network and create a bonded VIF on bond0.
If your cluster uses a different network layout, create the bonded VIF on a user network bond such as bond1.
Use the following procedure to create a bonded VIF (bond0:1 in this example):
1. If high availability (automated failover) is configured on the servers, disable it. Run the following command on the Fusion Manager:
# ibrix_server -m -U
2. Identify the bond0:1 VIF:
# ibrix_nic -a -n bond0:1 -h node1,node2,node3,node4
3. Assign an IP address to the bond1:1 VIFs on each node. In the command, -I specifies the IP address, -M specifies the netmask, and -B specifies the broadcast address:
# ibrix_nic -c -n bond0:1 -h node1 -I 16.123.200.201 -M 255.255.255.0 -B 16.123.200.255 # ibrix_nic -c -n bond0:1 -h node2 -I 16.123.200.202 -M 255.255.255.0 -B 16.123.200.255 # ibrix_nic -c -n bond0:1 -h node3 -I 16.123.200.203 -M 255.255.255.0 -B 16.123.200.255 # ibrix_nic -c -n bond0:1 -h node4 -I 16.123.200.204 -M 255.255.255.0 -B 16.123.200.255
Configuring backup servers
The servers in the cluster are configured in backup pairs. If this step was not done when your cluster was installed, assign backup servers for the bond0:1 interface. In the following example, node1 is the backup for node2, node2 is the backup for node1, node3 is the backup for node4, and node4 is the backup for node3.
1. Add the VIF:
# ibrix_nic -a -n bond0:2 -h node1,node2,node3,node4
2. Set up a backup server for each VIF:
# ibrix_nic -b -H node1/bond0:1,node2/bond0:2 # ibrix_nic -b -H node2/bond0:1,node1/bond0:2 # ibrix_nic -b -H node3/bond0:1,node4/bond0:2 # ibrix_nic -b -H node4/bond0:1,node3/bond0:2
Configuring NIC failover
NIC monitoring should be configured on VIFs that will be used by NFS, SMB, FTP, or HTTP.
IMPORTANT: When configuring NIC monitoring, use the same backup pairs that you used when
configuring standby servers.
50 Configuring virtual interfaces for client access
Page 51
For example:
# ibric_nic -m -h node1 -A node2/bond0:1 # ibric_nic -m -h node2 -A node1/bond0:1 # ibric_nic -m -h node3 -A node4/bond0:1 # ibric_nic -m -h node4 -A node3/bond0:1
Configuring automated failover
To enable automated failover for your file serving nodes, execute the following command:
ibrix_server -m [-h SERVERNAME]
Example configuration
This example uses two nodes, ib50-81 and ib50-82. These nodes are backups for each other, forming a backup pair.
[root@ib50-80 ~]# ibrix_server -l Segment Servers =============== SERVER_NAME BACKUP STATE HA ID GROUP
----------- ------- ------------ --- ------------------------------------ ----­ib50-81 ib50-82 Up on 132cf61a-d25b-40f8-890e-e97363ae0d0b servers ib50-82 ib50-81 Up on 7d258451-4455-484d-bf80-75c94d17121d servers
All VIFs on ib50-81 have backup (standby) VIFs on ib50-82. Similarly, all VIFs on ib50-82 have backup (standby) VIFs on ib50-81. NFS, SMB, FTP, and HTTP clients can connect to bond0:1 on either host. If necessary, the selected server will fail over to bond0:2 on the opposite host. StoreAll clients could connect to bond1 on either host, as these clients do not support or require NIC failover. (The following sample output shows only the relevant fields.)
Specifying VIFs in the client configuration
When you configure your clients, you may need to specify the VIF that should be used for client access.
NFS/SMB. Specify the VIF IP address of the servers (for example, bond0:1) to establish connection. You can also configure DNS round robin to ensure NFS or SMB client-to-server distribution. In both cases, the NFS/SMB clients will cache the initial IP they used to connect to the respective share, usually until the next reboot.
FTP. When you add an FTP share on the Add FTP Shares dialog box or with the ibrix_ftpshare command, specify the VIF as the IP address that clients should use to access the share.
HTTP. When you create a virtual host on the Create Vhost dialog box or with the ibrix_httpvhost command, specify the VIF as the IP address that clients should use to access shares associated with the Vhost.
StoreAll clients. Use the following command to prefer the appropriate user network. Execute the command once for each destination host that the client should contact using the specified interface.
ibrix_client -n -h SRCHOST -A DESTNOST/IFNAME
For example:
ibrix_client -n -h client12.mycompany.com -A ib50-81.mycompany.com/bond1
Configuring automated failover 51
Page 52
NOTE: Because the backup NIC cannot be used as a preferred network interface for StoreAll
clients, add one or more user network interfaces to ensure that HA and client communication work together.
Configuring VLAN tagging
VLAN capabilities provide hardware support for running multiple logical networks over the same physical networking hardware. To allow multiple packets for different VLANs to traverse the same physical interface, each packet must have a field added that contains the VLAN tag. The tag is a small integer number that identifies the VLAN to which the packet belongs. When an intermediate switch receives a “tagged” packet, it can make the appropriate forwarding decisions based on the value of the tag.
When set up properly, StoreAll systems support VLAN tags being transferred all of the way to the file serving node network interfaces. The ability of file serving nodes to handle the VLAN tags natively in this manner makes it possible for the nodes to support multiple VLAN connections simultaneously over a single bonded interface.
Linux networking tools such as ifconfig display a network interface with an associated VLAN tag using a device label with the form bond#.<VLAN_id>. For example, if the first bond created by StoreAll has a VLAN tag of 30, it will be labeled bond0.30.
It is also possible to add a VIF on top of an interface that has an associated VLAN tag. In this case, the device label of the interface takes the form bond#.<VLAN_id>.<VVIF_label>. For example, if a VIF with a label of 2 is added for the bond0.30 interface, the new interface device label will be bond0.30:2.
The following commands show configuring a bonded VIF and backup nodes for a unified network topology using the 10.10.x.y subnet. VLAN tagging is configured for hosts ib142-129 and ib142-131 on the 51 subnet.
Add the bond0.51 interface with the VLAN tag:
# ibrix_nic -a -n bond0.51 -h ib142-129 # ibrix_nic -a -n bond0.51 -h ib142-131
Assign an IP address to the bond0:51 VIFs on each node:
# ibrix_nic -c -n bond0.51 -h ib142-129 -I 192.168.51.101 -M 255.255.255.0 # ibrix_nic -c -n bond0.51 -h ib142-131 -I 192.168.51.102 -M 255.255.255.0
Add the bond0.51:2 VIF on top of the interface:
# ibrix_nic -a -n bond0.51:2 -h ib142-131 # ibrix_nic -a -n bond0.51:2 -h ib142-129
Configure backup nodes:
# ibrix_nic -b -H ib142-129/bond0.51,ib142-131/bond0.51:2 # ibrix_nic -b -H ib142-131/bond0.51,ib142-129/bond0.51:2
Create the user FM VIF:
ibrix_fm -c 192.168.51.125 -d bond0.51:1 -n 255.255.255.0 -v user
For more information about VLAG tagging, see the HP StoreAll Storage Network Best Practices Guide.
Support for link state monitoring
Do not configure link state monitoring for user network interfaces or VIFs that will be used for SMB or NFS. Link state monitoring is supported only for use with iSCSI storage network interfaces, such as those provided with 9300 Gateway systems.
52 Configuring virtual interfaces for client access
Page 53
5 Configuring failover
This chapter describes how to configure failover for agile management consoles, file serving nodes, network interfaces, and HBAs.
Agile management consoles
The agile Fusion Manager maintains the cluster configuration and provides graphical and command-line user interfaces for managing and monitoring the cluster. The agile Fusion Manager is installed on all file serving nodes when the cluster is installed. The Fusion Manager is active on one node, and is passive on the other nodes. This is called an agile Fusion Manager configuration.
Agile Fusion Manager modes
An agile Fusion Manager can be in one of the following modes:
active. In this mode, the Fusion Manager controls console operations. All cluster administration
and configuration commands must be run from the active Fusion Manager.
passive. In this mode, the Fusion Manager monitors the health of the active Fusion Manager.
If the active Fusion Manager fails, the a passive Fusion Manager is selected to become the active console.
nofmfailover. In this mode, the Fusion Manager does not participate in console operations.
Use this mode for operations such as manual failover of the active Fusion Manager, StoreAll software upgrades, and server blade replacements.
Changing the mode
Use the following command to move a Fusion Manager to passive or nofmfailover mode:
ibrix_fm -m passive | nofmfailover [-P] [-A | -h <FMLIST>]
If the Fusion Manager was previously the active console, StoreAll software will select a new active console. A Fusion Manager currently in active mode can be moved to either passive or nofmfailover mode. A Fusion Manager in nofmfailover mode can be moved only to passive mode.
With the exception of the local node running the active Fusion Manager, the -A option moves all instances of the Fusion Manager to the specified mode. The -h option moves the Fusion Manager instances in <FMLIST> to the specified mode.
Agile Fusion Manager and failover
Using an agile Fusion Manager configuration provides high availability for Fusion Manager services. If the active Fusion Manager fails, the cluster virtual interface will go down. When the passive Fusion Manager detects that the cluster virtual interface is down, it will become the active console. This Fusion Manager rebuilds the cluster virtual interface, starts Fusion Manager services locally, transitions into active mode, and take over Fusion Manager operation.
Failover of the active Fusion Manager affects the following features:
User networks. The virtual interface used by clients will also fail over. Users may notice a brief
reconnect while the newly active Fusion Manager takes over management of the virtual interface.
GUI. You must reconnect to the Fusion Manager VIF after the failover.
Failing over the Fusion Manager manually
To fail over the active Fusion Manager manually, place the console into nofmfailover mode. Enter the following command on the node hosting the console:
ibrix_fm -m nofmfailover
Agile management consoles 53
Page 54
The command takes effect immediately. The failed-over Fusion Manager remains in nofmfailover mode until it is moved to passive mode
using the following command:
ibrix_fm -m passive
NOTE: A Fusion Manager cannot be moved from nofmfailover mode to active mode.
Viewing information about Fusion Managers
To view mode information, use the following command:
ibrix_fm -i
NOTE: If the Fusion Manager was not installed in an agile configuration, the output will report
FusionServer: fusion manager name not set! (active, quorum is not configured).
When a Fusion Manager is installed, it is registered in the Fusion Manager configuration. To view a list of all registered management consoles, use the following command:
ibrix_fm -l
Configuring High Availability on the cluster
StoreAll High Availability provides monitoring for servers, NICs, and HBAs. Server HA. Servers are configured in backup pairs, with each server in the pair acting as a backup
for the other server. The servers in the backup pair must see the same storage. When a server is failed over, the ownership of its segments and its Fusion Manager services (if the server is hosting the active FM) move to the backup server.
NIC HA.When server HA is enabled, NIC HA provides additional triggers that cause a server to fail over to its backup server. For example, you can create a user VIF such as bond0:2 to service SMB requests on a server and then designate the backup server as a standby NIC for bond0:2. If an issue occurs with bond0:2 on a server, the server, including its segment ownership and FM services, will fail over to the backup server, and that server will now handle SMB requests going through bond0:2.
You can also fail over just the NIC to its standby NIC on the backup server. HBA monitoring. This method protects server access to storage through an HBA. Most servers ship
with an HBA that has two controllers, providing redundancy by design. Setting up StoreAll HBA monitoring is not commonly used for these servers. However, if a server has only a single HBA, you might want to monitor the HBA; then, if the server cannot see its storage because the single HBA goes offline or faults, the server and its segments will fail over.
You can set up automatic server failover and perform a manual failover if needed. If a server fails over, you must fail back the server manually.
When automatic HA is enabled, the Fusion Manager listens for heartbeat messages that the servers broadcast at one-minute intervals. The Fusion Manager initiates a server failover when it fails to receive five consecutive heartbeats. Failover conditions are detected more quickly when NIC HA is also enabled; server failover is initiated when the Fusion Manager receives a heartbeat message indicating that a monitored NIC might be down and the Fusion Manager cannot reach that NIC. If HBA monitoring is enabled, the Fusion Manager fails over the server when a heartbeat message indicates that a monitored HBA or pair of HBAs has failed.
54 Configuring failover
Page 55
What happens during a failover
The following actions occur when a server is failed over to its backup:
1. The Fusion Manager verifies that the backup server is powered on and accessible.
2. The Fusion Manager migrates ownership of the server’s segments to the backup and notifies
all servers and StoreAll clients about the migration. This is a persistent change. If the server is hosting the active FM, it transitions to another server.
3. If NIC monitoring is configured, the Fusion Manager activates the standby NIC and transfers
the IP address (or VIF) to it.
Clients that were mounted on the failed-over server may experience a short service interruption while server failover takes place. Depending on the protocol in use, clients can continue operations after the failover or may need to remount the file system using the same VIF. In either case, clients will be unaware that they are now accessing the file system on a different server.
To determine the progress of a failover, view the Status tab on the GUI or execute the ibrix_server -l command. While the Fusion Manager is migrating segment ownership, the operational status of the node is Up-InFailover or Down-InFailover, depending on whether the node was powered up or down when failover was initiated. When failover is complete, the operational status changes to Up-FailedOver or Down-FailedOver. For more information about operational states, see “Monitoring the status of file serving nodes” (page 102).
Both automated and manual failovers trigger an event that is reported on the GUI. Automated failover can be configured with the HA Wizard or from the command line.
Configuring automated failover with the HA Wizard
IMPORTANT: On the X9720 platform, the ibrixpwr iLO user must be created on each node
before the cluster HA can be fully functional. Enter the following command on each cluster node to create an iLO user with the username ibrixpwr and with the password hpinvent:
ibrix_ilo -c -u ibrixpwr -p hpinvent
The HA wizard configures a backup server pair and, optionally, standby NICs on each server in the pair. It also configures a power source such as an iLO on each server. The Fusion Manager uses the power source to power down the server during a failover.
On the GUI, select Servers from the Navigator.
Click High Availability to start the wizard. Typically, backup servers are configured and server HA is enabled when your system is installed, and the Server HA Pair dialog box shows the backup pair configuration for the server selected on the Servers panel.
If necessary, you can configure the backup pair for the server. The wizard identifies the servers in the cluster that see the same storage as the selected server. Choose the appropriate server from the list.
Configuring High Availability on the cluster 55
Page 56
The wizard also attempts to locate the IP addresses of the iLOs on each server. If it cannot locate an IP address, you will need to enter the address on the dialog box. When you have completed the information, click Enable HA Monitoring and Auto-Failover for both servers.
Use the NIC HA Setup dialog box to configure NICs that will be used for data services such as SMB or NFS. You can also designate NIC HA pairs on the server and its backup and enable monitoring of these NICs.
56 Configuring failover
Page 57
For example, you can create a user VIF that clients will use to access an SMB share serviced by server ib69s1. The user VIF is based on an active physical network on that server. To do this, click Add NIC in the section of the dialog box for ib69s1.
On the Add NIC dialog box, enter a NIC name. In our example, the cluster uses the unified network and has only bond0, the active cluster FM/IP. We cannot use bond0:0, which is the management IP/VIF. We will create the VIF bond0:1, using bond0 as the base. When you click OK, the user VIF is created.
The new, active user NIC appears on the NIC HA setup dialog box.
Configuring High Availability on the cluster 57
Page 58
Next, enable NIC monitoring on the VIF. Select the new user NIC and click NIC HA. On the NIC HA Config dialog box, check Enable NIC Monitoring.
58 Configuring failover
Page 59
In the Standby NIC field, select New Standby NIC to create the standby on backup server ib69s2. The standby you specify must be available and valid. To keep the organization simple, we specified bond0:1 as the Name; this matches the name assigned to the NIC on server ib69s1. When you click OK, the NIC HA configuration is complete.
Configuring High Availability on the cluster 59
Page 60
You can create additional user VIFs and assign standby NICs as needed. For example, you might want to add a user VIF for another share on server ib69s2 and assign a standby NIC on server ib69s1. You can also specify a physical interface such eth4 and create a standby NIC on the backup server for it.
The NICs panel on the GUI shows the NICs on the selected server. In the following example, there are four NICs on server ib69s1: bond0, the active cluster FM/IP; bond0:0, the management IP/VIF (this server is hosting the active FM); bond0:1, the NIC created in this example; and bond0:2, a standby NIC for an active NIC on server ib69s2.
60 Configuring failover
Page 61
The NICs panel for the ib69s2, the backup server, shows that bond0:1 is an inactive, standby NIC and bond0:2 is an active NIC.
Changing the HA configuration
To change the configuration of a NIC, select the server on the Servers panel, and then select NICs from the lower Navigator. Click Modify on the NICs panel. The General tab on the Modify NIC Properties dialog box allows you change the IP address and other NIC properties. The NIC HA tab allows you to enable or disable HA monitoring and failover on the NIC and to change or remove the standby NIC.
To view the power source for a server, select the server on the Servers panel, and then select Power from the lower Navigator. The Power Source panel shows the power source configured on the server when HA was configured. You can add or remove power sources on the server, and can power the server on or off, or reset the server.
Configuring High Availability on the cluster 61
Page 62
Configuring automated failover manually
To configure automated failover manually, complete these steps:
1. Configure file serving nodes in backup pairs.
2. Identify power sources for the servers in the backup pair.
3. Configure NIC monitoring.
4. Enable automated failover.
1. Configure server backup pairs
File serving nodes are configured in backup pairs, where each server in a pair is the backup for the other. This step is typically done when the cluster is installed. The following restrictions apply:
The same file system must be mounted on both servers in the pair and the servers must see the
same storage.
In a SAN environment, a server and its backup must use the same storage infrastructure to
access a segment’s physical volumes (for example, a multiported RAID array).
For a cluster using the unified network configuration, assign backup nodes for the bond0:1 interface. For example, node1 is the backup for node2, and node2 is the backup for node1.
1. Add the VIF:
ibrix_nic -a -n bond0:2 -h node1,node2,node3,node4
2. Set up a standby server for each VIF:
# ibrix_nic -b -H node1/bond0:1,node2/bond0:2
nl
ibrix_nic -b -H node2/bond0:1,node1/bond0:2
nl
ibrix_nic -b -H node3/bond0:1,node4/bond0:2
nl
ibrix_nic -b -H node4/bond0:1,node3/bond0:2
2. Identify power sources
To implement automated failover, perform a forced manual failover, or remotely power a file serving node up or down, you must set up programmable power sources for the nodes and their backups. Using programmable power sources prevents a “split-brain scenario” between a failing file serving node and its backup, allowing the failing server to be centrally powered down by the Fusion Manager in the case of automated failover, and manually in the case of a forced manual failover.
StoreAll software works with iLO, IPMI, OpenIPMI, and OpenIPMI2 integrated power sources. The following configuration steps are required when setting up integrated power sources:
For automated failover, ensure that the Fusion Manager has LAN access to the power sources.
Install the environment and any drivers and utilities, as specified by the vendor documentation.
If you plan to protect access to the power sources, set up the UID and password to be used.
Use the following command to identify a power source:
ibrix_powersrc -a -t {ipmi|openipmi|openipmi2|ilo} -h HOSTNAME -I IPADDR
-u USERNAME -p PASSWORD
For example, to identify an iLO power source at IP address 192.168.3.170 for node ss01:
ibrix_powersrc -a -t ilo -h ss01 -I 192.168.3.170 -u Administrator -p password
62 Configuring failover
Page 63
3. Configure NIC monitoring
NIC monitoring should be configured on user VIFs that will be used by NFS, SMB, FTP, or HTTP.
IMPORTANT: When configuring NIC monitoring, use the same backup pairs that you used when
configuring backup servers.
Identify the servers in a backup pair as NIC monitors for each other. Because the monitoring must be declared in both directions, enter a separate command for each server in the pair.
ibrix_nic -m -h MONHOST -A DESTHOST/IFNAME
The following example sets up monitoring for NICs over bond0:1:
ibric_nic -m -h node1 -A node2/bond0:1
nl
ibric_nic -m -h node2 -A node1/bond0:1
nl
ibric_nic -m -h node3 -A node4/bond0:1
nl
ibric_nic -m -h node4 -A node3/bond0:1
nl
The next example sets up server s2.hp.com to monitor server s1.hp.com over user network interface
eth1:
ibrix_nic -m -h s2.hp.com -A s1.hp.com/eth1
4. Enable automated failover
Automated failover is turned off by default. When automated failover is turned on, the Fusion Manager starts monitoring heartbeat messages from file serving nodes. You can turn automated failover on and off for all file serving nodes or for selected nodes.
Turn on automated failover:
ibrix_server -m [-h SERVERNAME]
Changing the HA configuration manually
Update a power source:
If you change the IP address or password for a power source, you must update the configuration database with the changes. The user name and password options are needed only for remotely managed power sources. Include the -s option to have the Fusion Manager skip BMC.
ibrix_powersrc -m [-I IPADDR] [-u USERNAME] [-p PASSWORD] [-s] -h POWERSRCLIST
The following command changes the IP address for power source ps1:
ibrix_powersrc -m -I 192.168.3.153 -h ps1
Disassociate a server from a power source:
You can dissociate a file serving node from a power source by dissociating it from slot 1 (its default association) on the power source. Use the following command:
ibrix_hostpower -d -s POWERSOURCE -h HOSTNAME
Delete a power source:
To conserve storage, delete power sources that are no longer in use. If you are deleting multiple power sources, use commas to separate them.
ibrix_powersrc -d -h POWERSRCLIST
Delete NIC monitoring:
To delete NIC monitoring, use the following command:
ibrix_nic -m -h MONHOST -D DESTHOST/IFNAME
Delete NIC standbys:
To delete a standby for a NIC, use the following command:
ibrix_nic -b -U HOSTNAME1/IFNAME1
Configuring High Availability on the cluster 63
Page 64
For example, to delete the standby that was assigned to interface eth2 on file serving node s1.hp.com:
ibrix_nic -b -U s1.hp.com/eth2
Turn off automated failover:
ibrix_server -m -U [-h SERVERNAME]
To specify a single file serving node, include the -h SERVERNAME option.
Failing a server over manually
The server to be failed over must belong to a backup pair. The server can be powered down or remain up during the procedure. You can perform a manual failover at any time, regardless of whether automated failover is in effect. Manual failover does not require the use of a programmable power supply. However, if you have identified a power supply for the server, you can power it down before the failover.
Use the GUI or the CLI to fail over a file serving node:
On the GUI, select the node on the Servers panel and then click Failover on the Summary
panel.
On the CLI, run ibrix_server -f, specifying the node to be failed over as the HOSTNAME.
If appropriate, include the -p option to power down the node before segments are migrated:
ibrix_server -f [-p] -h HOSTNAME
Check the Summary panel or run the following command to determine whether the failover was successful:
ibrix_server -l
The STATE field indicates the status of the failover. If the field persistently shows Down-InFailover or Up-InFailover, the failover did not complete; contact HP Support for assistance. For information about the values that can appear in the STATE field, see “What happens during a
failover” (page 55).
Failing back a server
After an automated or manual failover of a server, you must manually fail back the server, which restores ownership of the failed-over segments and network interfaces to the server. Before failing back the server, confirm that it can see all of its storage resources and networks. The segments owned by the server will not be accessible if the server cannot see its storage.
To fail back a node from the GUI, select the node on the Servers panel and then click Failback on the Summary panel.
On the GUI, select the node on the Servers panel and then click Failback on the Summary pane On the CLI, run the following command, where HOSTNAME is the failed-over node:
ibrix_server -f -U -h HOSTNAME
After failing back the node, check the Summary panel or run the ibrix_server -l command to determine whether the failback completed fully. If the failback is not complete, contact HP Support.
NOTE: A failback might not succeed if the time period between the failover and the failback is
too short, and the primary server has not fully recovered. HP recommends ensuring that both servers are up and running and then waiting 60 seconds before starting the failback. Use the ibrix_server -l command to verify that the primary server is up and running. The status should be Up-FailedOver before performing the failback.
64 Configuring failover
Page 65
Setting up HBA monitoring
You can configure High Availability to initiate automated failover upon detection of a failed HBA. HBA monitoring can be set up for either dual-port HBAs with built-in standby switching or single-port HBAs, whether standalone or paired for standby switching via software. The StoreAll software does not play a role in vendor- or software-mediated HBA failover; traffic moves to the remaining functional port with no Fusion Manager involvement.
HBAs use worldwide names for some parameter values. These are either worldwide node names (WWNN) or worldwide port names (WWPN). The WWPN is the name an HBA presents when logging in to a SAN fabric. Worldwide names consist of 16 hexadecimal digits grouped in pairs. In StoreAll software, these are written as dot-separated pairs (for example,
21.00.00.e0.8b.05.05.04). To set up HBA monitoring, first discover the HBAs, and then perform the procedure that matches
your HBA hardware:
For single-port HBAs without built-in standby switching: Turn on HBA monitoring for all ports
that you want to monitor for failure.
For dual-port HBAs with built-in standby switching and single-port HBAs that have been set
up as standby pairs in a software operation: Identify the standby pairs of ports to the configuration database and then turn on HBA monitoring for all paired ports. If monitoring is turned on for just one port in a standby pair and that port fails, the Fusion Manager will fail over the server even though the HBA has automatically switched traffic to the surviving port. When monitoring is turned on for both ports, the Fusion Manager initiates failover only when both ports in a pair fail.
When both HBA monitoring and automated failover for file serving nodes are configured, the Fusion Manager will fail over a server in two situations:
Both ports in a monitored set of standby-paired ports fail. Because all standby pairs were
identified in the configuration database, the Fusion Manager knows that failover is required only when both ports fail.
A monitored single-port HBA fails. Because no standby has been identified for the failed port,
the Fusion Manager knows to initiate failover immediately.
Discovering HBAs
You must discover HBAs before you set up HBA monitoring, when you replace an HBA, and when you add a new HBA to the cluster. Discovery adds the WWPN for the port to the configuration database.
ibrix_hba -a [-h HOSTLIST]
Adding standby-paired HBA ports
Identifying standby-paired HBA ports to the configuration database allows the Fusion Manager to apply the following logic when they fail:
If one port in a pair fails, do nothing. Traffic will automatically switch to the surviving port, as
configured by the HBA vendor or the software.
If both ports in a pair fail, fail over the server’s segments to the standby server.
Use the following command to identify two HBA ports as a standby pair:
ibrix_hba -b -P WWPN1:WWPN2 -h HOSTNAME
Enter the WWPN as decimal-delimited pairs of hexadecimal digits. The following command identifies port 20.00.12.34.56.78.9a.bc as the standby for port 42.00.12.34.56.78.9a.bc for the HBA on file serving node s1.hp.com:
ibrix_hba -b -P 20.00.12.34.56.78.9a.bc:42.00.12.34.56.78.9a.bc -h s1.hp.com
Configuring High Availability on the cluster 65
Page 66
Turning HBA monitoring on or off
If your cluster uses single-port HBAs, turn on monitoring for all of the ports to set up automated failover in the event of HBA failure. Use the following command:
ibrix_hba -m -h HOSTNAME -p PORT
For example, to turn on HBA monitoring for port 20.00.12.34.56.78.9a.bc on node s1.hp.com:
ibrix_hba -m -h s1.hp.com -p 20.00.12.34.56.78.9a.bc
To turn off HBA monitoring for an HBA port, include the -U option:
ibrix_hba -m -U -h HOSTNAME -p PORT
Deleting standby port pairings
Deleting port pairing information from the configuration database does not remove the standby pairing of the ports. The standby pairing is either built in by the HBA vendor or implemented by software.
To delete standby-paired HBA ports from the configuration database, enter the following command:
ibrix_hba -b -U -P WWPN1:WWPN2 -h HOSTNAME
For example, to delete the pairing of ports 20.00.12.34.56.78.9a.bc and
42.00.12.34.56.78.9a.bc on node s1.hp.com:
ibrix_hba -b -U -P 20.00.12.34.56.78.9a.bc:42.00.12.34.56.78.9a.bc
-h s1.hp.com
Deleting HBAs from the configuration database
Before switching an HBA to a different machine, delete the HBA from the configuration database:
ibrix_hba -d -h HOSTNAME -w WWNN
Displaying HBA information
Use the following command to view information about the HBAs in the cluster. To view information for all hosts, omit the -h HOSTLIST argument.
ibrix_hba -l [-h HOSTLIST]
The output includes the following fields:
DescriptionField
Server on which the HBA is installed.Host
This HBA’s WWNN.Node WWN
This HBA’s WWPN.Port WWN
Operational state of the port.Port State
WWPN of the standby port for this port (standby-paired HBAs only).Backup Port WWN
Whether HBA monitoring is enabled for this port.Monitoring
Checking the High Availability configuration
Use the ibrix_haconfig command to determine whether High Availability features have been configured for specific file serving nodes. The command checks for the following features and provides either a summary or a detailed report of the results:
Programmable power source
Standby server or standby segments
Cluster and user network interface monitors
Standby network interface for each user network interface
66 Configuring failover
Page 67
HBA port monitoring
Status of automated failover (on or off)
For each High Availability feature, the summary report returns status for each tested file serving node and optionally for their standbys:
Passed. The feature has been configured.
Warning. The feature has not been configured, but the significance of the finding is not clear.
For example, the absence of discovered HBAs can indicate either that the HBA monitoring feature was not configured or that HBAs are not physically present on the tested servers.
Failed. The feature has not been configured.
The detailed report includes an overall result status for all tested file serving nodes and describes details about the checks performed on each High Availability feature. By default, the report includes details only about checks that received a Failed or a Warning result. You can expand the report to include details about checks that received a Passed result.
Viewing a summary report
Use the ibrix_haconfig -l command to see a summary of all file serving nodes. To check specific file serving nodes, include the -h HOSTLIST argument. To check standbys, include the
-b argument. To view results only for file serving nodes that failed a check, include the -f argument.
ibrix_haconfig -l [-h HOSTLIST] [-f] [-b]
For example, to view a summary report for file serving nodes xs01.hp.com and xs02.hp.com:
ibrix_haconfig -l -h xs01.hp.com,xs02.hp.com
Host HA Configuration Power Sources Backup Servers Auto Failover Nics Monitored Standby Nics HBAs Monitored xs01.hp.com FAILED PASSED PASSED PASSED FAILED PASSED FAILED xs02.hp.com FAILED PASSED FAILED FAILED FAILED WARNED WARNED
Viewing a detailed report
Execute the ibrix_haconfig -i command to view the detailed report:
ibrix_haconfig -i [-h HOSTLIST] [-f] [-b] [-s] [-v]
The -h HOSTLIST option lists the nodes to check. To also check standbys, include the -b option. To view results only for file serving nodes that failed a check, include the -f argument. The -s option expands the report to include information about the file system and its segments. The -v option produces detailed information about configuration checks that received a Passed result.
For example, to view a detailed report for file serving node xs01.hp.com:
ibrix_haconfig -i -h xs01.hp.com
--------------- Overall HA Configuration Checker Results --------------­FAILED
--------------- Overall Host Results --------------­Host HA Configuration Power Sources Backup Servers Auto Failover Nics Monitored Standby Nics HBAs Monitored xs01.hp.com FAILED PASSED PASSED PASSED FAILED PASSED FAILED
--------------- Server xs01.hp.com FAILED Report ---------------
Check Description Result Result Information ================================================ ====== ================== Power source(s) configured PASSED Backup server or backups for segments configured PASSED Automatic server failover configured PASSED
Cluster & User Nics monitored Cluster nic xs01.hp.com/eth1 monitored FAILED Not monitored
Configuring High Availability on the cluster 67
Page 68
User nics configured with a standby nic PASSED
HBA ports monitored Hba port 21.01.00.e0.8b.2a.0d.6d monitored FAILED Not monitored Hba port 21.00.00.e0.8b.0a.0d.6d monitored FAILED Not monitored
Capturing a core dump from a failed node
The crash capture feature collects a core dump from a failed node when the Fusion Manager initiates failover of the node. You can use the core dump to analyze the root cause of the node failure. When enabled, crash capture is supported for both automated and manual failover. Failback is not affected by this feature. By default, crash capture is disabled. This section provides the prerequisites and steps for enabling crash capture.
NOTE: Enabling crash capture adds a delay (up to 240 seconds) to the failover to allow the
crash kernel to load. The failover process ensures that the crash kernel is loaded before continuing.
When crash capture is enabled, the system takes the following actions when a node fails:
1. The Fusion Manager triggers a core dump on the failed node when failover starts, changing the state of the node to Up, InFailover.
2. The failed node boots into the crash kernel. The state of the node changes to Dumping, InFailover.
3. The failed node continues with the failover, changing state to Dumping, FailedOver.
4. After the core dump is created, the failed node reboots and its state changes to Up, FailedOver.
IMPORTANT: Complete the steps in “Prerequisites for setting up the crash capture” (page 68)
before setting up the crash capture.
Prerequisites for setting up the crash capture
The following parameters must be configured in the ROM-based setup utility (RBSU) before a crash can be captured automatically on a file server node in failed condition.
1. Start RBSU – Reboot the server, and then Press F9 Key.
2. Highlight the System Options option in main menu, and then press the Enter key. Highlight the Virtual Serial Port option (below figure), and then press the Enter key. Select the COM1 port, and then press the Enter key.
68 Configuring failover
Page 69
3. Highlight the BIOS Serial Console & EMS option in main menu, and then press the Enter key.
Highlight the BIOS Serial Console Port option and then press the Enter key. Select the COM1 port, and then press the Enter key.
4. Highlight the BIOS Serial Console Baud Rate option, and then press the Enter key. Select the 115200 Serial Baud Rate.
5. Highlight the Server Availability option in main menu, and then press the Enter key. Highlight the ASR Timeout option and then press the Enter key. Select the 30 Minutes, and then press the Enter key.
6. To exit RBSU, press Esc until the main menu is displayed. Then, at the main menu, press F10. The server automatically restarts.
Setting up nodes for crash capture
IMPORTANT: Complete the steps in “Prerequisites for setting up the crash capture” (page 68)
before starting the steps in this section.
To set up nodes for crash capture, complete the following steps:
1. Enable crash capture. Run the following command:
ibrix_host_tune -S { -h HOSTLIST | -g GROUPLIST } -o trigger_crash_on_failover=1
2. Tune Fusion Manager to set the DUMPING status timeout by entering the following command:
ibrix_fm_tune -S -o dumpingStatusTimeout=240
This command is required to delay the failover until the crash kernel is loaded; otherwise, Fusion Manager will bring down the failed node.
Capturing a core dump from a failed node 69
Page 70
6 Configuring cluster event notification
Cluster events
There are three categories for cluster events:
Alerts. Disruptive events that can result in loss of access to file system data.
Warnings. Potentially disruptive conditions where file system access is not lost, but if the situation is not
addressed, it can escalate to an alert condition.
Information. Normal events that change the cluster.
The following table lists examples of events included in each category.
NameTrigger PointEvent Type
login.failureUser fails to log into GUIALERT
filesystem.unmountedFile system is unmounted
server.status.downFile serving node is down/restarted
server.unreachableFile serving node terminated unexpectedly
segment.migratedUser migrates segment using GUIWARN
login.successUser successfully logs in to GUIINFO
filesystem.cmdFile system is created
server.deregisteredFile serving node is deleted
nic.addedNIC is added using GUI
nic.removedNIC is removed using GUI
physicalvolume.addedPhysical storage is discovered and added using
management console
physicalvolume.deletedPhysical storage is deleted using management console
You can be notified of cluster events by email or SNMP traps. To view the list of supported events, use the command ibrix_event -q.
Setting up email notification of cluster events
You can set up event notifications by event type or for one or more specific events. To set up automatic email notification of cluster events, associate the events with email recipients and then configure email settings to initiate the notification process.
Associating events and email addresses
You can associate any combination of cluster events with email addresses: all Alert, Warning, or Info events, all events of one type plus a subset of another type, or a subset of all types.
The notification threshold for Alert events is 90% of capacity. Threshold-triggered notifications are sent when a monitored system resource exceeds the threshold and are reset when the resource
70 Configuring cluster event notification
Page 71
utilization dips 10% below the threshold. For example, a notification is sent the first time usage reaches 90% or more. The next notice is sent only if the usage declines to 80% or less (event is reset), and subsequently rises again to 90% or above.
To associate all types of events with recipients, omit the -e argument in the following command:
ibrix_event -c [-e ALERT|WARN|INFO|EVENTLIST] -m EMAILLIST
Use the ALERT, WARN, and INFO keywords to make specific type associations or use EVENTLIST to associate specific events.
The following command associates all types of events to admin@hp.com:
ibrix_event -c -m admin@hp.com
The next command associates all Alert events and two Info events to admin@hp.com:
ibrix_event -c -e ALERT,server.registered,filesystem.space.full
-m admin@hp.com
Configuring email notification settings
To configure email notification settings, specify the SMTP server and header information and turn the notification process on or off.
ibrix_event -m on|off -s SMTP -f from [-r reply-to] [-t subject]
The server must be able to receive and send email and must recognize the From and Reply-to addresses. Be sure to specify valid email addresses, especially for the SMTP server. If an address is not valid, the SMTP server will reject the email.
The following command configures email settings to use the mail.hp.com SMTP server and turns on notifications:
ibrix_event -m on -s mail.hp.com -f FM@hp.com -r MIS@hp.com -t Cluster1 Notification
NOTE: The state of the email notification process has no effect on the display of cluster events
in the GUI.
Dissociating events and email addresses
To remove the association between events and email addresses, use the following command:
ibrix_event -d [-e ALERT|WARN|INFO|EVENTLIST] -m EMAILLIST
For example, to dissociate event notifications for admin@hp.com:
ibrix_event -d -m admin@hp.com
To turn off all Alert notifications for admin@hp.com:
ibrix_event -d -e ALERT -m admin@hp.com
To turn off the server.registered and filesystem.created notifications for admin1@hp.com and admin2@hp.com:
ibrix_event -d -e server.registered,filesystem.created -m admin1@hp.com,admin2@hp.com
Testing email addresses
To test an email address with a test message, notifications must be turned on. If the address is valid, the command signals success and sends an email containing the settings to the recipient. If the address is not valid, the command returns an address failed exception.
ibrix_event -u -n EMAILADDRESS
Setting up email notification of cluster events 71
Page 72
Viewing email notification settings
The ibrix_event -L command provides comprehensive information about email settings and configured notifications.
ibrix_event -L Email Notification : Enabled SMTP Server : mail.hp.com From : FM@hp.com Reply To : MIS@hp.com
EVENT LEVEL TYPE DESTINATION
------------------------------------- ----- ----- ----------­asyncrep.completed ALERT EMAIL admin@hp.com asyncrep.failed ALERT EMAIL admin@hp.com
Setting up SNMP notifications
StoreAll software supports SNMP (Simple Network Management Protocol) V1, V2, and V3. Whereas SNMPV2 security was enforced by use of community password strings, V3 introduces
the USM and VACM. Discussion of these models is beyond the scope of this document. Refer to RFCs 3414 and 3415 at http://www.ietf.org for more information. Note the following:
In the SNMPV3 environment, every message contains a user name. The function of the USM
is to authenticate users and ensure message privacy through message encryption and decryption. Both authentication and privacy, and their passwords, are optional and will use default settings where security is less of a concern.
With users validated, the VACM determines which managed objects these users are allowed
to access. The VACM includes an access scheme to control user access to managed objects; context matching to define which objects can be accessed; and MIB views, defined by subsets of IOD subtree and associated bitmask entries, which define what a particular user can access in the MIB.
Steps for setting up SNMP include:
Agent configuration (all SNMP versions)
Trapsink configuration (all SNMP versions)
Associating event notifications with trapsinks (all SNMP versions)
View definition (V3 only)
Group and user configuration (V3 only)
StoreAll software implements an SNMP agent that supports the private StoreAll software MIB. The agent can be polled and can send SNMP traps to configured trapsinks.
Setting up SNMP notifications is similar to setting up email notifications. You must associate events to trapsinks and configure SNMP settings for each trapsink to enable the agent to send a trap when an event occurs.
NOTE: When Phone Home is enabled, you cannot edit or change the configuration of the StoreAll
SNMP agent with the ibrix_snmpagent. However, you can add trapsink IPs with ibrix_snmptrap and can associate events to the trapsink IP with ibrix_event.
72 Configuring cluster event notification
Page 73
Configuring the SNMP agent
The SNMP agent is created automatically when the Fusion Manager is installed. It is initially configured as an SNMPv2 agent and is off by default.
Some SNMP parameters and the SNMP default port are the same, regardless of SNMP version. The default agent port is 161. SYSCONTACT, SYSNAME, and SYSLOCATION are optional MIB-II agent parameters that have no default values.
NOTE: The default SNMP agent port was changed from 5061 to 161 in the StoreAll 6.1 release.
This port number cannot be changed.
The -c and -s options are also common to all SNMP versions. The -c option turns the encryption of community names and passwords on or off. There is no encryption by default. Using the -s option toggles the agent on and off; it turns the agent on by starting a listener on the SNMP port, and turns it off by shutting off the listener. The default is off.
The format for a v1 or v2 update command follows:
ibrix_snmpagent -u -v {1|2} [-p PORT] [-r READCOMMUNITY] [-w WRITECOMMUNITY] [-t SYSCONTACT] [-n SYSNAME] [-o SYSLOCATION] [-c {yes|no}] [-s {on|off}]
The update command for SNMPv1 and v2 uses optional community names. By convention, the default READCOMMUNITY name used for read-only access and assigned to the agent is public. No default WRITECOMMUNITY name is set for read-write access (although the name private is often used).
The following command updates a v2 agent with the write community name private, the agent’s system name, and that system’s physical location:
ibrix_snmpagent -u -v 2 -w private -n agenthost.domain.com -o DevLab-B3-U6
The SNMPv3 format adds an optional engine id that overrides the default value of the agent’s host name. The format also provides the -y and -z options, which determine whether a v3 agent can process v1/v2 read and write requests from the management station. The format is:
ibrix_snmpagent -u -v 3 [-e engineId] [-p PORT] [-r READCOMMUNITY] [-w WRITECOMMUNITY] [-t SYSCONTACT] [-n SYSNAME] [-o SYSLOCATION] [-y {yes|no}] [-z {yes|no}] [-c {yes|no}] [-s {on|off}]
Configuring trapsink settings
A trapsink is the host destination where agents send traps, which are asynchronous notifications sent by the agent to the management station. A trapsink is specified either by name or IP address. StoreAll software supports multiple trapsinks; you can define any number of trapsinks of any SNMP version, but you can define only one trapsink per host, regardless of the version.
At a minimum, trapsink configuration requires a destination host and SNMP version. All other parameters are optional and many assume the default value if no value is specified.
The format for creating a v1/v2 trapsink is:
ibrix_snmptrap -c -h HOSTNAME -v {1|2} [-p PORT] [-m COMMUNITY] [-s {on|off}]
If a port is not specified, the command defaults to port 162. If a community is not specified, the command defaults to the community name public. The -s option toggles agent trap transmission on and off. The default is on. For example, to create a v2 trapsink with a new community name, enter:
ibrix_snmptrap -c -h lab13-116 -v 2 -m private
For a v3 trapsink, additional options define security settings. USERNAME is a v3 user defined on the trapsink host and is required. The security level associated with the trap message depends on which passwords are specified—the authentication password, both the authentication and privacy passwords, or no passwords. The CONTEXT_NAME is required if the trap receiver has defined subsets of managed objects. The format is:
Setting up SNMP notifications 73
Page 74
ibrix_snmptrap -c -h HOSTNAME -v 3 [-p PORT] -n USERNAME [-j {MD5|SHA}] [-k AUTHORIZATION_PASSWORD] [-y {DES|AES}] [-z PRIVACY_PASSWORD] [-x CONTEXT_NAME] [-s {on|off}]
The following command creates a v3 trapsink with a named user and specifies the passwords to be applied to the default algorithms. If specified, passwords must contain at least eight characters.
ibrix_snmptrap -c -h lab13-114 -v 3 -n trapsender -k auth-passwd -z priv-passwd
Associating events and trapsinks
Associating events with trapsinks is similar to associating events with email recipients, except that you specify the host name or IP address of the trapsink instead of an email address.
Use the ibrix_event command to associate SNMP events with trapsinks. The format is:
ibrix_event -c -y SNMP [-e ALERT|INFO|EVENTLIST]
-m TRAPSINK
For example, to associate all Alert events and two Info events with a trapsink at IP address
192.168.2.32, enter:
ibrix_event -c -y SNMP -e ALERT,server.registered, filesystem.created -m 192.168.2.32
Use the ibrix_event -d command to dissociate events and trapsinks:
ibrix_event -d -y SNMP [-e ALERT|INFO|EVENTLIST] -m TRAPSINK
Defining views
A MIB view is a collection of paired OID subtrees and associated bitmasks that identify which subidentifiers are significant to the view’s definition. Using the bitmasks, individual OID subtrees can be included in or excluded from the view.
An instance of a managed object belongs to a view if:
The OID of the instance has at least as many sub-identifiers as the OID subtree in the view.
Each sub-identifier in the instance and the subtree match when the bitmask of the corresponding
sub-identifier is nonzero.
The Fusion Manager automatically creates the excludeAll view that blocks access to all OIDs. This view cannot be deleted; it is the default read and write view if one is not specified for a group with the ibrix_snmpgroup command. The catch-all OID and mask are:
OID = .1 Mask = .1
Consider these examples, where instance .1.3.6.1.2.1.1 matches, instance .1.3.6.1.4.1 matches, and instance .1.2.6.1.2.1 does not match.
OID = .1.3.6.1.4.1.18997 Mask = .1.1.1.1.1.1.1
OID = .1.3.6.1.2.1 Mask = .1.1.0.1.0.1
To add a pairing of an OID subtree value and a mask value to a new or existing view, use the following format:
ibrix_snmpview -a -v VIEWNAME [-t {include|exclude}] -o OID_SUBTREE [-m MASK_BITS]
The subtree is added in the named view. For example, to add the StoreAll software private MIB to the view named hp, enter:
ibrix_snmpview -a -v hp -o .1.3.6.1.4.1.18997 -m .1.1.1.1.1.1.1
74 Configuring cluster event notification
Page 75
Configuring groups and users
A group defines the access control policy on managed objects for one or more users. All users must belong to a group. Groups and users exist only in SNMPv3. Groups are assigned a security level, which enforces use of authentication and privacy, and specific read and write views to identify which managed objects group members can read and write.
The command to create a group assigns its SNMPv3 security level, read and write views, and context name. A context is a collection of managed objects that can be accessed by an SNMP entity. A related option, -m, determines how the context is matched. The format follows:
ibrix_snmpgroup -c -g GROUPNAME [-s {noAuthNoPriv|authNoPriv|authPriv}] [-r READVIEW] [-w WRITEVIEW]
For example, to create the group group2 to require authorization, no encryption, and read access to the hp view, enter:
ibrix_snmpgroup -c -g group2 -s authNoPriv -r hp
The format to create a user and add that user to a group follows:
ibrix_snmpuser -c -n USERNAME -g GROUPNAME [-j {MD5|SHA}] [-k AUTHORIZATION_PASSWORD] [-y {DES|AES}] [-z PRIVACY_PASSWORD]
Authentication and privacy settings are optional. An authentication password is required if the group has a security level of either authNoPriv or authPriv. The privacy password is required if the group has a security level of authPriv. If unspecified, MD5 is used as the authentication algorithm and DES as the privacy algorithm, with no passwords assigned.
For example, to create user3, add that user to group2, and specify an authorization password for authorization and no encryption, enter:
ibrix_snmpuser -c -n user3 -g group2 -k auth-passwd -s authNoPriv
Deleting elements of the SNMP configuration
All SNMP commands use the same syntax for delete operations, using -d to indicate the object is to delete. The following command deletes a list of hosts that were trapsinks:
ibrix_snmptrap -d -h lab15-12.domain.com,lab15-13.domain.com,lab15-14.domain.com
There are two restrictions on SNMP object deletions:
A view cannot be deleted if it is referenced by a group.
A group cannot be deleted if it is referenced by a user.
Listing SNMP configuration information
All SNMP commands employ the same syntax for list operations, using the -l flag. For example:
ibrix_snmpgroup -l
This command lists the defined group settings for all SNMP groups. Specifying an optional group name lists the defined settings for that group only.
Setting up SNMP notifications 75
Page 76
7 Configuring system backups
Backing up the Fusion Manager configuration
The Fusion Manager configuration is automatically backed up whenever the cluster configuration changes. The backup occurs on the node hosting the active Fusion Manager. The backup file is stored at <ibrixhome>/tmp/fmbackup.zip on that node.
The active Fusion Manager notifies the passive Fusion Manager when a new backup file is available. The passive Fusion Manager then copies the file to <ibrixhome>/tmp/fmbackup.zip on the node on which it is hosted. If a Fusion Manager is in maintenance mode, it will also be notified when a new backup file is created, and will retrieve it from the active Fusion Manager.
You can create an additional copy of the backup file at any time. Run the following command, which creates a fmbackup.zip file in the $IBRIXHOME/log directory:
$IBRIXHOME/bin/db_backup.sh Once each day, a cron job rotates the $IBRIXHOME/log directory into the $IBRIXHOME/log/
daily subdirectory. The cron job also creates a new backup of the Fusion Manager configuration
in both $IBRIXHOME/tmp and $IBRIXHOME/log. To force a backup, use the following command:
ibrix_fm -B
IMPORTANT: You will need the backup file to recover from server failures or to undo unwanted
configuration changes. Whenever the cluster configuration changes, be sure to save a copy of fmbackup.zip in a safe, remote location such as a node on another cluster.
Using NDMP backup applications
The NDMP backup feature can be used to back up and recover entire StoreAll software file systems or portions of a file system. You can use any supported NDMP backup application to perform the backup and recovery operations. (In NDMP terminology, the backup application is referred to as a Data Management Application, or DMA.) The DMA is run on a management station separate from the cluster and communicates with the cluster's file serving nodes over a configurable socket port.
The NDMP backup feature supports the following:
NDMP protocol versions 3 and 4
Two-way NDMP operations
Three-way NDMP operations between two network storage systems
Each file serving node functions as an NDMP Server and runs the NDMP Server daemon (ndmpd) process. When you start a backup or restore operation on the DMA, you can specify the node and tape device to be used for the operation.
Following are considerations for configuring and using the NDMP feature:
When configuring your system for NDMP operations, attach your tape devices to a SAN and
then verify that the file serving nodes to be used for backup/restore operations can see the appropriate devices.
When performing backup operations, take snapshots of your file systems and then back up
the snapshots.
When directory tree quotas are enabled, an NDMP restore to the original location fails if the
hard quota limit is exceeded. The NDMP restore operation first creates a temporary file and then restores a file to the temporary file. After this succeeds, the restore operation overwrites the existing file (if it present in same destination directory) with the temporary file. When the
76 Configuring system backups
Page 77
hard quota limit for the directory tree has been exceeded, NDMP cannot create a temporary file and the restore operation fails.
Configuring NDMP parameters on the cluster
Certain NDMP parameters must be configured to enable communications between the DMA and the NDMP Servers in the cluster. To configure the parameters on the GUI, select Cluster Configuration from the Navigator, and then select NDMP Backup. The NDMP Configuration Summary shows the default values for the parameters. Click Modify to configure the parameters for your cluster on the Configure NDMP dialog box. See the online help for a description of each field.
Using NDMP backup applications 77
Page 78
To configure NDMP parameters from the CLI, use the following command:
ibrix_ndmpconfig -c [-d IP1,IP2,IP3,...] [-m MINPORT] [-x MAXPORT] [-n LISTENPORT] [-u USERNAME] [-p PASSWORD] [-e {0=disable,1=enable}] -v [{0=10}] [-w BYTES] [-z NUMSESSIONS]
NDMP process management
Normally all NDMP actions are controlled from the DMA. However, if the DMA cannot resolve a problem or you suspect that the DMA may have incorrect information about the NDMP environment, take the following actions from the GUI or CLI:
Cancel one or more NDMP sessions on a file serving node. Canceling a session stops all
spawned sessions processes and frees their resources if necessary.
Reset the NDMP server on one or more file serving nodes. This step stops all spawned session
processes, stops the ndmpd and session monitor daemons, frees all resources held by NDMP, and restarts the daemons.
Viewing or canceling NDMP sessions
To view information about active NDMP sessions, select Cluster Configuration from the Navigator, and then select NDMP Backup > Active Sessions. For each session, the Active NDMP Sessions panel lists the host used for the session, the identifier generated by the backup application, the status of the session (backing up data, restoring data, or idle), the start time, and the IP address used by the DMA.
To cancel a session, select that session and click Cancel Session. Canceling a session kills all spawned sessions processes and frees their resources if necessary.
To see similar information for completed sessions, select NDMP Backup > Session History.
View active sessions from the CLI:
ibrix_ndmpsession -l
View completed sessions:
ibrix_ndmpsession -l -s [-t YYYY-MM-DD]
The -t option restricts the history to sessions occurring on or before the specified date.
Cancel sessions on a specific file serving node:
ibrix_ndmpsession -c SESSION1,SESSION2,SESSION3,... -h HOST
Starting, stopping, or restarting an NDMP Server
When a file serving node is booted, the NDMP Server is started automatically. If necessary, you can use the following command to start, stop, or restart the NDMP Server on one or more file serving nodes:
ibrix_server -s -t ndmp -c { start | stop | restart} [-h SERVERNAMES]
78 Configuring system backups
Page 79
Viewing or rescanning tape and media changer devices
To view the tape and media changer devices currently configured for backups, select Cluster Configuration from the Navigator, and then select NDMP Backup > Tape Devices.
If you add a tape or media changer device to the SAN, click Rescan Device to update the list. If you remove a device and want to delete it from the list, reboot all of the servers to which the device is attached.
To view tape and media changer devices from the CLI, use the following command:
ibrix_tape -l
To rescan for devices, use the following command:
ibrix_tape -r
NDMP events
An NDMP Server can generate three types of events: INFO, WARN, and ALERT. These events are displayed on the GUI and can be viewed with the ibrix_event command.
INFO events. Identifies when major NDMP operations start and finish, and also report progress. For example:
7012:Level 3 backup of /mnt/ibfs7 finished at Sat Nov 7 21:20:58 PST 2011 7013:Total Bytes = 38274665923, Average throughput = 236600391 bytes/sec.
WARN events. Indicates an issue with NDMP access, the environment, or NDMP operations. Be sure to review these events and take any necessary corrective actions. Following are some examples:
0000:Unauthorized NDMP Client 16.39.40.201 trying to connect 4002:User [joe] md5 mode login failed.
ALERT events. Indicates that an NDMP action has failed. For example:
1102: Cannot start the session_monitor daemon, ndmpd exiting. 7009:Level 6 backup of /mnt/shares/accounts1 failed (writing eod header error). 8001:Restore Failed to read data stream signature.
You can configure the system to send email or SNMP notifications when these types of events occur.
Using NDMP backup applications 79
Page 80
8 Creating host groups for StoreAll clients
A host group is a named set of StoreAll clients. Host groups provide a convenient way to centrally manage clients. You can put different sets of clients into host groups and then perform the following operations on all members of the group:
Create and delete mount points
Mount file systems
Prefer a network interface
Tune host parameters
Set allocation policies
Host groups are optional. If you do not choose to set them up, you can mount file systems on clients and tune host settings and allocation policies on an individual level.
How host groups work
In the simplest case, the host groups functionality allows you to perform an allowed operation on all StoreAll clients by executing a command on the default clients host group with the CLI or the GUI. The clients host group includes all StoreAll clients configured in the cluster.
NOTE: The command intention is stored on the Fusion Manager until the next time the clients
contact the Fusion Manager. (To force this contact, restart StoreAll software services on the clients, reboot the clients, or execute ibrix_lwmount -a or ibrix_lwhost --a.) When contacted, the Fusion Manager informs the clients about commands that were executed on host groups to which they belong. The clients then use this information to perform the operation.
You can also use host groups to perform different operations on different sets of clients. To do this, create a host group tree that includes the necessary host groups. You can then assign the clients manually, or the Fusion Manager can automatically perform the assignment when you register a StoreAll client, based on the client's cluster subnet. To use automatic assignment, create a domain rule that specifies the cluster subnet for the host group.
Creating a host group tree
The clients host group is the root element of the host group tree. Each host group in a tree can have only one parent, but a parent can have multiple children. In a host group tree, operations performed on lower-level nodes take precedence over operations performed on higher-level nodes. This means that you can effectively establish global client settings that you can override for specific clients.
For example, suppose that you want all clients to be able to mount file system ifs1 and to implement a set of host tunings denoted as Tuning 1, but you want to override these global settings for certain host groups. To do this, mount ifs1 on the clients host group, ifs2 on host group A, ifs3 on host group C, and ifs4 on host group D, in any order. Then, set Tuning 1 on the clients host group and Tuning 2 on host group B. The end result is that all clients in host group B will mount ifs1 and implement Tuning 2. The clients in host group A will mount ifs2 and implement Tuning 1. The clients in host groups C and D respectively, will mount ifs3 and ifs4 and implement Tuning 1. The following diagram shows an example of these settings in a host group tree.
80 Creating host groups for StoreAll clients
Page 81
To create one level of host groups beneath the root, simply create the new host groups. You do not need to declare that the root node is the parent. To create lower levels of host groups, declare a parent element for host groups. Do not use a host name as a group name.
To create a host group tree using the CLI:
1. Create the first level of the tree:
ibrix_hostgroup -c -g GROUPNAME
2. Create all other levels by specifying a parent for the group:
ibrix_hostgroup -c -g GROUPNAME [-p PARENT]
Adding a StoreAll client to a host group
You can add a StoreAll client to a host group or move a client to a different host group. All clients belong to the default clients host group.
To add or move a host to a host group, use the ibrix_hostgroup command as follows:
ibrix_hostgroup -m -g GROUP -h MEMBER
For example, to add the specified host to the finance group:
ibrix_hostgroup -m -g finance -h cl01.hp.com
Adding a domain rule to a host group
To configure automatic host group assignments, define a domain rule for host groups. A domain rule restricts host group membership to clients on a particular cluster subnet. The Fusion Manager uses the IP address that you specify for clients when you register them to perform a subnet match and sorts the clients into host groups based on the domain rules.
Setting domain rules on host groups provides a convenient way to centrally manage mounting, tuning, allocation policies, and preferred networks on different subnets of clients. A domain rule is a subnet IP address that corresponds to a client network. Adding a domain rule to a host group restricts its members to StoreAll clients that are on the specified subnet. You can add a domain rule at any time.
To add a domain rule to a host group, use the ibrix_hostgroup command as follows:
ibrix_hostgroup -a -g GROUPNAME -D DOMAIN
Adding a StoreAll client to a host group 81
Page 82
For example, to add the domain rule 192.168 to the finance group:
ibrix_hostgroup -a -g finance -D 192.168
Viewing host groups
To view all host groups or a specific host group, use the following command:
ibrix_hostgroup -l [-g GROUP]
Deleting host groups
When you delete a host group, its members are reassigned to the parent of the deleted group. To force the reassigned StoreAll clients to implement the mounts, tunings, network interface
preferences, and allocation policies that have been set on their new host group, either restart StoreAll software services on the clients or execute the following commands locally:
ibrix_lwmount -a to force the client to pick up mounts or allocation policies
ibrix_lwhost --a to force the client to pick up host tunings
To delete a host group using the CLI:
ibrix_hostgroup -d -g GROUPNAME
Other host group operations
Additional host group operations are described in the following locations:
Creating or deleting a mountpoint, and mounting or unmounting a file system (see “Creating
and mounting file systems” in the HP StoreAll Storage File System User Guide)
Changing host tuning parameters (see “Tuning file serving nodes and StoreAll clients”
(page 118))
Preferring a network interface (see “Preferring network interfaces” (page 131))
Setting allocation policy (see “Using file allocation” in the HP StoreAll Storage File System
User Guide)
82 Creating host groups for StoreAll clients
Page 83
9 Monitoring cluster operations
This chapter describes how to monitor the operational state of the cluster and how to monitor cluster health.
Monitoring X9720/9730 hardware
The GUI displays status, firmware versions, and device information for the servers, chassis, and system storage included in X9720 and 9730 systems. The Management Console displays a top-level status of the chassis, server, and storage hardware components. You can also drill-down to view the status of individual chassis, server, and storage sub-components.
Monitoring servers
To view information about the server and chassis included in your system.
1. Select Servers from the Navigator tree. The Servers panel lists the servers included in each chassis.
2. Select the server you want to obtain more information about. Information about the servers in the chassis is displayed in the right pane.
To view summary information for the selected server, select the Summary node in the lower Navigator tree.
Monitoring X9720/9730 hardware 83
Page 84
Select the server component that you want to view from the lower Navigator panel, such as NICs.
84 Monitoring cluster operations
Page 85
The following are the top-level options provided for the server:
NOTE: Information about the Hardware node can be found in “Monitoring hardware components”
(page 87).
HBAs. The HBAs panel displays the following information:
Node WWN
Port WWN
Backup
Monitoring
State
NICs. The NICs panel shows all NICs on the server, including offline NICs. The NICs panel
displays the following information:
Name
IP
Type
State
Monitoring X9720/9730 hardware 85
Page 86
Route
Standby Server
Standby Interface
Mountpoints. The Mountpoints panel displays the following information:
Mountpoint
Filesystem
Access
NFS. The NFS panel displays the following information:
Host
Path
Options
CIFS. The CIFS panel displays the following information:
NOTE: CIFS in the GUI has not been rebranded to SMB yet. CIFS is just a different name
for SMB.
Name
Value
Power. The Power panel displays the following information:
Host
Name
Type
IP Address
Slot ID
Events. The Events panel displays the following information:
Level
Time
Event
Hardware. The Hardware panel displays the following information:
The name of the hardware component.
The information gathered in regards to that hardware component.
See “Monitoring hardware components” (page 87) for detailed information about the Hardware panel.
86 Monitoring cluster operations
Page 87
Monitoring hardware components
The front of the chassis includes server bays and the rear of the chassis includes components such as fans, power supplies, Onboard Administrator modules, and interconnect modules (VC modules and SAS switches). The following Onboard Administrator view shows a chassis enclosure on a StoreAll 9730 system.
To monitor these components from the GUI:
1. Click Servers from the upper Navigator tree.
2. Click Hardware from the lower Navigator tree for information about the chassis that contains the server selected on the Servers panel, as shown in the following image.
Monitoring X9720/9730 hardware 87
Page 88
Monitoring blade enclosures
To view summary information about the blade enclosures in the chassis:
1. Expand the Hardware node.
2. Select the Blade Enclosure node under the Hardware node. The following summary information is displayed for the blade enclosure:
Status
Type
Name
UUID
Serial number
Detailed information of the hardware components in the blade enclosure is provided by expanding the Blade Enclosure node and clicking one of the sub-nodes.
When you select one of the sub-nodes under the Blade Enclosure node, additional information is provided. For example, when you select the Fan node, additional information about the Fan for the blade enclosure is provided in the Fan panel.
88 Monitoring cluster operations
Page 89
The sub-nodes under the Blade Enclosure node provide information about the hardware components within the blade enclosure:
Monitoring X9720/9730 hardware 89
Page 90
Table 2 Obtaining detailed information about a blade enclosure
Information providedPanel name
Bay
Status
Type
Name
UUID
Serial number
Model
Properties
Temperature Sensor: The Temperature Sensor panel displays information for a bay, OA module or for the blade enclosure.
Status
Type
UUID
Properties
Fan: The Fan panel displays information for a blade enclosure.
Status
Type
Name
UUID
Location
Properties
OA Module
Status
Type
Name
UUID
Serial number
Model
Firmware version
Location
Properties
Power Supply
Status
Type
Name
UUID
Serial number
Location
Shared Interconnect
Status
Type
Name
UUID
Serial number
Model
Firmware version
Location
Properties
90 Monitoring cluster operations
Page 91
Obtaining server details
The Management Console provides detailed information for each server in the chassis. To obtain summary information for a server, select the Server node under the Hardware node.
The following overview information is provided for each server:
Status
Type
Name
UUID
Serial number
Model
Firmware version
Message
1
Diagnostic Message
1
1
Column dynamically appears depending on the situation.
Obtain detailed information for hardware components in the server by clicking the nodes under the Server node.
Monitoring X9720/9730 hardware 91
Page 92
Table 3 Obtaining detailed information about a server
Information providedPanel name
CPU
Status
Type
Name
UUID
Model
Location
ILO Module
Status
Type
Name
UUID
Serial Number
Model
Firmware Version
Properties
Memory DiMM
Status
Type
Name
UUID
Location
Properties
NIC
Status
Type
Name
UUID
Properties
Power Management Controller
Status
Type
Name
UUID
Firmware Version
Storage Cluster
Status
Type
Name
UUID
92 Monitoring cluster operations
Page 93
Table 3 Obtaining detailed information about a server (continued)
Information providedPanel name
Drive: Displays information about each drive in a storage
cluster.
Status
Type
Name
UUID
Serial Number
Model
Firmware Version
Location
Properties
Storage Controller (Displayed for a server)
Status
Type
Name
UUID
Serial Number
Model
Firmware Version
Location
Message
Diagnostic message
Volume: Displays volume information for each server.
Status
Type
Name
UUID
Properties
Storage Controller (Displayed for a storage cluster)
Status
Type
UUID
Serial Number
Model
Firmware Version
Message
Diagnostic Message
Battery (Displayed for each storage controller)
Status
Type
UUID
Properties
IO Cache Module (Displayed for a storage controller)
Status
Type
UUID
Properties
Monitoring X9720/9730 hardware 93
Page 94
Table 3 Obtaining detailed information about a server (continued)
Information providedPanel name
Temperature Sensor: Displays information for each
temperature sensor.
Status
Type
Name
UUID
Locations
Properties
Monitoring storage and storage components
Select Vendor Storage from the Navigator tree to display status and device information for storage and storage components. The Vendor Storage panel lists the HP 9730 CX storage systems included in the system.
The Summary panel shows details for a selected vendor storage, as shown in the following image:
94 Monitoring cluster operations
Page 95
The Management Console provides a wide-range of information in regards to vendor storage, as shown in the following image.
Drill down into the following components in the lower Navigator tree to obtain additional details:
Servers. The Servers panel lists the host names for the attached storage.
Storage Cluster. The Storage Cluster panel provides detailed information about the storage
cluster. See “Monitoring storage clusters” (page 96) for more information.
Storage Switch. The Storage Switch panel provides detailed information about the storage
switches. See “Monitoring storage switches in a storage cluster” (page 101) for more information.
LUNs. The LUNs panel provides information about the LUNs in a storage cluster. See “Managing LUNs in a storage cluster” (page 101) for more information.
Monitoring X9720/9730 hardware 95
Page 96
Monitoring storage clusters
The Management Console provides detailed information for each storage cluster. Click one of the following sub-nodes displayed under the Storage Clusters node to obtain additional information:
Drive Enclosure. The Drive Enclosure panel provides detailed information about the drive
enclosure. Expand the Drive Enclosure node to view information about the power supply and sub enclosures. See “Monitoring drive enclosures for a storage cluster” (page 96) for more information.
Pool. The Pool panel provides detailed information about a pool in a storage cluster. Expand
the Pool node to view information about the volumes in the pool. See “Monitoring pools for
a storage cluster” (page 99) for more information.
Storage Controller. The Storage Controller panel provides detailed information about the
storage controller. Expand the Storage Controller node to view information about batteries and IO cache modules for a storage controller. See “Monitoring storage controllers for a
storage cluster” (page 100) for more information.
Monitoring drive enclosures for a storage cluster
Each 9730 CX has a single drive enclosure. That enclosure includes two sub-enclosures, which are shown under the Drive Enclosure node. Select one of the Sub Enclosure nodes to display information about one of the sub-enclosures.
96 Monitoring cluster operations
Page 97
Expand the Drive Enclosure node to provide additional information about the power supply and sub enclosures.
Table 4 Details provided for the drive enclosure
Where to find detailed informationNode
“Monitoring the power supply for a storage cluster” (page 97)
Power Supply
“Monitoring sub enclosures” (page 98)Sub Enclosure
Monitoring the power supply for a storage cluster
Each drive enclosure also has power supplies. Select the Power Supply node to view the following information for each power supply in the drive enclosure:
Status
Type
Name
UUID
The Power Supply panel displayed in the following image provides information about four power supplies in an enclosure:
Monitoring X9720/9730 hardware 97
Page 98
Monitoring sub enclosures
Expand the Sub Enclosure node to obtain information about the following components for each sub-enclosure:
Drive. The Drive panel provides the following information about the drives in a sub-enclosure:
Status
Volume Name
Type
UUID
Serial Number
Model
Firmware Version
Location. This column displays where the drive is located. For example, assume the
location for a drive in the list is Port: 52 Box 1 Bay: 7. To find the drive, go to Bay 7. The port number specifies the switch number and switch port. For port 52, the drive is connected to port 2 on switch 5. For location Port: 72 Box 1, Bay 6, the drive is connected to port 2 on switch 7 in bay 6.
Properties
Fan. The Fan panel provides the following information about the fans in a sub-enclosure:
Status
Type
Name
UUID
Properties
SEP. The SEP panel provides the following information about the storage enclosed processors
in the sub-enclosure:
Status
Type
Name
UUID
Serial Number
Model
Firmware Version
Temperature Sensor. The Temperature Sensor panel provides the following information about
the temperature sensors in the sub-enclosure:
Status
Type
98 Monitoring cluster operations
Page 99
Name
UUID
Properties
Monitoring pools for a storage cluster
The Management Console lists a Pool node for each pool in the storage cluster. Select one of the Pool nodes to display information about that pool.
When you select the Pool node, the following information is displayed in the Pool panel:
Status
Type
Name
UUID
Properties
To obtain details on the volumes in the pool, expand the Pool node and then select the Volume node. The following information is displayed for the volume in the pool:
Status
Type
Name
Monitoring X9720/9730 hardware 99
Page 100
UUID
Properties
The following image shows information for two volumes named LUN_15 and LUN_16 on the Volume panel.
Monitoring storage controllers for a storage cluster
The Management Console displays a Storage Controller node for each storage controller in the storage cluster. Select the Storage Controller node to view the following information for the selected storage controller:
Status
Type
UUID
Serial Number
Model
Firmware Version
Location
Message
Diagnostic Message
Expand the Storage Controller node to obtain information about the battery and IO cache module for the storage controller.
Monitoring the batteries for a storage controller
The Battery panel displays the following information:
Status
Type
UUID
Properties. Provides information about the remaining charge and the charge status.
In the following image, the Battery panel shows the information about a battery that has 100 percent of its charge remaining.
Monitoring the IO Cache Modules for a storage controller
The IO Cache Module panel displays the following information about the IO cache module for a storage controller:
Status
Type
100 Monitoring cluster operations
Loading...