Information furnished in this manual is believed to be accurate and reliable. However, QLogic Corporation assumes no
responsibility for its use, nor for any infringements of patents or other rights of third parties which may result from its use.
QLogic Corporation reserves the right to change product specifications at any time without notice. Applications described
in this document for any of these products are for illustrative purposes only. QLogic Corporation makes no representation
nor warranty that such applications are suitable for the specified use without further testing or modification. QLogic
Corporation assumes no responsibility for any errors that may appear in this document.
No part of this document may be copied nor reproduced by any means, nor translated nor transmitted to any magnetic
medium without the express written consent of QLogic Corporation.
Linux is a registered trademark of Linus Torvalds.
Microsoft and Windows are registered trademarks and Windows Server is a trademark of Microsoft Corporation.
Red Hat and all Red Hat-based trademarks are trademarks or registered trademarks of Red Hat, Inc.
SUSE is a registered trademark of Novell, Inc.
All other brand and product names are trademarks or registered trademarks of their respective owners.
This manual describes installation, configuration and administration task
information for the Fast Fabric Toolset.
This manual is organized as follows:
Section 1 describes the intended audience and technical support.
Section 2 describes the Fast Fabric Toolset.
Section 3describes getting started with Fast Fabric.
Section 4describes the Fast Fabric Textual User Interface (TUI) menu.
Section 5describes the Fast Fabric command tools and test tools.
Section 6describes MPI Sample Applications.
Appendix Apresents the Fast Fabric Quick Install Checklist.
Appendix Bdescribes the Fast Fabric Configuration Files.
Appendix Cprovides information on the configuration of IPoIB name mapping.
Appendix Dprovides information on configuring Multi-Subnet Fabrics.
1.1
Intended Audience
This manual is intended to provide network administrators and other qualified
personnel a reference for installation, configuration and administration task
information for the Fast Fabric toolset.
1.2
License Agreements
Refer to the QLogic Software End User License Agreement for a complete listing
of all license agreements affecting this product.
D000006-000 Rev. A1-1
Page 10
1 – Introduction
Technical Support
1.3
Technical Support
Customers should contact their authorized maintenance provider for technical
support of their QLogic products. QLogic-direct customers may contact QLogic
Technical Support; others will be redirected to their authorized maintenance
provider.
Visit the QLogic support Web site listed in Contact Informationfor the latest firmware
and software updates.
1.3.1
Availability
QLogic Technical Support for products under warranty is available during local
standard working hours excluding QLogic Observed Holidays.
1.3.2
Contact Information
Q
Support HeadquartersQLogic Corporation
4601 Dean Lakes Blvd
Shakopee, MN 55379
USA
QLogic Web Sitewww.qlogic.com
Technical Support Web Sitesupport.qlogic.com
Technical Support Emailsupport@qlogic.com
Technical Training Emailtech.training@qlogic.com
North American Region
Emailsupport@qlogic.com
Phone+1-952-932-4040
Fax+1 952-974-4910
All other regions of the world
QLogic Web Sitewww.qlogic.com
1-2D000006-000 Rev. A
Page 11
2.1
Feature Overview
The Fast Fabric Toolset is designed to both simplify and expedite common
InfiniBand (IB) cluster management tasks. Fast Fabric can assist in generic
management tasks as well as InfiniBand installation, upgrade, configuration and
verification tasks.
❥ Configures Internet Protocol over InfiniBand (IPoIB) IP addresses
Section 2
Fast Fabric Overview
❥ Performs InfiniBand driver upgrades or the installation of additional InfiniBand
drivers
❥ Verifies key fabric installation metrics:
❥ Components in fabric
❥ Link error counters
❥ Link widths and speeds
❥ IB and PCI bus bandwidth
❥ IB end-to-end latency
❥ IPoIB connectivity
❥ Subnet Agent (SA) visibility of all nodes
❥ IB connectivity of all switches and nodes
❥ Aids in diagnosis of fabric problems
❥ Fabric error isolation
❥ Fabric topology analysis
❥ Fabric route analysis
D000006-000 Rev A2-1
Page 12
2 – Fast Fabric Overview
Fast Fabric Architecture
❥ Aids in ongoing fabric status and configuration monitoring
❥ Automated fabric health checks and configuration baseline compare
❥ Automated chassis health checks and configuration baseline compare
❥ Automated SM health checks and configuration baseline compare
❥ Provides tools to accelerate common host administration tasks
❥ Executes commands across many hosts
❥ Copies files to and from many hosts
❥ Edits host-specific files across many hosts
❥ Provides tools to accelerate common chassis and switch administration tasks
❥ Manage firmware levels on switches and chassis
❥ Execute commands across many chassis
❥ Assists in the initial benchmarking and tuning of High Performance Computing
(HPC) fabrics.
Q
Fast Fabric includes both a Textual User Interface (TUI) menu system as well as
command line tools. The TUI presents the menus in a typical order of execution
for a new fabric install, hence simplifying fabric installation for new users. All
operations available in the TUI can also be accomplished via the command line.
The command line tools are designed to permit customer specific scripts to invoke
the command line tools.
2.2
Fast Fabric Architecture
Figure 2-1. Fast Fabric Architecture
2-2D000006-000 Rev A
Page 13
Q
2 – Fast Fabric Overview
Fast Fabric is typically installed on one or more IB Management Nodes. The IB
Management Node must be connected to the rest of the cluster via both InfiniBand
and a management network. The management network may be the primary
InfiniBand network (IPoIB) or Ethernet. The management network will be used for
Fast Fabric host setup and administration tasks. It may also be used for other
aspects of server administration or operation.
Depending on cluster size and design, the IB Management node may also be used
as the master node for starting MPI jobs. It may also be used to run a QLogic Host
SM and other management software. Consult the QLogic SM documentation for
details and what combinations are valid.
Note: When InfiniBand is used as the management network, Fast Fabric will not be
able to install host IB software nor configure IPoIB, however it will be able to support
host IB software upgrades, verification and all the other features of Fast Fabric.
If remote access to Fast Fabric is desired, setup remote access to the IB
Management Node via ssh, telnet, X windows, VNC or any other mechanism which
will allow the remote user to access a Linux Command Line shell. Typically Fast
Fabric is only used by cluster administrators.
2.2.1
How Fast Fabric Works
Fast Fabric consists of a variety of tools to administer hosts, chassis and externally
managed switches. Depending on the tool, the method of accessing and
administering the target devices may differ.
The following methods are used by Fast Fabric:
Login via management networkHost setup and installation,
MPI job startup (can be inband
or via management network)
Typically tools which login to other hosts will do such in a password-less manner
using ssh or telnet (configurable). Tools which login to internally managed chassis
can use ssh or telnet (configurable). Chassis tools can prompt for a single password
for all chassis or can be preconfigured with the password. These approaches permit
Table 2-1. Fast Fabric Methods
MethodExamples
Inband access via IBFabric topology reports, SA
database queries, fabric error
and link speed analysis, tools
for externally managed
switches, etc
tools for internally managed
chassis, etc
Verify MPI performance,
running sample MPI
benchmarks
D000006-000 Rev A2-3
Page 14
2 – Fast Fabric Overview
Fast Fabric Architecture
the tools to operate with minimal user interaction and hence reduce the time to
perform operations against many hosts or chassis.
After initial installation, Fast Fabric can be configured to use IPoIB instead of the
management network.
NOTE:Any reconfigurations that affect IPoIB or involve installing new IB hosts
Q
will not be able to use IPoIB.
2-4D000006-000 Rev A
Page 15
Section 3
Getting Started
Before using the Fast Fabric toolset, the Site Implementation Engineer must perform
the tasks described in the sections which follow. To aid in keeping track of steps
performed a checklist is provided (see appendix A). During the setup procedure,
the Fast Fabric configuration files which must be edited or created are described
throughout the procedure. For more information about the configuration files used
by Fast Fabric see appendix B.
The instructions below describe the basic fabric installation and verification
sequence for a typical single IB subnet fabric. For more information on installation
and verification of multiple IB subnet fabrics, see appendix D.
Some of the tasks are only applicable when Linux is being used. They will be marked
with (Linux). Similarly some of the tasks are only applicable when QuickSilver
Linux IB software is being used on the hosts. Those will be marked with (Host). All
tasks which are applicable only when SilverStorm IB Switches or SilverStorm IB
Chassis are being used will be marked with (Switch). All remaining tasks are
generally applicable to all environments and will be marked with (All).
NOTE:Some of the Linux steps may be applicable to other Unix-like operating
systems if it is desired to enable use of non-IB specific Fast Fabric tools
(such as cmdall) against the given hosts.
3.1
Design the Fabric
Prior to beginning the installation and setup of the fabric, its important to carefully
design and plan the installation. Part of the design plan must include identification
of which servers will be the administration nodes for the cluster and hence where
Fast Fabric will be installed.
For large clusters, cable, power, and cooling plans are very important and must be
carefully considered. These plans drive the ultimate layout of equipment in the
racks. A typical configuration involves leaf switches and servers in the same racks,
with core switches in centrally located racks. This minimizes both cable lengths and
complexity. It is also recommended to place the IB switches at the bottom of a rack.
This allows inter-rack cables to be cleanly routed below the floor (some sites use
cable routing above the racks in which case placing the IB switches near the top of
the rack is recommended).
NOTE:The overall physical design has many complex aspects, such as power,
cooling and rack layout which are beyond the scope of this document.
D000006-000 Rev A3-1
Page 16
3 – Getting Started
Set Up the Fabric
3.2
Set Up the Fabric
1. (All) The first step in any installation is to physically install the hardware:
❥ Servers
❥ Core and leaf InfiniBand switches, such as the SilverStorm 9024 and 9000
Multi-Protocol Fabric Directors (9020, 9040, 9080, 9120 and 9240).
❥ Virtual I/O systems, such as the EVIC and FVIC cards for the SilverStorm
9000 Multi-Protocol Fabric Directors Series.
NOTE:When installing externally managed switches (such as the SilverStorm
2. (All) Within each server a host channel adapter (HCA), such as the QuickSilver
HCA 7000 or 9000 must be installed. Refer to the QuickSilver Fabric Access Quick Start Guide for instructions.
Q
9024-FC switch), take note of the Node GUID. This is typically on a label
on the case of the switch. The Node GUID will be needed later to configure
and manage the switch(es).
3. (All) Prior to installing software, the hardware configuration should be reviewed
to ensure everything was installed according to plan. Later during the
installation Fast Fabric tools may also be used to help verify the installation.
4. (Linux) Install the desired Linux OS version (with the same kernel distribution)
on all hosts. Generally the IB Management node(s) (i.e., the host which will
run Fast Fabric) should have a full install and must include the Tcl, Expect and
TCLx packages. If Redhat Enterprise Server 3 or later is being installed, only
the Tcl and Expect packages are required.
For MPI clusters install the C and Fortran compilers along with their associated
tools on the IB Management node(s).
NOTE:All hosts must have a command-line prompt ending in "# " or "$ ". Make
certain there is a space after either "#" or "$". Such a prompt must be
used for the root user as well as any other user codes the user intends
Fast Fabric to make use of.
NOTE:To simplify the use of Fast Fabric to setup ssh security, it is recommended
to install all servers with the same root password. If desired after ssh has
been setup using Fast Fabric, the user may change the root passwords.
NOTE:Consult the QuickSilver Fabric Access Linux Host Release Notes for a
list of supported OS versions.
5. (Linux) Enable remote login as root to each host:
In order for Fast Fabric to manage the hosts, the IB Management Node must
be able to securely login as root to each host. This can be accomplished using
either ssh or rsh. SSH is recommended due to its higher level of security. If
3-2D000006-000 Rev A
Page 17
Q
3 – Getting Started
ssh is used, no additional manual steps are require at this stage (typically Linux
OS installation will enable ssh)
Alternatively, if its desired to use rsh during fabric installation and/or operation,
the following steps must be performed on each node such that the IB
Management Node can login using rsh as user root.
a. Each node must be configured such that the IB management node can rsh
into it. The IB management node must also be able to rsh into itself.
Typically this requires that a .rhosts file be created in /root such as:
<mgmthost name> root
<mgmthost name.domain name> root
localhost root
<mgmthost IP address>
where mgmthost is the network name of the IB Management Node and
domain is the network domain name of the master. The .rhosts file must
have permissions of 640. Also, rsh should be enabled on each node.
Enable rsh by editing the /etc/xinetd.d/rsh file and setting:
disable=no
This can also be accomplished using:
chkconfig rsh on
Also enable rexec and rlogin using the above steps.
b. Execute mv /etc/securetty /etc/securetty.bak
6. (All) TCP/IP Host Name resolution:
Fast Fabric and TCP/IP will need to resolve hostnames to Management
Network and/or IPoIB IP addresses. If the management network is not IPoIB,
each host will need both a management network name and an IPoIB network
name. In which case, a recommended convention is to use the actual hostname
as the management network name and <HOSTNAME>-ib as the IPoIB network
name (where <HOSTNAME> is the management network name of the given
host)
Typically name resolution is accomplished by configuring a DNS server on the
management network with both management network and IPoIB addresses for
each host (and QLogic internally managed IB chassis). Alternately a /etc/hosts
file may be created on the IB Management node. Fast Fabric can then
propagate this /etc/hosts file to all the other hosts.
If using the /etc/hosts approach:
On the master node, add all the Ethernet and IPoIB addresses into the
/etc/hosts file. For the IPoIB convention, use <HOSTNAME>-ib. The
localhost line should not be edited.
D000006-000 Rev A3-3
Page 18
3 – Getting Started
Using Fast Fabric
Q
The /etc/hosts file should not have any node-specific data (the following
section will step through the task of copying this file to all the nodes).
If using DNS:
Consult the documentation for the DNS server being used. Make sure to edit
the /etc/resolv.conf configuration on the IB Management Node to use the
proper DNS server. Consult the Linux OS documentation for more information
on configuring /etc/resolv.conf. This file is typically configured during OS
installation.
If /etc/resolv.conf must be manually configured for each host, Fast Fabric can
aid in copying this to all the hosts. In which case, the /etc/resolv.conf file
created on the IB Management Node must not have any node-specific data
and must be appropriate for use on all hosts. A later section will step through
the task of copying this file to all the nodes.
7. (All) NTP setup - it is recommended to configure an NTP server for the cluster
and have all the hosts and Internally-Managed chassis synchronize their clocks
with the NTP server. Consult the Linux OS documentation for information on
how to configure NTP servers and clients.
8. (All) On the IB Management node, install the Fabric Access Software using the
procedure documented in the Fabric Access Software Users Guide. The IB
Management Node must have at least Fast Fabric, the IB Stack and IPoIB
installed and configured. For MPI clusters running the QuickSilver Host stack,
the IB Management Node should also include the MPI Runtime and MPI
Development packages, and if the user desires to rebuild MPI itself, the IB
Development package and MPI Source packages will also be required.
After completing the install, reboot the IB Management node.
NOTE:When managing a cluster where compute nodes are not running the
3.3
Using Fast Fabric
The initial installation and verification process is best performed using the Fast
Fabric TUI menu system. The main menu can be invoked using the iba_config
command. The main menu is as follows:
QuickSilver host stack or where the IPoIB settings on the compute nodes
are incompatible with the IB Management node (for example when a 4K
MTU is used on the compute nodes), it is recommended not to run IPoIB
on the IB management nodes.
3-4D000006-000 Rev A
Page 19
Q
3 – Getting Started
SilverStorm Technologies Inc. InfiniBand 4.1.1.0.15 Software
1) Show Installed Software
2) Reconfigure IP over IB
3) Reconfigure Driver Autostart
4) Update HCA Firmware
5) Generate Supporting Information for Problem Report
6) Host Setup via Fast Fabric
7) Host Admin via Fast Fabric
8) Chassis Admin via Fast Fabric
9) Externally Managed Switch Admin via Fast Fabric
a) Uninstall Software
X) Exit
In the above menu, items 6-9 represent the Fast Fabric menus. The operation of
this menu is the same as the INSTALL and iba_config functions documented in the
QuickSilver Fabric Access Users Guide. Pressing a key 1-9 or a will invoke the
given submenu. Pressing X will exit the menu system.
Selection of a Fast Fabric menu (6-9) will present a submenu similar to the following:
SilverStorm Technologies Inc. IB Host Setup Menu (4.1.1.0.15)
Fast Fabric Host List: /etc/sysconfig/iba/hosts
0) Edit Config and Select/Edit Hosts Files [Perform]
1) Verify Hosts via Ethernet ping [Perform]
2) Verify rsh/rcp Configured [ Skip ]
3) Setup Password-less ssh/scp [Perform]
4) Copy /etc/hosts to all hosts [ Skip ]
5) Show uname -a for all hosts [Perform]
6) Install/Upgrade InfiniServ Software [Perform]
7) Configure IPoIB IP Address [Perform]
8) Build MPI Test Apps and Copy to Hosts [Perform]
9) Reboot Hosts [Perform]
a) Refresh ssh Known Hosts [Perform]
b) Rebuild MPI Library and Tools [ Skip ]
c) Run a command on all hosts [ Skip ]
d) Copy a file to all hosts [ Skip ]
e) View ibtest result files [ Skip ]
P) Perform the selected actions
N) Select None
X) Return to Previous Menu (or ESC)
The submenus typically present operations in the typical order they would be used
during an installation. Pressing the keys corresponding to menu items (0-e in the
example above) will toggle the Skip/Perform selection for the given item. As shown
in the example above, more than 1 item may be selected. Once the desired set of
D000006-000 Rev A3-5
Page 20
3 – Getting Started
Installing and Verifying Firmware on the SilverStorm IB Chassis
items have been selected, press P. To unselect all items, press N. Pressing X or
ESC will exit this menu and return to the Main Menu.
If more than 1 item is selected, the items will be performed in the order shown in
the menu. This is the typical order desired during fabric setup. If it's desired to
perform items in a different order, select a single item and press P to perform it by
itself. Then repeat. An opportunity will be presented after each item to abort:
Hit any key to continue (or ESC to abort)...
If ESC is pressed, the sequence of operations will be aborted and return to the
previous menu. Any other key will result in the next selected menu item being
performed. This prompt is also shown after the last selected item completes, hence
permitting an opportunity to review the results before the screen is cleared to display
the menu.
At the top of each Fast Fabric menu, the file listing the components to operate on
is shown. For example:
Fast Fabric Host List: /etc/sysconfig/iba/hosts
On each Fast Fabric menu, item 0 will permit a different file to be selected and will
permit the editing of the file (using the editor selected via the EDITOR environment
variable). In addition it will also permit review and editing of the fastfabric.conf
file. The fastfabric.conf file guides the overall configuration of Fast Fabric
and describes cluster specific attributes of how Fast Fabric will operate. It is
discussed in greater detail in appendix B.
Q
During the execution of each menu selection, the actual Fast Fabric command line
tool being used will be shown. This can be used as an educational aid to learn the
tools.
3.4
Installing and Verifying Firmware on the SilverStorm IB Chassis
If the fabric contains SilverStorm 9000 series internally-managed IB switches, Fast
Fabric may be used to aid the installation and configuration of the switches.
Prior to using Fast Fabric the following minimal steps need to be performed:
1. (Switch) Connect each SilverStorm chassis to the management network via
its Ethernet management port. Chassis with redundant management should
have both Ethernet management ports connected.
2. (Switch) Assign each SilverStorm chassis a unique IP address and
appropriately configure the chassis Ethernet management port network
settings.
3. (Switch) Select a unique name which will be used for each SilverStorm Chassis.
This name should be configured in DNS or /etc/hosts as the TCP/IP name for
the chassis Ethernet management port. In addition this should be configured
as the IB Node Description for the chassis via the chassis GUI or CLI.
3-6D000006-000 Rev A
Page 21
Q
3 – Getting Started
a. When Virtual I/O controllers (VIC) are installed in a chassis, each VIC should
also be assigned a unique name.
4. (Switch) Configure the administrator password on each SilverStorm Chassis
NOTE:Newer versions of SilverStorm chassis firmware permit SSH keys to be
configured within the chassis for secure password-less login. In this case
it is recommended to configure SSH keys in the chassis at this point such
that the IB Management Node can login as admin without a password.
NOTE:When using versions of SilverStorm chassis firmware that do not support
SSH keys, to simplify the use of Fast Fabric it is recommended to install
all chassis with the same admin password.
5. (Switch) Mount or copy the relevant chassis firmware CD(s) or files onto the
Fast Fabric management node. During the steps below the *.pkg files on the
CD will be used to upgrade the firmware on each chassis.
NOTE:When copying files its best to place all files at a given firmware level into
a single directory whose name indicates the firmware revision number
Once the above steps have been completed, additional setup of the Chassis may
be performed using Fast Fabric.
1. (Switch) Select the "Chassis Admin via Fast Fabric" option from the main menu.
2. (Switch) Select the items shown as "Perform" in the menu below and press
the P key to perform them:
SilverStorm Technologies Inc. IB Chassis Admin Menu (4.1.1.0.15)
Fast Fabric Chassis List: /etc/sysconfig/iba/chassis
0) Edit Config and Select/Edit Chassis Files [Perform]
1) Verify Chassis via Ethernet ping [Perform]
2) Update Chassis Firmware [Perform]
3) Show Status of Chassis IB Ports [ Skip ]
4) Reboot Chassis [ Skip ]
5) Generate all Chassis Problem Report Info [ Skip ]
6) Run a command on all chassis [ Skip ]
7) View ibtest result files [ Skip ]
P) Perform the selected actions
N) Select None
X) Return to Previous Menu (or ESC)
3. (Switch) "Edit Config and Select/Edit Chassis Files" will permit the chassis and
fastfabric.conf files to be edited. When placed in the editor for fastfabric.conf,
review all the settings. Especially review the FF_CHASSIS_LOGIN_METHOD
D000006-000 Rev A3-7
Page 22
3 – Getting Started
Installing and Verifying Firmware on the SilverStorm IB Chassis
and FF_CHASSIS_ADMIN_PASSWORD. Consult appendix B for more
information about fastfabric.conf.
NOTE:Fast Fabric will provide the opportunity to enter the chassis password
interactively when needed. Hence it's not necessary to place it within
fastfabric.conf. If it is desired to instead keep the QLogic Chassis admin
password in fastfabric.conf, its recommended to change the
fastfabric.conf permissions to be 0x600 (eg. root only access).
NOTE:Newer versions of chassis firmware permit ssh keys to be configured
within the chassis for secure password-less login. In which case there is
no need to configure a FF_CHASSIS_ADMIN_PASSWORD and
FF_CHASSIS_LOGIN_METHOD can be ssh. Consult the SilverStorm 9000 Users Guide for more information.
When placed in the editor for chassis, create the file with a list of the chassis names
(the TCP/IP Ethernet management port names assigned above) or IP addresses
(Use of names is recommended). One entry per line. Such as:
Chassis1
Chassis2
Q
NOTE:Do not list externally managed switches, such as the SilverStorm 9024FC
switches in this file. Those will be covered in the next section.
For further details about the file format refer to the section “Selection of Chassis”
on page 5-4
4. (Switch) "Verify Chassis via Ethernet ping" will ping each selected chassis over
the management network. If all chassis were found, continue to the next step.
If some chassis were not found, abort out of the menu and review the following
for those chassis which were not found:
❥ Is chassis powered on and booted
❥ Is chassis connected to management network
❥ Is chassis IP address and network settings consistent with DNS or /etc/hosts
❥ Is Management node connected to the management network
❥ Are Management node IP address and network settings correct
❥ Is the management network itself up (switches, routers, etc)
❥ Is correct set of chassis listed in the chassis file (the previous step may be
repeated to review and edit the file as needed)
5. (Switch) "Update Chassis Firmware" will permit the chassis firmware version
to be verified and updated as needed.
3-8D000006-000 Rev A
Page 23
Q
3 – Getting Started
NOTE:The chassis must be running firmware version 4.0.0.4.3 or later to perform
this function. If the chassis is not up to this level, it will need to be manually
updated via the chassis GUI. See the SilverStorm 9000 Users Guide for
more information.
NOTE:Consult the relevant chassis firmware release notes to ensure any
prerequisites for the upgrade to the new firmware level have been met
prior to performing the upgrade via Fast Fabric.
When prompted:
Multiple Firmware files and/or Directories may be space
separated
Shell wildcards may be used
For Directories all .pkg files in the directory tree will be used
Enter Files/Directories to use (or none):
specify the directory where the relevant firmware files have been stored. This can
be the mount point of the CD or the directory to which the files were copied in a
previous step.
Since the fabric is not yet operational, it's recommended to answer "y" to:
Would you like to run the firmware now? [n]:
Fast Fabric will ensure that all chassis are running the firmware level provided and
install and/or reboot each chassis as needed
If any chassis fails to be updated, use the "View ibtest result files" option to review
the result files from the update. Refer to the section “Interpreting the ibtest log files”
on page 5-68for more details.
6. (Switch) If there are any other operations which need to be performed on all
chassis, they may be performed using the "Run a command on all chassis"
option.Each time this is executed a single chassis CLI command may be
specified to be executed against all selected chassis. Using such commands,
additional setup or verification of the chassis may be performed.
3.5
Installing and Configuring the Subnet Manager
(All) At this point the subnet manager (SM) for the fabric must be installed or
enabled. Consult the QuickSilver Fabric Manager and Fabric Viewer Users Guide
for information on how to install, enable and configure the SM.
When using the QuickSilver host-based SM, a typical installation will place Fast
Fabric and the host SM on the same IB Management Node. If desired, it is also
valid to place Fast Fabric on its own independent management node, perhaps along
with other 3rd party management applications (such as MPI job schedulers, etc).
D000006-000 Rev A3-9
Page 24
3 – Getting Started
Installing and Verifying Firmware on the IB Switches
The steps which follow will require that an SM be operational within the fabric.
3.6
Installing and Verifying Firmware on the IB Switches
If the fabric contains SilverStorm 9024FC series externally managed switches, Fast
Fabric may be used to aid the installation and configuration of the switches.
Prior to using Fast Fabric the following minimal steps need to be performed:
1. (Switch) Select a unique name which will be used for each Switch. This name
will be configured as the IB Node Description for the switch in the steps below.
NOTE:Externally managed switches do not have an Ethernet port and hence
will not have a TCP/IP name.
2. (Switch) Mount or copy the relevant switch firmware CD(s) or files onto the
Fast Fabric management node. During the steps below the *.emfw files on the
CD will be used to upgrade the firmware on each switch.
NOTE:When copying files its best to place all files at a given firmware level into
a single directory whose name indicates the firmware revision number
Q
Once the above steps have been completed, additional setup of the switches may
be performed using Fast Fabric.
3. (Switch) Select the "Externally Managed Switch Admin via Fast Fabric" option
from the main menu.
4. (Switch) Select the items shown as "Perform" in the menu below and press
the P key to perform them:
SilverStorm Technologies Inc. IB Switch Admin Menu (4.1.1.0.15)
Fast Fabric Externally Managed Switch List:
/etc/sysconfig/iba/ibnodes
0) Edit Config and Select/Edit Switch Files [Perform]
1) Verify Switch via Firmware dump [ Skip ]
2) Update Switch Firmware [Perform]
3) Reboot Switch [ Skip ]
4) View ibtest result files [ Skip ]
P) Perform the selected actions
N) Select None
X) Return to Previous Menu (or ESC)
5. (Switch) "Edit Config and Select/Edit Switch Files" will permit the ibnodes and
fastfabric.conf files to be edited. When placed in the editor for fastfabric.conf,
3-10D000006-000 Rev A
Page 25
Q
3 – Getting Started
review all the settings. Refer to appendix B for more information about
fastfabric.conf.
When placed in the editor for ibnodes, create the file with a list of the switch node
guids and desired switch names, one entry per line. Such as:
0x00066a00d9000138,edge1
0x00066a00d9000139,edge2
NOTE:Do not list Internally managed chassis, such as the SilverStorm 9000
chassis in this file. Those were covered in the previous section.
NOTE:If the IB path from the IB Management node to other switch nodes is
through a 9024FC which is to be updated, the ibnodes file should omit
that 9024FC switch at this time. Otherwise the reboot for the 9024FC in
the path could disrupt the updates of other switches.
For further details about the file format, refer to section “Selection of Switches” on
page 5-7
If needed, a SA query such as the following can be used to get a list of all switches,
however this will include both internally and externally managed switches and hence
the output must be edited to leave only the SilverStorm externally-managed
switches:
saquery -t sw -o nodeguid
6. (Switch) "Update Switch Firmware" will permit the switch firmware version to
be updated and the switch node name set.
NOTE:Consult the relevant switch firmware release notes to ensure any
prerequisites for the upgrade to the new firmware level have been met
prior to performing the upgrade via Fast Fabric.
When prompted:
Multiple Firmware files and/or Directories may be space
separated
Shell wildcards may be used
For Directories all .emfw files in the directory tree will be used
Enter Files/Directories to use (or none):
specify the directory where the relevant firmware files have been stored. This can
be the mount point of the CD or the directory to which the files were copied in a
previous step.
Since the fabric is not yet operational, it's recommended to answer "y" to:
Would you like to run the firmware now? [n]:
Fast Fabric will update the firmware on all switches and set the node names as per
the ibnodes file created in a previous step. Each switch will then be rebooted.
D000006-000 Rev A3-11
Page 26
3 – Getting Started
Installing InfiniBand on the Remaining Servers
If any switch fails to be updated, use the "View ibtest result files" option to review
the result files from the update. Refer to the section “Interpreting the ibtest log files”
on page 5-68 for more details.
If some switches were not found review the following for those switches which were
not found:
❥ Is switch powered on
❥ Is switch connected to IB network
❥ Is Management node connected to the IB network
❥ Is the SM running on the IB network
❥ Is correct set of switches listed in the ibnodes file (the previous step may be
repeated to review and edit the file as needed)
7. (Switch) If any 9024FC switches were skipped above in step 5 and 6, these
steps should be repeated for those switches. In this case it is recommended
to create a separate file with a name other than ibnodes. An alternate name
may be specified at the prompt:
Q
Select Switch File to Use/Edit
[/etc/sysconfig/iba/ibnodes]:
3.7
Installing InfiniBand on the Remaining Servers
Fast Fabric may now be used to install and configure the remaining hosts and verify
overall operation of the fabric.
1. (Linux) Select the "Host Setup via Fast Fabric" option from the main menu.
2. Select the items shown as "Perform" in the menu below and press the P key
to perform them:
3-12D000006-000 Rev A
Page 27
Q
3 – Getting Started
SilverStorm Technologies Inc. IB Host Setup Menu (4.1.1.0.15)
Fast Fabric Host List: /etc/sysconfig/iba/hosts
0) Edit Config and Select/Edit Hosts Files [Perform]
1) Verify Hosts via Ethernet ping [Perform]
2) Verify rsh/rcp Configured [ Skip ]
3) Setup Password-less ssh/scp [Perform]
4) Copy /etc/hosts to all hosts [ Skip ]
5) Show uname -a for all hosts [Perform]
6) Install/Upgrade InfiniServ Software [Perform]
7) Configure IPoIB IP Address [Perform]
8) Build MPI Test Apps and Copy to Hosts [Perform]
9) Reboot Hosts [Perform]
a) Refresh ssh Known Hosts [ Skip ]
b) Rebuild MPI Library and Tools [ Skip ]
c) Run a command on all hosts [ Skip ]
d) Copy a file to all hosts [ Skip ]
e) View ibtest result files [ Skip ]
P) Perform the selected actions
N) Select None
X) Return to Previous Menu (or ESC)
NOTE:If passwordless root login via rsh is to be used during fabric setup and
operation, also select "Verify rsh/rcp Configured". However it is instead
recommended that ssh be used, in which case this step can be skipped.
NOTE:If /etc/hosts will be used for name resolution (as opposed to using DNS),
also select "Copy /etc/hosts to all hosts"
3. (All) "Edit Config and Select/Edit Hosts Files" will permit the hosts and
fastfabric.conf files to be edited. When placed in the editor for fastfabric.conf,
review all the settings. Especially review the FF_IPOIB_SUFFIX,
.ff_host_basename_to_ipoib, ff_host_basename, FF_IPOIB_NETMASK,
FF_PRODUCT, FF_PACKAGES, FF_INSTALL_OPTIONS,
FF_UPGRADE_OPTIONS and FF_ALL_ANALYSIS. Consult appendix B for
more information about fastfabric.conf.
NOTE:During setup of passwordless ssh, Fast Fabric will provide the opportunity
to enter the host root password interactively when needed. Therefore, it
is recommended not to place it within fastfabric.conf. If it is desired to
instead keep the root password for the hosts in fastfabric.conf, its
recommended to change the fastfabric.conf permissions to be 0x600 (eg.
root only access).
D000006-000 Rev A3-13
Page 28
3 – Getting Started
Installing InfiniBand on the Remaining Servers
When placed in the editor for hosts, create the file with a list of the hosts names
(the TCP/IP management network names) except the IB Management node from
which Fast Fabric is presently being run, one entry per line. Such as:
host1
host2
NOTE:Do not list the IB Management Node itself (i.e., the node where Fast
Fabric is currently running).
If multiple IB Management Nodes are to be used, they may be listed at this time
and Fast Fabric can aid in their initial installation and verification.
For further details about the file format, refer to the section “Selection of Hosts” on
page 5-3.
4. (All) "Verify Hosts via Ethernet ping" will ping each selected host over the
management network. If all hosts were found, continue to the next step. If
some hosts were not found, abort out of the menu and review the following for
those hosts which were not found:
Q
❥ Is host powered on and booted
❥ Is host connected to management network
❥ Is host management network IP address and network settings consistent with
DNS or /etc/hosts
❥ Is Management node connected to the management network
❥ Are Management node IP address and network settings correct
❥ Is the management network itself up (switches, routers, etc)
❥ Is correct set of hosts listed in the hosts file (the previous step may be repeated
to review and edit the file as needed)
5. (Linux) "Verify rsh/rcp Configured" will confirm that passwordless rsh/rcp is
properly configured such that the IB Management Node can access all the other
hosts.
NOTE:It is recommended that ssh be used instead in which case this step may
be skipped
6. (Linux) "Setup Password-less ssh/scp" will setup secure password-less ssh
such that the IB Management Node can securely login to all the other hosts as
root via the management network without requiring a password.
Password-less ssh is required by Fast Fabric, MPI test applications and most
versions of MPI (including QuickSilver MPI).
7. (Linux) "Copy /etc/hosts to all hosts" will copy the /etc/hosts file on this host to
all the other selected hosts.
3-14D000006-000 Rev A
Page 29
Q
3 – Getting Started
NOTE:If DNS is being used, this step should be skipped.
NOTE:Typically, /etc/resolv.conf is setup as part of OS installation for each host.
However, if /etc/resolv.conf was not setup on all the hosts during OS
installation, the Fast Fabric "Copy a file to all hosts" operation could be
done at this time to copy /etc/resolv.conf from the IB Management Node
to all the other nodes.
8. (Linux) "Show uname -a for all hosts" will show the OS version on all the hosts.
Review the results carefully to verify all the hosts have the expected OS version.
In typical clusters all hosts will be running the same OS and kernel version.
If any hosts are identified with an incorrect OS version, the OS on those hosts should
be corrected at this time and operation of this sequence should be aborted when
prompted. As necessary all the preceding setup steps should then be repeated for
those hosts (there is no harm in repeating them for all the hosts).
9. (Host) "Install/Upgrade InfiniServ Software" will install the IB software on all
the hosts. By default it will look in the current directory for the
$FF_PRODUCT.<VERSION>.tgz file. If it is not found in the current directory,
it will prompt for input of a directory name where this file can be found.
When prompted, select to do an initial installation as follows:
Would you like to do an upgrade install? [y]: n
Would you like to do an initial install/load? [n]: y
NOTE:An initial installation will uninstall any existing InfiniServ software on the
selected hosts. An upgrade install is not appropriate at this step.
If any hosts fail to be installed, use the "View ibtest result files" option to review the
result files from the update. For more details, see “Interpreting the ibtest log files”
on page 5-68.
10. (Host) "Configure IPoIB IP Address" will create the ifcfg-ib1 files on each host.
The file will be created with a statically assigned IP address. The IPoIB IP
address for each host will be determined by the resolver (Linux host command).
If not found via the resolver, /etc/hosts on the given host will be checked.
11. (Host) "Build MPI Test Apps and Copy to Hosts" will build the MPI sample
benchmarks on the IB Management Node and copy the resulting object files to
all the hosts. This is in preparation for execution of MPI performance tests and
benchmarks in a later step.
12. (Linux) "Reboot Hosts" will reboot all the selected hosts and ensure they go
down and come back up (as verified via ping over the management network).
When the hosts come back up, they will be running the IB software installed.
13. (Linux) If there are any other setup operations which need to be performed on
all hosts, they may be performed using the "Run a command on all hosts" option.
D000006-000 Rev A3-15
Page 30
3 – Getting Started
Verifying InfiniBand on the Remaining Servers
Each time this is executed a Linux shell command (or sequence of commands
separated by semicolons) may be specified to be executed against all selected
hosts.
NOTE:It is recommended at this time to run the "date" command to verify the
the date and time is consistent on all hosts. If needed "Copy a file to all
hosts" may be used to copy the appropriate files to all hosts to enable
and configure NTP.
3.8
Verifying InfiniBand on the Remaining Servers
Upon completion of the preceeding sections, the hosts are all booted, installed and
operational. The subsequent steps will verify the operation of the hosts and fabric.
1. (All) Select the "Host Admin via Fast Fabric" option from the main menu.
2. Select the items shown as "Perform" in the menu below and press the P key
to perform them:
Q
SilverStorm Technologies Inc. IB Host Admin Menu (4.1.1.0.15)
Fast Fabric Host List: /etc/sysconfig/iba/allhosts
0) Edit Config and Select/Edit Hosts Files [Perform]
1) Verify Hosts via Ethernet ping [Perform]
2) Summary of Fabric Components [Perform]
3) Show Status of Host IB Ports [ Skip ]
4) Verify Hosts see each other [Perform]
5) Verify Hosts ping via IPoIB [Perform]
6) Refresh ssh Known Hosts [Perform]
7) Check MPI Performance [Perform]
8) Generate all Hosts Problem Report Info [ Skip ]
9) Run a command on all hosts [ Skip ]
a) View ibtest result files [ Skip ]
P) Perform the selected actions
N) Select None
X) Return to Previous Menu (or ESC)
3. (All) "Edit Config and Select/Edit Hosts Files" will permit the hosts and
fastfabric.conf files to be edited. When placed in the editor for fastfabric.conf,
review all the settings. Especially review the FF_IPOIB_SUFFIX,
ff_host_basename_to_ipoib, and ff_host_basename. Consult appendix B for
more information about fastfabric.conf.
When placed in the editor for allhosts, create the file with the IB Management
node's hosts names (the TCP/IP management network names) (shown as
3-16D000006-000 Rev A
Page 31
Q
3 – Getting Started
mgmthost below for example) and include the hosts file previously created, one
entry per line. Such as:
mgmthost
include /etc/sysconfig/iba/hosts
For further details about the file format refer to section “Selection of Hosts” on
page 5-3.
4. (All) "Verify Hosts via Ethernet ping" will ping each selected host over the
management network. If all hosts were found, continue to the next step. If
some hosts were not found, abort out of the menu and review the following for
those hosts which were not found:
❥ Is the host powered on and booted
❥ Is the host connected to management network
❥ Is the host management network IP address and network settings consistent
with DNS or /etc/hosts
❥ Is the Management node connected to the management network
❥ Are the Management node IP address and network settings correct
❥ Is the management network itself up (switches, routers, etc)
❥ Is the correct set of hosts listed in the hosts file (the previous step may be
repeated to review and edit the file as needed)
5. (All) "Summary of Fabric Components" will provide a brief summary of the
counts of components in the fabric including how many switch chips, hosts, and
links are in the fabric. It will also indicate if any 1x links were found (which could
indicate a poorly seated or bad cable). Review the results against the expected
configuration of the cluster.
NOTE:The link count includes some internal links within the switch boxes. This
means that the count displayed will be greater than the actual number of
cables.
If components are missing or 1x links are found, they should be corrected.
Subsequent steps will aid in locating any 1x links.
6. (Host) If desired "Show Status of Host IB Ports" will allow the state and symbol
error counts of all ports to be manually reviewed.
Instead it is recommended to run:
iba_report -i 10 -o errors -o slowlinks
on the IB Management node. This will check all the ports in the fabric for any
links which have high error rates or are running at a lower speed than expected.
Any such identified links should be diagnosed and corrected.
D000006-000 Rev A3-17
Page 32
3 – Getting Started
Complete Installation of additional IB Management Nodes
7. (Host) "Verify Hosts see each other" will verify that each host can see all the
others via queries to the Subnet Administrator and the SA replica on each host
has been fully populated.
8. (Host) "Verify Hosts ping via IPoIB" will verify that IPoIB is properly configured
and running on all the hosts. This is accomplished via the IB management
node pinging each host via IPoIB.
9. (Linux) "Refresh ssh Known Hosts" will refesh the ssh knowhosts file on the
IB management node to include the IPoIB hostnames of all the hosts
10. (Host) "Check MPI Performance" will do a quick check of PCI and MPI
performance.
This displays the MPI latency and bandwidth between pairs of hosts (1-2, 3-4,
5-6, etc). The results are also written to the test.res file which may be viewed
via the "View ibtest result files". Refer to the section “Interpreting the ibtest log
files” on page 5-68 for more details.
The numbers reported should be checked against the practical PCI speeds in
the Performance Impact table below. If any pairs are not in the expected
performance range, carefully examine the two hosts involved to verify that the
PCI slot used, BIOS settings and any motherboard jumpers related to devices
on PCI buses or slot speeds. Also verify HCA and riser cards are properly seated.
Q
Table 3-1. Performance Impact
PCI SpeedTheoretical MaxPractical Bandwidth
1331024MB/sec800-900 MB/sec
100770MB/sec600-680 MB/sec
66512MB/sec400-450 MB/sec
3.9
Complete Installation of additional IB Management Nodes
If the fabric is to have more than one IB Management Node, the setup of the
additional management nodes may be completed now. The previous steps will
have performed basic software installation, setup and verification on those nodes.
Now the management software itself must be installed and configured.
NOTE:The steps below assume a symmetrical configuration where all IB
management nodes have the same connectivity and capabilities. In
assymetrical configurations where the IB management nodes are not all
connected to the same set of management networks and IB subnets, the
files copied to each management node may need to be slightly different.
For example configuration files for fabric_analysis may indicate different
port numbers, or host files used for FastFabric and MPI may need to list
different hosts. For multiple subnet configurations, refer to “Multi-Subnet
Fabrics” on page D-1.
3-18D000006-000 Rev A
Page 33
Q
3 – Getting Started
Repeat the following steps on each additional IB Management Node:
1. (All) Install the additional Fabric Access Software components using the
procedure documented in the Fabric Access Software Users Guide. The IB
Management Node must have at least Fast Fabric, the IB Stack and IPoIB
installed and configured. For MPI clusters the IB Management Node should
also include the MPI Runtime and MPI Development packages, and if the user
desires to rebuild MPI itself, the IB Development package and MPI Source
packages will also be required.
NOTE:Do not uninstall or replace existing configuration files which were previous
created, especially IPoIB related configuration files.
2. (All) Copy the Fast Fabric configuration files from the initial IB Management
Node. At least the following files should be copied:
/etc/sysconfig/fastfabric.conf
/etc/sysconfig/iba/hosts
/etc/sysconfig/iba/allhosts
/etc/sysconfig/iba/ibnodes
/etc/sysconfig/iba/chassis
After copying the files, edit the hosts and allhosts files such that the file on each
IB Management Node omits itself from the hosts files (but lists all other IB
Management Nodes) and specifies itself in the allhosts file.
See appendix B for a complete list of Fast Fabric configuration files.
3. (Linux) Perform "Setup Password-less ssh/scp" in the "Host Setup via Fast
Fabric" menu and "Refresh ssh Known Hosts" in the "Host Admin via Fast
Fabric" menu.
3.10
Configure and Initialize Health Check Tools
For more information on the health check tools, see the detailed discussion in
“Health Check and Baselining Tools” on page 5-69. The Health check tools may
be run on one or more IB management nodes within the cluster. This procedure
should be followed on each IB management node from which the health check tools
will be used.
1. (All) Edit fastfabric,conf and review the following parameters:
FF_ANALYSIS_DIR, FF_ALL_ANALYSIS, FF_FABRIC_HEALTH,
FF_CHASSIS_CMDS,_FF_CHASSIS_HEALTH, and FF_ESM_CMDS.
FF_ALL_ANALYSIS should be updated to reflect the type of SM (esm or
hostsm).
2. (All) If using Embedded SM(s) in QLogic IB Chassis, create
/etc/sysconfig/iba/esm_chassis listing the chassis which are running SMs.
D000006-000 Rev A3-19
Page 34
3 – Getting Started
Running HPL
Q
Create the file with a list of the chassis names (the TCP/IP Ethernet management
port names assigned above) or IP addresses (Use of names is recommended).
One entry per line. Such as:
Chassis1
Chassis2
For further details about the file format refer to the section “Selection of Chassis”
on page 5-4.
3. (All) Perform a health check using: all_analysis -e. If any errors are
encountered resolve the errors and rerun all_analysis -e until a clean run
occurs.
4. (All) Create a cluster configuration baseline using: all_analysis -b
5. (All) If desired, schedule regular runs of all_analysis via cron or other
mechanisms. Consult the Linux OS documentation for more information on
cron. Also consult the section “Health Check and Baselining Tools” on
page 5-69 for more information about all_analysis and its automated use.
3.11
Running HPL
As part of the installation process, a set of common MPI benchmarks have been
installed. One of the more popular measures of overall performance is HPL. This
is the application used to rate systems on the Top 500 list. The steps allow some
initial runs of HPL to be made and provide some initial baseline numbers. The
defaults provided should perform within 10%-20% of optimal HPL results for the
cluster. Tuning for that additional 10%-20% is beyond the scope of this document.
1. (Host) To run HPL, first select a configuration file appropriate to your cluster.
2. (Host) Now create the file /opt/iba/src/mpi_apps/mpi_hosts listing the host
3. (Host) Now run HPL:
It is best to start with a small configuration to verify HPL has been properly
compiled:
a. cd /opt/iba/src/mpi_apps
b. /config_hpl 2t
will configure a two process test run of HPL.
names of all the hosts. Depending of your selection of
VIADEV_PATH_METHOD in /opt/iba/src/mpi_apps/mpi.param.hpl the user
can specify Ethernet or IPoIB host names. The default config will allow either.
./run_hpl 2
Since this is a very small problem size the performance of the run will be much
lower than the potential of the machine. So do not worry about performance,
just whether or not the run was successful.
3-20D000006-000 Rev A
Page 35
Q
3 – Getting Started
At this point the user is ready to move onto full scale HPL runs. Assorted sample
HPL.dat files are provided in /opt/iba/src/mpi_apps/hpl-config. These files are a
good starting point for most clusters and should get within 10-20% of the optimal
performance for the cluster. The problem sizes used assume a cluster with 1GB of
physical memory per processor (e.g., for a 2 processor node, 2GB of node memory
is assumed). For each cluster size, 4 files are provided:
t - a very small test run (5000 problem size)
s - a small problem size on the low end of optimal problem sizes
m - a medium problem size
l - a large problem size
These can be selected using config_hpl. The following command displays the
preconfigured problem sizes available:
./config_hpl
For example, to do a small run for a 256 processor cluster (i.e., 128 nodes of dual
CPU systems):
1. Type ./config_hpl 256s and press Enter.
2. Type ./run_hpl 256 and press Enter.
During these runs the user should use top on a node to monitor memory and CPU
usage. The xhpl should use 98-99% of the CPU. If any other processes are taking
more than 1-2%, review the host configuration and stop these extra processes if
possible. HPL is very sensitive to swapping. If a lot of swapping is seen, and xhpl
is dropping below 97% for long durations, this may indicate a problem size that is
too large for the memory and OS configuration.
At this point the user can continue to tune HPL to refine performance. Parameters
in /opt/iba/src/mpi_apps/mpi.param.hpl and in HPL.dat can all affect HPL
performance. In addition the selection of compiler and BLAS Math library may also
significantly affect performance. The new HPL.dat files may be placed in
/opt/iba/src/mpi_apps/hpl-config and use config_hpl to select them and copy them
to all nodes in the run. Alternately scpall may be used to copy the file to all nodes.
Refer to the section “Basic Setup and Administration Tools” on page 5-11 for more
information on scpall.
3.12
Upgrading IB software
If an existing cluster which has been installed and verified needs to have Infiniband
software upgraded, the following steps may be followed.
1. (All) On each IB Management Node, perform an upgrade installation of the
Fabric Access Software using the procedure documented in the Fabric Access Software Users Guide. Each IB Management Node must have at least Fast
Fabric, the IB Stack and IPoIB installed and configured. For MPI clusters the
D000006-000 Rev A3-21
Page 36
3 – Getting Started
Upgrading IB software
NOTE:Ensure that existing configuration is appropriately upgraded, especially
2. (All) Select the "Host Setup via Fast Fabric" option from the main menu.
3. Select the items shown as "Perform" in the menu below and press the P key
SilverStorm Technologies Inc. IB Host Setup Menu (4.1.1.0.15)
Fast Fabric Host List: /etc/sysconfig/iba/hosts
0) Edit Config and Select/Edit Hosts Files [Perform]
1) Verify Hosts via Ethernet ping [ Skip ]
2) Verify rsh/rcp Configured [ Skip ]
3) Setup Password-less ssh/scp [ Skip ]
4) Copy /etc/hosts to all hosts [ Skip ]
5) Show uname -a for all hosts [ Skip ]
6) Install/Upgrade InfiniServ Software [Perform]
7) Configure IPoIB IP Address [ Skip ]
8) Build MPI Test Apps and Copy to Hosts [ Skip ]
9) Reboot Hosts [Perform]
a) Refresh ssh Known Hosts [ Skip ]
b) Rebuild MPI Library and Tools [ Skip ]
c) Run a command on all hosts [ Skip ]
d) Copy a file to all hosts [ Skip ]
e) View ibtest result files [ Skip ]
Q
IB Management Nodes should also include the MPI Runtime and MPI
Development packages, and if the user desires to rebuild MPI itself, the IB
Development package and MPI Source packages will also be required.
After completing the install, reboot each of the IB Management Nodes to ensure
they are running the new IB software.
Fast Fabric and IPoIB related configuration files. Consult the Fabric
Access Software Users Guide and release notes for further information.
to perform them:
P) Perform the selected actions
N) Select None
X) Return to Previous Menu (or ESC)
4. (All) "Edit Config and Select/Edit Hosts Files" will permit the hosts and
fastfabric.conf files to be edited. When placed in the editor for fastfabric.conf,
review all the settings. Especially review the FF_PRODUCT, FF_PACKAGES,
and FF_UPGRADE_OPTIONS. See appendix B for more information about
fastfabric.conf.
Select a hosts list file which lists all the hosts except the IB Management nodes.
If necessary create a new file at this time, potentially based on the existing
/etc/sysconfig/iba/hosts file.
3-22D000006-000 Rev A
Page 37
Q
3 – Getting Started
NOTE:Do not list any of IB Management Nodes (eg. The nodes which have
fast fabric installed)
NOTE:The file may list the Management Network or IPoIB hostnames for the
selected hosts
5. (Host) "Install/Upgrade InfiniServ Software" will upgrade the IB software on all
the selected hosts. By default it will look in the current directory for the
$FF_PRODUCT.<VERSION>.tgz file. If it is not found in the current directory,
it will prompt for input of a directory name where this file can be found.
When prompted, select to do an upgrade installation as follows:
Would you like to do an upgrade install? [y]: y
NOTE:An upgrade installation will update any existing InfiniServ software on the
selected hosts. An upgrade install is only valid for hosts which already
have a previous version of InfiniServ software installed.
If any hosts fail to be updated, use the "View ibtest result files" option to review
the result files from the update. See the section “Interpreting the ibtest log files”
on page 5-68 for more details.
6. (Linux) If there are any other setup operations which need to be performed on
all hosts, they may be performed using the "Run a command on all hosts" option.
Each time this is executed a Linux shell command (or sequence of commands
separated by semicolons) may be specified to be executed against all selected
hosts.
NOTE:NOTE: Check the relevant release notes for the new InfiniServ release
being installed for any such additional required steps.
7. (Linux) "Reboot Hosts" will reboot all the selected hosts and ensure they go
down and come back up (as verified via ping over the management network).
When the hosts come back up, they will be running the IB software installed.
8. Repeat the verification steps for the fabric as discussed in the section “Verifying
InfiniBand on the Remaining Servers” on page 3-16.
.
D000006-000 Rev A3-23
Page 38
3 – Getting Started
Upgrading IB software
Q
3-24D000006-000 Rev A
Page 39
Section 4
Fast Fabric TUI Menu
Fast Fabric is easiest to use from the textual user interface (TUI) menu system.
The menu system provides a way to perform all common tasks and presents
common options. Additional less common options are available directly via the
Command Line Tools documented in the next section.
In the sections that follow, the menu system will be discussed. The majority of menu
items directly invoke various Fast Fabric command tools. As such the section on
each menu item will indicate what command tool it invokes and a summary of the
operation performed. For further details about the given command tool, consult the
relevant section within “Basic Setup and Administration Tools” on page 5-11.
Some of the menu items are only applicable when Linux is being used. They will
be marked with (Linux). Similarly some of the menu items are only applicable when
QuickSilver Linux IB software is being used on the hosts. Those will be marked
with (Host). All menu items which are applicable only when SilverStorm IB Switches
or Chassis are being used will be marked with (Switch). All remaining menu items
are generally applicable to all environments and will be marked with (All).
NOTE:Some of the Linux menu items may be applicable to other Unix-like
operating systems if it is desired to enable the use of non-IB specific Fast
Fabric tools (such as cmdall) against the given hosts.
The main menu can be invoked using the iba_config command. The main menu
is as follows:
SilverStorm Technologies Inc. InfiniBand 4.1.1.0.15 Software
1) Show Installed Software
2) Reconfigure IP over IB
3) Reconfigure Driver Autostart
4) Update HCA Firmware
5) Generate Supporting Information for Problem Report
6) Host Setup via Fast Fabric
7) Host Admin via Fast Fabric
8) Chassis Admin via Fast Fabric
9) Externally Managed Switch Admin via Fast Fabric
a) Uninstall Software
X) Exit
In the above menu, items 6-9 represent the Fast Fabric menus. The operation of
this menu is the same as the INSTALL and iba_config functions documented in the
D000006-000 Rev A4-1
Page 40
4 – Fast Fabric TUI Menu
QuickSilver Fabric Access Software Users Guide. Selecting items 1-9 will display
the given submenu. Pressing X will exit the menu system.
Selection of a Fast Fabric menu (6-9) will present a submenu such as below:
SilverStorm Technologies Inc. IB Host Setup Menu (4.1.1.0.15)
Fast Fabric Host List: /etc/sysconfig/iba/hosts
0) Edit Config and Select/Edit Hosts Files [Perform]
1) Verify Hosts via Ethernet ping [Perform]
2) Verify rsh/rcp Configured [ Skip ]
3) Setup Password-less ssh/scp [Perform]
4) Copy /etc/hosts to all hosts [ Skip ]
5) Show uname -a for all hosts [Perform]
6) Install/Upgrade InfiniServ Software [Perform]
7) Configure IPoIB IP Address [Perform]
8) Build MPI Test Apps and Copy to Hosts [Perform]
9) Reboot Hosts [Perform]
a) Refresh ssh Known Hosts [Perform]
b) Rebuild MPI Library and Tools [ Skip ]
c) Run a command on all hosts [ Skip ]
d) Copy a file to all hosts [ Skip ]
e) View ibtest result files [ Skip ]
Q
P) Perform the selected actions
N) Select None
X) Return to Previous Menu (or ESC)
The submenus typically present operations in the typical order they would be used
during an installation. Pressing the keys corresponding to menu items (0-9, a-e in
the example above) will toggle the Skip/Perform selection for the given item. As
shown in the example above, more than 1 item may be selected. Once the desired
set of items have been selected, press P. To unselect all items, press N. Pressing
X or ESC will exit this menu and return to the Main Menu.
If more than 1 item is selected, the items will be performed in the order shown in
the menu. This is the typical order desired during fabric setup. If it's desired to
perform items in a different order, select a single item and press P to perform it by
itself. Then repeat. An opportunity will be presented after each item to abort:
Hit any key to continue (or ESC to abort)...
If ESC is pressed, the sequence of operations will be aborted and return to the
previous menu. Any other key will result in the next selected menu item being
performed. This prompt is also shown after the last selected item completes, hence
permitting an opportunity to review the results before the screen is cleared to display
the menu.
4-2D000006-000 Rev A
Page 41
Q
At the top of each Fast Fabric menu, the file listing the components to operate on
is shown. For example:
Fast Fabric Host List: /etc/sysconfig/iba/hosts
On each Fast Fabric menu, item 0 will permit a different file to be selected and will
permit the editing of the file (using the editor selected via the EDITOR environment
variable). In addition it will also permit review and editing of the fastfabric.conf file.
The fastfabric.conf file guides the overall configuration of Fast Fabric and describes
cluster specific attributes of how Fast Fabric will operate. It is discussed in greater
detail in appendix B.
During the execution of each menu selection, the actual Fast Fabric command line
tool being used will be shown. This can be used as an educational aid to learn the
tools.
4.1
Host Setup via Fast Fabric
This menu is focused on initial host setup and installation of IB software on all the
hosts.
4 – Fast Fabric TUI Menu
D000006-000 Rev A4-3
Page 42
4 – Fast Fabric TUI Menu
Host Setup via Fast Fabric
0) Edit Config and Select/Edit Hosts Files [ Skip ]
1) Verify Hosts via Ethernet ping [ Skip ]
2) Verify rsh/rcp Configured [ Skip ]
3) Setup Password-less ssh/scp [ Skip ]
4) Copy /etc/hosts to all hosts [ Skip ]
5) Show uname -a for all hosts [ Skip ]
6) Install/Upgrade InfiniServ Software [ Skip ]
7) Configure IPoIB IP Address [ Skip ]
8) Build MPI Test Apps and Copy to Hosts [ Skip ]
9) Reboot Hosts [ Skip ]
a) Refresh ssh Known Hosts [ Skip ]
b) Rebuild MPI Library and Tools [ Skip ]
c) Run a command on all hosts [ Skip ]
d) Copy a file to all hosts [ Skip ]
e) View ibtest result files [ Skip ]
P) Perform the selected actions
N) Select None
X) Return to Previous Menu (or ESC)
Q
4.1.1
Edit Configuration and Select/Edit Hosts Files
(All) This will permit the hosts and fastfabric.conf files to be edited. The hosts file
selected and created via this menu should not list the Fast Fabric host itself. After
editing the two files, an opportunity is given to edit them again or continue forward.
Selected Host File: /etc/sysconfig/iba/hosts
Do you want to edit/review/change the files? [y]:
The default will repeat the editing process, answer "n" to proceed to continue
forward.
Refer to the section “Selection of Hosts” on page 5-3 for more details about the
format of the hosts file.
4.1.2
Verify Hosts via Ethernet ping
(All) This will run the pingall command. All the hosts listed will be pinged via the
Management Network.
4.1.3
Verify RSH/RCP Configured
(Linux) This will run the check_rsh command. This will confirm that passwordless
rsh/rcp is properly configured such that the IB Management Node can access all
the other hosts.
4-4D000006-000 Rev A
Page 43
Q
NOTE:It is recommended that SSH be used in place of the check_rsh
command.
4.1.4
Setup Password-less SSH/SCP
(Linux) This will run the setup_ssh -i "" command. This will setup secure
password-less SSH such that the IB Management Node can securely login to all
the other hosts as root via the management network without requiring a password.
Password-less SSH is required by Fast Fabric, MPI test applications and most
versions of MPI (including QuickSilver MPI).
4.1.5
Copy /etc/hosts to all hosts
(Linux) This will run the scpall /etc/hosts /etc/hosts command to copy the /etc/hosts
file on this host to all the other selected hosts. This is not necessary when using a
DNS server to resolve hostnames for the cluster.
4 – Fast Fabric TUI Menu
4.1.6
Show uname -a for all hosts
(Linux) This run the cmdall "uname -a" command to show the OS version on
all the hosts. Review the results carefully to verify all the hosts have the expected
OS version. In typical clusters all hosts will be running the same OS and kernel
version.
4.1.7
Install/Upgrade QuickSilver Software
(Host) This will run the ibtest load or ibtest update command to install the
IB software on all the hosts. By default it will look in the current directory for the
$FF_PRODUCT.<VERSION>.tgz file. If it is not found in the current directory, it
will prompt for input of a directory name where this file can be found.
Prompts will guide the user through options:
❥ upgrade - updates all servers with new release. Only components previously
installed are upgraded. Will fail for any hosts which have no InfiniServ IB software
installed
❥ initial install/load - uninstalls any existing InfiniServ IB software and installs the
given release based on fastfabric.conf installation options specified.
After the install is completed, the hosts will still need to be rebooted to bring up the
new IB drivers. This can be performed using the "Reboot Hosts" menu item
discussed below.
D000006-000 Rev A4-5
Page 44
4 – Fast Fabric TUI Menu
Host Setup via Fast Fabric
If any hosts fail to be updated, use the View ibtest result files option to
review the result files from the update. For more details, see “Interpreting the ibtest
log files” on page 5-68.
4.1.8
Configure IPoIB IP Address
(Host) This will run the ibtest configipoib command to create the ifcfg-ib1 files
on each host. The file will be created with a statically assigned IP address. The
IPoIB IP address for each host will be determined by the resolver (Linux host
command). If not found via the resolver, /etc/hosts on the given host will be
checked.
4.1.9
Build MPI Test Apps and Copy to Hosts
(Host) This will build the MPI sample benchmarks on the IB Management Node
and copy the resulting object files to all the hosts. This is in preparation for execution
of MPI performance tests and benchmarks in a later step.
Q
4.1.10
Reboot Hosts
(Linux) This will run the ibtest reboot command to reboot all the selected hosts
and ensure they go down and come back up (as verified via ping over the
management network). When the hosts come back up, they will be running the IB
software installed.
4.1.11
Refresh SSH Known Hosts
(Linux) This will run the setup_ssh -C -i"" command to refresh the ssh known
hosts list on this server for the Management Network. This may be used to update
security for this host if hosts are replaced, reinstalled, renamed, or repaired.
4.1.12
Rebuild MPI Library and Tools
(Host) This will rebuild the InfiniServ MPI Library itself and related tools (such as
mpirun). This will be performed via the dobuild tool supplied with the InfiniServ MPI
Source. Consult the QuickSilver Fabric Access Software Users Guide for more
information.
4-6D000006-000 Rev A
Page 45
Q
4.1.13
Run a command on all hosts
(Linux) This will run the cmdall command. A Linux shell command (or sequence
of commands separated by semicolons) may be specified to be executed against
all selected hosts.
4.1.14
Copy a file to all hosts
(Linux) This will run the scpall command. A file on the local host may be specified
to be copied to all selected hosts.
4.1.15
View ibtest result files
(All) This permits viewing of the test.log and test.res files that reflect the
results from ibtest runs (such as for installing QuickSilver software or rebooting all
hosts per menu items above). The user is also given the option to remove these
files after viewing them.
4 – Fast Fabric TUI Menu
If not removed, subsequent runs of ibtest from within the current directory will
continue to append to these files.
D000006-000 Rev A4-7
Page 46
4 – Fast Fabric TUI Menu
Host Admin via Fast Fabric
4.2
Host Admin via Fast Fabric
This menu is focused on verifying hosts and the fabric as well as administration of
all the hosts.
SilverStorm Technologies Inc. IB Host Admin Menu (4.1.1.0.15)
Fast Fabric Host List: /etc/sysconfig/iba/allhosts
0) Edit Config and Select/Edit Hosts Files [ Skip ]
1) Verify Hosts via Ethernet ping [ Skip ]
2) Summary of Fabric Components [ Skip ]
3) Show Status of Host IB Ports [ Skip ]
4) Verify Hosts see each other [ Skip ]
5) Verify Hosts ping via IPoIB [ Skip ]
6) Refresh ssh Known Hosts [ Skip ]
7) Check MPI Performance [ Skip ]
8) Generate all Hosts Problem Report Info [ Skip ]
9) Run a command on all hosts [ Skip ]
a) View ibtest result files [ Skip ]
Q
P) Perform the selected actions
N) Select None
X) Return to Previous Menu (or ESC)
4.2.1
Edit Config and Select/Edit Hosts Files
All This will permit the allhosts and fastfabric.conf files to be edited. The allhosts
file selected and created via this menu should list the Fast Fabric host itself. After
editing the two files, an opportunity is given to edit them again or continue forward.
Selected Host File: /etc/sysconfig/iba/allhosts
Do you want to edit/review/change the files? [y]:
The default will repeat the editing process, answer n to roceed to continue forward.
Refer to the section “Selection of Hosts” on page 5-3 for more details about the
format of the allhosts file.
4.2.2
Verify Hosts via Ethernet Ping
(All) This will run the pingall command. All the hosts listed will be pinged via the
Management Network.
4-8D000006-000 Rev A
Page 47
Q
4.2.3
Summary of Fabric Components
(All) This will run the fabric_info command to provide a brief summary of the
counts of components in the fabric including how many switch chips, hosts, and
links are in the fabric. It will also indicate if any 1x links were found (that could
indicate a poorly seated or bad cable). Review the results against the expected
configuration of the cluster.
NOTE:The link count includes some internal links within the switch boxes. This
means that the count displayed will be greater than the actual number of
cables.
4.2.4
Show Status of Host IB Ports
(Host) This will run the showallports command to allow the state and symbol error
counts of all host ports to be manually reviewed.
(All) Instead it is recommended to run:
4 – Fast Fabric TUI Menu
iba_report -i 10 -o errors -o slowlinks
on the IB Management node. This will check all the ports in the fabric for any links
which have high error rates or are running at a lower speed than expected. Any
such identified links should be diagnosed and corrected.
4.2.5
Verify Hosts see each other
(Host) This will run the ibtest sacache command to verify that each host can
see all the others via queries to the Subnet Administrator and the SA replica on
each host has been fully populated.
4.2.6
Verify Hosts ping via IPoIB
(Host) This will run the ibtest ipoibping command to verify that IPoIB is
properly configured and running on all the hosts. This is accomplished via the IB
management node pinging each host via IPoIB.
4.2.7
Refresh SSH Known Hosts
(Linux) This will run the setup_ssh -C command to refresh the SSH known hosts
list on this server for the IPoIB and Management Networks. This may be used to
update security for this host if hosts are replaced, reinstalled, renamed, or repaired.
D000006-000 Rev A4-9
Page 48
4 – Fast Fabric TUI Menu
Host Admin via Fast Fabric
4.2.8
Check MPI Performance
(Host) This will run the ibtest mpiperf command to do a quick check of PCI
and MPI performance.
This displays the MPI latency and bandwidth between pairs of hosts (1-2, 3-4, 5-6,
etc). The results are also written to the test.res file which may be viewed via the
View ibtest result files. Refer to the section “Interpreting the ibtest log
files” on page 5-68 for more details.
The numbers reported should be checked against the practical PCI speeds in the
Performance Impact section. If any pairs are not in the expected performance range,
carefully examine the two hosts involved to verify that the PCI slot used, BIOS
settings and any motherboard jumpers related to devices on PCI buses or slot
speeds. Also verify HCA and riser cards are properly seated.
4.2.9
Generate all Hosts Problem Report Info
Q
(Host) This will run the captureall command to collect configuration and status
information from all hosts and generate a single *.tgz file which can be sent to
the Support Representative.
4.2.10
Run a command on all hosts
(Linux) This will run the cmdall command. A Linux shell command (or sequence
of commands separated by semicolons) may be specified to be executed against
all selected hosts.
4.2.11
View ibtest result files
(All) This permits viewing of the test.log and test.res files which reflect the
results from ibtest runs (such as those for installing QuickSilver software or
rebooting all hosts per menu items above). The user is also given the option to
remove these files after viewing them.
If not removed, subsequent runs of ibtest from within the current directory will
continue to append to these files.
4-10D000006-000 Rev A
Page 49
Q
4.3
QLogic IB Chassis Admin via Fast Fabric
This menu is focused on administration of QLogic 9000 series internally managed
IB chassis.
SilverStorm Technologies Inc. IB Chassis Admin Menu (4.1.1.0.15)
Fast Fabric Chassis List: /etc/sysconfig/iba/chassis
0) Edit Config and Select/Edit Chassis Files [ Skip ]
1) Verify Chassis via Ethernet ping [ Skip ]
2) Update Chassis Firmware [ Skip ]
3) Show Status of Chassis IB Ports [ Skip ]
4) Reboot Chassis [ Skip ]
5) Generate all Chassis Problem Report Info [ Skip ]
6) Run a command on all chassis [ Skip ]
7) View ibtest result files [ Skip ]
P) Perform the selected actions
N) Select None
4 – Fast Fabric TUI Menu
X) Return to Previous Menu (or ESC)
4.3.1
Edit the Configuration and Select/Edit Chassis Files
(Switch) This will permit the chassis and fastfabric.conf files to be edited.
The chassis file selected and created via this menu should not list the Fast Fabric
host itself. After editing the two files, an opportunity is given to edit them again or
continue forward.
Selected Chassis File: /etc/sysconfig/iba/chassis
Do you want to edit/review/change the files? [y]:
The default will repeat the editing process, answer n to proceed to continue forward.
Refer to section “Selection of Chassis” on page 5-4 for more details about the format
of the chassis file.
4.3.2
Verify Chassis via Ethernet Ping
(Switch) This will run the pingall -C command to ping each selected chassis
over the management network.
D000006-000 Rev A4-11
Page 50
4 – Fast Fabric TUI Menu
QLogic IB Chassis Admin via Fast Fabric
4.3.3
Update Chassis Firmware
(Switch) This will run the ibtest -C update command to permit the chassis
firmware version to be verified and updated as needed.
NOTE:The chassis must be running firmware version 4.0.0.4.3 or later to perform
this function. If the chassis is not up to this level, it will need to be manually
updated via the chassis GUI. See the SilverStorm 9000 Users Guide for
more information.
NOTE:Consult the relevant chassis firmware release notes to ensure any
prerequisites for the upgrade to the new firmware level have been met
prior to performing the upgrade via Fast Fabric.
Prompts will guide the user through options:
❥ push - push firmware to each chassis but do not change selected nor running
firmware
❥ select - push firmware to each chassis and select it for use on next reboot
Q
❥ run - push firmwarew to each chassis, select it for use and if its not the presently
running firmware, reboot the chassis
Additional options prompted for:
❥ parallel vs serial update
❥ selection of firmware files or directory containing .pkg files
❥ prompting for chassis password (default is to have password in fastfabric.conf)
If any chassis fails to be updated, use the View ibtest result files option
to review the result files from the update. Refer to the section “Interpreting the ibtest
log files” on page 5-68 for more details.
4.3.4
Show Status of Chassis IB Ports
(Switch) This will run the showallports -C command to allow the state and symbol
error counts of all chassis ports to be manually reviewed.
(All) Instead it is recommended to run:
iba_report -i 10 -o errors -o slowlinks
on the IB Management node. This will check all the ports in the fabric for any links
which have high error rates or are running at a lower speed than expected. Any
such identified links should be diagnosed and corrected.
4-12D000006-000 Rev A
Page 51
Q
4.3.5
Reboot Chassis
(Switch) This will run the ibtest -C reboot command to reboot all the selected
chassis and ensure they go down and come back up (as verified via ping over the
management network).
4.3.6
Generate all Chassis Problem Report Information
(Switch) This will run the captureall -C command to collect configuration and
status information from all chassis and generate a single *.tgz file that can be
sent to the Support Representative.
4.3.7
Run a command on all chassis
(Switch) This will run the cmdall -C command. A Chassis CLI command may
be specified to be executed against all selected chassis.
4 – Fast Fabric TUI Menu
4.3.8
View ibtest results files
(All) This permits viewing of the test.log and test.res files which reflect the
results from ibtest runs (such as for updating Chassis Firmware or rebooting all
chassis per menu items above). The user is also given the option to remove these
files after viewing them.
If not removed, subsequent runs of ibtest from within the current directory will
continue to append to these files.
D000006-000 Rev A4-13
Page 52
4 – Fast Fabric TUI Menu
SilverStorm Externally Managed IB Switch Administration via Fast Fabric
4.4
Q
SilverStorm Externally Managed IB Switch Administration via Fast
Fabric
This menu is focused on administration of SilverStorm 9024FC externally managed
switches.
SilverStorm Technologies Inc. IB Switch Admin Menu (4.1.1.0.15)
Fast Fabric Externally Managed Switch List:
/etc/sysconfig/iba/ibnodes
0) Edit Config and Select/Edit Switch Files [ Skip ]
1) Verify Switch via Firmware dump [ Skip ]
2) Update Switch Firmware [ Skip ]
3) Reboot Switch [ Skip ]
4) View ibtest result files [ Skip ]
P) Perform the selected actions
N) Select None
X) Return to Previous Menu (or ESC)
4.4.1
3.4.1Edit Config and Select/Edit Chassis Files
(Switch) This will permit the ibnodes and fastfabric.conf files to be edited.
The ibnodes file selected and created via this menu should not list the Fast Fabric
host itself. After editing the two files, an opportunity is given to edit them again or
continue forward.
Selected Chassis File: /etc/sysconfig/iba/chassis
Do you want to edit/review/change the files? [y]:
The default will repeat the editing process, answer n to proceed to continue forward.
Refer to the section “Selection of Switches” on page 5-7 for more details about the
format of the ibnodes file.
4.4.2
Verify Switch via Firmware Dump
(Switch) Use of this option is not recommended.
4.4.3
Update Switch Firmware
(Switch) This will run the ibtest -n upgrade command to permit the switch
firmware version to be updated and the switch node name to be set.
4-14D000006-000 Rev A
Page 53
Q
NOTE:Consult the relevant switch firmware release notes to ensure any
Prompts will guide the user through options:
❥ select - push firmware to each switch and select it for use on next reboot
❥ run - push firmware to each switch, select it for use and reboot switches
Additional options prompted for:
❥ parallel vs serial update
❥ select of firmware files or directory containing .emfw files
If any switches fail to be updated, use the View ibtest result files option
to review the result files from the update. Refer to the section “Interpreting the ibtest
log files” on page 5-68 for more details.
4.4.4
Reboot Switch
4 – Fast Fabric TUI Menu
prerequisites for the upgrade to the new firmware level have been met
prior to performing the upgrade via Fast Fabric.
(Switch) This will run the ibtest -n reboot command to reboot all the selected
switches.
4.4.5
View ibtest result files
(All) This permits viewing of the test.log and test.res files that reflect the
results from ibtest runs (such as those for updating Switch Firmware or rebooting
all switches per menu items above). The user is also given the option to remove
these files after viewing them.
If not removed, subsequent runs of ibtest from within the current directory will
continue to append to these files.
D000006-000 Rev A4-15
Page 54
4 – Fast Fabric TUI Menu
SilverStorm Externally Managed IB Switch Administration via Fast Fabric
Q
4-16D000006-000 Rev A
Page 55
Detailed Descriptions of Command LineTools
Some of the commands are only applicable when Linux is being used. They will
be marked with (Linux). Similarly some of the commands are only applicable when
QuickSlver Linux IB software is being used on the hosts. Those will be marked with
(Host). All commands which are applicable only when SilverStorm IB Switches or
IB Chassis are being used will be marked with (Switch). All remaining commands
are generally applicable to all environments and will be marked with (All).
NOTE:Some of the Linux commands may be applicable to other Unix-like
operating systems if it is desired to enable use of non-IB specific Fast
Fabric tools (such as cmdall) against the given hosts.
The Fast Fabric tools are installed in directories which are part of the standard Linux
root PATH. Most of the tools are installed in /sbin.
5.1
Common Tool Options
There are some common options to the assorted command line tools. These options
are applicable to most of the tools:
Section 5
5.1.1
-?
5.1.2
-p
Will display Usage information for any of the commands (as will any invalid option)
Runs the operation/command in parallel. This means the operation is performed
simultaneously on batches of 20 hosts. As such this option allows the overall
time of an operation to be much lower. However, a side effect is that any output
from the command will be bursty and intermingled. Therefore this option should
be used for commands where there is no output or the output is of limited interest.
For some commands (such as scpall), this will perform the operation in a quiet
mode to limit output. If the user wants to change the number of parallel operations
export TEST_MAX_PARALLEL=# where # is the new number (such as 30).
For more advanced operations (such as ibtest), parallel operation is the default
mode.
Parallel operation can also be disabled by setting FF_MAX_PARALLEL to 1.
D000006-000 Rev A5-1
Page 56
5 – Detailed Descriptions of Command LineTools
Common Tool Options
5.1.3
-S
Prompt for password for admin on chassis. By default Fast Fabric operations
against SilverStorm chassis (such as cmdall, captureall, showallports,
and ibtest) obtain the chassis admin password from the
FF_CHASSIS_ADMIN_PASSWORD environment variable which may be directly
exported or part of fastfabric.conf. Alternatively the -S option may be used
on these commands in which case the chassis admin password will be prompted
for interactively. The password is prompted for once and the same password is
then used to login to each chassis during the operation.
NOTE:Newer versions of SilverStorm chassis firmware permit ssh keys to be
configured within the chassis for secure password-less login. In which
case there is no need to configure a FF_CHASSIS_ADMIN_PASSWORD
and FF_CHASSIS_LOGIN_METHOD can be ssh. Consult the
SilverStorm 9000 Users Guide for more information.
5.1.4
-C
Q
5.1.5
-n or -I
Specifies that the given operation should be performed against chassis. By
default Fast Fabric operations are performed against hosts. However, selected
Fast Fabric commands (such as cmdall, pingall, captureall, and
ibtest) can also operate against SilverStorm internally managed IB chassis.
When -C is specified, the operation will be performed against chassis instead
of hosts (and the selection of chassis options discussed below will be used).
Specifies that the given operation should be performed against
externally-managed switches (such as the SilverStorm 9024FC model IB switch).
By default Fast Fabric operations are performed against hosts. However,
selected Fast Fabric commands (such as ibtest) can also operate against
externally-managed switches. When specified, the operation will be performed
against switches instead of hosts (and the selection of switches options
discussed below will be used).
NOTE:Some commands use -n while others use -I. In a future release this
will be made consistent among all commands.
5-2D000006-000 Rev A
Page 57
Q
5.1.6
Selection of Hosts
For operations that are performed against a set of hosts, there are multiple ways
to specify the hosts on which to operate:
1. Small sets of hosts can be easily specified on the command line via the -h
option discussed below.
2. When multiple commands are performed against the same small set of hosts,
the environment variable HOSTS can be used to specify a space separated lists
of hosts.
3. For groups of hosts that will be used often, a file may be created listing the
hosts. The defauilt file is /etc/sysconfig/iba/hosts that should list all
hosts in the cluster except the host running Fast Fabric itself. Such a file may
then be specified via the -f command line option or the HOSTS_FILE
environment variable.
Within the tools the options are considered in the following order, the first item
listed below that is specified is used for the given command.
5 – Detailed Descriptions of Command LineTools
1. -h option
2. HOSTS environment variable
3. -f option
4. HOSTS_FILE environment variable
5. /etc/sysconfig/iba/hosts file
For example if the -h option is used and the HOSTS_FILE environment variable
is also exported, the command will operate only on hosts specified via the -h
option.
5.1.6.1
Host List Files
The -f option or the HOSTS_FILE environment variable may be used to provide
the name of a file containing the list of hosts on which to operate. The default
is /etc/sysconfig/iba/hosts. In some fabrics it may be useful to create
multiple files in /etc/sysconfig/iba representing different subsets of the
fabric from which the user may operate. For example:
/etc/sysconfig/iba/hosts-mpi: list of MPI hosts
/etc/sysconfig/iba/hosts-fs: list of file server hosts
/etc/sysconfig/iba/hosts: list of all hosts except for the Fast Fabric
node
/etc/sysconfig/iba/allhosts: list of all hosts including the Fast Fabric
node
D000006-000 Rev A5-3
Page 58
5 – Detailed Descriptions of Command LineTools
Common Tool Options
If a relative path is specified for the -f option or HOSTS_FILE, the current
directory will be checked first, followed by /etc/sysconfig/iba/
5.1.6.1.1
Host List File Format
Below is a sample host list file:
# this is a comment
192.168.0.4# host identified by IP address
n001# host identified by resolvable TCP/IP name
include /etc/sysconfig/iba/hosts-mpi # included file
Each line of the host list file may specify a single host, a comment or another
host list file to include.
Hosts may be specified by IP address or a resolvable TCP/IP host name.
Typically, host names are used for readability. Also, some Fast Fabric tools will
translate the supplied host names to IPoIB hostnames, in which case names are
generally easier to translate than numeric IP addresses. Typically management
network hostnames are specified. However, if desired, IPoIB hostnames or IP
addresses may be used. This can accelerate large file transfers and other
operations.
Q
Files to be included may be specified via an include directive followed by a file
name. File names specified should generally be absolute pathnames. If relative
pathnames are used, they will be searched for within the current directory then
/etc/sysconfig/iba.
Comments may be placed on any line. By using a # to precede the comment.
On lines with hosts or include directives, the # must be white space separated
from any preceding hostname, IP address or included file name.
5.1.6.2
Explicit host names
When hosts are explicitly specified via the -h option or the HOSTS environment
variable, a space separated list of host names (or IP addresses) may be supplied.
For example: -h 'host1 host2 host3'.
5.1.7
Selection of Chassis
For operations which are performed against a set of chassis, there are multiple
ways to specify the chassis on which to operate:
5-4D000006-000 Rev A
Page 59
Q
5 – Detailed Descriptions of Command LineTools
1. Small sets of chassis can .be easily specified on the command line via the -H
option discussed below
2. When multiple commands will be performed against the same small set of
chassis, the environment variable CHASSIS can be used to specify a space
separated lists of chassis.
3. For groups of chassis which will be used often, a file may be created listing the
chassis. The defauilt file is /etc/sysconfig/iba/chassis which should
list all chassis in the cluster. Such a file may then be specified via the -F
command line option or the CHASSIS_FILE environment variable.
Within the tools the options are considered in the following order, the first item
listed below that is specified is used for the given command.
1. -H option
2. CHASSIS environment variable
3. -F option
4. CHASSIS_FILE environment variable
5. /etc/sysconfig/iba/chassis file
For example if the -H option is used and the CHASSIS_FILE environment
variable is also exported, the command will operate only on chassis specified
via the -H option.
5.1.7.1
Chassis List Files
The -F option or the CHASSIS_FILE environment variable may be used to
provide the name of a file containing the list of SilverStorm IB chassis to operate
on. The default is /etc/sysconfig/iba/chassis. In some fabrics it may
be useful to create multiple files in /etc/sysconfig/iba representing
different subsets of the fabric the user may operate from. For example:
/etc/sysconfig/iba/chassis-core: list of core switching chassis
/etc/sysconfig/iba/chassis-edge: list of edge switching chassis
/etc/sysconfig/iba/esm_chassis: list of chassis running an SM
/etc/sysconfig/iba/chassis: list of all chassis
If a relative path is specified for the -F option or CHASSIS_FILE, the current
directory will be checked first, followed by /etc/sysconfig/iba/.
D000006-000 Rev A5-5
Page 60
5 – Detailed Descriptions of Command LineTools
Common Tool Options
5.1.7.1.1
Chassis List File Format
Below is a sample chassis file:
# this is a comment
192.168.0.5# chassis IP address
edge1# chassis resolvable TCP/IP name
include /etc/sysconfig/iba/corechassis # included file
Each line of the chassis list file may specify a single chassis, a comment or
another chassis that list file to include.
A chassis may be specified by chassis management network IP address or a
resolvable TCP/IP name. Typically names are used for readability.
Files to be included may be specified via an include directive followed by a file
name. File names specified should generally be absolute path names. If relative
path names are used, they will be searched for within the current directory then
/etc/sysconfig/iba.
Q
Comments may be placed on any line. By using a # to precede the comment.
On lines with chassis or include directives, the # must be white space
separated from any preceding name, IP address or included file name.
5.1.7.2
Explicit Chassis names
When chassis are explicitly specified via the -H option or the CHASSIS
environment variable, a space separated list of names (or IP addresses) may
be supplied. For example: -H 'chassis1 chassis2 chassis3'.
5.1.7.3
Selection of slots within a chassis
Normally, operations are performed against the management card in the chassis.
For operations such as cmdall, the command is executed against the
management interface for the given chassis. For more sophisticated operations
such as firmware update, a directory with firmware for each chassis card type
can be supplied and all cards in the chassis will be updated with the appropriate
firmware from that directory.
However, in some cases it may be desirable to perform operations against a
specific subset of cards within the chassis. In this case the chassis IP address
or name within a chassis list or a chassis file can be augmented with a list of slot
numbers on which to operate. This is done in the form:
chassis:slot1,slot2,…
5-6D000006-000 Rev A
Page 61
Q
5 – Detailed Descriptions of Command LineTools
For example:
i9k229:0
i9k229:0,1,5
192.168.0.5:0,1,5
NOTE:There must be no spaces within the chassis name and/or slot list.
This format is used by cmdall and chassis firmware update. This format
may be used anyplace a chassis name or IP address is valid, such as the -H
option, the CHASSIS environment variable or chassis list files. The slot
number specified is ignored on some operations (such as pingall). Only slots
containing management cards, EVICs and FVICs may be specified with this
format. For all 9000 series chassis, slot 0 is always an alias for the presently
active management card for the chassis. For the remainder of slot usages in
the chassis, the chassisQuery command can be executed against a given
chassis to identify which slots have management, EVIC or FVIC cards.
NOTE:For any operation, care should be taken that a given chassis is listed only
once with all relevant slots as part of that single specification. This is
important so that parallel operations do not cause conflicting concurrent
operations against a given chassis.
5.1.8
Selection of Switches
For operations that are performed against a set of fixed configuration
externally-managed switches, there are multiple ways to specify the switch on
which to operate:
1. Small sets of switches can be easily specified on the command line via the -N
option discussed below.
2. When multiple commands will be performed against the same small set of
switches, the environment variable IBNODES can be used to specify a space
separate lists of switches
3. For groups of switches which will be used often, a file may be created listing
the switches. The defauilt file is /etc/sysconfig/iba/ibnodes that should
list all switches in the cluster. Such a file may then be specified via the -L
command line option or the IBNODES_FILE environment variable.
Within the tools the options are considered in the following order, the first item
listed below which is specified is used for the given command.
1. -N option
2. IBNODES environment variable
3. -L option
D000006-000 Rev A5-7
Page 62
5 – Detailed Descriptions of Command LineTools
Common Tool Options
4. IBNODES_FILE environment variable
5. /etc/sysconfig/iba/ibnodes file
For example if the -N option is used and the IBNODES_FILE environment
variable is also exported, the command will operate only on switches specified
via the -N option.
5.1.8.1
Switch List Files
The -L option or the IBNODES_FILE environment variable may be used to
provide the name of a file containing the list of SilverStorm IB switches on which
to operate. The default is /etc/sysconfig/iba/ibnodes. In some fabrics
it may be useful to create multiple files in /etc/sysconfig/iba representing
different subsets of the fabric from which the user may operate.
If a relative path is specified for the -L option or CHASSIS_FILE, the current
directory will be checked first, followed by /etc/sysconfig/iba/.
5.1.8.1.1
Switch List File Format
Q
Below is a sample switch list file:
# this is a comment
0x00066a00d9000138,i9k138 # Node GUID with desired Name
0x00066a00d9000139,i9k139 # Node GUID with desired Name
include /etc/sysconfig/iba/moreswitches # included file
Each line of the switch list file may specify a single switch, a comment or another
switch list file to include.
Switches can be specified by node GUID optionally followed by a comma and
the IB Node Description (i.e., the name) to be assigned to the switch. The GUID
will be used to select the switch and on firmware update operations, the node
description will be written to the switch such that other Fast fabric tools (such as
saquery and iba_report) can provide a more easily readable name for the
switch.
Files to be included may be specified via an include directive followed by a file
name. File names specified should generally be absolute path names. If relative
path names are used, they will be searched for within the current directory then
/etc/sysconfig/iba.
Comments may be placed on any line. By using a # to precede the comment.
On lines with chassis or include directives, the # must be white space separated
from any preceding GUID, name or included filename.
5-8D000006-000 Rev A
Page 63
Q
It is recommended that a unique node description be specified for each switch.
This name should follow typical naming rules and use the characters a-z, A-Z,
0-9, and underscore. No spaces are allowed in the node description.
Additionally, names should not start with a digit.
For 9024FC switches, the node GUID can be found on a label on the bottom of
the switch. Alternately, the node GUIDs for switches in the fabric can be found
use a command such as:
saquery -t sw -o nodeguid
NOTE:The above command will report all switch node GUIDs, including those
of internally-managed chassis such as the 9120 model. GUIDs for
internally-managed chassis cannot be specified for use in -N, IBNODES,
-L, or IBNODES_FILE specified lists.
5.1.8.2
Explicit Switch names
When switches are explicitly specified via the -N option or the IBNODES
environment variable, a space separated list of GUIDs (optionally with name)
may be supplied. For example:
Some of the fabric health commands (fabric_analysis, all_analysis)
permits a specific set of local HCA ports to be used for fabric analysis. The
default is to use the first active port. However, for IB management nodes
connected to more than 1 IB subnet, it is necessary to specify the local HCA and
port such that the desired subnet will be analyzed. When the non-default
behavior is desired, there are multiple ways to specify the local ports to use:
1. Small sets of ports can be easily specified on the command line via the -p
option discussed below.
2. When multiple commands will be performed against the same small set of ports,
the environment variable PORTS can be used to specify a space separated lists
of ports.
3. For groups of ports that will be used often, a file may be created listing the ports.
The defauilt file is /etc/sysconfig/iba/ports that should list all local ports
connected to unique subnets. Such a file may then be specified via the -t
command line option or the PORTS_FILE environment variable.
Within the tools the options that are considered in the following order, the first
item listed below that is specified is used for the given command.
D000006-000 Rev A5-9
Page 64
5 – Detailed Descriptions of Command LineTools
Common Tool Options
1. -p option
2. PORTS environment variable
3. -t option
4. PORTS_FILE environment variable
5. /etc/sysconfig/iba/ports file
6. default of the first active port on system (0 :0 port specification)
For example, if the -p option is used and the PORTS_FILE environment variable
is also exported, the command will operate only on ports specified via the -p
option.
5.1.9.1
Port List Files
The -t option or the PORTS_FILE environment variable may be used to provide
the name of a file containing the list of local HCA ports to use. The default is
/etc/sysconfig/iba/ports. In some fabrics it may be useful to create
multiple files in /etc/sysconfig/iba representing different subsets of the
ports from which the user may operate. For example:
Q
/etc/sysconfig/iba/ports-primary: ports for which this node is
primary
/etc/sysconfig/iba/ports-plain1: port(s) for plain1 subnet
/etc/sysconfig/iba/ports: list of all unique subnet ports
If a relative path is specified for the -t option or PORTS_FILE, the current
directory will be checked first, followed by /etc/sysconfig/iba/.
5.1.9.1.1
Port List File Format
Below is a sample port list file:
# this is a comment
1:1 # first port on 1st HCA
1:2 # second port on 1st HCA
2:1 # first port on 2nd HCA
3:0 # first active port on 3rd HCA
include /etc/sysconfig/iba/ports-plain2# included file
Each line of the port list file may specify a single port, a comment or include
another port list file.
5-10D000006-000 Rev A
Page 65
Q
5.1.9.2
Explicit ports
5 – Detailed Descriptions of Command LineTools
Ports are specified as hca:port. No spaces are permitted. The first HCA is 1
and the first Port is 1. The value 0 for HCA or Port has special meaning. The
allowed formats are:
0:0 = 1st active port in system
0:y = port y within system
x:0 = 1st active port on HCA x
x:y = HCA x, port y
Files to be included may be specified via an include directive followed by a file
name. File names specified should generally be absolute pathnames. If relative
pathnames are used, they will be searched for within the current directory then
/etc/sysconfig/iba.
Comments may be placed on any line. By using a # to precede the comment.
On lines with a port or include directive, the # must be white space separated
from any preceding port or included filename.
When ports are explicitly specified via the -p option or the PORTS environment
variable, a space separated list of ports may be supplied.. For example: -p '1:1 1:2 2:1'.
5.2
Basic Setup and Administration Tools
5.2.1
pingall
(All): Pings a group of hosts or chassis to verify that they are powered on and
-C - performs a ping against a chassis. The default is hosts
-p - ping all hosts/chassis in parallel
-f hostfile - file with hosts in cluster, default is
/etc/sysconfig/iba/hosts
-F chassisfile - file with chassis in cluster default is
/etc/sysconfig/iba/chassis
-h hosts - list of hosts to ping
-H chassis - list of chassis to ping
D000006-000 Rev A5-11
Page 66
5 – Detailed Descriptions of Command LineTools
Basic Setup and Administration Tools
Example:
pingall
pingall -h 'arwen elrond'
HOSTS='arwen elrond' pingall
pingall -C
pingall -C -H 'chassis1 chassis2'
CHASSIS='chassis1 chassis2' pingall -C
Environment Variables:
The following environment variables are also used by this command:
HOSTS, HOSTS_FILE - see discussion on selection of hosts above
CHASSIS, CHASSIS_FILE - see discussion on selection of chassis above
FF_MAX_PARALLEL - when -p option is used maximum number of parallel
operations to perform at once.
Q
5.2.2
check_rsh
(Linux) Verifies that rsh is set up to allow passwordless file copies (RCP) and
commands (rsh) to be run from this host to all the other hosts (and to itself via
localhost) as a specific user (default is root). Additionally, this command can be used to verify rsh is setup to allow MPI to use rsh for job startup.
NOTE:For security reasons, configuration and use of rsh/rcp/rlogin is no
longer recommended. Instead ssh is recommended. SSH may be used
by MPI as well as setup_ssh.
-i 'ipoib_suffix '- suffix to apply to host names to create IPoIB host
names. The default is '-ib'. Use -i '' to indicate no suffix.
-h hosts - list of hosts to setup
-f hostfile - file with hosts in cluster, default is
/etc/sysconfig/iba/hosts
-u user - user on remote system to verify this user can rsh to. The default
is current user code.
Example:
check_rsh
check_rsh -h 'arwen elrond'
HOSTS='arwen elrond' check_rsh
5-12D000006-000 Rev A
Page 67
Q
5.2.3
setup_ssh
5 – Detailed Descriptions of Command LineTools
Environment Variables
The following environment variables are also used by this command:
HOSTS, HOSTS_FILE - see discussion on selection of hosts above
(Linux): creates ssh keys and configures them on all hosts so the system can
ssh and scp into all other hosts without a password prompt. Typically, during
cluster setup this tool is used to enable the root user on the IB Management
node to login to the other hosts via password-less ssh. However, if desired, this
tool can also aid the setup of password-less ssh login for other user codes as well.
-C - only perform connect (to enter in local hosts knownhosts). When run
in this mode, -S and -s options are ignored).
-s - use ssh/scp to transfer files, default is rsh/rcp.
-i ipoib_suffix - suffix to apply to host names to create IPoIB host names. The default is '-ib'.
-h hosts - list of hosts to setup.
-f hostfile - file with hosts in cluster, default is
/etc/sysconfig/iba/hosts.
-u user - user on remote system to allow this user to ssh to, default is
current user code.
-S - securely prompt for password for user on remote system.
Example:
setup_ssh -s -S -I""
setup_ssh -C
setup_ssh -h 'arwen elrond' -C
HOSTS='arwen elrond' setup_ssh -C
Environment Variables
The following environment variables are also used by this command:
HOSTS, HOSTS_FILE - see discussion on selection of hosts above.
FF_IPOIB_SUFFIX - suffix to append to hostname to create IPoIB
hostname. Used in absence of -i.
D000006-000 Rev A5-13
Page 68
5 – Detailed Descriptions of Command LineTools
Basic Setup and Administration Tools
Fast Fabric provides additional flexibility in the translation between IPoIB and
management network hostnames. Refer to appendix C for more information.
Setup_ssh provides an easy way to create ssh keys and distribute them to the
hosts in the cluster. Many of the Fast Fabric tools (as well as many versions of
MPI) require ssh be set up for password-less operation. Therefore, setup_ssh
is an important setup step.
This tool also sets up ssh to the local host and the local hosts IPoIB name. This
capability is required by selected Fast Fabric commands and may be used by
some applications (such as MPI).
Setup_ssh has two modes of operation. The mode is selected by the presence
or absence of the -C option. Typically, setup_ssh will first be run without the
-C option, then it may later be run with the -C option.
Initial key exchange
When run without the -C option, setup_ssh will perform the initial key exchange
and enable password-less ssh and scp. The key exchange can be accomplished
using ssh and scp (in a password prompting manner) via the -s option or using
password-less rsh and rcp (omitting the -s option).
Q
The preferred way to use setup_ssh for initial key exchange is with the -s
and -S options. This requires all hosts have been configured with the same
password for the specified "user" (typically root). In this mode the password
will be prompted for once and then ssh and scp are used in conjunction with that
password to complete the setup for the hosts. Use in this manner also avoids
the need to setup rsh/rcp/rlogin (which can be a security risk).
If -s is used without the -S option, the user will be prompted by ssh and scp for
each host as they are setup. There will be multiple prompts per host. For a
handful of hosts this is manageable, however for a significant number of hosts
this can become cumbersome. Therefore, the -S option is recommended in this
case.
If the -s option is not specified, rsh and rcp will be used to perform the ssh key
exchange. This requires password-less rcp and rlogin be enabled on each host
(check_rsh can perform verification).
Setup_ssh will configure password-less ssh/scp for both the management
network and IPoIB. Typically, the management network will be used for Fast
Fabric while IPoIB will be used for MPI and other applications. If IPoIB is not yet
running (for example, during initial cluster installation IB software will not yet be
installed on all the hosts), the -i option can be specified with an empty string:
setup_ssh -i ''
This will cause the last part of the setup of ssh for IPoIB to be skipped.
Refreshing local systems known hosts
5-14D000006-000 Rev A
Page 69
Q
5.2.4
cmdall
5 – Detailed Descriptions of Command LineTools
If hosts have IP addresses added (for example by installing IB software and
enabling IPoIB), IP addresses changes, MAC addresses changed or other
aspects have changed (such as server OS reinstallation), the local hosts ssh
known_hosts file can be refreshed by running setup_ssh with the -C option.
This option will not transfer the keys, but rather will connect to each host
(management network and IPoIB) in order to refresh the ssh keys. Existing
entries for the specified hosts are replaced within the local known_hosts file.
When run in this mode the -S and -s options are ignored. This mode assumes
ssh has previously been setup for the hosts, as such no files are transferred to
the specified hosts and no passwords should be required.
Typically after completing the installation and booting of IB software, setup_ssh
will need to be rerun with the -C option to update the knownhosts file
(Linux and Switch): Executes a command on all hosts or SilverStorm IB chassis.
This is very powerful and can be used for everything from configuring servers or
chassis, verifying that they are running, starting and stopping host processes, etc.
-C - perform command against chassis, default is hosts
-p - run command in parallel on all hosts
-q - quiet mode, do not show command to execute
-f hostfile - file with hosts in cluster, default is
/etc/sysconfig/iba/hosts
-F chassisfile - file with chassis in cluster default is
/etc/sysconfig/iba/chassis
-h hosts - list of hosts on which to execute the command
-H chassis - list of chassis on which to execute the command
-u user - the user to perform the command as. For hosts, the default is current user code. For chassis, the default is admin (this argument is ignored)
-S - securely prompt for password for admin on chassis
D000006-000 Rev A5-15
Page 70
5 – Detailed Descriptions of Command LineTools
Basic Setup and Administration Tools
Host Examples:
cmdall date
cmdall 'uname -a'
cmdall -h 'elrond arwen' date
HOSTS='elrond arwen' cmdall date
Chassis Examples:
cmdall -C 'ismPortStats'
cmdall -C -H 'chassis1 chassis2' ismPortStats
CHASSIS='chassis1 chassis2' cmdall ismPortStats
Environment Variables
The following environment variables are also used by this command:
HOSTS, HOSTS_FILE - see discussion on selection of hosts above
CHASSIS, CHASSIS_FILE - see discussion on selection of chassis above
FF_MAX_PARALLEL - when -p option is used maximum number of parallel
operations to perform at once.
Q
FF_CHASSIS_LOGIN_METHOD - how to login to chassis. Can be ssh or telnet
FF_CHASSIS_ADMIN_PASSWORD - password for admin on all chassis. Used in absence of -S option.
NOTE:All commands performed with cmdall must be non-interactive in nature.
cmdall will wait for the command to complete before proceeding. For
example, when running host commands such as rm, the -i option
(interactively prompt before removal) should not be used (Note that this
option is sometimes part of a standard bash alias list). Similarly, when
running chassis commands such as fwUpdateChassis, the -reboot
option should not be used (this option causes an immediate reboot
therefore, the command never returns). Similarly, the chassis command
reboot should not be executed via cmdall. Instead use the ibtest -C reboot Fast Fabric command to reboot one or more chassis For further
information about individual chassis CLI commands consult the
SilverStorm 9000 CLI Reference Guide. For further information about
Linux OS commands, consult the Linux man pages and any other
documentation supplied with the OS by the OS supplier.
When performing cmdall against hosts, internally ssh is used. The command
cmdall requires that password-less ssh be setup between the host running Fast
Fabric and the hosts cmdall is operating against. The setup_ssh Fast Fabric
tool can aid in setting up password-less ssh.
When performing cmdall against a set of chassis, all chassis must be configured
with the same admin password.
5-16D000006-000 Rev A
Page 71
Q
5.2.5
captureall
5 – Detailed Descriptions of Command LineTools
For operations against chassis use of the -S option is recommended. This avoids
the need to keep the password in configuration files.
(Switch and Host): Captures supporting information for a problem report from
all hosts or SilverStorm IB chassis and uploads to this system
-C - perform capture against chassis, default is hosts
-p - perform capture in parallel
[for a host capture this only affects the upload phase]
-f hostfile - file with hosts in cluster, default is
/etc/sysconfig/iba/hosts
-F chassisfile - file with chassis in cluster, default is
/etc/sysconfig/iba/chassis
-h hosts -a list of hosts to perform a capture of
-H chassis - a list of chassis to perform a capture of
-d upload_dir - directory to upload to, default is uploads. If not specified,
the environment variable UPLOADS_DIR will be used. If that is not exported,
the default (./uploads) will be used.
-S - securely prompt for password for administrator on a chassis
file - name for capture file [.tgz will be appended]
When a host captureall is performed, iba_capture will be run to create the
specified capture file within ~root on each host (with the .tgz suffix added).
The files will be uploaded and unpacked into a matching directory name within
upload_dir/hostname/ on the local system. The default file name is
hostcapture.
When a chassis capture all is performed, the chassis capture CLI command
will be run on each chassis and its output will be saved to
upload_dir/chassisname/file on the local system. The default file name
is chassiscapture.
For both host and chassis capture, the uploaded captures will be combined into
a tgz file with the file name specified and the suffix .all.tgz added
Host Capture Examples:
captureall
D000006-000 Rev A5-17
Page 72
5 – Detailed Descriptions of Command LineTools
Basic Setup and Administration Tools
The above example creates a hostcapture directory in
./uploads/<HOSTNAME>/ for each host in
/etc/sysconfig/iba/hosts then creates hostcapture.all.tgz.
captureall mycapture
The above example creates a mycapture directory in
./uploads/<HOSTNAME>/ for each host in
/etc/sysconfig/iba/hosts then creates mycapture.all.tgz.
captureall -h 'arwen elrond' 030127capture
Chassis Capture Examples:
captureall -C
The above example creates a chassiscapture file in
./uploads/<CHASSISNAME>/ for each chassis in
/etc/sysconfig/iba/chassis then creates
chassiscapture.all.tgz.
captureall -C mycapture
Q
The above example creates a mycapture.tgz file in
./uploads/<CHASSISNAME>/ for each chassis in
/etc/sysconfig/iba/chassis then creates mycapture.all.tgz.
The following environment variables are also used by this command:
HOSTS, HOSTS_FILE - see discussion on selection of hosts above.
CHASSIS, CHASSIS_FILE - see discussion on selection of chassis above.
UPLOADS_DIR - directory to upload to, used in absence of -d.
FF_MAX_PARALLEL -maximum number of parallel operations to perform at
once.
FF_CHASSIS_LOGIN_METHOD - how to login to chassis. Can be SSH or
telnet.
FF_CHASSIS_ADMIN_PASSWORD - password for administrator on all chassis.
Used in absence of -S option.
When performing captureall against hosts, internally SSH is used. The command
captureall requires that password-less SSH be setup between the host
running Fast Fabric and the hosts captureall is operating against. The setup_ssh Fast Fabric tool can aid in setting up password-less SSH.
When performing captureall against a set of chassis, all chassis must be
configured with the same administrator password.
5-18D000006-000 Rev A
Page 73
Q
For operations against chassis use of the -S option is recommended. This avoids
the need to keep the password in configuration files.
NOTE:The resulting host capture files can require significant amounts of space
on the Fast Fabric host. Actual size will vary, but sizes can be multiple
megabytes per host. As such it is recommended to ensure adequate
space is available on the Fast Fabric system. In many cases it may not
be necessary to run captureall against all hosts or chassis, but rather
a representative subset may be sufficient. Consult with your support
representative for further information.
5.3
File Management Tools
The following tools aid in copying files to and from large groups of nodes in the
fabric.
Internally, these tools make use of SCP and require that password-less SSH/SCP
be setup between the host running Fast Fabric and the hosts files that are being
transferred to and from. The setup_ssh Fast Fabric tool can aid in setting up
password-less SSH/SCP.
5 – Detailed Descriptions of Command LineTools
5.3.1
scpall
(Linux): The scpall tool permits efficient copying of files or directories from the
current system to multiple hosts in the fabric. When copying large directory trees,
performance can be improved by using the -t option. This will tar and compress
the tree, then transfer the resulting compressed tarball to each node (and untar
it on each node).
This can provide a powerful facility for copying data files, operating system files
or even applications to all the hosts (or a subset of hosts) within the fabric.
-t - optimized recursive copy of directories using tar
-h hosts - list of hosts to copy to
D000006-000 Rev A5-19
Page 74
5 – Detailed Descriptions of Command LineTools
File Management Tools
-f hostfile - file with hosts in cluster, default is
/etc/sysconfig/iba/hosts.
-u user - user to perform copy to, default is current user code
source_file: the name of files to copy from this system, relative to the
current directory. Multiple files may be listed.
source_dir: the name of directory to copy from this system, relative to the
current directory.
dest_file or dest_dir: is the name of the file or directory on the
destination system to copy to. It is relative to the home directory of the
specified user code (an absolute path name may be specified if desired).
When performing directory copies using the -t option, the destination directory
is optional. If not specified it defaults to the present directory name. If both the
source and destination directory names are omitted, they both default to the
current directory name.
NOTE:The tool scpall can only copy from this system to a group of systems
in cluster. The user@ style syntax cannot be used in the arguments to
scpall.
Environment Variables
The following environment variables are also used by this command:
HOSTS, HOSTS_FILE - see discussion on selection of hosts above
FF_MAX_PARALLEL - when -p option is used maximum number of parallel
operations to perform at once.
To copy from hosts in the cluster to this host, use uploadall.
5-20D000006-000 Rev A
Page 75
Q
5.3.2
uploadall
5 – Detailed Descriptions of Command LineTools
(Linux): Copies one or more files from a group of hosts to this system. Since
the file name will be the same on each host, a separate directory on this system
is created for each host and the file is copied to it. This is a convenient way to
upload log files or configuration files for review. It can also be used in conjunction
with downloadall to upload a host specific configuration file, edit it for each
host and download the new version to all the hosts.
-f <HOST FILE> - file with hosts in cluster, default is
/etc/sysconfig/iba/hosts
-h hosts - list of hosts to upload from
-u user - user to perform copy to, default is current user code
-d upload_dir - directory to upload to, default is uploads. If not specified
the environment variable UPLOADS_DIR will be used, if that is not exported
the default (./uploads) will be used.
source_file - the name of files to copy to this system, relative to the current
directory. Multiple files may be listed.
dest_file - is the name of the file or directory on this system to copy to. It
is relative to upload_dir/<HOSTNAME>.
A local directory within upload_dir/ will be created for each host being
uploaded from. Each uploaded file will be copied to
upload_dir/<HOSTNAME>/dest_file. If more than one source file is
specified, dest_file will be treated as a directory name and the directories
upload_dir/<HOSTNAME>/dest_file/ will be created for each host and the
source_files will be uploaded to those directories.
D000006-000 Rev A5-21
Page 76
5 – Detailed Descriptions of Command LineTools
File Management Tools
The above example copies capture.tgz and /etc/init.d/ipoip.cfg to
./uploads/<HOSTNAME>/preinstall/ where there a <HOSTNAME>
directory is created for each host in /etc/sysconfig/iba/hosts.
NOTE:The uploadall tool can only copy from a group of systems in a cluster
to this system. The user@ style syntax cannot be used in the arguments
to uploadall.
Q
5.3.3
downloadall
To copy files from this host to hosts in the cluster use scpall or downloadall.
Environment Variables
The following environment variables are also used by this command:
HOSTS, HOSTS_FILE - see discussion on selection of hosts above.
FF_MAX_PARALLEL - when -p option is used maximum number of parallel
operations to perform at once.
UPLOADS_DIR - the directory to upload to, used in absence of -d.
(Linux): Copies one of more files to a group of hosts from a system. Since the
file contents to copy may be different for each host, a separate directory on this
system is used for the source files for each host. This can also be used in
conjunction with uploadall to upload a host-specific configuration file, edit it
for each host and download the new version to all the hosts.
-f hostfile - file with hosts in cluster. The default is
/etc/sysconfig/iba/hosts.
-h hosts - the list of hosts to download files to
-u user - the user to perform the copy. The default is current user code
-d download_dir - the directory to download files to. The default is
./downloads. If not specified, the environment variable DOWNLOADS_DIR
will be used. If that is not exported the default (./downloads) will be used.
source_file - the name of files to copy from the system. Multiple files may be listed. The option source_file is relative to
download_dir/<HOSTNAME>.
A local directory within download_dir/ must exist for each host being
downloaded to Each downloaded file will be copied from
download_dir/<HOSTNAME>/source_file.
dest_file - is the name of the file or directory on the destination hosts to
copy to.
If more than one source file is specified, dest_file will be treated as a directory
name. The given directory must already exist on the destination hosts (the copy
will fail for hosts where the directory does not exist).
NOTE:The tool downloadall can only copy from this system to a group of
hosts in the cluster. The user@ style syntax cannot be used in the
arguments to downloadall.
To copy files from hosts in the cluster to this host use uploadall.
Environment Variables
The following environment variables are also used by this command:
HOSTS, HOSTS_FILE - see discussion on selection of hosts above.
FF_MAX_PARALLEL - when -p option is used maximum number of parallel
operations to perform at once.
DOWNLOADS_DIR - directory to download from, used in absence of -d.
D000006-000 Rev A5-23
Page 78
5 – Detailed Descriptions of Command LineTools
File Management Tools
5.3.4
Simplified Editing of Node-Specific Files
(Linux): The combination of uploadall and downloadall provide a powerful
yet simple to use mechanism for reviewing and/or editing node-specific files
without the need to login to each node.
This is best explained with an example.
Assume the file /etc./sysconfig/network-scripts/ifcfg-ib1 needs
to be reviewed and possibly edited for each host. This file would typically contain
the IP configuration information for IPoIB and may contain a unique IP address
per host.
Alternatively, if there was no need to download the file to all hosts, a subset of
hosts can be specified using the -h option or by creating an alternate host list file:
SM - each subnet manger (SM) running in the fabric is listed along with its
node name, port GUID and present SM state (Master, Standby, etc).
Number of CA - number of unique channel adapters (CA) in the fabric. A
CA with two-connected ports is counted as a single CA.
NOTE:Channel adapters include both HCAs in servers as well has TCAs within
IO Modules, IB Native Storage, etc.
Number of CA ports - number of connected CA ports in the fabric.
Number of Switch chips - number of unique switches in the fabric.
NOTE:A large IB switch may be composed of many unique switch chips.
Number of Links - number of IB links in the fabric. Note that a large IB
switch may have internal links.
D000006-000 Rev A5-25
Page 80
5 – Detailed Descriptions of Command LineTools
Fabric Analysis Tools
Number of 1x Ports - number of ports in the fabric running at 1x speed.
Typically such ports represent a bad cable connection, a bad cable, too long
a cable or perhaps faulty hardware on one side of the link.
Fabric_info can be very useful as a quick assessment of the fabric state.
Fabric_info can be run against a known good fabric to identify its components
and then later run to see if anything has changed about the fabric configuration
or state. When used in this manner it can be used to quickly identify if CAs are
down, links are missing, SMs are missing, etc.
For more extensive fabric analysis, see iba_report.
5.4.2
showallports
(Switch and Host): Displays basic port state and statistics for all host nodes,
chassis or externally managed switches.
NOTE:iba_report is a newer and more powerful Fast Fabric command. For
general fabric analysis, use iba_report with options such as -o errors and/or -o slowlinks to perform a more efficient analysis of
link speeds and errors.
The following environment variables are also used by this command:
HOSTS, HOSTS_FILE - see discussion on selection of hosts above
CHASSIS, CHASSIS_FILE - see discussion on selection of chassis above
IBNODES, IBNODES_FILE - see discussion on selection of switches above
MGMT_HOST - host to use to perform IB node queries, used in absence of -M
FF_MAX_PARALLEL - when -p option is used maximum number of parallel
operations to perform at once.
FF_CHASSIS_LOGIN_METHOD - how to login to chassis. Can be SSH or
Telnet
FF_CHASSIS_ADMIN_PASSWORD - password for the administrator on all
chassis. Used in absence of -S option.
When performing showallports against hosts, internally SSH is used. showallports requires that password-less SSH be setup between the host
running Fast Fabric and the hosts showallports is operating against. The setup_ssh Fast Fabric tool can aid in setting up password-less SSH.
When performing showallports against a set of chassis, all chassis must be
configured with the same administrator password.
D000006-000 Rev A5-27
Page 82
5 – Detailed Descriptions of Command LineTools
Fabric Analysis Tools
For operations against chassis use of the -S option is recommended. This avoids
the need to keep the password in configuration files.
When performing showallports against externally-managed switches it
requires an IB-enabled management node with Fast Fabric installed. Typically
this will be the Fast Fabric node from which showallports is being run.
However, if desired an alternate node may be specified by the -M option or
MGMT_HOST environment variable.
5.4.3
iba_report
(All): iba_report provides powerful fabric analysis and reporting capabilities.
It must be run on a host connected to the IB fabric with Fast Fabric installed.
iba_report obtains all its data in an IBTA-compliant manner. Therefore, it will
interoperate with both SilverStorm and 3rd party IB components, provided those
components are IBTA compliant and implement the IBTA optional features
required by iba_report.
iba_report requires that the subnet manager implement all the IBTA SA
queries defined in the standard (such as SM Info records, Link Records, Trace
Routes, Port Records, Node Records, etc). As such, it is recommended that the
QuickSilver Fabric Manager version 4.0 or later be used. iba_report requires
all end nodes to implement the PMA PortCounters (IBTA mandatory counters).
Also any end nodes which report support of a IBTA device management agent
must implement the IOU Info, IOC Profile and Service Entry queries as outlined
in the IBTA 1.1 standard.
Q
iba_report takes advantage of these interfaces to obtain extensive
information about the fabric from the subnet manager and the end nodes. Using
this information, iba_report is able to cross reference it and produce analysis
greatly beyond what any single subnet manager request could provide. As such,
it exceeds the capabilities previously available in tools such as saquery and
fabric_info.
iba_report internally cross references all this information so its output can be
in user-friendly form. Reports will include both GUIDs, LIDs and names for
components. Obviously, these reports will be easiest to read if the end user has
taken the time to provide unique names for all the components in the fabric (node
names and IOC names). All SilverStorm components support this capability.
For hosts, the node names automatically are assigned based on the network
host name of the server. For switches and line cards the names can be assigned
via the element managers for each component.
Each run of iba_report obtains up to date information from the fabric. At the
start of the run iba_report will take a few seconds to obtain all the fabric data,
then it will output it to stdout. The reports are sorted by GUIDs and other
permanent information such that they can be rerun in the future and produce
5-28D000006-000 Rev A
Page 83
Q
5 – Detailed Descriptions of Command LineTools
output in the same order even if components have been rebooted. This is useful
for comparison using simple tools like diff. iba_report permits multiple
reports to be requested for a single run (i.e., 1 of each report type).
By default iba_report uses the first active port on the local system. However,
if the IB management node is connected to more than one fabric (e.g., a subnet),
the HCA and port may be specified to select the fabric to analyze.
-h/--hca hca - HCA to send via, default is 1st HCA
-p/--port port - port to send via, default is 1st active port
-o/--output report - report type for output
-d/--detail level - level of detail 0-n for output, default is 2
-P/--persist - only include data persistent across reboots
-H/--hard - only include permanent hardware data
-N/--noname - omit node and IOC names
-x/--xml - output in XML
-s/--stats - get performance statistics for all ports
-i/--interval seconds - obtain performance statistics over interval
seconds, clears all statistics, waits interval seconds, then generates report.
Implies -s
-C/--clear - clear performance stats for all ports. Only stats with error
thresholds are cleared. A clear occurs after generating the report.
-a/--clearall - clear all performance stats for all ports
-c/--config file - error thresholds configuration file. The default is
/etc/sysconfig/iba/iba_mon.conf
-L/--limit - For port error counters check (-o errors) and port counters
clear (-C or -i) with -F limit operation to exact specified focus. Normally the
neighbor of each selected port would also be checked/cleared does not affect
other reports
-F/--focus point - focus area for report used for all reports except route
to limit scope of report
-S/--src point - source for trace route, default is local port
D000006-000 Rev A5-29
Page 84
5 – Detailed Descriptions of Command LineTools
Fabric Analysis Tools
-D/--dest point - destination for trace route
-Q/--quietfocus - do not include focus description in report
Report Types:
comps - summary of all systems and SMs in fabric
brcomps - brief summary of all systems and SMs in fabric
nodes - summary of all node types and SMs in fabric
brnodes - brief summary of all node types and SMs in fabric
ious - summary of all IO units in the fabric
links - summary of all links
extlinks - summary of links external to systems
slowlinks - summary of links running slower than expected
slowconfiglinks- summary of links configured to run slower than
supported
Q
includes slowlinks
slowconnlinks- summary of links connected with mismatched speed
potential
includes slowconfiglinks
misconfiglinks - summary of links configured to run slower than
supported
misconnlinks - summary of links connected with mismatched speed
potential
errors - summary of links whose errors exceed counts in the configuration
file
otherports - summary of ports not connected to the fabric
all - comp, nodes, ious, links, extlinks, slowconnlinks, and errors reports
route - trace route between -S and -D points
none - no report, useful if just want to clear statistics
Point Syntax:
gid:value - value is numeric port gid of form: subnet:guid
lid:value - value is numeric lid
portguid:value - value is numeric port GUID
nodeguid:value - value is numeric node GUID
5-30D000006-000 Rev A
Page 85
Q
5 – Detailed Descriptions of Command LineTools
nodeguid:value1:port:value2 - value1 is numeric node GUID, value2
is port #
iocguid:value - value is numeric IOC GUID
iocguid:value1:port:value2 - value1 is numeric IOC GUID, value2 is
port #
systemguid:value - value is numeric system image GUID
systemguid:value1:port:value2 - value1 is numeric system image
GUID
value2 is port #
ioc:value - value is IOC Profile ID String (IOC Name)
ioc:value1:port:value2 - value1 is IOC Profile ID String (IOC Name)
value2 is port #
iocpat:value - value is global pattern for IOC Profile ID String (IOC Name)
iocpat:value1:port:value2 - value1 is global pattern for IOC Profile ID
String
(IOC Name), value2 is port #
ioctype:value - value is IOC type (VNIC or SRP)
ioctype:value1:port:value2 - value1 is IOC type (VNIC or SRP)
value2 is port #
node:value - value is node description (node name)
node:value1:port:value2 - value1 is node description (node name)
value2 is port #
nodepat:value - value is glob pattern for node description (node name)
nodepat:value1:port:value2 - value1 is glob pattern for node
description
(node name), value2 is port #
nodetype:value - value is node type (SW, CA or RT)
nodetype:value1:port:value2 - value1 is node type (SW, CA or RT)
value2 is port #
sm - master subnet manager
route:point1:point2 - all ports along the routes between the 2 given
points
D000006-000 Rev A5-31
Page 86
5 – Detailed Descriptions of Command LineTools
Fabric Analysis Tools
Examples:
iba_report can generate hundreds of different reports. Following is a list of
some commonly generated reports:
Analyze a fabric for bad cables:
iba_report -o slowlinks -o errors
Analyze a fabric for bad cables or misconfigured ports:
iba_report -o slowconfiglinks -o errors
Analyze a fabric for bad cables or misconfigured ports or misconnected ports:
iba_report can be run with no options at all. In this mode it provides a brief
list of the nodes in the fabric (the brnodes report). The report organizes nodes
as CAs, Switches and Routers. It also includes a summary of all the SMs in the
fabric.
D000006-000 Rev A5-33
Page 88
5 – Detailed Descriptions of Command LineTools
Fabric Analysis Tools
Here is a sample of iba_report for a small fabric:
[root@duster root]# iba_report
Node Type Brief Summary
14 Connected CAs in Fabric:
NodeGUID Type Name
Port LID PortGUID Width Speed
0x0002c9020020e0d4 CA coyote1
1 0x000d 0x0002c9020020e0d5 4x 2.5Gb
0x00066a00580001e0 CA VEx in Chassis 0x00066a005000010c, Slot 2
2 0x0014 0x00066a02580001e0 4x 2.5Gb
0x00066a0098000001 CA julio
1 0x000c 0x00066a00a0000001 4x 2.5Gb
0x00066a00980001b8 CA orc
1 0x000b 0x00066a00a00001b8 4x 2.5Gb
0x00066a0098000380 CA goblin
1 0x000a 0x00066a00a0000380 4x 2.5Gb
Q
0x00066a0098000384 CA cuda
1 0x0005 0x00066a00a0000384 1x 2.5Gb
2 0x0006 0x00066a01a0000384 4x 2.5Gb
0x00066a00980003a6 CA erik
1 0x0015 0x00066a00a00003a6 4x 2.5Gb
2 0x0016 0x00066a01a00003a6 4x 2.5Gb
0x00066a00980006a2 CA goblin
1 0x000f 0x00066a00a00006a2 4x 2.5Gb
0x00066a0098000849 CA rockaway
2 0x000e 0x00066a01a0000849 4x 2.5Gb
0x00066a0098002813 CA brady
1 0x0002 0x00066a00a0002813 4x 2.5Gb
2 0x0003 0x00066a01a0002813 4x 2.5Gb
0x00066a0098002854 CA brady
1 0x0004 0x00066a00a0002854 4x 2.5Gb
2 0x0008 0x00066a01a0002854 4x 2.5Gb
0x00066a0098003f81 CA ibm345
1 0x0007 0x00066a00a0003f81 4x 2.5Gb
0x00066a009800447b CA duster
5-34D000006-000 Rev A
Page 89
Q
5 – Detailed Descriptions of Command LineTools
1 0x0011 0x00066a00a000447b 4x 2.5Gb
2 0x0012 0x00066a01a000447b 4x 2.5Gb
0x00066a0098004a73 CA erik
1 0x0009 0x00066a00a0004a73 4x 2.5Gb
3 Connected Switches in Fabric:
NodeGUID Type Name
Port LID PortGUID Width Speed
0x00066a00280002cd SW InfiniCon Systems InfiniFabric (Sw A Dev
A)
0 0x0013 0x00066a00280002cd Noop Noop
3 4x 2.5Gb
5 4x 2.5Gb
0x00066a00d8000123 SW InfiniCon Systems InfinIO9024
0 0x0001 0x00066a00d8000123 4x 2.5Gb
1 4x 2.5Gb
2 1x 2.5Gb
3 4x 2.5Gb
4 4x 2.5Gb
5 4x 2.5Gb
6 4x 2.5Gb
7 4x 2.5Gb
8 4x 2.5Gb
9 4x 2.5Gb
10 4x 2.5Gb
11 4x 2.5Gb
12 4x 2.5Gb
14 4x 2.5Gb
15 4x 2.5Gb
16 4x 2.5Gb
17 4x 2.5Gb
18 4x 2.5Gb
19 4x 2.5Gb
20 4x 2.5Gb
0x00066a10280002cd SW InfiniCon Systems InfiniFabric (Sw A Dev
D000006-000 Rev A5-35
Page 90
5 – Detailed Descriptions of Command LineTools
Fabric Analysis Tools
B)
0 0x0010 0x00066a10280002cd Noop Noop
2 4x 2.5Gb
4 4x 2.5Gb
1 Connected SMs in Fabric:
State GUID Name
Master 0x00066a00d8000123 InfiniCon Systems InfinIO9024
Each iba_report allows for various levels of detail. Increasing detail is shown
as further indentation of the additional information. The -d option to
iba_report controls the detail level. The default is 2. Values from 0-n are
Q
5-36D000006-000 Rev A
Page 91
Q
5 – Detailed Descriptions of Command LineTools
permitted. The maximum detail per report varies, but most have less than 5
detail levels.
For example, the above report when run at detail level 0 outputs:
[root@duster root]# iba_report -d 0
Node Type Brief Summary
14 Connected CAs in Fabric:
3 Connected Switches in Fabric:
1 Connected SMs in Fabric:
You will notice this is a nice summary of fabric components and
is very similar to fabric_info.
At the next level of detail you get a report with a little more
detail:
[root@duster root]# iba_report -d 1
Node Type Brief Summary
14 Connected CAs in Fabric:
NodeGUID Type Name
0x0002c9020020e0d4 CA coyote1
0x00066a00580001e0 CA VEx in Chassis 0x00066a005000010c, Slot 2
0x00066a0098000001 CA julio
0x00066a00980001b8 CA orc
0x00066a0098000380 CA goblin
0x00066a0098000384 CA cuda
0x00066a00980003a6 CA erik
0x00066a00980006a2 CA goblin
0x00066a0098000849 CA rockaway
0x00066a0098002813 CA brady
0x00066a0098002854 CA brady
0x00066a0098003f81 CA ibm345
0x00066a009800447b CA duster
0x00066a0098004a73 CA erik
3 Connected Switches in Fabric:
NodeGUID Type Name
0x00066a00280002cd SW InfiniCon Systems InfiniFabric (Sw A Dev
A)
D000006-000 Rev A5-37
Page 92
5 – Detailed Descriptions of Command LineTools
Fabric Analysis Tools
0x00066a00d8000123 SW InfiniCon Systems InfinIO9024
0x00066a10280002cd SW InfiniCon Systems InfiniFabric (Sw A Dev
B)
1 Connected SMs in Fabric:
State GUID Name
Master 0x00066a00d8000123 InfiniCon Systems InfinIO9024
The above examples were all performed with a single report, the brnodes (Brief
Nodes) report. However this is just one of the many topology reports which
iba_report can generate the others include:
❥ nodes - a more verbose form of brnode which can provide much greater
levels of detail to drill down into all the details of every node, even down to all
the port state, IOUs/IOCs/Services, Port counters.
❥ comps and brcomps are very similar to brnodes and nodes, except the
reports are organized around systems. The grouping into systems is based
on system image guids for each node. This report will help to present more
complex systems (such as servers with multiple HCAs or large switches
composed of multiple IB Switch chips).
Q
NOTE:All SilverStorm switches implement a system image GUID and will
therefore be properly grouped. However, some third-party devices do not
implement the system image GUID and may report a value of 0. In such
a case iba_report will treat each component as an independent
system.
❥ links - This report presents all the links in the fabric. The output is very
concise and helps to identify the connectivity between nodes in the fabric.
❥ extlinks - All the external links in the fabric (eg. those between different
systems).
❥ ious - This is somewhat similar to the nodes reports, however the focus is
around IOUs/IOCs and IO Services in the fabric. This report can be used to
identify various IO devices in the fabric and their capabilities (such as the
SilverStorm EVIC and FVIC Virtual IO Controllers or IBTA compliant
direct-attach IB storage).
❥ otherports - All the ports which are not connected to this fabric. This report
will identify additional ports on CAs or Switches which are not connected to
this fabric. For switches these represent unused ports. For CAs these may
be ports connected to other fabrics or unused ports.
The above reports are all summaries of the present state of the fabric. These
reports can be very helpful to analyze the configuration of the fabric and or verify
it was installed consistent with the desired design and configuration.
5-38D000006-000 Rev A
Page 93
Q
5 – Detailed Descriptions of Command LineTools
However, iba_report does not stop there. Additionally, iba_report has
reports that will help to analyze the operational characteristics of the fabric and
help to identify bottlenecks and faulty components in the fabric.
To assist in this area, iba_report also supports the following reports:
❥ slowlinks - identifies links which are running slower than expected. This
helps to pinpoint bad cables or components in the fabric, such as a 4x cable
that is poorly-connected and therefore only runs at 1x link width. The analysis
includes both link speed and width.
❥ slowconfiglinks - this extends on the slowlinks report to also report links
which have been configured (most likely by software) to run at a width or
speed below their potential. Such as DDR capable links which have been
forced to run at SDR rates.
❥ slowconnlinks - this further extends on the slowconfiglinks report to also
report links which are cabled such that one of the ends of the link will never
run to its potential. Such as a DDR capable HCA connected to an SDR switch.
❥ misconfiglinks - this is similar to slowconfiglinks in that it reports
links which have been configured to run below their potential. However it
does not include links which are running slower than expected.
❥ misconnlinks - this is similar to slowconnlinks in that it reports links
which have been connected between ports of different speed potential.
However it does not include links which are running slower than expected,
nor links which have been configured to run slower than their potential.
❥ errors - this performs a single point in time analysis of the PMA port counters
for every node and port in the fabric. All the counters are compared against
configured thresholds (defaults are those in the iba_mon.conf file). Any
link whose counters exceed these thresholds are listed (and depending on
the detail level the exact counter and threshold will be reported). This is a
powerful way to identify marginal links in the fabric such as bad or loose cables
or damaged components.
❥ route - This permits the user to identify two end points in the fabric (by node
name, node GUID, port name, port GUID, system image GUID, LID, port GID,
IOC GIUD or IOC name) and obtain a list of all the links and components used
when these two end points communicate. If there are multiple paths between
the end points (such as a CA with 2 connected ports or a system with 2 CAs),
the route for every available path (based on presently configured routing
tables) will be reported.
The above set of reports can therefore be very powerful ways to obtain point in
time status and problem analysis for the fabric.
D000006-000 Rev A5-39
Page 94
5 – Detailed Descriptions of Command LineTools
Fabric Analysis Tools
5.4.3.2
Topology Verification
iba_report provides a flexible way to identify changes to the fabric or the
appropriate reassembly of the fabric after a move (for example after staging and
testing the fabric in a remote location before final installation at a customer site).
In this mode of operation, all the above reports are available, however the types
of information output can be filtered. For example, using the -P option,
information which would not persist across a fabric reboot (such as LIDs and
error counters) will be omitted from the report (and marked out with xxx). Such
a report can be saved for later comparison to a future report. Since iba_report
produces simple text reports, standard tools such as sdiff (i.e., side by side
diff) can be used for easy comparison and analysis of what changed.
Given the wealth of reports available, the user can select the information they
want to save. For ease of use an all report is available which includes all the
reports of general interest.
If software configuration changes are anticipated (such as adjusting the timeouts
the SM configures in the fabric), the iba_report -H option can be used. This
will further limit the report to only include hardware information. This is a superset
of -P and omits more information.
Q
A related but independent option is -N. This will omit all the node and IOC names
from the report. If changes are anticipated in this area, this option can be used
so future diffs will not report changes in names.
5.4.3.3
Focused Reports
One of the more powerful features of iba_report is the ability to focus a report
on a subset of the fabric. Using the -F option the user can specify a node name,
node name pattern, node guid, node type, port guid, IOC name, IOC name
pattern, ioc guid, ioc type, system image guid, port gid, port rate, lid or SM. The
subsequent report will indicate the total components in the fabric but will only
report on those which relate to the focus area. For example in a nodes report,
if a port is specified for focus, only the node containing that port will be reported
on. In a links report, only the link using that port will be reported.
Notice that a focus level that is different from the orientation of the report may
be chosen. For example if a node name is specified as the focus for the links
report, a report of all the links to that node will be provided. This could include
multiple switch ports or CA ports.
By carefully using this feature of report focus, reverse lookups can be done. For
example, doing a brnodes report with a focus on a LID will reverse lookup the
LID and indicate what node it is for.
5-40D000006-000 Rev A
Page 95
Q
When focusing a report, it can sometimes be helpful to also use a detail level of
0 or 1. In this case the report will show only a count of number of matches (for
detail 0) and just the highest level of the entity which matches (for detail 1).
5.4.3.3.1
Advanced Focus
The node name, node name pattern, node guid, node type, IOC name, IOC name
pattern, IOC GUID, IOC type and system image GUID also allow for a port
number specifier. This permits the focus to be limited to the given port number.
If the selection resolves to multiple switches or CAs (for example a system
composed of multiple nodes), all ports on the present fabric matching the given
port number will be selected.
An even more advanced form of focus is to focus on the route between any two
points. This will focus on all the ports involved in that route and can be an
excellent way to focus in quickly on a performance or error situation which is
being reported between 2 specific points in the fabric (Such as a
StatusTimeoutRetry that MPI may be reporting between 2 processes in its
run).
5 – Detailed Descriptions of Command LineTools
Focus can use glob style patterns. This permits a wildcarded focus by node
name or IOC name. If a naming convention is used for fabric components, this
can provide a powerful way to focus reports on nodes. For example, if the host
names are prefixed with an indication of their purpose, searches can be
performed based on the purpose of the node. For example if the following naming
convention is used: l### = login node ###, n### = compute node ###, s### =
storage node ###, etc. Node purposes can be focused by using patterns such
as 'l*', 'n*' or 's*'.
NOTE:A glob style pattern is a shell style wildcard pattern as used by bash and
many other tools. When using such patterns they should be single quoted
so that the shell will not try to expand them to match local file names.
Typically a focused report will include a summary at its start of the items focused
on. When the focus has a large scope, this list can be quite long. In this case
the -Q option can be used to omit this section from the report.
D000006-000 Rev A5-41
Page 96
5 – Detailed Descriptions of Command LineTools
Fabric Analysis Tools
5.4.3.3.2
Focus Examples:
Below are some examples of using the focus options:
iba_report permits custom scripting. As previously mentioned, options like
-H, -P and -N can aid the generation of reports that can be diff'ed.
In addition the -x option permits output reports to be generated in XML format.
The XML hierarchy is similar to the textual reports. Use of XML permits other
XML tools (such as PERL XML extensions) to easily parse iba_report output
such that scripts can be created to further search and refine report output formats.
This allows iba_report to be integrated into custom scripts. It can also be used
to generate customer-specific new report formats, cross reference iba_report
with other site-specific information, etc.
5-42D000006-000 Rev A
Page 97
5 – Detailed Descriptions of Command LineTools
Q
5.4.3.4.1
Using iba_report to monitor for fabric changes
iba_report can easily be used in other scripts. For example the following simple
script could be run as a cron job to identify if the fabric has changed as compared
to the initial design:
#!/bin/bash
# specify some filenames to use
expected_config=/usr/local/report.master # master copy of
config previously created
config=/tmp/report$$ # where we will generate new report
diffs=/tmp/report.diff$$ # where we will generate diffs
iba_report -o all -d 5 -P > $config 2>/dev/null
if ! diff $config $expected_config > $diffs 2>/dev/null
then
# notify admin, for example mail the new report to the admin
cat $diffs $expected_config $config |
mail -s "fabric change detected" admin@somewhere
fi
rm -f $config $diffs
D000006-000 Rev A5-43
Page 98
5 – Detailed Descriptions of Command LineTools
Fabric Analysis Tools
5.4.3.5
Q
Sample Output
5.4.3.5.1
Analysis of all ports in fabric for errors, inconsistent connections, bad
cables