HP NonStop NS-series, Integrity NonStop NS-series Operation Manual

HP Integrity NonStop NS-Series Operations Guide
Abstract
This guide describes how to perform routine system hardware operations for HP Integrity NonStop NS-series servers. These tasks include monitoring the system, performing common operations tasks, and performing routine hardware maintenance. This guide is written for system operators.
Product Version
N.A.
Supported Release Version Updates (RVUs)
This guide supports H06.08 and all subsequent H-series RVUs until otherwise indicated by its replacement publication.
Part Number Published
529869-005 November 2006
Document History
Part Number Product Version Published
529869- 003 N.A. February 2006 529869- 004 N . A. August 2006 529869- 005 N.A. November 2006
Hewlett-Packard Company—529869-005
i
HP Integrity NonStop NS-Series Operations Guide
Index Examples Figures Tables
What’s New in This Manual xiii
Manual Information xiii New and Changed Information xiii
About This Guide xv
Who Should Use This Guide xv What Is in This Guide xvi Where to Get More Information xvii Notation Conventions xviii
1. Introduction to Integrity NonStop NS-Series Operations
When to Use This Section 1-2 Understanding the Operational Environment 1-2 What Are the Operator Tasks? 1-2
Monitoring the System and Performing Recovery Operations 1-2 Preparing for and Recovering from Power Failures 1-3 Stopping and Powering Off the System 1-3 Powering On and Starting the System 1-3 Creating Startup and Shutdown Files 1-3 Performing Preventive Maintenance
1-3
Operating Disk Drives and Tape Drives
1-3
Responding to Spooler Problems 1-4 Updating Firmware 1-4
Determining the Cause of a Problem:
A Systematic Approach 1-4 A Problem-Solving Worksh eet 1-4 Task 1: Get the Facts 1-6 Task 2: Find and Eliminate the Cause of the Problem 1-7 Task 3: Escalate the Problem If Necessary 1-8 Task 4: Prevent Future Problems 1-9
Contents
HP Integrity NonStop NS-Series Operations Guide—529869-005
ii
2. Determining Your System Configuration
Logging On to an Integrity NonStop Server 1-9
System Consoles 1-9 Opening a TACL Window 1-10 Overview of OSM Applications 1-1 1 Launching OSM Applications 1-11
Service Procedures 1-12
Support and Service Library 1-12
2. Determining Your System Configuration
When to Use This Section 2-1 Modular Hardware Components 2-2
Differences Between Integrity NonStop NS-Series Systems 2-2 Terms Used to Describe System Hardware Components 2-4
Recording Your System Configuration 2-4 Using SCF to Determine Your System Configuration 2-5
SCF System Naming Conventions 2-5 SCF Configuration Files 2-5 Using SCF to Display Subsystem Configuration Information 2-6 Displaying SCF Configuration Information for Subsystems 2-9 Additional Subsystems Controlled by SCF 2-13 Displaying Configuration Information—SCF Examples 2-15
3. Overview of Monitoring and Recovery
When to Use This Section 3-1 Functions of Monitoring 3-2 Monitoring Tasks 3-2
Working With a Daily Checklist 3-2 Tools for Checking the Status of System Hardware 3-3 Additional Monitoring Tasks 3-6
Monitoring and Resolving Problems—An Approach 3-7 Using OSM to Monitor the System 3-7
Using the OSM Service Connection 3-7 Recovery Operations for Problems Detected by OSM 3-12 Monitoring Problem Incident Reports 3-12
Using SCF to Monitor the System 3-12
Determining Device States 3-13
Automating Routine System Monitoring 3-16 Using the Status LEDs to Monitor the System 3-20 Related Reading 3-22
Contents
HP Integrity NonStop NS-Series Operations Guide—529869-005
iii
4. Monitoring EMS Event Messages
4. Monitoring EMS Event Messages
When to Use This Section 4-1 What Is the Event Management Service (EMS)? 4-1 Tools for Monitoring EMS Event Messages 4-1
OSM Event Viewer 4-2 EMSDIST 4-2 ViewPoint 4-2 Web ViewPoint 4-2
Related Reading 4-2
5. Processes: Monitoring and Recovery
When to Use This Section 5-1 Types of Processes 5-1
System Processes 5-1 I/O Processes (IOPs) 5-2 Generic Processes 5-2
Monitoring Processes 5-3
Monitoring System Processes 5-3 Monitoring IOPs 5-4 Monitoring Generic Processes 5-4
Recovery Operations for Processes 5-6 Related Reading 5-6
6. Communications Sub systems: Mon itoring and Recovery
When to Use This Section 6-1 Communications Subsystems 6-1
Local Area Networks (LANs) and Wide Area Networks (WANs)
6-2
Monitoring Communications Subsystems and Their Objects
6-4
Monitoring the SLSA Subsystem 6-4 Monitoring the WAN Subsystem 6-6 Monitoring the NonStop TCP/IP Subsystem 6-9 Monitoring Line-Handler Process Status 6-10 Tracing a Communications Line 6-12
Recovery Operations for Communications Subsystems 6-13 Related Reading 6-13
Contents
HP Integrity NonStop NS-Series Operations Guide—529869-005
iv
7. ServerNet Resources: Monitoring and Recovery
7. ServerNet Resources: Monitoring and Recovery
When to Use This Section 7-1 ServerNet Communications Network 7-1 System I/O ServerNet Connections 7-4 Monitoring the Status of the ServerNet Fabrics 7-4
Monitoring the ServerNet Fabrics Using OSM 7-5 Monitoring the ServerNet Fabrics Using SCF 7-6
Related Reading 7-8
8. I/O Adapters and Modules: Monitoring and Recovery
When to Use This Section 8-1 I/O Adapters and Modules 8-2
Fibre Channel ServerNet Adapter (FCSA 8-2 Gigabit Ethernet 4-Port Adapter (G4SA) 8-2 4-Port ServerNet Extender (4PSE) 8-3
Monitoring I/O Adapters and Modules 8-3
Monitoring the FCSAs 8-4 Monitoring the G4SAs 8-5 Monitoring the 4PSEs 8-7
Recovery Operations for I/O Adapters and Modules 8-7 Related Reading 8-8
9. Processors and Components: Monitoring and Recovery
When to Use This Section 9-1 Overview of the NonStop Blade Complex 9-2 Monitoring and Maintaining Processors 9-4
Monitoring Processors Automatically Using TFDS
9-4
Monitoring Processor Status Using the OSM Low-Level Link
9-5
Monitoring Processor Status Using the OSM Service Connection 9-5 Monitoring Processor Performance Using ViewSys 9-7
Identifying Processor Problem s 9-7
Processor or System Hangs 9-7 Processor Halts 9-8 OSM Alarms and Attribute Values 9-8
Contents
HP Integrity NonStop NS-Series Operations Guide—529869-005
v
10. Disk Drives: Monitoring and Recovery
Recovery Operations for Processors 9-9
Recovery Operations for a Processor Halt 9-9 Halting One or More Processors 9-10 Reloading a Single Processor on a Running Server 9-10 Recovery Operations for a System Hang 9-14 Enabling/Disabling Processor and System Freeze 9-15 Freezing the System and Freeze-Enabled Processors 9-15 Dumping a Processor to Disk 9-15 Backing Up a Processor Dump to Tape 9-19 Replacing Processor Memory 9-19 Replacing the Processor Board and Processor Entity 9-19 Submitting Information to Your Service Provider 9-19
Related Reading 9-22
10. Disk Drives: Monitoring and Recovery
When to Use This Section 10-1 Overview of Disk Drives 10-2
Internal SCSI Disk Drives 10-2 M8xxx Fibre Channel Disk Drives 10-3 Enterprise Storage System (ESS) Disks 10-3
Monitoring Disk Drives 10-4
Monitoring Disk Drives With OSM 10-4 Monitoring Disk Drives With SCF 10-5 Monitoring the State of Disk Drives 10-9 Monitoring the Use of Space on a Disk Volume 10-9 Monitoring the Size of Database Files 10-9 Monitoring Disk Configur ation and Perf orma nce 10-10
Identifying Disk Drive Problems 10-1 1
Internal SCSI Disk Drives 10-11 M8xxx Fibre Channel Disk Drives 10-11
Recovery Operations for Disk Drives 10-12
Recovery Operations for a Down Disk or Down Disk Path 10-14 Recovery Operations for a Nearly Full Database File 10-15
Related Reading 10-15
Contents
HP Integrity NonStop NS-Series Operations Guide—529869-005
vi
11. Tape Dr ives: Monitoring and Recovery
11. Tape Drives: Monitoring and Recovery
When to Use This Section 11-1 Overview of Tape Drives 11-1 Monitoring Tape Drives 11-2
Monitoring Tape Drive Status With OSM 11-2 Monitoring Tape Drive Status With SCF 11-5 Monitoring Tape Drive Status With MEDIACOM 11-6 Monitoring the Status of Labeled-Tape Operations 11-7
Identifying Tape Drive Problems 11-7 Recovery Operations for Tape Drives 11-8
Recovery Operations Using the OSM Service Connection 11-8 Recovery Operations Using SCF 11-9
Related Reading 11-9
12. Print ers and Termin a ls : Monitoring and R e co v ery
When to Use This Section 12-1 Overview of Printers and Terminals 12-1 Monitoring Printer and Collector Process Status 12-2
Monitoring Prin ter Status 12-2 Monitoring Collector Process Status 12-2
Recovery Operations for Printers and Terminals 12-3
Recovery Operations for a Full Collector Process 12-3
Related Reading 12-3
13. Applications: Monitoring and Recovery
When to Use This Section 13-1 Monitoring TMF
13-1
Monitoring the Status of TMF
13-2
Monitoring Data Volumes 13-2 TMF States 13-3
Monitoring the Status of Pathway 13-4
PATHMON States 13-5
Related Reading 13-6
Contents
HP Integrity NonStop NS-Series Operations Guide—529869-005
vii
14. Pow er Failures: Preparation and Recovery
14. Power Failures: Preparation and Recovery
When to Use This Section 14-2 System Response to Power Failures 14-2
NonStop NS-Series Cabinets (Modular Cabinets) 14-2 NonStop S-Series I/O Enclosures 14-2 External Devices 14-2 ESS Cabinets 14-3 Air Conditioning 14-3
Preparing for Power Failure 14-3
Set Ride-Through Time 14-3 Configure OSM Power Fail Support 14-3 Monitor Power Supplies 14-4 Monitor Batteries 14-4 Maintain Batteries 14-4
Power Failure Recovery 14-4
Procedure to Recover From a Power Failure 14-5 Setting System Time 14-5
Related Reading 14-5
15. Starting and Stopping the System
When to Use This Section 15-2 Powering On a System 15-2
Powering On the System From a Low Power State 15-3 Powering On the System From a No Power State 15-3
Starting a System 15-5
Loading the System 15-5 Starting Other System Components 15-9 Performing a System Load 15-9 Performing a System Load From a Specific Processor 15-11 Reloading Processors 15-12
Minimizing the Frequency of Planned Outages 15-14
Anticipating and Planning for Change 15-14
Stopping Application, Devices, and Processes 15-14 Stopping the System 15-16
Alerts 15-16 Halting All Processors Using OSM 15-16
Powering Off a System 15-17
System Power-Off Using OSM 15-17 System Power-Off Using SCF 15-17 Emergency Power-Off Procedure 15-18
Contents
HP Integrity NonStop NS-Series Operations Guide—529869-005
viii
16. Creating Startup and Shutdown Files
Troubleshooting and Recovery Operations 15-18
Fans Are Not Turning 15-18 System Does Not Appear to Be Powered On 15-19 Green LED Is Not Lit After POSTs Finish 15-19 Amber LED on a Component Remains Lit After the POST Finishes 15-19 Components Fail When Testing the Power 15-19 Recovering From a System Load Failure 15-20 Getting a Corrupt System Configuration File Analyzed 15-21 Recovering From a Reload Failure 15-21 Exiting the OSM Low-Level Link 15-22 Opening Startup Event Stream and Startup TACL Windows 15-22
Related Reading 15-24
16. Creating Startup and Shutdown Files
Automating System Startup and Shutdown 16-2
Managed Configuration Services (MCS) 16-2 Startup 16-2 Shutdown 16-3 For More Information 16-3
Processes That Represent the System Console 16-3
$YMIOP.#CLCI 16-3 $YMIOP.#CNSL 16-3 $ZHOME 16-4 $ZHOME Alternative 16-4
Example Command Files 16-4 CIIN File 16-5
Establishing a CIIN File 16-6 Modifying a CIIN File 16-6 If a CIIN File Is Not Specified or Enabled in OSM 16-7 Example CIIN Files 16-8
Writing Efficient Startup and Shutdown Command Files 16-9
Command File Syntax 16-9 Avoid Manual Intervention 16-10 Use Parallel Processing 16-10 Investigate Product-Specific Techniques 16-11
How Process Persistence Affects Configuration and Startup 16-11 Tips for Startup Files 16-11
Contents
HP Integrity NonStop NS-Series Operations Guide—529869-005
ix
17. Preventive Maintenance
Startup File Examples 16-12
System Startup File 16-12 Spooler Warm-Start File 16-14 TMF Warm-Start File 16-14 TCP/IP Stack Configuration and Startup File 16-14 CP6100 Lines Startup File 16-17 ATP6100 Lines Startup File 16-17 X.25 Lines Startup File 16-17 Printer Line Startup File 16-18 Expand-Over-IP Line Startup File 16-18 Expand Direct-Connect Line Startup File 16-18
Tips for Shutdown Files 16-19 Shutdown File Examples 16-19
System Shutdown File 16-20 CP6100 Lines Shutdown File 16-21 ATP6100 Lines Shutdown File 16-21 X.25 Lines Shutdown File 16-21 Printer Line Shutdown File 16-22 Expand-Over-IP Line Shutdown File 16-22 Direct-Connect Line Shutdown File 16-22 Spooler Shutdown File 16-23 TMF Shutdown File 16-23
17. Preventive Maintenance
When to Use This Section 17-1 Monitoring Physical Facilities 17-1
Checking Air Temperature and Humidity 17-1 Checking Physical Security 17-2 Maintaining Order and Cleanliness 17-2 Checking Fire-Protectio n Systems 17-2
Cleaning System Components 17-2
Cleaning an Enclosure 17-2 Cleaning and Maintaining Printers 17-2 Cleaning Tape Drives 17-3
Handling and Storing Cartridge Tapes 17-3
Contents
HP Integrity NonStop NS-Series Operations Guide—529869-005
x
A. Operational Differences Between Systems
Running G-Series and H-Series RVUs
A. Operational Differences Between Systems Runnin g G-Series
and H-Series RVUs
B. Tools and Utilities for Operations
When to Use This Appendix B-1 BACKCOPY B-2 BACKUP B-2 Disk Compression Program (DCOM) B-2 Disk Space Analysis Program (DSAP) B-2 EMSDIST B-2 Event Management Service Ana lyzer (EM SA) B-2 File Utility Program (FUP) B-3 Measure B-3 MEDIACOM B-3 NonStop NET/MASTER B-3 NSKCOM and the Kernel-Managed Swap Facility (KMSF) B-3 OSM Package B-3 PATHCOM B-4 PEEK B-4 RESTORE B-4 SPOOLCOM B-4 Subsystem Control Facility (SCF) B-4 HP Tandem Advanced Command Language (TACL) B-5 TMFCOM B-5 Web ViewPoint B-5 ViewPoint B-5 ViewSys
B-6
C. Related Reading D. Converting Numbers
When to Use This Appendix D-1 Overview of Numbering Systems D-2 Binary to Decimal D-3 Octal to Decimal D-4 Hexadecimal to Decimal D-5 Decimal to Binary D-7 Decimal to Octal D-8 Decimal to Hexadecimal D-9
Contents
HP Integrity NonStop NS-Series Operations Guide—529869-005
xi
Safety and Compliance
Safety and Compliance Index
Examples
Example 2-1. SCF LISTDEV Command Output 2-7 Example 2-2. SCF ADD DISK Command Output 2-11 Example 2-3. SCF INFO PROCESS Command Output 2-15 Example 2-4. SCF INFO SAC Command Output 2-15 Example 2-5. SCF INFO PROCESS $ZZWAN Command Output 2-16 Example 2-6. SCF INFO LINE Command Output 2-16 Example 3-1. SCF STATUS TAPE Command 3-13 Example 3-2. System Monitoring Command File 3-16 Example 3-3. System Monitoring Output File 3-17
Figures
Figure 3-1. OSM Management: System Icons Indicate Problems Within 3-8 Figure 3-2. Expanding the Tree Pane to Locate the Source of Problems 3-9 Figure 3-3. Attributes Tab 3-10 Figure 3-4. Using System Status Icons to Monitor Multiple Systems 3-10 Figure 3-5. Alarm Summary Dialog Box 3-11 Figure 3-6. Problem Summary Dialog Box 3-11 Figure 7-1. Integrity NonStop NS16000 System 7-2 Figure 7-2. Integrity NonStop NS14000 System with IOAM Enclosure 7-3 Figure 7-3. I/O Connections to the PICS in a P-Switch 7-4 Figure 9-1. Modular NSAA With One NonStop Blade Complex and Four
Processors 9-3
Figure 9-2.
Processor Status Display 9-5
Figure 9-3.
OSM Representation of Processor Complex 9-6 Figure 11-1. OSM: Monitoring Tape Drives Connected to an FCSA 11-3 Figure 11-2. OSM: Monitoring Tape Drives Connected to an IOMF2 11-4 Figure 15-1. System Load Dialog Box 15-10 Figure 15-2. Logical Processor Reload Parameters 15-13 Figure 15-3. Opening a Startup TACL Window 15-22 Figure 15-4. OutsideView Buttons on the Windows Toolbar 15-22 Figure D-1. Binary to Decimal Conversion D-3 Figure D-2. Octal to Decimal Conversion D-4 Figure D-3. Hexadecimal to Decimal Conversion D-6
Contents
HP Integrity NonStop NS-Series Operations Guide—529869-005
xii
Tables
Tables
Table 1-1. Problem-Solving Worksheet 1-5 Table 2-1. Key Subsystems and Their Logical Device Names and Device
Types 2-8 Table 2-2. Displaying Information for the TCP/IP Subsystem ($ZTCO) 2-9 Table 2-3. Displaying Information for the Kernel Subsystem ($ZZKRN) 2-10 Table 2-4. Displaying Information for the Storage Subsystem ($ZZST0) 2-10 Table 2-5. Displaying Information for the SLSA Subsystem ($ZZLAN) 2-12 Table 2-6. Displaying Information for the WAN Subsystem ($ZZWAN) 2-13 Table 2-7. Subsystem Objects Controlled by SCF 2-13 Table 3-1. Monitoring System Components 3-4 Table 3-2. Daily Tasks Checklist 3-6 Table 3-3. SCF Object States 3-14 Table 3-4. Status LEDs and Their Functions 3-20 Table 3-5. Related Reading for Monitoring 3-22 Table 4-1. Related Reading for Monitoring EMS Event Messages 4-2 Table 6-1. Related Reading for Communications Lines and Devices 6-13 Table 8-1. Service, Flash Firmware, Flash Boot Firmware, Device, and Enabled
States for the FCSA 8-4 Table 8-2. Service, Device, and Enabled States for the G4SA 8-6 Table 8-3. Related Reading for I/O Adapters and Modules 8-8 Table 9-1. Other Files to Submit to Your Service Provider 9-20 Table 9-2. Additional Processor Dump Information for Your Service Provider 9-21 Table 9-3. Related Reading for Monitoring and Recovery Operations on
Processors 9-22 Table 10-1. Primary and Backup Path States for Disk Drives 10-9 Table 10-2. Possible Causes of Common Disk Drive Problems 10-11 Table 10-3.
Common Recovery Operations for Disk Drives 10-12 Table 11-1.
Common Tape Drive Problems 11-7 Table 1 1-2. Related Reading for Tapes and Tape Drives 11-9 Table 13-1. TMF States 13-3 Table 15-1. System Load Paths in Order of Use 15-7 Table 15-2. Related Reading for Starting and Stopping a System 15-24 Table C-1. Related Reading for Tools and Utilities C-1 Table D-1. Descriptions of Number Systems D-2
HP Integrity NonStop NS-Series Operations Guide—529869-005
xiii
What’s New in This Manual
Manua l In forma tion
HP Integrity NonStop NS-Series Operations Guide
Abstract
This guide describes how to perform routine system hardware operations for HP Integrity NonStop NS-series servers. These tasks include monitoring the system, performing common operations tasks, and performing routine hardware maintenance. This guide is written for system operators.
Product Version
N.A.
Supported Release Version Updates (RVUs)
This guide supports H06.08 and all subsequent H-series RVUs until otherwise indicated by its replacement publication.
Document History
New and Changed Information
This manual has b een up dated to i nclude refer ences t o HP In tegr ity NonStop NS14000 and NS1000 servers containing VIO enclosures (in place of an IOAM enclosure).
Part Numb er Published
529869- 005 November 2006
Part Number Product Ve rsion Published
529869- 003 N.A. February 2006 529869- 004 N . A. August 2006 529869- 005 N.A. November 2006
What’s New in This Manual
HP Integrity NonStop NS-Series Operations Guide—529869-005
xiv
New and Changed Information
HP Integrity NonStop NS-Series Operations Guide—529869-005
xv
About This Guide
This guide describes how to perform routine system hardware operations for HP Integrity NonStop NS-series servers on H-series release version updates.
This guide is primarily geared toward commercial type NonStop NS-series servers (see Differences Between Integrity NonStop NS-Series Systems on page 2-2 for high- level architectural and hardware differences between the various commercial models). While basic monitoring principles, such as Using OSM to Monitor the System on page 3-7, apply to Telco as well as commercial systems, refer to the NonStop NS-Series Carrier Grade Server Manual for hardware details and service procedures specific to Telco systems.
Use this guide along with the Guardian User’s Guide and the written policies and procedures of your company regarding:
General operations
Security
System backups
Starting and stopping applications
Who Should Use This Guide
This guide is written for operators who perform system hardware operations. It provides an overview of the routine tasks of monitoring the system and guides the operator through the infrequent tasks of starting and stopping the system and performing online recovery on the system.
Note. NS-series refers to the hardware that makes up the server. H-series refers to the software that runs on the server.
The term, NonSto p s erv er, refers to both NonStop S-series servers and Integrity NonStop NS-series servers.
About This Guide
HP Integrity NonStop NS-Series Operations Guide—529869-005
xvi
What Is in This G uide
What Is in This Guide
Section or Appendix Section and Appendix Titles
Section 1
Introduction to Integrity NonStop NS-Series Operations
Section 2
Determining Your System Configuration
Section 3
Overview of Monitoring and Recovery
Section 4
Monitoring EMS Event Messages
Section 5
Processes: Monitoring and Recovery
Section 6
Communications Subsystems: Monitoring and Recovery
Section 7
ServerNet Resources: Monitoring and Recovery
Section 8
I/O Adapters and Modules: Monitoring and Recovery
Section 9
Processors and Components: Monitoring and Recovery
Section 1 0
Disk Drives: Monitoring and Recovery
Section 11
Tape Drives: Monitoring and Recovery
Section 1 2
Printers and Terminals: Monitoring and Recovery
Section 1 3
Applications: Monitoring and Recovery
Section 1 4
Power Failures: Preparation and Recovery
Section 1 5
Starting and Stopping the System
Section 1 6
Creating Startup and Shutdown Files
Section 1 7
Preventive Maintenance
Appendix A
Operational Differences Between Systems Running G-Series and H-Series RVUs
Appendix B
Tools and Utilities for Operations
Appendix C
Related Reading
Appendix D
Converting Numbers
About This Guide
HP Integrity NonStop NS-Series Operations Guide—529869-005
xvii
Where to Get More Info rma tion
Where to Get More Information
Operations planning and operations management practices appear in these manuals:
NonStop NSxxxx Planning Guide for your NS16000, NS14000, or NS1000 server
Availability Guide for Application Design
Availability Guide for Change Management
Availability Guide for Problem Management
For comprehensive information about performing operations tasks for an Integrity NonStop NS-series server, you need both this guide and the Guardian User’s Guide. The Guardian User’s Guide describes some tasks not covered in this guide, such as supporting users of the system.
The Guardian User’s Guide describes routine tasks common to system operations on all NonStop servers. Instructions and examples show how to support users of the system, how to monitor operator messages, how to control the spooler, and how to manage disks and tapes. Numerous tools that support these functions are also documented. Some monitoring procedures in the Guardian User’s Guide have information about using only the Subsystem Control Facility (SCF). That guide does not generally describe any monitoring procedures using the OSM packages.
Information about the use of OSM, such as how to migrate from TSM to OSM, how to install and configure OSM server and client components, and how to use the OSM Service Connection, appear in these manuals:
OSM Migration and Configuration Guide
NonStop System Console Installer Guide
OSM Service Connection User’s Guide (available in NTL and as online help within the OSM Service Connection)
Servers that are connected in ServerNet clusters require special installation and operating procedures that are not documented in this manual. Such information is instead provided with the appropriate cluster documentation and the ServerNet Cluster Supplement for Integrity NonStop NS-Series Servers.
In the 6780 ServerNet cluster environment, installation and operating procedures are documented in these manuals:
ServerNet Cluster 6780 Planning and Installation Guide
ServerNet Cluster 6780 Operations Guide
Installation and operating procedures for earlier server clusters (those using 6770 switches) are documented in:
ServerNet Cluster Manual
Note. For manuals not available in the H-series collection, please refer to the G-series collection on NTL.
About This Guide
HP Integrity NonStop NS-Series Operations Guide—529869-005
xviii
Support and Service Library
OSM is the required system management tool for servers that use 6780 switches in ServerNet clusters, but OSM also provides system management for earlier versions of ServerNet clusters.
For other documentation related to operations tasks, refer to Appendix C, Related
Reading.
Support and Service Library
These NTL Support and Service library categories provide procedures, part numbers, troubleshooting tips, and tools for servicing NonStop S-series and Integrity NonStop NS-series systems:
Hardware Service and Maintenance Publications
Service Information
Service Procedures
Tools and Download Files
Troubleshooting Tips
Within these categories, where applicable, content might be further categorized according to server or enclosure type.
Authorized service providers can also order the NTL Support and Service Library CD:
Channel Partners and Authorized Service Providers: Order the CD from the SDRC at https://scout.nonstop.compaq.com/SDRC/ce.htm.
HP employees: Subscribe at World on a Workbench (WOW). Subscribers automatically receive CD updates. Access the WOW order form at http://hps.knowledgemanagement.hp.com/wow/order.asp.
Notation Conventions
Hypertext Links
Blue underline is used to indicate a hypertext link within text. By clicking a passage of text with a blue underline, you are taken to the location described. For example:
This requirement is described under Backup DAM Volumes and Physical Disk
Drives on page 3-2.
General Sy ntax Notation
The following list summarizes the notation conventions for syntax presentation in this manual.
About This Guide
HP Integrity NonStop NS-Series Operations Guide—529869-005
xix
General Syntax Notation
UPPERCASE LETTERS. Uppercase letters indicate keywords and reserved words; enter
these items exactly as shown. Items not enclosed in brackets are required. For example:
MAXATTACH
lowercase italic letters. Lowercase italic letters indicate variable items that you supply.
Items not enclosed in brackets are required. For example:
file-name
computer type. Computer type letters within text indicate C and Open System Services
(OSS) keywords and reserved words; enter these items exactly as shown. Items not enclosed in brackets are required. For example:
myfile.c
italic computer type. Italic computer type letters within text indicate C and Open
System Services (OSS) variable items that you supply. Items not enclosed in brackets are required. For example:
pathname
[ ] Brackets. Brackets enclose optional syntax items. For example:
TERM [\system-name.]$terminal-name INT[ERRUPTS]
A group of items enclosed in brackets is a list from which you can choose one item or none. The items in the list may be arranged either vertically, with aligned brackets on each side of the list, or horizontally, enclosed in a pair of brackets and separated by vertical lines. For example:
FC [ num ] [ -num ] [ text ]
K [ X | D ] address
{ } Braces. A group of items enclosed in braces is a list from which you are required to
choose one item. The items in the list may be arranged either vertically, with aligned braces on each side of the list, or horizontally, enclosed in a pair of braces and separated by vertical lines. For example:
LISTOPENS PROCESS { $appl-mgr-name } { $process-name }
ALLOWSU { ON | OFF }
| Vertical Line. A vertical line separates alternatives in a horizontal list that is enclosed in
brackets or braces. For example:
INSPECT { OFF | ON | SAVEABEND }
About This Guide
HP Integrity NonStop NS-Series Operations Guide—529869-005
xx
Notation for Messages
… Ellipsis. An ellipsis immediately following a pair of brackets or braces indicates that you
can repeat the enclosed sequence of syntax items any number of times. For example:
M address [ , new-value ] [ - ] {0|1|2|3|4|5|6|7|8|9}
An ellipsis imme diately fol lowing a single syntax item indi cates that you can repeat that syntax item any number of times. For example:
"s-char"
Punctuation. Parentheses, commas, semicolons, and other symbols not previously
described must be entered as shown. For example:
error := NEXTFILENAME ( file-name ) ; LISTOPENS SU $process-name.#su-name
Quotation marks around a symbol such as a bracket or brace indicate the symbol is a required character that you must enter as shown. For example:
"[" repetition-constant-list "]"
Item Spacing. Spaces shown between items are required unless one of the items is a
punctuation symbol such as a parenthesis or a comma. For example:
CALL STEPMOM ( process-id ) ;
If there is no space between two items, spaces are not permitted. In the following example, there are no spaces permitted between the period and any other items:
$process-name.#su-name
Line Spacing. If the syntax of a command is too long to fit on a single line, each
continuation line is indented three spaces and is separated from the preceding line by a blank line. This spacing distinguishes items in a continuation line from items in a vertical list of selections. For example:
ALTER [ / OUT file-spec / ] LINE [ , attribute-spec ]…
Notation for Messages
The following list summarizes the notation conventions for the presentation of displayed messages in this manual.
Bold Text. Bold text in an example indicates user input entered at the terminal. For
example:
ENTER RUN CODE ?123 CODE RECEIVED: 123.00
About This Guide
HP Integrity NonStop NS-Series Operations Guide—529869-005
xxi
Notation for Messages
The user must press the Return key after typing the input.
Nonitalic text. Nonitalic letters, numbers, and punctuation indicate text that is displayed or
returned exactly as shown. For example:
Backup Up.
lowercase italic letters. Lowercase italic letters indicate variable items whose values are
displayed or returned. For example:
p-register process-name
[ ] Brackets. Brackets enclose items that are sometimes, but not always, displayed. For
example:
Event number = number [ Subject = first-subject-value ]
A group of items enclosed in brackets is a list of all possible items that can be displayed, of which one or none might actually be displayed. The items in the list might be arranged either vertically, with aligned brackets on each side of the list, or horizontally, enclosed in a pair of brackets and separated by vertical lines. For example:
proc-name trapped [ in SQL | in SQL file system ]
{ } Braces. A group of items enclosed in braces is a list of all possible items that can be
displayed, of which one is actually displayed. The items in the list might be arranged either vertically, with aligned braces on each side of the list, or horizontally, enclosed in a pair of braces and separated by vertical lines. For example:
obj-type obj-name state changed to state, caused by { Object | Operator | Service }
process-name State changed from old-objstate to objstate { Operator Request. } { Unknown. }
| Vertical Line. A vertical line separates alternatives in a horizontal list that is enclosed in
brackets or braces. For example:
Transfer status: { OK | Failed }
% Percent Sign. A percent sign precedes a number that is not in decimal notation. The
% notation precedes an octal number. The %B notation precedes a binary number. The %H notation precedes a hexadecimal number. For example:
%005400 %B101111 %H2F P=%p-register E=%e-register
About This Guide
HP Integrity NonStop NS-Series Operations Guide—529869-005
xxii
Change Bar Notation
Change Bar Notation
Change bars are used to indicate substantive differences between this edition of the manual and the preceding edition. Change bars are vertical rules placed in the right margin of changed portions of text, figures, tables, examples, and so on. Change bars highlight new or revised information. For example:
The message types specified in the REPORT clause are different in the COBOL85 environment and the Common Run-Time Environment (CRE).
The CRE has many new message types and some new message type codes for old message types. In the CR E, the messa ge type S Y STEM incl udes all me ssages except LOGICAL-CLOSE and LOGICAL-OPEN.
HP Integrity NonStop NS-Series Operations Guide—529869-005
1-1
1
Introduction to Integrity NonStop NS-Series Operations
When to Use This Section on page 1-2 Understanding the Operational Environment on page 1-2 What Are the Operator Tasks? on page 1-2
Monitoring the System and Performing Recovery Operations on page 1-2 Preparing for and Recovering from Power Failures on page 1-3 Stopping and Powering Off the System on page 1-3 Powering On and Starting the System on page 1-3 Performing Preventive Maintenance on page 1-3 Operating Disk Drives and Tape Drives on page 1-3 Responding to Spooler Problems on page 1-4 Updating Firmware on page 1-4
Determining the Cause of a Problem: A Systematic Approach on page 1-4
A Problem-Solving Worksh eet on page 1-4 Task 1: Get the Facts on page 1-6 Task 2: Find and Eliminate the Cause of the Problem on page 1-7 Task 3: Escalate the Problem If Necessary on page 1-8 Task 4: Prevent Future Problems
on page 1-9
Logging On to an Integrity NonStop Server on page 1-9
System Consoles
on page 1-9
Opening a TACL Window on page 1-10 Overview of OSM Applications on page 1-11 Launching OSM Applications on page 1-11
Service Procedures on page 1-12
Support and Service Library on page1-12
Introduction to Integrity NonStop NS-Series Operations
HP Integrity NonStop NS-Series Operations Guide—529869-005
1-2
When to Use This Section
When to Use This Section
This section introduces system hardware operations for Integrity NonStop NS-series servers. It provides an introduction to the other sections in this guide.
Understanding the Operational Environment
To understand the operational environment:
If you are already familiar with other NonStop systems, see Appendix A,
Operational Differences Between Systems Running G-Series and H-Series RVUs.
For a brief introduction to the system organization and the location of system components in an Integrity NonStop server, see Section 2, Determining Your
System Configuration.
For information about various software tools and utilities you can use to perform system operations on an Integrity NonStop server, see Appendix B, Tools and
Utilities for Operations.
What Are the Operator Tasks?
The system operations described in this guide include:
Monitoring the system and performing recovery operations
Preparing for and recovering from power failures
Stopping and powering off the system
Powering on and starting the system
Performing preventive maintenance
Operating disk drives and tape drives
Responding to spooler problems
Monitoring the System and Performing Recovery Operation s
Checking for indications of potential system problems by monitoring the system is part of the normal system operations routine. You perform recovery operations to restore a malfunctioning system component to normal use. Most recovery procedures for Integrity NonStop servers can be performed online. Monitoring the status of all system components and performing recovery operations are described in:
Section 3, Overview of Monitoring and Recovery
Section 4, Monitoring EMS Event Messages
Section 5, Processes: Monitoring and Recovery
Section 6, Communications Subsystems: Monitoring and Recovery
Section 7, ServerNet Resources: Monitoring and Recovery
Section 8, I/O Adapters and Modules: Monitoring and Recovery
Section 9, Processors and Components: Monitoring and Recovery
Introduction to Integrity NonStop NS-Series Operations
HP Integrity NonStop NS-Series Operations Guide—529869-005
1-3
Preparing for and Recovering from Power Failures
Section 10, Disk Drives: Monitoring and Recovery
Section 11, Tape Drives: Monitoring and Recovery
Section 12, Printers and Terminals: Monitoring and Recovery
Section 13, Applications: Monitoring and Recovery
Recovery operations for a system conso le are not discu ssed in th is guide. For recove ry procedures for a system console and the applications installed on the system console, see the NonStop NSxxxx Hardware Installation Manual for your Integrity NonStop NS16000, NS14000, or NS1000 server.
Preparing for and Recovering from Power Failures
You can minimize unplanned outage time by having procedures to prepare and recover quickly from power failures, as described in Section 14, Power Failures:
Preparation and Recovery.
Stopping and Powering Off the System
HP recommends a specific set of procedures for stopping and powering off an Integrity NonStop server or its components, as described in Section 15, Starting and Stopping
the System.
Powering On and Starting the System
HP recommends a specific set of procedures for powering on and starting an Integrity NonStop server or its components, as described in Section 15, Starting and Stopping
the System.
Creating St a r tup and Shutdown Files
HP recommends a specifi c set of pr ocedur es for creati ng st ar tup and shut down fil es on an Integrity NonStop server or its components, as described in Section 16, Creating
Startup and Shutdown Files.
Performing Preventive Maintenance
Routine preventive maintenance consists of:
Dusting or cleaning enclosures as needed
Cleaning tape drives regularly
Evaluating tape condition regularly
Cleaning and reverifying tapes as needed
Routine hardware maintenance procedures are described in Section 17, Preventive
Maintenance.
Operating Disk Drives and Tape Drives
Refer to the documentation shipped with the drive.
Introduction to Integrity NonStop NS-Series Operations
HP Integrity NonStop NS-Series Operations Guide—529869-005
1-4
Responding to Spooler Problems
Responding to Spooler Problems
Refer to the Spooler Utilities Reference Manual.
Updating Firmwa r e
Refer to the H06.xx Software Installation and Upgrade Guide
Determining the Cause of a Problem: A Systematic Approach
Continuous availability of your NonStop system is important to system users, and your problem-solving processes can help make such availability a reality. To determine the cause of a problem on your system, start by trying the easiest, least expensive possibilities. Move to more complex, expensive possibilities only if the easier solutions fail.
This subsection presents an approach you can use in your operations environment to:
Determine the possible causes of problems
Systematically fix or escalate such problems
Develop ways of preventing the same problems from recurring
The four basic steps in systematic problem solving are:
A Problem-Solving Worksheet
Table 1-1 is a worksheet that you can use to help you through the problem-solving
process. Use this worksheet to:
Get the facts about a problem
Find and eliminate the cause of the problem
Make any appropriate escalation decisions
Prevent future problems
Make copies of this worksheet and use it to collect and analyze facts regarding a problem you are experiencing. The results might not tell you exactly what is occurring, but they will narrow down the number of possible causes.
You are authorized by HP to reproduce this worksheet only for the purpose of operating your system.
Task Page
Task 1: Get the Facts
1-6 Task 2: Find and Eliminate th e C ause of the Problem 1-7 Task 3: Escalate the Problem If Necessary 1-8 Task 4: Prevent Future Problems 1-9
Introduction to Integrity NonStop NS-Series Operations
HP Integrity NonStop NS-Series Operations Guide—529869-005
1-5
A Problem-Solving Worksheet
Table 1-1. Problem-Solving Worksheet
Problem Facts Possible Causes
What?
Where?
When?
Magnitude?
Situ ation Facts Escalation Decision
Plan to Verify/Fix
Plan to Prevent and Control Damage
Introduction to Integrity NonStop NS-Series Operations
HP Integrity NonStop NS-Series Operations Guide—529869-005
1-6
Task 1: Get the Facts
Task 1: Get the Facts
The first step in solving any problem is to get the facts. Although it is tempting to speculate about causes, your time is better spent in first understanding the symptoms of the problem.
Task 1a: Determine the Facts About the Problem
To get a clear, complete description of problem symptoms, ask questions to determine the facts about the problem. For example:
Task 1b: Determine the Facts About the Situation
Collect facts about the situation in which the problem arose. A clear description of the situation that led to the problem could indicate a simple solution. Examples of questions to ask are:
Who reported the problem and how can this person be contacted?
How critical is the situation?
What events led to the problem?
Has anything changed recently that might have caused the problem?
What event messages have you received?
What is the current configuration of the hardware and software products affected?
An example of information you might obtain from asking questions:
Category Questi ons to Ask
What? What are you having trouble with?
What specifically is wrong?
Where? Where did you first notice the problem?
Where has it occurred since you first noticed it? Which ap plic ations, components, de v ic es , and people ar e affec t ed?
When? When did the problem occur?
What is the frequency of the problem? Has this problem occurred before this time?
Magnitude? Is the problem quantifiable in any way? (That is, can it be measured?) For
example , h ow m any people are affected? Is this problem getting wors e?
Question Answer
What is happening that indicates a problem?
A terminal is h ung.
Where is this problem occurring?
In the office of USER.BONNIE. The affected terminal is named $JT1.#C02.
When is this problem occurring? At 8:30 this morning and also at the same time two days
ago. Both times, this problem occurred after three unsuccessful attempts to log on.
What is the magnitude of this problem?
Intermittent; the problem seemed t o disappear on its own when it first occurred two days ago.
Introduction to Integrity NonStop NS-Series Operations
HP Integrity NonStop NS-Series Operations Guide—529869-005
1-7
Task 2: Find and Eliminate the Cause of the Problem
Task 2: Find and Eliminate the Cause of the Problem
After you collect the facts, you are ready to begin considering the possible causes of a problem. Using these facts and relying on your knowledge and experience, begin to list possible causes of the problem.
Task 2a: Identify the Most Likely Cause
To evaluate the possible causes of any problem, you must compare each cause with the problem symptoms. The problem-solving worksheet gives you a guide for accomplishing this task. In the following example:
Possible causes become column headings
Entries made in the worksheet’s rows indicate whether the cause in that column could have produced the problem symptoms you listed in that row.
°
Write yes in the appropriate box if that cause could explain that symptom.
°
Write no in the appropriate box if a possible cause does not explain a fact.
The most likely cause is the one that best explains all the facts; that is, the cause that contains the most yes answers.
For example, possible causes of a hung terminal problem could be:
A terminal hardware problem
A stopped or suspended TACL process
System security, which locks a user out after three unsuccessful logon attempts
This worksheet lists some possible causes of a hung terminal and illustrates further how to evaluate the possible causes:
Problem Facts Possible Causes
Terminal hardware
TACL process
Security
What? Terminal $JT 1. #C 02 is hung Yes Yes Yes
Where? Office of USER.BONNIE Yes Yes Yes
When? 8:30 a.m. to day Two days ago at 8:30 a. m . After 3 failed logon attempt s
Yes Yes No
Yes Yes No
Yes Yes Yes
Magnitude? Intermittent Goes awa y on its ow n
? ?
Yes Yes
Yes Yes
Introduction to Integrity NonStop NS-Series Operations
HP Integrity NonStop NS-Series Operations Guide—529869-005
1-8
Task 3: Escalate the Problem If Necessar y
Task 2b: Fix the Most Probable Cause of the Problem
For the example in the worksheet, the most likely cause of the hung terminal is a security problem. Ask yourself what would be the fastest, least expensive, safest, and surest way of verifying that this is the most probable cause of the problem.
Once you have determined the most likely cause, try to fix it. Follow through and implement the appropriate solution. If this solution does not fix the problem, continue trying other possible solutions that are reasonable considering time, expense, and safety.
Task 3: Escalate the Problem If Necessary
If the solutions you tried in the previous tasks do not solve the problem, you might consider escalating the problem to get additional help.
Task 3a: Determine Whether You Need to Escalate the Problem
After you complete each task i n the pro blem- solvi ng pro cess, you must decide whe ther you can continue by yourself or if you must ask for help. Ask yourself these questions:
Do I have the authority to resolve this problem?
Do I have the necessary knowledge?
Do I have the skill?
Do I have the time?
What other people need to become involved, if any?
Who needs to be informed about the problem’s status?
Task 3b: Provide Documentation
If you decide to escalate the problem, you might be required to document the problem by providing:
A problem identification number
A problem classification
A complete description and history of the problem
Diagnostic information such as copies of the event log, results of memory dumps, and so on
You might also have procedures at your site for logging problems. If you have a shift log or problem log, make timely entries in the log.
Introduction to Integrity NonStop NS-Series Operations
HP Integrity NonStop NS-Series Operations Guide—529869-005
1-9
Task 4: Prevent Future Problems
Task 4: Prevent Future Problems
Solving problems that occur with your system can be exciting because it is active and stimulating. Preventing problems is often less dramatic. But in the end, prevention is more productive than solving problems. The more work you do to prevent problems before they arise, the fewer problems that will arise at potentially critical times.
These questions provide a framework for your problem-prevention efforts:
Why did this problem occur? What was the root cause? Were there any contributing causes?
How serious was the problem?
What is the likelihood that it will occur again?
Is it possible to eliminate the causes of this problem?
Is it possible to reduce the likelihood that this problem will occur in the future?
Can automation tools be used to detect and respond to preliminary symptoms of this problem?
Can anything be done now to minimize the damage that would result from a reoccurrence of this problem?
Can the problem resolution process be improved in any way?
Logging On to an Integrity NonStop Server
Many operations and troubleshooting tasks are performed by logging on to your Integrity NonStop server from a system console and using the TACL command interpreter or one of the OSM applications. For example, the TACL command interpreter allows you to access SCF, which you use to configure, control, and collect information about objects within subsystems. For examples of OSM tasks and functions, see Overview of OSM Applications
on page 1-11.
System Consoles
A system console is a personal computer approved by HP to run maintenance and diagnostic software for Integrity NonStop servers. New system consoles are preconfigured with the required HP and third-party software. When upgrading to the latest RVU, software upgrades can be installed from the HP NonStop System Console Installer CD.
System consoles comm uni cate with Inte gri ty Non Stop servers over a ded icated se rvi ce LAN (local area network). System consoles configured as the primary and backup dial­out points are referred to as the primary and backup system consoles, respectively.
The OSM Low-Level Link and OSM Notification Director applications reside on the system console, along with other required HP and third-party software. OSM Service Connection and OSM Event Viewer software resides on your server, and connectivity
Introduction to Integrity NonStop NS-Series Operations
HP Integrity NonStop NS-Series Operations Guide—529869-005
1-10
Opening a TACL Window
is established from the console through Internet Explorer browser sessions. For more information, see Launching OSM Applications on page 1-11.
Opening a TACL Window
On a system console, you must open a TACL window before you can log on to the TACL command interpreter. For information about logging on to a TACL command interpreter, see the Guardian User’s Guide.
You can use any of the following methods to open a TACL window.
Opening a TACL Window Directly From OutsideView
If you know the IP address of the NonStop server (not that of OSM), use this method:
1. Select St art> Pro gram s>OutsideV iew32 7.1.
2. From the Session menu, select New. The New Session Properties dialog box appears.
3. From the New Session Properties dialog box, Session tab, click IO Properties. The TCP/IP Properties dialog box appears.
4. In the TCP/IP Properties dialog box: a. In the Host name or IP address and port box, type the IP address, followed by
a space and the port number. For example:
172.17.22.187 23
The port number is 23 for a TACL prompt and 301 for a Startup TACL prompt. In general, you should use port number 23 to perform operations tasks.
b. Click OK.
5. From the New Session Properties dialog box, click OK. A TACL window appears.
6. Log on to the TACL prompt.
Opening a TACL Window From the Low-Level Link
You can also open a TACL window from the OSM Low-Level Link application as described in the Troubleshooting section in Opening Startup Event Stream and Startup
TACL Windows on page 15-22.
For more details on the functions of the TACL command interpreter, see Appendix B,
Tools and Utilities for Operations.
Introduction to Integrity NonStop NS-Series Operations
HP Integrity NonStop NS-Series Operations Guide—529869-005
1-11
Overview of OSM Applications
Overview of OSM Applications
HP NonStop Open System Management (OSM) applications perform a variety of functions, such as:
The OSM Low-Level Link Application is primarily used for down-system support, such as Two startup event stream windows and two startup TACL windows are
automatically launched on the system console configured to receive them. on
page 15-6, Recovery Operations for Processors on page 9-9, and configuring IOAM, VIO, and P-switch modules (see the NonStop NSxxxx Hardware Installation Manual for your Integrity NonStop NS16000, NS14000, or NS1000 server).
The OSM Service Connection is used to monitor, inventory, and perform actions on system and ServerNet Cluster components. See Using OSM to Monitor the
System on page 3-7 for an overview of how the OSM Service Connection is used
to monitor your system components.
The OSM Event Viewer is used for Section 4, Monitoring EMS Event Messages.
The OSM Notification Director is used for Monitoring Pr oblem Inci dent Reports on page 3-12 and dialing out information to your service provider.
Launching OSM Applications
Several operations tasks in this guide require you to log on to one of the OSM applications. Assuming that all OSM client components have been installed on the system console, launch the desir ed ap plication as d escribed below, then see the online help (or default home page, for the browser-based OSM applications) for log-on instructions.
To launch OSM applications: Start>Pr ogram s> HP OSM. Then select the name of the application to launch:
OSM Service Connection
OSM Low-Level Link Application
OSM Notification Director>Start/Stop
OSM Event Viewer
OSM System Inventory Tool
The OSM Service Connection and the OSM Event Viewer are browser-based applications. Assuming that the OSM Console Tools component has been installed on the system console, the Start menu shortcuts launch a default web page for these two applications. From that page, you can select the system of your choice from the list of bookmarks displayed in the left column of the page (available bookmarks include those that were user-created during previous sessions and those converted automatically from an existing OSM system list). If no bookmarks are available, the web page also contains instructions on how to access these a pplications by e nterin g a system URL as an Internet Explorer address. The system console-based OSM Console Tools component is not required to use the OSM Service Connection and the OSM Event
Introduction to Integrity NonStop NS-Series Operations
HP Integrity NonStop NS-Series Operations Guide—529869-005
1-12
Service Procedures
Viewer applications; it merely installs the Start menu shortcuts and default home pages to make accessing these applications easier. You can also simply open a new Internet Explorer browser window and enter the URL of the system you wish to access.
For more information on configuring, accessing, or using OSM applications, see:
OSM Migration and Configuration Guide
OSM Service Connection User’s Guide
Online help within the OSM Service Connection, Low-Level Link, Notification Director, and Event Viewer applications
Service Procedures
OSM offers a variety of guided procedures, interactive actions, and documented service procedures to automate or assist with system serviceability . They are launched by actions within the OSM Service Connection, and include online help.
For a list (and help files) for service procedures, both those incorporated into OSM and others that are not part of OSM, refer to the Support and Service Library.
Support and Service Library
These NTL Support and Service library categories provide procedures, part numbers, troubleshooting tips, and tools for servicing NonStop S-series and Integrity NonStop NS-series systems:
Hardware Service and Maintenance Publications
Service Information
Service Procedures
Tools and Download Files
Troubleshooting Tips
Within these categories, where applicable, content might be further categorized according to server or enclosure type.
Authorized service providers can also order the NTL Support and Service Library CD:
Channel Partners and Authorized Service Providers: Order the CD from the SDRC at https://scout.nonstop.compaq.com/SDRC/ce.htm.
HP employees: Subscribe at World on a Workbench (WOW). Subscribers automatically receive CD updates. Access the WOW order form at http://hps.knowledgemanagement.hp.com/wow/order.asp.
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-1
2
Determining Your System Configuration
When to Use This Section on page 2-1 Modular Hardware Components on page 2-2
Differences Between Integrity NonStop NS-Series Systems on page 2-2 Terms Used to Describe System Hardware Components on page 2-4
Recording Your System Configuration on page 2-4 Using SCF to Determine Your System Configuration on page 2-5
SCF System Naming Conventions on page 2-5 SCF Configuration Files on page 2-5 Using SCF to Display Subsystem Configuration Information on page 2-6 Displaying SCF Configuration Information for Subsystems on page 2-9 Additional Subsystems Controlled by SCF on page 2-13 Displaying Configuration Information—SCF Examples on page 2-15
When to Use This Section
This section describes the system e nclosur es, the system organ ization, n umbe ring and labeling, and how to identify components in an Integrity NonStop NS-series server. For detailed information on system hardware organization, refer to the NonStop NSxxxx Planning Guide for your Integrity NonStop NS16000, NS14000, or NS1000 server.
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-2
Modular Hardware Components
Modular Hardware Components
Hardware for Integrity NonStop systems is implemented in modules or enclosures that are installed in modular cabinets. The servers include these hardware components:
Modular Cabinet with Power Distribution Unit (PDU)
NonStop Blade Complex
NonStop Blade Element
Logical Synchronization Unit (LSU) (in Integrity NonStop NS16000 and NS14000 systems only; Integrity NonStop NS1000 systems have no LSUs)
Processor Switch, or p-switch (in Integrity NonStop NS16000 systems only; Integrity NonStop NS14000 and NS1000 systems have no processor switches)
I/O Adapter Module (IOAM) Enclosure, including subcomponent I/O Adapters:
Fibre Channel ServerNet adapter (FCSA)
Gigabit Ethernet 4-port ServerNet adapter (G4SA)
4-Port ServerN et Extender s (4P SEs) ( Integr ity NonStop NS14000 and NS1000 systems only)
VIO Enclosure (displayed by OSM as a VIO Module object) — For more information, see Integrity NonStop NS14000 Systems, Integrity NonStop NS1000
Systems, or the Versatile I/O (VIO) Manual.
Fibre Channel disk module (FCDM)
Maintenance Switch (Ethernet)
UPS and ERM
NonStop System Console (to manage the system)
Cable Management Devices
Enterprise Storage System (ESS)
Differences Between Integrity NonStop NS-Series Systems
NonStop System Architectures
Integrity NonStop NS-series systems offer of a variety of architecture and configuration options to suit different customer needs. Integrity NonStop NS16000 and Integrity NonStop NS14000 systems take advantage of NonStop advanced architecture (NSAA). For more information, see the NonStop NS16000 Planning Guide or NonStop NS14000 Planning Guide. Integrity NonStop NS1000 systems employ the NonStop value architecture (NSVA). For more information, see the NonStop NS1000 Planning Guide.
Integrity NonStop NS16000 Systems
In Integrity NonStop NS16000 systems, IOAM enclosures connect through ServerNet links to the processors via the processor switches. One IOAM enclosure provides ServerNet connectivity for up to 10 ServerNet I/O adapters on each of the two ServerNet fabrics. FCSAs and G4SAs can be installed in an IOAM enclosure for
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-3
Differences Between Integrity NonStop NS-Series
Systems
communications to storage devices and subsystems as well as to LANs. Additional IOAM enclosures can be added to increase connectivity and storage resources.
Integrity NonStop NS16000 systems connect to NonStop S-series I/O enclosures by using fiber-optic ServerNet links to connect the p-switches of the Integrity NonStop system to IOMF2 CRUs in the I/O enclosures.
Integrity NonStop NS14000 Systems
In Integrity NonStop NS14000 systems, there are no p-switches. There are now two types of NS14000 systems:
A NonStop NS14000 system consisting of a single IOAM enclosure, with an I/O adapter module on each ServerNet fabric — processor connections are made through ports on 4-Port ServerNet Extenders (4PSEs), located in slot one and optionally slot 2 of each I/O adapter module, to the processors via the LSUs. The IOAM enclosure provides ServerNet connectivity for up to 8 ServerNet I/O adapters on each of the two S erve rNet fab rics (FC SAs and G4SAs can be inst al led in slots 2 through 5 of the two IOAMs in the IOAM enclosure for communications to storage devices and subsystems as well as to LANs). Integrity NonStop NS14000 systems do not support connections to additional IOAM enclosures or NonStop S-series I/O enclosures.
A NonStop NS14000 system consisting of two VIO enclosures, one on each ServerNet fabric — processor connections for processors 0-3 are made through ports 1-4 of the VIO Logic Board in slot 14 of each VIO enclosure, via the LSUs. An optional Optical Extender PIC in slot 2 provides for additional processor connectivity (processors 4-7). VIO enclosures have embedded ports and allow for optional expansion ports to supply the equivalent functionality provided by FCSAs and G4SAs in NS14000 systems with IOAMs.
Integrity NonStop NS14000 systems do not support connections to additional IOAM enclosures or NonStop S-series I/O enclosure
For more information on Integrity NonStop NS14000 systems, see the Versatile I/O
(VIO) Manual, the NonStop NS14000 Planning Guide, or the NonStop NS14000 Hardware Installation Manual.
Integrity NonStop NS1000 Systems
Integrity NonStop NS1000 systems have no processor switches or LSUs. Like Integrity NonStop NS14000 systems, there are now two types: those consisting of a single IOAM enclosure (two IOAMs) and those consisting of one VIO enclosure for each fabric. ServerNet connectivity for each type is accomplished as described for the
Integrity NonStop NS14000 Systems, except for the absence of the LSUs.
Integrity NonStop NS1000 systems do not support connections to NonStop S-series I/O enclosures. Besides the architectural differences, Integrity NonStop NS1000 systems also utilize different NonS t op Bl ade Element s t han I ntegr ity NonStop NS16000 or NS14000 systems. For more information on Integrity NonStop NS1000 systems, refer to the NonStop NS1000 Planning Guide and the NonS top NS10 00 Hardware Installation Manual.
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-4
Terms Used to Describe System Hardware
Components
Terms Used to Describe System Hardware Components
The terms used to describe system hardware components vary. These terms include:
Device
System resource or object
Device
A device can be a physical device or a logical device. A physical device is a physical component of a computer system that is used to communicate with the outside world or to acquire or store data. A logical device is a process used to conduct input or output with a physical device.
System Resource or Object
The term “system resource” is used in OSM documentation to refer to server components that OSM software displays, monitors, and often controls. The term “object” is often used when referring to a specific resource, such as “the Disk object.” All system resources are displayed in hierarchical form in the tree pane of the OSM Service Connectio n; man y are also d isplayed in P hysical or In ventor y view s of the view pane. The effe ct of sele cting an object in ei ther pa ne is the same: for example, you can view attributes for the selected system resource in the Attributes tab, view alarms for that resource (if any exist) in the Alarms tab, or right-click on the resource object and select Actions, to display the Actions dialog box (from which you can select and perform actions on the selected system resource). Besides physical hardware components, such as IOAM enclosures, power supplies, ServerNet adapters, and disk and tape drives, system resources also include logical entities that OSM supports, such as logical processors, ServerNet fabrics, and LIFs (logical interfaces).
Recording Your System Configuration
As a system operator, you need to understand how your system is configured so you can confirm that the hardware and system software are operat ing norma lly. If problems do occur, knowing your configuration allows you to pinpoint problems more easily. If your system configuration is corrupted, documentation about your configuration is essential for recovery. You should be familiar with the system organization, system configuration, and naming conventions.
Several methods are available for researching and recording your system configuration:
Maintaining records in hard-copy format
Using the OSM Service Connection to inventory your system In the OSM Service Connection tree pane, select the System object. From the
View pane drop-down menu, select Inventory to display a list of the system’s hardware resources. Click Save to save this list to a Microsoft Excel file.
Using SCF to list objects and devices and to display subsystem configuration information
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-5
Using SCF to Determine Your System Configuration
For information on forms available that can help you record your system configuration, refer to the NonStop NSxxxx Planning Guide for your Integrity NonStop NS16000, NS14000, or NS1000 server.
Using SCF to Determine Your System Configuration
SCF is one of the most important tools available to you as a system operator. SCF commands configure and control the objects (lines, controllers, processes, and so on) belonging to each subsystem running on the Integrity NonStop NS-series server. You also use SCF to display information about subsystems and their objects.
SCF accepts commands from a workstation, a disk file, or an application process. It sends display output to a workstation, a file, a process, or a printer. Some SCF commands are available only to some subsystems. An overall SCF reference is the SCF Reference Manual for H-Series RVUs. Subsystem-specific information appears in a separate manual for each subsystem. For a partial list of these manuals, refer to
Appendix C, Related Reading.
More detai ls about the functions o f SCF appear in Subsystem Control Facility (SCF) on
page B-4.
SCF System Naming Conventions
SCF object names usually follow a consistent set of naming conventions defined for each installation. HP preconfigures some of the naming conventions to create the logical device names for many SCF objects.
System planning and configuration staff at your site likely will change or expand on the preconfigured file-naming conventions that HP provides, typically by establishing naming conventions for configuring such objects as storage devices, communication processes, and adapters. These conventions should simplify your monitoring tasks by making process or object functions intuitively obvious to someone looking at the object name. For example, in your environment, tape drives might be named $TAPEn, where n is a sequential number.
The SCF Reference Manual for H-Series RVUs lists naming conventions for SCF objects, as well as HP reserved names that cannot be changed or used for other objects or processes in your environment.
SCF Configuration Files
Your system is delivered with a standard set of configuration files:
The $SYSTEM.SYSnn.CONFBASE file contains the minimal configuration required to load the system.
The $SYSTEM.ZSYSCONF.CONFIG file contains a standard system configuration created by HP. This basic configuration includes such objects as disk drives, tape
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-6
Using SCF to Display Subsystem Configuration
Information
drives, ServerNet adapters, the local area network (LAN) and wide area network (WAN) subsystem manager processes, the OSM server processes, and so on. You typically use this file to load the system.
The $SYSTEM.ZSYSCONF.CONFIG file is also saved on your system as the ZSYSCONF.CONF0000 file.
All subsequent changes to the system configuration are made using SCF. The system saves configuration changes on an ongoing basis in the ZSYSCONF.CONFIG file. You have the option to save a stable copy of your configuration at any time in ZSYSCONF.CONFxxyy using the SCF SAVE command. For example:
-> SAVE CONFIGURATION 01.02
You can save multiple system configurations by numbering them seq uentially ba sed on a meaningful convention that reflects, for example, different hardware configurations. Each time you load the system from CONFBASE or CONFxxyy, the system automatically saves in a file called ZSYSCONF.CONFSAVE a copy of the configuration file used for the system load.
For guidelines on how to recover if your system configuration files are corrupted, refer to Troubleshooting and Recovery Operations on page 15-18.
For certain SCF subsystems, configuration changes are persistent. The changes persist through processor and system loads unle ss you load the system with a dif ferent configuration file. Examples of these subsystems are the Kernel, ServerNet LAN Systems Access (SLSA), the storage subsystem, and WAN. For other SCF subsystems, the changes are not persistent. You must reimplement them after a system or processor load. Examples of these subsystems are General Device Support (GDS), Open System Services (OSS), and SQL communication subsystem (SCS).
Using SCF to Display Subsystem Configuration Information
SCF enables you to display, in varying levels of detail, the configuration of objects in each subsystem supported by SCF. For example, you can use the LISTDEV command to list all the devices on your system or to list the objects within a given subsystem. Then you can use the INFO command with a logical device name or device type to obtain information about a specific device or class of devices.
Another useful command when displaying information is the ASSUME command. Use the ASSUME command to define a current default object and fully qualified object name. Then you can use INFO to display information just for that object. For example, if you type this command and then enter the INFO command without specifying an object, SCF displays only the information for the workstation called $Ll.#TERM1:
> SCF ASSUME WS $L1.#TERM1
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-7
Using SCF to Display Subsystem Configuration
Information
SCF LISTDEV: Listing the Devices on Your System
To obtain listings for most devices and processes that have a device type known to SCF, at a TACL prompt type:
> SCF LISTDEV
In the example shown in Example 2-1, the SCF LISTDEV command lists all the physical and logical devices on the system.
Example 2-1. SCF LISTDEV Command Output
$SYSTEM STARTUP 1> SCF LISTDEV
LDev Name PPID BPID Type RSize Pri Program 0 $0 0,3 1,3 ( 1,0 ) 102 201 \DRP14.$SYSTEM.SYS00.OPCOLL 1 $NCP 2,6 0,0 (62,0 ) 3 199 \DRP14.$SYSTEM.SYS00.NCPOBJ 3 $YMIOP 0,5 1,5 ( 6,4 ) 80 205 \DRP14.$SYSTEM.SYS00.TMIOP 5 $Z0 0,7 1,7 ( 1,2 ) 102 200 \DRP14.$SYSTEM.SYS00.OCDIST 6 $SYSTEM 0,257 1,257 ( 3,45) 4096 220 \DRP14.$SYSTEM.SYS00.TSYSDP2 7 $ZOPR 0,8 1,8 ( 1,0 ) 102 201 \DRP14.$SYSTEM.SYS00.OAUX 63 $ZZKRN 0,294 1,328 (66,0 ) 4096 180 \DRP14.$SYSTEM.SYS00.OZKRN 64 $ZZWAN 0,291 1,298 (50,3 ) 132 180 \DRP14.$SYSTEM.SYS00.WANMGR 65 $ZZSTO 0,292 1,329 (65,0 ) 4096 180 \DRP14.$SYSTEM.SYS00.TZSTO 66 $ZZSMN 1,289 2,282 (64,1 ) 132 199 \DRP14.$SYSTEM.SYS00.SANMAN 67 $ZZSCL 1,290 2,277 (64,0 ) 132 199 \DRP14.$SYSTEM.SYS00.SNETMON 68 $ZZLAN 0,293 1,297 (43,0 ) 132 199 \DRP14.$SYSTEM.SYS00.LANMAN 86 $ZSNET 0,294 1,328 (66,0 ) 4096 180 \DRP14.$SYSTEM.SYS00.OZKRN 87 $ZSLM2 0,288 1,293 (67,0 ) 1024 221 \DRP14.$SYSTEM.SYS00.TZSLM2 91 $ZNET 0,14 1,13 (50,63) 3900 175 \DRP14.$SYSTEM.SYS00.SCP 104 $ZM03 3,279 0,0 (45,0 ) 132 201 \DRP14.$SYSTEM.SYS00.QIOMON 105 $ZM02 2,280 0,0 (45,0 ) 132 201 \DRP14.$SYSTEM.SYS00.QIOMON 106 $ZM01 1,280 0,0 (45,0 ) 132 201 \DRP14.$SYSTEM.SYS00.QIOMON 107 $ZM00 0,290 0,0 (45,0 ) 132 201 \DRP14.$SYSTEM.SYS00.QIOMON 108 $ZLOG 0,307 1,345 ( 1,0 ) 4024 150 \DRP14.$SYSTEM.SYS00.EMSACOLL 104 $ZM03 3,279 0,0 (45,0 ) 132 201 \DRP14.$SYSTEM.SYS00.QIOMON 105 $ZM02 2,280 0,0 (45,0 ) 132 201 \DRP14.$SYSTEM.SYS00.QIOMON 106 $ZM01 1,280 0,0 (45,0 ) 132 201 \DRP14.$SYSTEM.SYS00.QIOMON 107 $ZM00 0,290 0,0 (45,0 ) 132 201 \DRP14.$SYSTEM.SYS00.QIOMON 108 $ZLOG 0,307 1,345 ( 1,0 ) 4024 150 \DRP14.$SYSTEM.SYS00.EMSACOLL 121 $ZIM03 3,280 0,0 (64,2 ) 132 199 \DRP14.$SYSTEM.SYS00.MSGMON 122 $ZIM02 2,285 0,0 (64,2 ) 132 199 \DRP14.$SYSTEM.SYS00.MSGMON 123 $ZIM01 1,291 0,0 (64,2 ) 132 199 \DRP14.$SYSTEM.SYS00.MSGMON 124 $ZIM00 0,305 0,0 (64,2 ) 132 199 \DRP14.$SYSTEM.SYS00.MSGMON 126 $ZEXP 0,13 1,18 (63,30) 132 150 \DRP14.$SYSTEM.SYS00.OZEXP 128 $SC26 2,281 3,285 (63,4 ) 1 199 \DRP14.$SYSTEM.SYS00.LHOBJ 129 $SC25 2,283 3,286 (63,4 ) 1 199 \DRP14.$SYSTEM.SYS00.LHOBJ 131 $DATA6 0,296 1,287 ( 3,42) 4096 220 \DRP14.$SYSTEM.SYS00.TSYSDP2 132 $DATA5 0,297 1,286 ( 3,42) 4096 220 \DRP14.$SYSTEM.SYS00.TSYSDP2 133 $DATA4 0,298 1,285 ( 3,44) 4096 220 \DRP14.$SYSTEM.SYS00.TSYSDP2 134 $DATA3 0,299 1,284 ( 3,42) 4096 220 \DRP14.$SYSTEM.SYS00.TSYSDP2 135 $DATA2 0,300 1,283 ( 3,42) 4096 220 \DRP14.$SYSTEM.SYS00.TSYSDP2 136 $DATA1 0,301 1,282 ( 3,44) 4096 220 \DRP14.$SYSTEM.SYS00.TSYSDP2 137 $DATA 0,302 1,281 ( 3,44) 4096 220 \DRP14.$SYSTEM.SYS00.TSYSDP2 145 $ZOLHD 0,369 1,359 ( 1,30) 132 150 \DRP14.$SYSTEM.SYS00.EMSDIST 167 $ZTC0 0,338 1,332 (48,0 ) 32000 200 \DRP14.$SYSTEM.SYS00.TCPIP 168 $ZTNT 0,340 1,334 (46,0 ) 6144 149 \DRP14.$SYSTEM.SYS00.TELSERV 200 $ZPMON 0,375 0,0 (24,0 ) 4096 180 \DRP14.$SYSTEM.SYS00.OSSMON
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-8
Using SCF to Display Subsystem Configuration
Information
The columns in Example 2-1 mean:
Table 2-1 gives the names of some subsystems that are common to most Integrity
NonStop NS-series systems and are routinely monitored by operations. These subsystems appear in the LISTDEV output in Example 2-1 on page 2 - 7.
Also, in Example 2-1 on page 2-7, several disk drives and tape drives have been configured. You can identify the subsystem that owns a device by looking up its device type in the SCF Reference Manual for H-Series RVUs.
LDev The logical device number Name The logical device name PPID The primary proc essor number and process iden tification num ber (PIN)
of the specified device BPID The backup processor number and PIN of the specified device Type The device type and subtype RSize The record size the device is configured for Pri The priority level of the I/O process Program The fully qualified name of the program file for the process
Table 2-1. Key Subsystems and Their Logical Device Names and Device Typ es
Subsystem Name Logical Name Device Type Description
TCP/IP $ZTCO 48 Transmission Control
Protocol/I nternet Protoco l (TCP/IP)
Kern el $Z Z KR N 66 NonS t op Kernel operati ng
system
Sto rag e $ZZSTO Disk: 3
Tape: 4 Open SCSI: 8 SMF pool: 25 SMF monitor: 52 $ZZSTO: 65 $ZSLM: 67
All storage devices; for examp le, disk an d t ape
SLSA $ZZLAN 43 All ServerNet LAN
Systems Access (SLSA) connection and facilities
WAN $ZZWAN 50 All wide area network
(WAN) connec ti ons
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-9
Dis playing SCF Configuration I nformati on for
Subsystems
To display information about a particular device:
> SCF LISTDEV TYPE n
where n is a number for the device type. For example, if n is 3, the device type is disks. For the \MS9 system, entering LISTDEV TYPE 3 would display information for $DATA6, $DATA5, $DATA4, $DATA3, $DATA2, $DATA1, and $DATA.
To display information for a given subsystem:
> SCF LISTDEV subsysname
where subsysname is the logical name of a subsystem; for example, $ZZKRN for the Kernel subsystem.
Displaying SCF Configuration Information for Subsystems
The following tables give some of the SCF commands that display configuration information for objects controlled by subsystems that are common to most Integrity NonStop NS-series systems. The examples use the SCF ASSUME command to make a given subsystem the current default object for gathering information.
TCP/IP Subsystem
These examples are based on a TCP/IP process named $ZTCO. Before using the commands listed in Table 2-2, type this command to make the TCP/IP subsystem the default object:
> SCF ASSUME PROCESS $ZTCO
Integrity NonStop servers support two versions of TCP/IP—NonStop TCP/IPv6 and NonStop TCP/IP. When you use the SCF LISTDEV and INFO commands, all current TCP/IP processes are displayed. For more information, refer to the TCP/IPv6
Configuration and Management Manual and the TCP/IP Configuration and Management Manual.
Table 2-2. Displaying Information for the TCP/IP Subsystem ($ZTCO)
To Display Information About These Configured Objects Enter This Command
All TCP/IP devices LISTDEV TCPIP Detailed information ab out the TCP/I P
subsystem manager
INFO, DETAIL
All SUBN ET names INF O SU BNET * All ROUTE names INFO ROUTE *
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-10
Dis playing SCF Configuration I nformati on for
Subsystems
Kernel Subsystem
Before using commands listed in Table 2-3, type this command to make the Kernel subsystem the default object:
> SCF ASSUME PROCESS $ZZKRN
Generic processes are part of the SCF Kernel subsystem. Generic processes can be created by the operating system or by a user. Examples of generic processes created by the operating system are the Kernel, SLSA, the storage subsystem, and WAN subsystem manager processes. Examples of generic processes created by a user are a Pathway program, a third-party program, or a user-written program that you configure to be controlled by the operating system. The $ZPM persistence manager starts and monitors all generic processes.
Storage Subsystem
The storage subsystem manages disk and tape drives as well as SCSI and HP NonStop Storage Management Foundation (SMF) devices. Use the commands listed in Table 2-4 to display desired information.
Table 2-3. Displaying Information for the Kernel Subsystem ($ZZKRN)
To Display Information About These Configured Objects Enter This Command
The Kernel subsystem manager and ServerN et process nam es
LIST DE V KERNEL
All Kernel subsystem object and process names
NAMES $ZZKRN
All generic processes INFO * Detailed information ab out a generic
process
INFO #generic-process, DETAIL
Table 2-4. Displaying Information for the Storage Subsystem ($ZZST0)
To Display Information About These Configured Objects Enter This Command
All disk and tape drives (list) LISTDEV STORAGE All storage subsystem objects and
processes (by name)
NAMES $ZZSTO
All di s k drives (list) LIST DE V T YPE 3 All di s k drives (s um m ary informatio n) INFO DISK $* A speci f ic disk dr ive (detailed informat ion) INFO DISK $name, DETAIL All tape drives (list) LISTDEV TYPE 4 All tape drives (summary information) INFO TAPE $* A specific tape driv e (detailed info rm ation) INFO TAPE $name, DETAIL
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-11
Dis playing SCF Configuration I nformati on for
Subsystems
When displaying configuratio n files for disk and tape devices in the storage subsystem, you can use the OBEYFORM option with the INFO command to display currently defined attribute values in the format that you would use to set up a configuration file. Each attribute appears as a syntactically correct configuration command.
For example, this command shows all the attributes for $SYSTEM in OBEYFORM:
-> INFO DISK $SYSTEM,OBEYFORM
This output appears as shown in Example2-2.
You can create a command file containing the output by using the OUT option of the INFO command. For details, see the SCF Reference Manual for the Storage Subsystem.
To get detailed configuration information in command format for all disks on the system, issue this command:
-> INFO DISK $*,OBEYFORM
Example 2-2. SCF ADD DISK Command Output
ADD DISK $SYSTEM , & SENDTO STORAGE , & BACKUPCPU 1, & HIGHPIN ON , & PRIMARYCPU 0, & PROGRAM $SYSTEM.SYSTEM.TSYSDP2 , & STARTSTATE STARTED, & PRIMARYLOCATION (11,1,11) , & PRIMARYSAC IOMF.SAC-2.GRP-11.MOD-1.SLOT-50, & MIRRORLOCATION (11,1,12) , & MIRRORSAC IOMF.SAC-1.GRP-11.MOD-1.SLOT-55, & AUDITTRAILBUFFER 0 , & AUTOREVIVE OFF, & AUTOSTART ON, & CBPOOLLEN 1000 , & FSTCACHING OFF , & FULLCHECKPOINTS ENABLED , & HALTONERROR 1, & LKIDLONGPOOLLEN 8 , & LKTABLESPACELEN 15 , & MAXLOCKSPEROCB 5000 , & MAXLOCKSPERTCB 5000 , & NONAUDITEDINSERT OFF , & NUMDISKPROCESSES 4, & OSSCACHING ON , & PROTECTDIRECTORY SERIAL , & REVIVEBLOCKS 10 , & REVIVEINTERVAL 100 , & REVIVEPRIORITY 0 , & REVIVERATE 0 , & SERIALWRITES ENABLED
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-12
Dis playing SCF Configuration I nformati on for
Subsystems
To get detailed configuration information in command format for all tape drives on the system, issue this command:
-> INFO TAPE $*,OBEYFORM
ServerNet LAN Systems Access (SLSA) Subsystem
Before using commands listed in Table 2-5, type this command to make the SLSA subsystem the default object:
> SCF ASSUME PROCESS $ZZLAN
The SLSA subsystem provides access to parallel LAN and WAN I/O for Integrity NonStop servers. The SLSA subsystem provides access to Ethernet, token-ring, and multifunction I/O board Ethernet adapters and to the ServerNet wide area network (SWAN) concentrator.
When displaying configuration files for adapter and LIF devices in the SLSA subsystem, you can use the OBEYFORM option with the INFO command to display currently defined attribute values in the format that you would use to set up a configuration file. E ach a ttribut e app ears as a syntactically correct system conf igurat ion command. For example:
ADD ADAPTER $ZZLAN.E0154, & LOCATION (1 , 1 , 54 ) , & TYPE G4SA, & ACCESSLIST (0, 1)
Table 2-5. Displaying Information for the SLSA Subsystem ($ZZLAN)
To Display Information About These Configured Objects Enter This Command
The SLSA subsystem manager LISTDEV SLSA All SLSA subsystem object and process
names
NAMES $ZZ LAN
All configured adapters, with grou p/ m odule/ s lot and adap t er type
INFO ADAPTER *
A specific adapter INFO ADAPTER adapter, DETAIL All logical interface (LIF) names, with
associated MAC addresses, associated physical int erface (PIF ) names, and port types
INFO LIF *
A specific LI F INFO LIF lifname, DETAIL A specific PI F INFO PIF pifname, DETAIL All ServerN et addressable control ler (SAC)
names
INFO SAC *
A specific SAC INFO SAC sacname.n, DET A IL
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-13
Additional Subsystems Controlled by SCF
Examples of the INFO command used with the OBEYFORM option are:
-> INFO ADAPTER $*, OBEYFORM
-> INFO LIF $*, OBEYFORM
WAN Subsystem
Before using commands listed in Table 2-6, type this command to make the wide area network (WAN) subsystem the default object:
> SCF ASSUME PROCESS $ZZWAN
The WAN subsystem has responsibility for all WAN connections.
Additional Subsystems Controlled by SCF
Table 2-7 lists the names associated with additional subsystems that can be controlled
by SCF, along with its device types. You can use SCF commands to display the current attribute values for these objects.
Some SCF commands are available only to some subsystems. The objects that each command affects and the attributes of those objects are subsystem specific. This subsystem-specific information is presented in a separate manual for each subsystem. A partial list of these manuals appears in Table 6-1 on page 6-13.
Refer to the SCF Reference Manual for H-Series RVUs for further information.
Table 2-6. Displaying Information for the WAN Subsystem ($ZZWAN)
To Display Information About These Configured Objects Enter This Command
The WAN subsystem manager LISTDEV WAN All WAN configu rat ion managers, TCP/IP
processes , and WANBoot pr oc es s es
INFO *
All PATH names INFO PATH * The WAN adapters INFO ADAPTER * All DEVICE objects INFO DEVICE * All PROFILE objects INFO PROFILE *
Table 2-7. Subsystem Objects Controlled by SCF (page 1 of 2)
Subsystem Acronym Description
Device Type
Device Subtype
AM3270 AM3270 Access Method 60 0 or 10 ATM Asynchronous Transfer Mode (ATM)
protocol
42 0 or 1
A TP6100 Asynchronous T erminal Process 6100 53 0
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-14
Additional Subsystems Controlled by SCF
CP6100 Communications Process Subsystem 51 0 Envoy Byte-synchronous and asynchronous
communic ations data link -level interfac e
70
EnvoyACP/XF Byte-synchronous communications data
link-leve l int erface
11 40, 41, 42,
43
Expand Expan d network con t rol process ($NCP) or
line-handler process
62 or 63 2, 3, 5, or 6
GDS General Device Support 57 OSIAPLMG Open Systems Interconnection/Appli cat i on
Manager
55 20
OSIAS Open Systems Interconnection/Appli cat i on
Services
55 1-5
OSICMIP Open Systems Interconnection/ Common
Management Information Protocol
55 24
OSIFTAM Open Systems Interconnection/File
Transfe r, Access, and Management
55 21 or 25
OSIMHS Open Systems Interconnectio n/ Message
Handling System
55 11 or 12
OSITS Open Systems Interconnection/Transport
Services
55 55, 4
OSS Open System Services 24 0 PAM Port Access Method QIO Queued I/O product 45 0 SCP Subsystem Control Point 50 63 SCS SQL Communications Subsystem 38 0 SNAX/APN SNAX Advanced Peer Networking 58 or 13 0 SNAX/XF SNAX Extended Facility 58 or 13 SNAXAPC SN AX Advanc ed Program C om m unication 13 10 SNAXCRE SNAX Creator-2 18 0 SNAXHL S SNAX High-Leve l Support 13 5 SNMP Simple Network Management Protoco l
agent
31 0
TELSERV TCP/IP TELNET product 46 0 TR3271 TR3271 Access M et hod 60 1 or 11 X25AM X.25 Access Method 61 0
Table 2-7. Subsystem Objects Controlled by SCF (page 2 of 2)
Subsystem Acronym Description
Device Type
Device Subtype
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-15
Displaying Configuration Information—SCF
Examples
Display ing Configura tion Information—SCF Exa m ples
These examples show SCF commands that display subsystem configuration information, along with the information that is returned. These commands are not preceded by an ASSUME command.
To display all the processes running in the Kernel subsystem:
-> INFO PROC $ZZKRN.#*
The system displays a listing similar to that shown in Example 2-3:
To display a list of all SAC names with their associated owners and access lists:
-> info sac $zzlan.*
The system displays a listing similar to that shown in Example 2-4
:
Example 2-3. SCF INFO PROCESS Command Output
32-> INFO PROCESS $ZZKRN.#* NONSTOP KERNEL - Info PROCESS \DRP09.$ZZKRN Symbolic Name *Name *Autorestart *Program
CLCI-TACL $CLCI 10 $SYSTEM.SYSTEM.TACL OSM-APPSRVR $ZOSM 10 $SYSTEM.SYSTEM.APPSRVR OSM-CIMOM $ZCMOM 5 $SYSTEM.SYSTEM.CIMOM OSM-CONFLH-RD $ZOLHI 0 $SYSTEM.SYSTEM.TACL OSM-OEV $ZOEV 10 $SYSTEM.SYSTEM.EVTMGR QATRAK $TRAK 10 $SYSTEM.SYSTOOLS.QATRACK QIOMON $ZMnn 10 $SYSTEM.SYSTEM.QIOMON ROUT $ZLnn 10 $SYSTEM.SYSTEM.ROUT SCP $ZNET 10 $SYSTEM.SYSTEM.SCP SP-EVENT $ZSPE 5 $SYSTEM.SYSTEM.ZSPE TFDSHLP $ZTHnn 10 $SYSTEM.SYSTEM.TFDSHLP ZEXP $ZEXP 10 $SYSTEM.SYSTEM.OZEXP ZHOME $ZHOME 10 $SYSTEM.SYSTEM.ZHOME ZLOG $ZLOG 5 $SYSTEM.SYSTEM.EMSACOLL ZSLM2 $ZSLM2 10 $SYSTEM.SYSTEM.TZSLM2 ZZKRN $ZZKRN 10 $SYSTEM.SYSTEM.OZKRN ZZLAN $ZZLAN 10 $SYSTEM.SYSTEM.LANMAN ZZSTO $ZZSTO 10 $SYSTEM.SYSTEM.TZSTO ZZWAN $ZZWAN 10 $SYSTEM.SYSTEM.WANMGR
Example 2-4. SCF INFO SAC Command Output
-> INFO SAC $ZZLAN.* SLSA Info SAC
Name Owner *Access List $ZZLAN.E4SA0.0 3 (3,2,1,0) $ZZLAN.E4SA0.1 3 (3,2,1,0) $ZZLAN.E4SA52.0 0 (0,1) $ZZLAN.E4SA52.1 0 (0,1) $ZZLAN.FESA0.0 0 (0,1,2,3,4,5,6,7)
Determining Your System Configuration
HP Integrity NonStop NS-Series Operations Guide—529869-005
2-16
Displaying Configuration Information—SCF
Examples
To display configuration attribute values for all the WAN subsystem configuration managers, TCP/IP processes, and WANBoot processes:
-> INFO PROCESS $ZZWAN.*
The system displays a listing similar to that shown in Example 2-5:
To display detailed information about an Expand line-handler process:
->INFO LINE $line-name, DETAIL
where $line-name is the logical line-handler process name. The system displays a listing similar to Example 2-6 for Expand-Over-NAM and
Expand-Over-ServerNet line-handler processes.
Example 2-5. SCF INFO PROCESS $ZZWAN Command Output
-> INFO PROCESS $ZZWAN.*
WAN MANAGER Detailed Info Process \DRP09.$ZZWAN.#ZTXAE
RecSize........... 0 *Type............. ( 0,49)
Preferred Cpu..... 0 Alternate Cpu..... 1
HOSTIP Address.... 172.031.145.090
*IOPOBJECT........ $SYSTEM.SYS00.SNMPTMUX
TCPIP Name........ $ZTC02
WAN MANAGER Detailed Info Process \DRP09.$ZZWAN.#0
RecSize........... 0 *Type............. (50,00)
Preferred Cpu..... 0 Alternate Cpu..... N/A
*IOPOBJECT........ $SYSTEM.SYS00.CONMGR
Example 2-6. SCF INFO LINE Command Output
-> INFO LINE $SC151, DETAIL
L2Protocol Net^Nam TimeFactor...... 1 *SpeedK........ NOT_SET
Framesize....... 132 -Rsize........... 1 -Speed........
*LinePriority.... 1 StartUp......... OFF Delay......... 0:00:00.10
*Rxwindow........ 7 *Timerbind... 0:01:00.00 *L2Timeout..... 0:00:01.00
*Txwindow........ 7 *Maxreconnects... 0 *AfterMaxRetries PASSIVE
*Timerreconnect 0:01:00.00 *Retryprobe...... 10 *Timerprobe.... 0:00:30.00
*Associatedev.... $ZZSCL *Associatesubdev *Timerinactivity 0:00:00.00
*ConnectType..... ACTIVEANDPASSIVE
*LineTf.......... 0
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-1
3
Overview of Monitoring and Recovery
When to Use This Section on page 3-1 Functions of Monitoring on page 3-2 Monitoring Tasks on page 3-2
Working With a Daily Checklist on page 3-2 Tools for Checking the Status of System Hardware on page3-3
Additional Monitoring Tasks on page 3-6 Monitoring and Resolving Problems—An Approach on page 3-7 Using OSM to Monitor the System on page 3-7
Using the OSM Service Connection on page 3-7
Recovery Operations for Problems Detected by OSM on page 3-12
Monitoring Problem Incident Reports on page 3-12 Using SCF to Monitor the System on page 3-12
Determining Device States on page 3-13 Automating Routine System Monitoring on page 3-16 Using the Status LEDs to Monitor the System on page 3-20 Related Reading on page 3-22
When to Use This Section
This section provides an overview of monitoring an Integrity NonStop server using various tools. It describes some common monitoring tasks. It also refers you to other sections or manuals for more information about monitoring specific system components, events, applications, or processes.
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-2
Functions of Monitoring
Functions of Monitoring
You must monitor a system to ensure that it is operating properly and to recognize when corrective action is required. By monitoring a system, you can:
Verify whether components are currently up or down
Be quickly notified of error conditions, state changes, and threshold conditions that
have been exceeded or are reaching their limits
View a chronological list of events that can help with problem diagnosis and
resolution
Determine how much of a particular resource is being used; for example,
processor capacity, disk or file space, or communications line bandwidth
Find performance problems that can affect the users of the system
Make better use of existing resources
Ensure that products such as HP NonStop SQL/MP, HP NonStop SQL/MX, HP
NonStop Transaction Management Facility (TMF), and Pathway are available
Prevent many problems and outages from occurring
Monitoring Tasks
Regardless of the shift you work, certain areas of your hardware and software environment need to be checked on a regular basis. This subsection provides guidelines that will enable you to determine the general areas you should monitor.
Working With a Daily Checklist
A good method for ensuring that certain areas of your operations environment are monitored is to develop a checklist. Monitor these items on a system frequently. At least daily, monitor:
OSM Service Connection GUI
Event messages
Alarms
Problem incident reports
The status of all system components
The status of processes
The status of all applications
The performance of processors, disks, and communications lines (Monitoring
performance is not discussed in this guide.)
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-3
Tools for Checking the Status of System Hardware
An example of a checklist you might use to standardize your routine daily monitoring tasks is:
Tools for Checking the Status of System Hardware
Several tools are available to check the status of system components in an Integrity NonStop NS-series server. The most frequently used tools are the OSM Service Connection and the Subsystem Control Facility (SCF).
For information relating to system components in NonStop S-series servers, refer to the appropriate NonStop S-Series documentation.
Table 3-1
lists the tools available to monitor system components.
Task Operator’s name Date & time N otes and questions
Check phone messages
Check faxes Check e-mail Check shift log Check EM S event
messages Check s tatus of
terminals Check comm. lines Check TMF status Check Pathway status Check disks Check tape dri v es Check process ors Check printers Check spooler
supervisor and collector processes
Check ServerNet cluster status
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-4
Tools for Checking the Status of System Hardware
Table 3-1. Monitoring System Components (page1of3)
Resource
Monitored Using These Tools See...
Adapters for communica tions subsystems:
G4SA
OSM Serv ic e Connection
SCF interf ac e to various subsystems
Using th e OSM Service Co nnection
on
page 3-7
Section 6, Communications Subsystems: Monito ring and Reco v ery
Secti on8, I/O Adapters and Modules: Monito ring and Reco v ery
OSM Service Connection User’s Guide
(or OSM Service Connection on line help)
Adapters fo r t he s t orage subsystem:
Fibre Channel ServerNet adapter (F C SA)
OSM Serv ic e Connection
SCF interf ac e to the storage subsystem
Using th e OSM Service Co nnection
on
page 3-7
Secti on8, I/O Adapters and Modules: Monito ring and Reco v ery
OSM Service Connection User’s Guide
(or OSM Service Connection on line help)
AWAN access server RAS
management tool
AWAN 3886 Serv er Installatio n and Configuration Guide
Communications lines SCF interface
to the various subsystems
Section 6, Communications Subsystems: Monito ring and Reco v ery
Disk drive enclosure (a nd indi v idual dis k drives) attached to FCSAs
OSM Serv ic e Connection
SCF interf ac e to the storage subsystem
DSAP
Using th e OSM Service Co nnection on
page 3-7
Secti on8, I/O Adapters and Modules: Monito ring and Reco v ery
Section 10, Disk Drives: Monitoring and Recovery
Guardi an User’s Gui de.
Disk drives attached to ServerNet adapters in legacy NonStop S-series enclosures
OSM Serv ic e Connection
SCF interf ac e to the storage subsystem
DSAP
Using th e OSM Service Co nnection
on
page 3-7
Section 10, Disk Drives: Monitoring and Recovery
Guardi an User’s Gui de
Modular I/O adapter module (IOAM) and subcomponents, including ServerNet switch boards, p ow er supplies, and fans
OSM Serv ic e Connection
Using th e OSM Service Co nnection
on
page 3-7
Mon itor B atte r i e s
on page 14-4
OSM Serv ice Connec t ion Use r’s Guide
(or OSM Service Connection on line help)
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-5
Tools for Checking the Status of System Hardware
Legacy NonStop S-series enclosu re and subcomponents, inclu ding IOMF2 CRUs, PMCUs, power supplies, fans, and batteries
OSM Serv ic e Connection
Using th e OSM Service Co nnection on
page 3-7
Secti on8, I/O Adapters and Modules: Monito ring and Reco v ery
OSM Serv ice Connec t ion Use r’s Guide
(or OSM Service Connection on line help)
NonStop Bl ade Compl ex compon ents: Blade Elements, LS U s , logical processors
OSM Serv ic e Connection
Using th e OSM Service Co nnection
on
page 3-7
Secti on 9, Proc essors an d C om ponents: Monito ring and Reco v ery
OSM Serv ice Connec t ion Use r’s Guide
(or OSM Service Connection on line help)
NonSto p ServerNet Clust er 6770 Switch
OSM Serv ic e Connection
ServerNet Cluster 6770 Hardware Installat ion and Supp ort Guide , or ServerNet Cluster Manual
OSM Serv ice Connec t ion User’s Guide
(or OSM Service Connection on line help)
NonSto p ServerNet Cluster 6780 Switch
OSM Serv ic e Connection
ServerNet Cluster 6780 Operations Guide OSM Serv ice Connec t ion Use r’s Guide
(or OSM Service Connection on line help)
Printers SCF
SPOOLCOM
Section 12, Printers and Terminals: Monito ring and Reco v ery
Guardi an User’s Gui de
Processor switch (P-switch) module and subcomponents, including ServerNet switch boards, power supplies, fans, PICs and ports
OSM Serv ic e Connection
Using th e OSM Service Co nnection
on
page 3-7
OSM Serv ice Connec t ion Use r’s Guide
(or OSM Service Connection on line help)
ServerNet connectivity for an Integrity NonStop NS14000 or NS1000 system (which have no processor switches)
4-Port ServerNet Extender (4PSE)
OSM Serv ic e Connection
Using th e OSM Service Co nnection
on
page 3-7
OSM Serv ice Connec t ion Use r’s Guide
(or OSM Service Connection on line help)
ServerN et fa brics: process or­to-proce s so r and processor­to-IOMF2 communication
OSM Serv ic e Connection
SCF interf ac e to the Kern el subsystem
Using th e OSM Service Co nnection
on
page 3-7
Section 7, ServerNet Resources: Monito ring and Reco v ery
Table 3-1. Monitoring System Components (page2of3)
Resource
Monitored Using These Tools See...
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-6
Additional Monitoring Tasks
Additional Monitoring Tasks
Table 3-2 provides an example of additional areas you should monitor daily.
ServerNet wide area network (SWAN) concentrator
OSM Serv ic e Connection
SCF interf ac e to the WAN subsystem
Using th e OSM Service Co nnection on
page 3-7
Section 6, Communications Subsystems: Monito ring and Reco v ery
Tape drives OSM Service
Connection SCF interf ac e
to the storage subsystem
MEDIACOM
Section 11, Tape Drives: Monitoring and Recovery
Secti on8, I/O Adapters and Modules: Monito ring and Reco v ery
Guardi an User’s Gui de
Uninterruptible Power Supply (UPS)
OSM Serv ic e Connection
Mon itor B atte r i e s
on page 14-4
OSM Serv ice Connec t ion Use r’s Guide
(or OSM Service Connection on line help)
Table 3-2. Daily Tasks Checklist
General Tasks Specific Tasks For More Information, See
Monitor messages fr om system users
Check telephone, fax, electronic mail, and any other messages
Guardi an User’s Gui de
Monitor o perator messages
From the OSM Event Viewer
From the EMSDIST printing distributor
From V iewPoin t
Section 4, Monitoring EMS Event Messages
OSM Ev ent Viewer onlin e help
Guardi an User’s Gui de
ViewPoint Manual
Monitor key applications
Monitor Pathway and TMF
Monitor SQL/MX, SQL/MP and other applications
Section 13, Applications: Monito ring and Reco v ery
The doc um entation spec if ic to the application
Monitor system processes
Use the SC F and TACL P PD commands
Section 5, Processes: Monito ring and Reco v ery
Table 3-1. Monitoring System Components (page3of3)
Resource
Monitored Using These Tools See...
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-7
Monitoring and Resolving Problems—An Approach
Monitoring and Resolving Problems—An Approach
A useful approach to identifying and resolving problems in your system is to first use OSM to locate the focal point of a hardware problem and then use SCF to gather all the related data from the subsystems that control or act on the hardware. In this way, you can develop a larger picture that encompasses the whole environment, including communications links and other objects and services that might be contributing to the problem or affected by it.
To get comprehensive online descriptions of all the available SCF commands, use the SCF HELP command.
The following subsections give instructions for using OSM and SCF to monitor and resolve problems.
Using OSM to Monitor the System
This section deals mostly with the OSM Service Connection, the primary OSM interface for system monitoring and serviceability.
See Overview of OSM Applications on page 1-11 for examples of how the other OSM applications are used for monitoring-related functions.
Using the OSM Service Connection
The OSM Service Connection can be used in a variety of ways to monitor your system, including:
Use of colors and symbols to direct you to the source of any problems
Attribute values for system resources, displayed in the Attributes tab and in many
dialog boxes.
Alarms, displayed in the Alarms tab and Alarm Summary dialog box. The following section presents one model for using the OSM Service Connection to
monitor your system, along with a few other options.
A Top-Down Approach
The Management (or main) window of the OSM Service Connection uses a series of colors and symbols to notify you that pr oblems exist within the system. You can tell at a high-level glance when problem conditions exist, then drill-down, or expand the tree pane to find the component reporting the problem. Figure 3-1 illustrates how both the the rectangular system icon (located at the top of the view pane) and the system object in the tree pane indicate problems within the system. The system icon, which is green when OSM is reporting no problems on the system, has turned yellow. The system icon in the tree pane is displaying a yellow arrow to indicate a problem within.
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-8
Using the OSM Service Connection
Expanding the system object in the tree pane, you can see a yellow arrow on the Group 110 object, indicating that the problem is located somewhere within that group.
Expanding the tree pane further, as illustrated in Figure 3-2, yellow arrows on the IOAM Enclosure 110 and IOAM 110.3 objects reveal that the problem exists on a ServerNet adapter in slot 3 of that I/O module. The red bell-shaped icon by that resource object (in the tree pane) indicates that there is an alarm on the object. To obtain information about the alarm:
1. Click to select the object displaying the red triangular and bell-shaped symbols.
2. Select the Alarms tab from the details pane.
Figure 3-1. OS M Management: System Icons Indicate Problems Within
Note. In the OSM Service Connection Management window, the tree pane is located on the
far left. In the lower right is the Overview pane. Located between them is the details pane, from which you can choose to view the Attributes or Alarms tab. Directly above the details pane is the view pane, from which you can choose a Physical or Inventory view of your system or ServerN et C lus t er. The gray bar directly above the view pane is an OSM -s pecific toolb ar (as opposed to the standard Internet Explorer menu bar at the top of the browser window).
VST310.vsd
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-9
Using the OSM Service Connection
3. Click to select the alarm, then right-click and select Details.
Check the Attributes tab (Figure 3-3) also, as a yellow or red triangular symbol indicates problem attribute values exist. In this case, the degraded Service State attribute was caused by an alarm. However, when a resource displays a yellow or red triangular object but no bell-shaped icon, it has no alarms but is reporting problem or degraded attribute values.
Figure 3-2. Expanding the Tree Pane to Locate the Source of Problems
VST311.vsd
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-10
Using the OSM Service Connection
Using System Status Icons to Monitor Multiple Systems
When you are monitoring multiple systems, you can create a System Status Icon for each system, allowing you to keep a high-level eye on each system while saving screen space. Figure 3-4 shows three separate System Status icons, each created by:
1. Establishing an OSM Service Connection session to the system.
2. From the Summary menu on the OSM toolbar, selecting System Status. You can then minimize, but not close, the OSM Service Connection Management
window for each system. If the System Status icon for a system turns from green to yellow, as illustrated in Figure 3-4, open the Management window for that system and locate the problem as described in A Top-Down Approach on page 3-7.
Figure 3-3. Attributes Tab
Figure 3-4. Using System Status Icons to Monitor Multiple Systems
VST312.vsd
VST313.vsd
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-11
Using the OSM Service Connection
Using Alarm and Problem Summaries
Other options for monitoring your system with the OSM Service Connection include using the Alarm Summary (Figure 3-5) or Problem Summary (Figure 3-6) dialog boxes to quickly view all alarms and problem conditions that exist on your system.
Figure 3-5. Alarm Summary Dialog Box
Figure 3-6. Problem Summary Dialog Box
VST314.vsd
VST315.vsd
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-12
Recovery Operations for Problems Detected by
OSM
Suppressing Problems and Alarms
In certain cases, you might want to acknowledge or suppress a particular problem, to stop it from propagating a known problem all the way up to the system level. That way, it will be easier to identify other problems that might occur. For more information on OSM problem management features such as deleting or suppressing alarms and suppressing problem attributes, see the OSM Service Connection User’s Guide (also available as online help within the OSM Service Connection).
Recovery Operations for Problems Detected by OSM
Recovery operations depend on the particular problem, of course. Methods of determining the appropriate recovery action include:
Alarm Deta ils, avai lable for each al arm displaye d i n O SM, pr ovide su ggested repair
actions.
The value displayed by problem attributes in OSM often provide clues to recovery.
EMS events, retrieved and viewed in the OSM Event Viewer, include cause, effect,
and recovery information in the event details.
Check the section in this guide that covers the system resource—for example,
Section 11, Tape Drives: Monitoring and Recovery— for information on using the
SCF and other tools to determine the cause of a problem. Then follow the
directions in the Recovery Operations subsection in the relevant section. Replacing a system component that has malfunctioned is beyond the scope of this
guide. For more information, contact your service provider, or refer to the Support and
Service Library on page 1-12.
Monitoring Problem Incident Reports
The OSM Notif ication D ire ctor g ener ates pr oblem i ncident r epor t s when changes occur that could directly affect the availability of resources on your Integrity NonStop server. The Incident Report List tab on the Notification Director dialog box allows you to view, sort, authorize, and reject incident reports. The Notification Director allows you to forward notifications to your service provider if your system is configured for remote dial-out.
Using SCF to Monitor the System
Use the Subsystem Control Facility (SCF) to display information and current status for all the devices on your system known to SCF. Some SCF commands are available only to some subsystems. The objects that each command affects and the attributes of those objects are sub system sp ecific. This subsystem -spe cific inform ation a ppear s in a separate manual for each subsystem. A partial list of these manuals appears in
Appendix C, Related Reading.
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-13
Determining Device States
Determining Device States
This subsection explains how to determine the state of devices on your system. For example, to monitor the current state of all tape devices on your system, at an SCF prompt:
-> STATUS TAPE $*
Example 3-1 shows the results of the SCF STATUS TAPE $* command:
Some other examples of the SCF STATUS command are:
-> STATUS LINE $LAM3
-> STATUS WS $LAM3.#WS1
-> STATUS WS $LAM3.*
-> STATUS WINDOW $LAM3.#WS1.*
-> STATUS WINDOW $LAM3.*, SEL STOPPED
Example 3-1. SCF STATUS TAPE Command
1-> STATUS TAPE $* STORAGE - Status TAPE \COMM.$TAPE0 LDev State Primary Backup DeviceStatus PID PID 156 STOPPED 2,268 3,288 NOT READY
STORAGE - Status TAPE \COMM.$DLT20 LDev State Primary Backup DeviceStatus PID PID 394 STARTED 2,267 3,295 NOT READY
STORAGE - Status TAPE \COMM.$DLT21 LDev State Primary Backup DeviceStatus PID PID 393 STARTED 1,289 0,299 NOT READY
STORAGE - Status TAPE \COMM.$DLT22 LDev State Primary Backup DeviceStatus PID PID 392 STARTED 0,300 1,288 NOT READY
STORAGE - Status TAPE \COMM.$DLT23 LDev State Primary Backup DeviceStatus PID PID 391 STARTED 1,287 0,301 NOT READY
STORAGE - Status TAPE \COMM.$DLT24 LDev State Primary Backup DeviceStatus PID PID 390 STARTED 6,265 7,298 NOT READY
STORAGE - Status TAPE \COMM.$DLT25 LDev State Primary Backup DeviceStatus PID PID 389 STARTED 4,265 5,285 NOT READY
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-14
Determining Device States
The general format of the STATUS display follows. However, the format varies depending on the subsystem.
where:
SCF Object States
Table 3-3 lists and explains the possible object states that the SCF STATUS command
can report.
subsystem STATUS object-type object-name Name State PPID BPID attr1 attr2 attr3
object-name1 state nn,nnn nn,nnn val1 val2 val3 … object-name2 state nn,nnn nn,nnn val1 val2 val3 …
subsystem The reporting subsystem name object-type The object, or device, type object-name The fully qualified name of the object
State One of the valid object states: ABORTING, DEFINED,
DIAGNOSING, INITIALIZED, SERVICING, STARTED, START­ING, STOPPED, STOPPING, SUSPENDED, SUSPENDING, and UNKNOWN
PPID The primary processor number and process identification number
(PIN) of the object
BPID The backup processor number and PIN of the object
attrn The name of an attribute of the object valn The value of that object attribute
T able 3-3. SCF Object States (page 1 of 2)
State S ubstate Explanation
ABORTING The objec t is being aborted. T he object is
responding to an ABORT command or some type of malfunction. In this stat e, no new links are allowed, and drastic m easures mi ght be underway to reach the STOPPED state. This state is irrevocable.
DEFINE D One of the generally def ined possible co nditions of
an object w it h respect to the m anagem ent of t hat object.
DIAGNOSING The object is in a subsystem-defined test mode
entered through the DIAGNOSE command.
INITIALIZED The system has created the process, but it is not
yet in one of th e operational s tates.
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-15
Determining Device States
SERVICING SP EC I AL T he object is being servic ed or used by a
privileged process and is inacces s ible to user processes.
TEST The object is reserved for exclusive testing.
STARTED The object is logically accessible to user
processes.
STARTING The object is being initialized and is in transition to
the STARTED state.
STOPPED CONFI G-ERROR The objec t is co nf igured improperly.
DOWN The object is no longer logically accessible to user
processes.
HARDDOWN The object is in the har d-do wn state or is physical l y
inaccessible due to a hardware error. INACCESSIBLE The object is inaccessible to user processes. PREMATURE-
TAKEOVER
The backup input/output (I/O) process was asked
to take over for the primary I/O process before it
had the proper information. RESOURCE-
UNAVAILABLE
The input/output (I/O) process could not obtain a
necessary resource. UNKNOWN-
REASON
The input / output (I/O) p roc ess is down for an
unknow n reason.
STOPPING The object is in transition to the STOPPED state.
No new links are allowed to or from the object.
Existing links are in the process of b eing deleted.
SUSPEN D ED Th e f low of informatio n t o an d from the ob jec t is
restricted. (It is typically prevented.) A subsystem
must clearly distinguish between the type of
information that is allowed to flow in the
SUSPENDED state and that whi c h normall y fl ows
in the STARTED or STOPPED state. In the
SUSPENDED state, the object must complete any
outstanding work defined by the subsystem.
SUSPENDING The object is in transition to the SUSPENDED
state. The subsystem must clearly define the
nature of the restrictions that this state imposes on
its objects.
UNKNOWN The object’s state cannot be determined because
the object is inaccessible.
T able 3-3. SCF Object States (page 2 of 2)
State S ubstate Explanation
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-16
Automating Routine System Moni toring
Automating Routine System Monitoring
You can automate many of the monitoring procedures. Automation saves you time and helps you to perform many routine tasks more efficiently.
Your operations environment might be using TACL macros, TACL routines, or command files to perform routine system monitoring and other tasks. These items allow you to run many procedures so that you can quickly determine system status, produce reports, or perform other common tasks. The TACL Reference Manual contains an example that you can adapt to automate system monitoring.
Example 3-2 contains an example of a command file you can use or adapt to check
many of the system elements:
1. To create a command file named SYSCHK that will automate system monitoring, type the text shown in Example 3-2 into an EDIT file.
2. After you create this file, at a TACL prompt, type this command to execute the file and automatically monitor many elements of your system:
> OBEY SYSCHK
For an example of the output that is sent to your home terminal when you execute a command file such as SYSCHK, refer to Example 3-3. This output shows that all elements of the system being monitored are up and running normally.
Example 3-2. System Monitoring Command File
COMMENT THIS IS THE FILE SYSCHK COMMENT THIS CHECKS ALL DISKS: SCF STATUS DISK $* COMMENT THIS CHECKS ALL TAPE DRIVES: SCF STATUS TAPE $* COMMENT THIS CHECKS THE SPOOLER PRINT DEVICES: SPOOLCOM DEV COMMENT THIS CHECKS THE LINE HANDLERS: SCF STATUS LINE $* COMMENT THIS CHECKS THE STATUS OF TMF: TMFCOM;STATUS TMF COMMENT THIS CHECKS THE STATUS OF PATHWAY: PATHCOM $ZVPT;STATUS PATHWAY;STATUS PATHMON COMMENT THIS CHECKS ALL SACS: SCF STATUS SAC $* COMMENT THIS CHECKS ALL ADAPTERS SCF STATUS ADAPTER $* COMMENT THIS CHECKS ALL LIFS SCF STATUS LIF $* COMMENT THIS CHECKS ALL PIFS SCF STATUS PIF $*
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-17
Automating Routine System Moni toring
Example 3-3. System Monitoring Output File (page 1 of 3)
COMMEN T THIS IS TH E FILE SYSCHK
COMMEN T THIS CHE CKS ALL DISKS : SCF STAT US DI SK $*
STORAG E - Sta tu s DISK \SHARK .$ DA TA12 LDev Primary Backup Mirror MirrorBackup Primary Backup PID PID 52 *STARTED STARTED *STARTED STARTED 3,262 2,263
STORAG E - Sta tu s DISK \SHARK .$ DA TA01 LDev Primary Backup Mirror MirrorBackup Primary Backup PID PID 63 *STARTED STARTED *STARTED STARTED 0,267 1,266
STORAG E - Sta tu s DISK \SHARK .$ DA TA04 LDev Primary Backup Mirror MirrorBackup Primary Backup PID PID 60 *STARTED STARTED *STARTED STARTED 0,270 1,263
STORAG E - Sta tu s DISK \SHARK .$ SY STEM LDev Primary Backup Mirror MirrorBackup Primary Backup PID PID 6 *STARTED STARTED STOPPED STOPPED 0,256 1,256
COMMEN T THIS CHE CKS ALL TAPE DR IV ES : SCF STAT US TA PE $*
STORAG E - Sta tu s TAPE $TAPE1 LDev State SubState Primary Backup DeviceStatus PID PID 48 STARTED 0,274
STORAG E - Sta tu s TAPE $TAPE0 LDev State SubState Primary Backup DeviceStatus PID PID 49 STARTED 0,273
COMMENT THIS CHECKS THE SPOOLER PRINT DEVICES: SPOOLCOM DEV
DEVICE STATE FLAGS PROC FORM $LINE1 WAITING H $SPLX $LINE2 WAITING H $SPLX $LINE3 WAITING H $SPLX $LASER WAITING H $SPLP
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-18
Automating Routine System Moni toring
COMMEN T THIS CHE CKS ALL SACS: SCF STAT US SA C $*
SLSA Status SAC
Name Owner State $ZZLAN .E 4SA 1.0 1 STARTED $ZZLAN .E 4SA 1.1 0 STARTED $ZZLAN .E 4SA 1.2 0 STARTED $ZZLAN .E 4SA 1.3 1 STARTED
COMMENT THIS CHE CKS ALL ADAPTERS SCF STATUS ADAPTER $*
SLSA Status ADAP TER Name State
$ZZLAN.MIOE0 STARTED $ZZLAN.E4SA0 STARTED $ZZLAN.MIOE1 STARTED $ZZLAN.E4SA2 STARTED
COMMEN T THI S CH ECKS ALL LIFS SCF STAT US LI F $*
SLSA Status LIF Name State Access State
$ZZLAN.LAN0 STARTED UP $ZZLAN.LAN3 STARTED DOWN
COMMEN T THI S CH ECKS ALL PIFS SCF STAT US PI F $*
SLSA Status PIF Name State
$ZZLAN .E 4SA 0.0.A STARTED $ZZLAN .E 4SA 0.0.B STARTED $ZZLAN .E 4SA 0.1.A STOPPED $ZZLAN .E 4SA 0.1.B STARTED
COMMENT THIS CHECKS THE LINE HANDLERS: SCF STATUS LINE $*
COMMEN T THIS CHE CKS THE STATU S OF TMF : TMFCOM;STATUS TMF TMF Stat us: System: \SA GE , Ti me: 12-Jul- 19 94 14:05:00 State: Started Transaction Rate: 0.25 TPS AuditTrail Status: Master: Active audit tr ai l ca pacity used : 68 % First pinned fi le : $T MF1.ZTMFA T. AA 000044 Reason: Active transactions(s). Current file: $TMF1.ZTMFAT.AA000045 AuditDump Status: Master: State: enabled, Status: active, Process $X545, File: $TMF2.ZTMFAT.AA000042 BeginT ra ns St atus: Enab le d Catalo g Sta tu s: Status: Up Processes Status: Dump Files: #0: State: InProgress
Example 3-3. System Monitoring Output File (page 2 of 3)
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-19
Automating Routine System Moni toring
COMMEN T THIS CHE CKS THE STATU S OF PAT HWAY: PATHCO M $ZV PT ;STATUS PA TH WAY ;STATUS PA TH MON
PATHWA Y -- ST AT E=RUNNIN G RUNNING EXTERN AL TCP S 0 LINKMONS 0 PATHCOMS 1 SPI 0 FREEZE RUNNING STOPPE D THAWED FROZEN PENDING SERVERCLASSES 17 0 17 0 0
RUNNING STOPPED PENDING SERVERPROCESSES 17 35 0 TCPS 1 0 0
RUNNING STOPPED PENDING SUSPENDED TERMS 0 0 0 0 PATHMON \COMM.$ZVPT -- STATE=RUNNING CPUS 0:1 PATHCTL (OPEN) $OPER.VIEWPT.PATHCTL LOG1 SE (OPEN) $0 LOG2 (CLOSED)
REQNUM FILE PID PAID WAIT 1 PATHCOM $X0X7 1,254 2 TCP $Z040
Example 3-3. System Monitoring Output File (page 3 of 3)
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-20
Using the Status LEDs to Monitor the System
Using the Status LEDs to Monitor the System
Status LEDs on the various enclosures and system components light during certain operations, such as when the system performs a series of power-on self-test s (POSTs) when a server is first powered on. Table 3-4 lists some of the status light-emitting diodes (LEDs) and their functions.
Table 3-4. Status LEDs and Their Functions (page1of3)
Location LED Name Color Function
Disk drive Power-on Green Lights wh en the disk dr ive is rece iving
power.
Activity Yellow or
amber
Lights when the disk drive is executing a read or write command.
Disk drive, fibre channel
Drive Ready (top green)
Green Flashes when drive is starting. (At the
same time, the middle green light is lit and the bottom am ber light is lit.)
Drive Online (middle green)
Green Flashes when drive is operational and
performin g a locate func ti on.
Drive Failure (bottom amber)
Amber Flashes when drive is inactive or in error
condition. When this occurs, verify the loop and re place the drive, if nec essary.
All If all lights are on and none are f l ashing,
the drive is not operation al. Perform the following actions:
1. Check FCSA. Replace if defective.
2. Chec k FC-AL I/O module. Replac e if defective.
3. Replace drive.
EMU Heartbe at Left G reen Flashes w hen EMU is operational and
performing locate. Power might just have been appl ied to the EMU, or an enclosu r e fault might exist.
On when an EMU fault exists that is not an enclosure fault.
Off when an EMU fault exi sts, which could be or might not be an enclosu r e f ault.
Power Middle
Green
Flashes w hen EMU is operational and performin g locate.
On when EM U is operational. An EMU or an enclosu re fault might still exist.
Off when pow er has just been applied to an enclosure, or when an enclosure fault exists.
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-21
Using the Status LEDs to Monitor the System
Enclosure Status
Amber Flashes when EMU is operational and
performin g locate. On when EM U is operational, but an
enclosur e fa ult exists. Off when EMU is operational, or power
has just been applied to an enclosure , or when an EM U fa ult exists that is not an enclo su re fa ult , or w hen an enc los ur e fa ult exists.
FC-AL I/O Module
Power-on Middle
Green
Lights when power is on an d m odule is available f or normal operation. If light is off, the module is nonoperational: check FCSAs, cables, and power supplies.
Port 1 Bottom
Green
Lights when c arrier on Port 1 is opera­tional.
Port 2 Top Green Lights when carrier on Port 2 is opera-
tional.
Fibre Cha nnel ServerNet adapter (F C SA)
Power-on Green Lights when the adapter is receiving
power.
Service Amber Lights to indicate internal failure or
serv ice acti on required.
Gigabit Et hernet 4-port ServerNet adapter (G4SA)
Power-on Green Lights when the adapter is receiving
power.
Service Amber Lights to indicate internal failure or
serv ice acti on required.
LSU I/O PI C Power-on Gre en Lights when pow er is on and adapter is
available f or normal operation.
Service Amb er Lights when a POST is in progre s s , b oard
is being reset, or a fault exists.
LSU optics adapter connector
Power-on Gr een Ligh ts when NonSt op Blad e Element optic
or ServerNet link is functional.
LSU logic board Power-on Green Lights when pow er is on and adapter is
available f or normal operation.
Service Amb er Lights when a POST is in progre s s , b oard
is being reset, or a fault exists.
Table 3-4. Status LEDs and Their Functions (page2of3)
Location LED Name Color Function
Overview of Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
3-22
Related Reading
Related Reading
For more information about monitoring, see the documentation listed in Table 3-5.
NonStop Bl ade Element
Power-on Flashing
Green
Lights when power is on and Blade Ele­ment is ava ilable for norm al operation.
Flashing Yellow
Lights when Blade Element is in low power mo de.
Service Steady
Amber
Lights when a hardware or software fault exists.
Locator Flashing
Blue
Lights when the system locator is acti­vated.
P-switch PICs Power-on Green Lights when power is on with PIC avail-
able for norm al operation.
Amber Lights when a fault exists.
P-switch PIC ServerNet connector
Power-on Green Lights when a ServerNet link is functional.
Table 3-5. Related Reading for Monitoring
Task Tool For information, see...
Monitoring system hardware, including locating failed or failing FRUs
OSM Serv ice Connection
OSM online help
OSM Service Connection User’s Guide
Using SCF, its commands and options, and devic e t yp es and subtypes
SCF interface to subsystems
SCF Reference Manual for H-Series RVUs SCF Reference Manual for the Storage
Subsystem
Monitoring clustered servers
OSM Serv ice Connection
ServerNet Cluster 6780 Operations Guide ServerNet Cluster Manual
Table 3-4. Status LEDs and Their Functions (page3of3)
Location LED Name Color Function
HP Integrity NonStop NS-Series Operations Guide—529869-005
4-1
4
Monitoring EMS Event Messages
When to Use This Section on page 4-1 What Is the Event Management Service (EMS)? on page 4-1 Tools for Monitoring EMS Event Messages on page 4-1
OSM Event Viewer on page 4-2 OSM Event Viewer on page 4-2 ViewPoint on page 4-2 Web ViewPoint on page 4-2
Related Reading on page 4-2
When to Use This Section
Use this section for a brief description of the Event Management Service (EMS) and the tools used to monitor EMS event messages.
What Is the Event Management Service (EMS)?
The Event Management Service (EMS) is a collection of processes, tools, and interfaces that support the reporting and retrieval of event information. Information retrieved from EMS can help you to:
Monitor your system or network environment
Analyze circumstances that led up to a problem
Detect failure patterns
Adjust for changes in the run-time environment
Recognize and handle critical problem s
Perform many other tasks required to maintain a productive computing operation
Tools for Monitoring EMS Event Messages
To view EMS event messages for an Integrity NonStop server, use one of these tools:
OSM Event Viewer
EMSDIST
ViewPoint
Web ViewPoint
Monitoring EMS Event Messages
HP Integrity NonStop NS-Series Operations Guide—529869-005
4-2
OSM Event Viewer
OSM Event Viewer
The OSM Event Viewer is a browser-based event viewer. The OSM Event Viewer allows you to retrieve and view events from any EMS formatted log files ($0, $ZLOG, or an alternate collector) for rapid assessment of operating system problems.
To access the OSM Event Viewer, refer to Launching OSM Applications on pa ge 1-11. For details on how to use the OSM Event Viewer, refer to the online help.
EMSDIST
The EMSDIST program is the object program for a printing, forwarding, or consumer distributor, any of which you can start with a TACL RUN command. This guide does not describe using EMSDIST. For more information, see the Guardian User’s Guide.
ViewPoint
ViewPoint displays event messages about current or past events occurring anywhere in the network on a set of block-mode events screens. The messages can be errors, failures, warnings, and requests for operator actions. The events screens allow operators to monitor significant occurrences or problems in the network as they occur. Critical events or events requiring immediate action are highlighted.
Web View Po i nt
Web ViewPoint, a browser-based product, accesses the Event Viewer, Object Manager, and Performance Monitor subsystems. Web ViewPoint monitors and displays EMS events; identifies and lists all supported subsystems; manages NonStop server subsystems and user applications in a secure, automated, and customizable way; monitors and gr aphs pe rfor mance attribu tes and tr ends; investi gates a nd displays most active system processes; and offers simple navigation and a point-and-click command interface.
Related Reading
For more information about monitoring EMS event messages, see the documentation in Table 4-1.
Table 4-1. Related Reading for Monitoring EMS Event Messages
Task Tool For information, see...
Viewing eve nt logs EMSDIST Guardian User’s Guide
ViewPoint ViewPoint Manual OSM Eve nt
Viewer
OSM Ev ent Viewer onlin e help
HP Integrity NonStop NS-Series Operations Guide—529869-005
5-1
5
Processes: Monitoring and Recovery
When to Use This Section on page 5-1 Types of Processes on page 5-1
System Processes on page 5-1 I/O Processes (IOPs) on page 5-2 Generic Processes on page 5-2
Monitoring Processes on page 5-3
Monitoring System Processes on page 3 Monitoring IOPs on page 4
Monitoring Generic Processes on page 4 Recovery Operations for Processes on page 5-6 Related Reading on page 5-6
When to Use This Section
This section provides basic information about the different types of processes for Integrity NonStop servers. It gives a brief example of monitoring each type of process and provides information about the commands available for recovery operations.
Types of Processes
Three types of processes are of major concern to a system operator of an Integrity NonStop NS-series server:
System processes
I/O processes (IOPs)
Generic processes
System Processes
A system process is a privileged process that is created during system load and exists continuously for a given configuration for as long as the processor remains operable. Examples of system processes include the memory manager, the monitor, and the I/O control processes.
Processes: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
5-2
I/O Processes (IOPs)
I/O Processes (IOPs)
An I/O process (IOP) is a system process that manages communications between a processor and I/O devices. IOPs are often configured as fault-tolerant process pairs, and they typically control one or more I/O devices or communications lines. Each IOP is configured in a maximum of two processors, typically a primary processor and a backup processor.
An IOP provides an application program interface (API) that allows access to an I/O interface. A wide area network (WAN) communications line is an example of an I/O interface. IOPs configured using the SCF interface to the WAN subsystem manage the input and output functions for the ServerNet wide area network (SWAN) concentrator. Examples of IOPs include, but are not limited to, line-handler processes for Expand and other communications subsystems.
Generic Processes
Generic processes are configured by the SCF interface to the Kernel subsystem. They can be configured in one or more processors. Although sometimes called system­managed processes, generic processes can be either system processes or user­created processes. Any process that can be started from a TACL prompt can be configured as a generic process. Generic processes can be configured to have persistence; that is, to automatically restart if stopped abnormally.
Examples of generic processes:
The $ZZKRN Kernel subsystem manager process
Other generic processes controlled by $ZZKRN; for example:
°
The $ZZSTO storage subsystem manager process
°
The $ZZWAN wide area network (WAN) subsystem manager process
°
QIO processes
°
OSM server processes
°
The $ZZLAN ServerNet LAN Systems Access (SLSA) subsystem manager process
°
The $FCSMON fibre channel storage monitor
For more information, refer to the SCF Reference Manual for the Kernel Subsystem.
Processes: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
5-3
Monitoring Processes
Monitoring Processes
This subsection briefly provides examples of some of the tools available to monitor processes. For some processes, such as IOPs, monitoring is more fully discussed in other manuals. In general, use this method to monitor processes:
1. Develop a list of processes that are crucial to the operation of your system.
2. Determine how each of these processes is configured.
3. Use the appropriate tool to monitor the process.
Monitoring System Processes
Check that the system processes are up and running in the processors as you intended. At a TACL prompt:
> STATUS *
This example shows partial output produced by the TACL STATUS * command:
$SYSTEM STARTUP 2> status * Process Pri PFR %WT Userid Program file Hometerm 0,0 201 P R 000 255,255 $SYSTEM.SYS14.NMONTOR $YMIOP.#CLCI 0,1 210 P 040 255,255 $SYSTEM.SYS14.NMEMMAN $YMIOP.#CLCI 0,2 210 P 051 255,255 $SYSTEM.SYS14.NMSNGERR $YMIOP.#CLCI $0 0,3 201 P 011 255,255 $SYSTEM.SYS14.OPCOLL $YMIOP.#CLCI 0,4 211 P 017 255,255 $SYSTEM.SYS14.TMFMON $YMIOP.#CLCI $YMIOP 0,5 205 P 251 255,255 $SYSTEM.SYS14.TMIOP $YMIOP.#CLCI $ZNUP 0,6 200 P 015 255,255 $SYSTEM.SYS14.NZNUP $YMIOP.#CLCI $Z0 0,7 200 P 015 255,255 $SYSTEM.SYS14.OCDIST $YMIOP.#CLCI $ZOPR 0,8 201 P 011 255,255 $SYSTEM.SYS14.OAUX $YMIOP.#CLCI $ZCNF 0,9 200 P 001 255,255 $SYSTEM.SYS14.TZCNF $YMIOP.#CLCI $ZTM00 0,11 200 P 017 255,255 $SYSTEM.SYS14.TMFMON2 $YMIOP.#CLCI $TMP 0,12 204 P 005 255,255 $SYSTEM.SYS14.TMFTMP $YMIOP.#CLCI $ZL00 0,13 200 P 001 255,255 $SYSTEM.SYS14.ROUT $ZHOME $NCP 0,14 199 P 011 255,255 $SYSTEM.SYS14.NCPOBJ $ZHOME $ZEXP 0,15 150 P 001 255,255 $SYSTEM.SYS14.OZEXP $ZHOME $CLCI 0,34 199 000 0,0 $SYSTEM.SYS14.TACL $YMIOP.#CLCI $TRAK 0,40 146 000 255,255 $SYSTEM.SYSTOOLS.QATRACK $ZHOME $Z00Y 0,43 150 015 255,255 $SYSTEM.SYS14.FDIST $ZHOME $NULL B 0,45 147 001 255,255 $SYSTEM.SYSTEM.NULL $Z01J $ZNET 0,64 175 P 011 255,255 $SYSTEM.SYS14.SCP $ZHOME $Z1RL 0,249 148 R 000 98,98 $SYSTEM.SYS14.TACL $ZTNT.#PTBY5D $SYSTEM 0,257 220 P 317 255,255 $SYSTEM.SYS14.TSYSDP2 $YMIOP.#CLCI $ZHOME 0,292 199 P 001 255,255 $SYSTEM.SYS14.ZHOME $YMIOP.#CLCI $ZM00 0,294 201 P 015 255,255 $SYSTEM.SYS14.QIOMON $ZHOME $ZZWAN 0,295 180 011 255,255 $SYSTEM.SYS14.WANMGR $ZHOME $ZZSTO 0,296 180 P 011 255,255 $SYSTEM.SYS14.TZSTO $ZHOME $ZZLAN 0,297 199 P 015 255,255 $SYSTEM.SYS14.LANMAN $ZHOME $ZZKRN 0,298 180 P 011 255,255 $SYSTEM.SYS14.OZKRN $ZHOME $Z000 0,299 180 P 011 255,255 $SYSTEM.SYS14.TZSTOSRV $ZHOME $ZLM00 0,300 200 P 015 255,255 $SYSTEM.SYS14.LANMON $ZHOME $IXPOHO 0,301 199 P 355 255,255 $SYSTEM.SYS14.LHOBJ $ZHOME $ZTXAE 0,330 145 015 255,255 $SYSTEM.SYS14.SNMPTMUX $ZHOME $ZWBAF 0,333 179 P 015 255,255 $SYSTEM.SYS14.WANBOOT $ZHOME $ZZW00 0,334 199 P 215 255,255 $SYSTEM.SYS14.CONMGR $ZHOME $DSMSCM 0,335 220 P 317 255,255 $SYSTEM.SYS14.TSYSDP2 $ZHOME $DATA2 0,336 220 P 317 255,255 $SYSTEM.SYS14.TSYSDP2 $ZHOME $ZLOG 0,340 150 011 255,255 $SYSTEM.SYS14.EMSACOLL $ZHOME $ZTH00 0,343 148 P 005 255,255 $SYSTEM.SYS14.TFDSHLP $YMIOP.#CLCI $DSMSCM 0,344 220 P 317 255,255 $SYSTEM.SYS14.TSYSDP2 $ZHOME $Z1RM 1,80 148 005 255,255 $SYSTEM.SYS14.TACL $ZTNT.#PTBY5D $ZPP01 1,280 160 P 015 255,255 $SYSTEM.SYS14.OSSPS $YMIOP.#CLCI
Processes: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
5-4
Monitoring IOPs
Monitoring IOPs
For a list of manuals that provide more information about monitoring I/O processes (IOPs), refer to the WAN Subsystem Configuration and Management Manual, the
SWAN Concentrator and WAN Subsystem Troubleshooting Guide, and the Expand Configuration and Management Manual.
Monitoring Generic Processes
Because generic processes are configured using the SCF interface to the Kernel subsystem, you specify the $ZZKRN Kernel subsystem manager process when monitoring a generic process. These SCF commands are available for monitoring $ZZKRN and other generic processes:
Monitoring the Status of $ZZKRN
To monitor the status of the $ZZKRN Kernel subsystem manager process, at a TACL prompt:
> SCF STATUS SUBSYS $ZZKRN
This example shows the output produced by this command:
Monitoring the Status of All Generic Processes
To monitor the status of all generic processes controlled by $ZZKRN, at a TACL prompt:
> SCF STATUS PROCESS $ZZKRN.#*
$ZLM01 1,342 200 P 015 255,255 $SYSTEM.SYS14.LANMON $ZHOME $ZTC0 B 1,352 200 P 011 255,255 $SYSTEM.SYS14.TCPIP $ZHOME $ZTNT B 1,355 149 001 255,255 $SYSTEM.SYS14.TELSERV $ZHOME $ZPORT B 1,357 149 001 255,255 $SYSTEM.SYS14.LISTNER $ZHOME $KLA9E 1,424 147 001 255,255 $DATA2.KMZTT.LOGGER $ZTNT.#PTBY5D $ZTM02 2,5 200 P 017 255,255 $SYSTEM.SYS14.TMFMON2 $YMIOP.#CLCI $GRD2 2,243 147 P 001 255,255 $DATA2.QA9050.RUNNER $ZTNT.#PTBY5CV $ZP02A B 2,300 195 001 255,255 $SYSTEM.ZRPC.PORTMAP $ZHOME $ZCMOM B 2,303 150 001 255,255 $SYSTEM.SYS14.CIMOM $ZHOME
INFO Displays configuration information for the specified objects NAMES Displays a list of subordinate object types and names for the
specified objects
STATUS Displays current status in formati on about the specified obje ct s
1 -> STATUS SUBSYS $ZZKRN NONSTOP KERNEL - Status SUBSYS \COMM.$ZZKRN Name State Processes
(conf/strd) \COMM.$ZZKRN STARTED ( 25/22 )
Processes: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
5-5
Monitoring Generic Processes
This example shows the output produced by this command:
1-> STATUS PROCESS $ZZKRN.#* NONSTOP KERNEL - Status PROCESS \DRP25.$ZZKRN.#CLCI-TACL Symbolic Name Name State Sub Primary Backup Owner
PID PID ID CLCI-TACL $CLCI STOPPED None None MSGMON $ZIM00 STARTED 0 ,306 None 255,255 MSGMON $ZIM01 STARTED 1 ,291 None 255,255 MSGMON $ZIM02 STARTED 2 ,285 None 255,255 MSGMON $ZIM03 STARTED 3 ,280 None 255,255 MSGMON $ZIM04 STARTED 4 ,280 None 255,255 MSGMON $ZIM05 STARTED 5 ,280 None 255,255 MSGMON $ZIM06 STARTED 6 ,280 None 255,255 MSGMON $ZIM07 STARTED 7 ,280 None 255,255 MSGMON $ZIM08 STARTED 8 ,280 None 255,255 MSGMON $ZIM09 STARTED 9 ,280 None 255,255 MSGMON $ZIM10 STARTED 10,280 None 255,255 MSGMON $ZIM11 STOPPED None None MSGMON $ZIM12 STOPPED None None MSGMON $ZIM13 STOPPED None None MSGMON $ZIM14 STOPPED None None MSGMON $ZIM15 STOPPED None None OSM-APPSRVR $ZOSM STARTED 2 ,292 None 255,255 OSM-CIMOM $ZCMOM STARTED 2 ,294 3 ,288 255,255 OSM-CONFLH-RD $ZOLHI STOPPED None None OSM-OEV $ZOEV STARTED 2 ,290 None 255,255 QATRAK $TRAK STARTED 0 ,17 None 255,255 QIOMON $ZM00 STARTED 0 ,290 None 255,255 QIOMON $ZM01 STARTED 1 ,280 None 255,255 QIOMON $ZM02 STARTED 2 ,280 None 255,255 QIOMON $ZM03 STARTED 3 ,279 None 255,255 QIOMON $ZM04 STARTED 4 ,279 None 255,255 QIOMON $ZM05 STARTED 5 ,279 None 255,255 QIOMON $ZM06 STARTED 6 ,279 None 255,255 QIOMON $ZM07 STARTED 7 ,279 None 255,255 QIOMON $ZM08 STARTED 8 ,279 None 255,255 QIOMON $ZM09 STARTED 9 ,279 None 255,255 QIOMON $ZM10 STARTED 10,279 None 255,255 QIOMON $ZM11 STOPPED None None QIOMON $ZM12 STOPPED None None QIOMON $ZM13 STOPPED None None QIOMON $ZM14 STOPPED None None QIOMON $ZM15 STOPPED None None RTACL $RTACL STOPPED None None SCP $ZNET STARTED 0 ,14 1 ,13 255,255 SP-EVENT $ZSPE STARTED 0 ,309 None 255,255 TFDSHLP $ZTH00 STARTED 0 ,310 None 255,255 TFDSHLP $ZTH01 STARTED 1 ,292 None 255,255 TFDSHLP $ZTH02 STARTED 2 ,286 None 255,255 TFDSHLP $ZTH03 STARTED 3 ,281 None 255,255 TFDSHLP $ZTH04 STARTED 4 ,281 None 255,255 TFDSHLP $ZTH05 STARTED 5 ,281 None 255,255 TFDSHLP $ZTH06 STARTED 6 ,281 None 255,255 TFDSHLP $ZTH07 STARTED 7 ,281 None 255,255 TFDSHLP $ZTH08 STARTED 8 ,281 None 255,255 TFDSHLP $ZTH09 STARTED 9 ,281 None 255,255 TFDSHLP $ZTH10 STARTED 10,281 None 255,255 TFDSHLP $ZTH11 STOPPED None None TFDSHLP $ZTH12 STOPPED None None
Processes: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
5-6
Recovery Operations for Processes
In nearly all circumstances, items that are essential to system operations that must be running at all times restart automatically if they are stopped for any reason while the NonStop Kernel operating system is running.
Some OSM processes stop after executing a macro that runs during system load or during the reload of processor 0 or 1. Those processes include $ZOLHI.
Optionally, you can also configure other processes such as the Expand subsystem manager process, $ZEXP, and the Safeguard monitor process, $ZSMP, as generic processes.
Recovery Operations for Processes
For recovery operations on generic processes, use the SCF interface to the Kernel subsystem and specify the PROCESS object. These SCF commands are available for controlling generic processes:
Generic processes that are configured to be persistent usually do not require operator intervention for recovery. In most circumstances, persistent generic processes restart automatically.
For recovery operations on IOPs, refer to the WAN Subsystem Configuration and
Management Manual, the SWAN Concentrator and WAN Subsystem Troubleshooting Guide, and the Expand Configuration and Management Manual.
For recovery operations on system processes, refer to the Guardian User’s Guide.
Related Reading
For more information about generic processes and the SCF interface to the Kernel subsystem, refer to the SCF Reference Manual for the Kernel Subsystem.
For more information about IOPs, refer to the WAN Subsystem Configuration and
Management Manual, the SWAN Concentrator and WAN Subsystem Troubleshooting Guide, and the Expand Configuration and Management Manual.
TFDSHLP $ZTH13 STOPPED None None TFDSHLP $ZTH14 STOPPED None None TFDSHLP $ZTH15 STOPPED None None ZEXP $ZEXP STARTED 0 ,13 1 ,15 255,255 ZHOME $ZHOME STARTED 0 ,289 1 ,295 255,255 ZLOG $ZLOG STARTED 0 ,308 1 ,329 255,255 ZZKRN $ZZKRN STARTED 0 ,293 1 ,319 255,255 ZZLAN $ZZLAN STARTED 0 ,292 1 ,297 255,255 ZZSCL $ZZSCL STARTED 1 ,290 2 ,279 255,255 ZZSMN $ZZSMN STARTED 1 ,289 2 ,282 255,255 ZZSTO $ZZSTO STARTED 0 ,291 1 ,320 255,255 ZZWAN $ZZWAN STARTED 2 ,296 3 ,289 255,255
ABORT Terminates operation of a generic process. This command is not
supported for the subsystem manager processes.
START Initiates the operation of a generic process.
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-1
6
Communications Subsystems: Monitoring and Recovery
When to Use This Section on page 6-1 Communications Subsystems on page 6-1
Local Area Networks (LANs) and Wide Area Networks (WANs) on page 6-2 Monitoring Communications Subsystems and Their Objects on page 6-4
Monitoring the SLSA Subsystem on page6-4
Monitoring the WAN Subsystem on page 6-6
Monitoring the NonStop TCP/IP Subsystem on page 6-9
Monitoring Line-Handler Process Status on page 6-10
Tracing a Communications Line on page 6-12 Recovery Operations for Communications Subsystems on page 6-13 Related Reading on page 6-13
When to Use This Section
Use this section to determine where to find more information about monitoring and recovery operations for communications devices such as ServerNet adapters, printers, and spoolers; communications lines; and communications processes such as WAN IOPs.
Communications Subsystems
The software that provides users of Integrity NonStop systems with access to a set of communications services is called a co mmunications su bsystem . Because con nectivity is an important part of online transaction processing (OLTP), HP offers a variety of communications products that support a wide range of applications.
Communication between specific devices or networks is typically achieved using several communications products or subsystems. These products are related as component s in a layered structur e. To accomplish the required connection, higher-level components—for example, NonStop TCP/IP processes—use the services of lower­level components such as the ServerNet LAN Systems Access (SLSA) subsystem.
The same higher-level component can often use any of several lower-level components; thus, the Expand subsystem—which consists of multiple processes on a node—can use the NonStop TCP/IP subsystem, the X.25 Access Method (X.25 AM),
Communications Subsystems: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-2
Local Area Networks (LANs) and Wide Area
Networks (WANs)
or other communication interface options to provide data transmissions over local area networks (LANs) or wide area networks (WANs), respectively. Similarly, multiple higher-level components can use the services of a single lower-level component.
Local Area Networks (LANs) and Wide Area Networks (WANs)
Two important communications interfaces for LANs and WANs on Integrity NonStop servers are the SLSA subsystem and the WAN subsystem.
The SLSA subsystem supports parallel LAN I/O operations, allowing Integrity NonStop NS-series servers to communicate across the ServerNet fabrics and access Ethernet devices through various LAN protocols. SLSA also communicates with the appropriate adapter type over the ServerNet fabrics. Adapters supported on Integrity NonStop systems include:
Gigabit Ethernet 4-port adapter (G4SA)
Fibre Channel ServerNet adapter (FCSA) (for the Storage subsystem) I/O adapter module (IOAM) enclosures enable I/O operations to take place between
Integrity NonStop servers and some Fibre Channel storage devices. See the Modular I/O Installation and Configuration Guide for more information.
Adapters supported on NonStop S-series servers that can be accessed through Expand over IP, include:
ATM 3 ServerNet adapter (ATM3SA)
Ethernet 4 ServerNet adapter (E4SA)
Fast Ethernet ServerNet adapter (FESA)
Gigabit Ethernet ServerNet adapter (GESA)
Gigabit Ethernet 4-Port ServerNet adapter (G4SA)
Multifunction I/O board (MFIOB) in the processor multifunction (PMF) customer-
replaceable unit (CRU) and I/O multifunction (IOMF) CRU
Token-Ring ServerNet adapter (TRSA) For further information, refer to the Introduction to Networking for NonStop NS-Series
Servers. In addition to the adapters, the SLSA subsystem supports these objects:
Processes
Monitors
ServerNet addressable controllers (SACs)
Logical interfaces (LIFs)
Filters
Communications Subsystems: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-3
Local Area Networks (LANs) and Wide Area
Networks (WANs)
Physical interfaces (PIFs) Processes that use the SLSA subsystem to send and receive data on a LAN attached
to an Integrity NonStop server are called LAN service providers. Two service providers—the NonStop TCP/IP and NonStop TCP/IPv6 subsystems and the Port Access Method (PAM)—are currently supported. They provide access for these subsystems:
Processes, user applications, and subsystems that use the SLSA subsystem and related LAN providers to connect to an FCSA or G4SA attached to an Integrity NonS top NS-series server are called LAN clients. For exam ple, the W A N subsystem is a client of the SLSA subsystem because the SLSA subsystem provides the WAN subsystem access to the ServerNet wide area network (SWAN) concentrator through the LAN.
The WAN subsystem is used to control access to the SWAN concentrator. Depending on your configuration, it can be used to configure and manage both WAN and LAN connectivity for these communication subsystem objects:
You can define these communications subsystem objects as WAN subsystem devices.
LAN Service Provider Subsystems Supported
NonSto p TCP/IP subsyste m, NonSto p TCP/IPv6 subsys tem
The Expand subsystem, which provides Expand-over­IP connections.
Port Access Method (PAM) Ethernet and token-ring LANs. The OSI/AS, OSI/TS,
SNAX/XF, and SNAX/APN subsystems communicate with SLSA through the PAM subsystem.
Object Conne ctiv it y By
AM3270 Line-han dler processes Asynchronous Terminal Process
6100 (ATP6100)
Line-han dler processes
Communications Process subsystem (CP6100)
Line-han dler processes
EnvoyA C P/ XF Line-han dler processes Envoy subsystem Line-handler processes Expand Subsystem network control process and line-handler
processes
ServerN et cl us t er (Expand-ov er­ServerNet)
Line-han dler processes
SNAX/APN Subsystem service manager process and line-han dler
processes
SNAX/XF Subsystem service manager p roce ss and li n e-han dler
processes TR3271 Line-han dler processes X25AM Line-handler proces s es
Communications Subsystems: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-4
Monitoring Communications Subsystem s and Their
Objects
Monitoring Communications Subsystems and Their Objects
Monitoring and recovery operations for communications subsystems can be complex. An error in any of the components—service providers, clients, objects, adapters, processes, and so on—can generate multiple error messages from many interdependent subsystems and processes. Analyzing and solving an error that originates in an object controlled by a LAN or a WAN often requires that you methodically gather status information about the affected services and then eliminate objects that are working normally.
Detailed monitoring and recovery techniques for devices and processes related to communications subsystems are discussed in detail in the manuals for each subsystem. For more information, refer to Related Reading on page 6-13.
This guide provides some basic commands you can use to identify and resolve common problems. Your most powerful tool for monitoring and collecting information about subsystem objects is the SCF facility. You can use SCF commands to get information and status for subsystem objects by name, device type, or device subtype.
Subdevices are defined if a subsystem potentially operates on numerous, separately addressable objects, such as stations on a multipoint line; the line is a device, and the stations are subdevices.
For a list of subsystems with their device type numbers and device subtypes, see
Using SCF to Determine Your System Configuration on page 2-5.
Monitoring the SLSA Subsystem
This subsection describes how to obtain the status of adapters, SACs, LIFs, and PIFs. For more information on the SLSA subsystem, refer to the LAN Configuration and Management Manual.
Monitoring the Status of an Adapter and Its Components
1. To monitor the status of an adapter:
> SCF STATUS ADAPTER adapter-name
A listing similar to this example is sent to your home terminal:
->STATUS ADAPTER $ZZLAN.G11123 SLSA Status ADAPTER Name State
$ZZLAN.G11123 STARTED
Communications Subsystems: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-5
Monitoring the SLSA Subsystem
This example shows the listing displayed when checking all adapters on $ZZLAN:
> SCF STATUS ADAPTER $ZZLAN.*
2. The SAC object corresponds directly to the hardware on an adapter. A SAC is a component of an adapter and can support one or more PIFs. To monitor the status of a SAC:
> SCF STATUS SAC sac-name
A listing similar to this example is sent to your home terminal:
This example shows a listing of the status of all SACs on $ZZLAN.G11123:
> SCF STATUS SAC $ZZLAN.G11123*
3. The PIF object corresponds directly to hardware on the adapter. A PIF is the physical connection to the LAN. To monitor the status of a PIF:
> SCF STATUS PIF pif-name
A listing similar to this example is sent to your home terminal:
1->STATUS ADAPTER $ZZLAN.* SLSA Status ADAPTER Name State
$ZZLAN.G11121 STARTED $ZZLAN.G11122 STARTED $ZZLAN.G11123 STARTED $ZZLAN.G11124 STARTED $ZZLAN.G11125 STARTED $ZZLAN.MIOE0 STARTED $ZZLAN.MIOE1 STARTED
1->STATUS SAC $ZZLAN.G11123.O SLSA Status SAC Name Owner State Trace Status
$ZZLAN.G11123.0 1 STARTED ON
->STATUS SAC $ZZLAN.G11123* SLSA Status SAC Name Owner State Trace Status
$ZZLAN.G11123.0 1 STARTED ON
->STATUS PIF $ZZLAN.G11123.0 SLSA Status PIF Name State Trace Status
$ZZLAN.G11123.0.A STARTED ON
Communications Subsystems: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-6
Monitoring the WAN Subsystem
This example shows a listing of the status of all PIFs on $ZZLAN.G11123:
> SCF STATUS PIF $ZZLAN.G11123.*
4. The LIF provides an interface to the PIF. The LIF object corresponds to logical processes that handle data transferred between the LAN and a system using the ServerNet architecture. To monitor the status of a LIF:
> SCF STATUS LIF lif-name
A listing similar to this example is sent to your home terminal:
This example shows a detailed listi ng of the statu s of the LIF on $ZZLAN.L11021A:
> SCF STATUS LIF $ZZLAN.L11021A , DETAIL
Monitoring the WAN Subsystem
This subsection describes how to obtain the status of SWAN concentrators, data communications devices, processes, and CLIPs. For more information on the WAN subsystem, see the WAN Subsystem Configuration and Management Manual.
Monitoring Status for a SWAN Concentrator
To display the current status for a SWAN concentrator:
> SCF STATUS ADAPTER $ZZWAN.#concentrator-name
->STATUS PIF $ZZLAN.G11123.* SLSA Status PIF Name State Trace Status
$ZZLAN.G11123.0.A STARTED ON $ZZLAN.G11123.0.B STARTED ON $ZZLAN.G11123.0.C STOPPED OFF $ZZLAN.G11123.0.D STARTED ON
->STATUS LIF $ZZLAN.L11021A SLSA Status LIF Name State Access State
$ZZLAN.L11021A STARTED UP
->STATUS LIF $ZZLAN.L11021A , DETAIL SLSA Detailed Status LIF \SYS.$ZZLAN.L11021A
Access State............. UP
CPUs with Data Path...... ( 0, 1, 2 )
Potential Access CPUs.... ( 0, 1, 2, 3 )
State.................... STARTED
Trace Filename...........
Trace Status.............
Communications Subsystems: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-7
Monitoring the WAN Subsystem
The system displays a listing similar to:
To display the status for all SWAN concentrators configured for your system:
> SCF STATUS ADAPTER $ZZWAN.*
The system displays a listing similar to:
Monitoring Status for a Data Communications Device
To verify that a WAN subsystem device is in the STARTED state:
> SCF STATUS DEVICE $ZZWAN.#device-name
The system displays a listing similar to:
-> status adapter $zzwan.#s01 WAN Manager STATUS ADAPTER for ADAPTER \TAHITI.$ZZWAN.#S01
State........... STARTED
Number of clips. 3 Clip 1 status : CONFIGURED
Clip 2 status : CONFIGURED Clip 3 status : CONFIGURED
1-> STATUS ADAPTER $ZZWAN.* WAN Manager STATUS ADAPTER for ADAPTER \COMM.$ZZWAN.#SWAN1
State........... STARTED
Number of clips. 3 Clip 1 status : CONFIGURED
Clip 2 status : CONFIGURED Clip 3 status : CONFIGURED
WAN Manager STATUS ADAPTER for ADAPTER \COMM.$ZZWAN.#SWAN2
State........... STARTED
Number of clips. 3 Clip 1 status : CONFIGURED
Clip 2 status : CONFIGURED Clip 3 status : CONFIGURED
-> status DEVICE $zzwan.#IP01 WAN Manager STATUS DEVICE for DEVICE \COWBOY.$ZZWAN.#IP01
STATE ...........STARTED
LDEV number.. ..173
PPIN...........2, 13 BPIN............3, 11
Communications Subsystems: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-8
Monitoring the WAN Subsystem
Monitoring WAN Processes
To display the status of all WAN subsystem processes—configuration managers, TCP/IP processes, WANBoot processes:
> SCF STATUS PROCESS $ZZWAN.*
The system displays a listing similar to:
To monitor a single WANBoot process, type:
> SCF STATUS PROCESS $ZZWAN.#boot-process
The system displays a listing similar to:
-> STATUS PROCESS $ZZWAN.* WAN Manager STATUS PROCESS for PROCESS \COMM.$ZZWAN.#5
State :......... STARTED
LDEV Number..... 66
PPIN............ 5 ,264 Process traced.. NO
WAN Manager STATUS PROCESS for PROCESS \COMM.$ZZWAN.#4
State :......... STARTED
LDEV Number..... 67
PPIN............ 4 ,264 Process traced.. NO
WAN Manager STATUS PROCESS for PROCESS \COMM.$ZZWAN.#ZTF00
State :......... STARTED
PPIN............ 4 ,342
WAN Manager STATUS PROCESS for PROCESS \COMM.$ZZWAN.#SWB1
State :......... STARTED
PPIN............ 4 ,275 BPIN............ 5 ,302
WAN Manager STATUS PROCESS for PROCESS \COMM.$ZZWAN.#ZTF01
State :......... STARTED
PPIN............ 5 ,340
WAN Manager STATUS PROCESS for PROCESS \COMM.$ZZWAN.#SWB0
State :......... STARTED
PPIN............ 4 ,274 BPIN............ 5 ,303
-> status PROCESS $ZZWAN.#ZB017 WAN Manager STATUS PROCESS for PROCESS \ICEBAT.$ZZWAN.#ZB017
STATE:...........STARTED
PPIN.............0 ,278 BPIN.............0, 282
Communications Subsystems: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-9
Monitoring the NonStop TCP/IP Subsystem
Monitoring CLIPs
To display the current status for a CLIP:
> SCF STATUS SERVER $ZZWAN.#concentrator-name.clip-num
Values for the CLIP number are 1, 2, or 3. The system displays a listing similar to:
Monitoring the NonStop TCP/IP Subsystem
This subsection describes how to obtain the status for NonStop TCP/IP processes, routes, and subnets. For additional information, refer to the TCP/IP Configuration and
Management Manual. For NonStop TCP/IPv6, refer to the TCP/IPv6 Configuration and Management Manual.
Monitoring the NonStop TCP/IP Process
To display the dynamic state of a NonStop TCP/IP process, first list the names of all NonStop TCP/IP processes:
-> SCF LISTDEV TCPIP
Then type:
> SCF STATUS PROCESS tcp/ip-process-name
where tcp/ip-process-name is the name of the process you want information about.
The system displays a listing similar to this output, which is for process $ZTCO:
-> status server $zzwan.#s01.1 WAN Manager STATUS SERVER for CLIP \COWBOY.$ZZWAN.#S01.1
STATE :..........STARTED
PATH A...........: CONFIUGRED
PATH B...........: CONFIGURED
NUMBER of lines. 2
Line...............0 : $SAT23A
Line...............1 : $SAT23B
-> Status Process $ZTCO TCPIP Status PROCESS \SYSA.$ZTCO Status: STARTED
PPID.................( 0,107) BPID.............. ( 1. 98)
Proto State Laddr Lport Faddr Fport SendQ RecvQ TCP TIME-WAIT 130.252.12.3 ftp-data 130.252.12.152 11089 0 0 TCP TIME-WAIT 130.252.12.3 ftp-data 130.252.12.152 63105 0 0 TCP ESTAB 130.252.12.3 ftp 130.252.12.152 57441 0 0 TCP TIME-WAIT 130.252.12.3 smtp 130.252.12.8 3309 0 0
Communications Subsystems: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-10
Monitoring Line-Handler Process Status
Monitoring NonStop TCP/IP Routes
To display status information for all NonStop TCP/IP routes:
> SCF STATUS ROUTE $ZTCO.*
The system displays a listing similar to:
Monitoring NonStop TCP/IP Subnets
To obtain the status of all NonStop TCP/IP subnets:
> SCF STATUS SUBNET $ZTC0.*
The system displays a listing similar to:
Monitoring Line-Handler Process Status
A line-handler process is a component of a data communications subsystem. It is an I/O process that transmits and receives data on a communications line, either directly or by communicating with another I/O process. This subsection explains how to monitor the status of a line-handler process on your system or on another system in your network to which you have remote access.
To check the status of a line-handler process on your system:
> SCF STATUS LINE $line
A listing similar to this example is sent to your home terminal:
1-> Status Route $ZTCO.* TCPIP Status ROUTE \SYSA.$ZTCO.* Name Status RefCnt #ROU11 STARTED 0
#ROU9 STARTED 0 #ROU12 STARTED 0 #ROU8 STARTED 1 #ROU3 STOPPED 0
1-> STATUS SUBNET $ZTC0.* TCPIP Status SUB NET \SYSA.$ZTC0.* Name Status
#LOOP0 STARTED #EN1 STARTED
1-> STAT US LI NE $LHPLIN1 EXPAND Status LINE Name State PPID BPID ConMgr-LDEV
$LHCS6S STARTED 1, 20 2,25 49
Communications Subsystems: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-11
Monitoring Line-Handler Process Status
This listing shows that the Expand line-handler process being monitored is up and functioning normally.
The data shown in the report means:
If any state other than STARTED appears, check the meaning of the state in SCF
Object States on page 3-14. Depending upon the type of problem, follow your
established procedures for problem reporting and escalation.
Examples
To check the detailed status of line $LHCS6S:
> SCF STATUS LINE $LHCS6S, DETAIL
A listing such as this output is sent to your home terminal:
To display the status of all the Expand lines that are currently active on your system, enter this INFO PROCESS command for the Expand manager process $NCP:
-> INFO PROCESS $NCP, LINESET
Name Specifies the name of the object State Indicates the summary state of the object, which is either
STARTED, STARTING, DIAGNO SING (for SWAN concentrators
only), or STOPPED PPID Specifies the primary process ID BPID Specifies the backup process ID ConMgr-LDEV Contains the LDEV of the concentrator manager process. This
field applies only to SWAN concentrator lines.
-> STATUS LINE $LHCS6S, DETAIL
PPID.. .. ... ........ .. .. . ( 3, 24) BPID.. ... ........ .. . ( 2, 24)
State................... STOPPED Path LDEV........... 50
Trace Status............ OFF Clip Status......... UNLOADED
ConMgr-LDEV............. 49
Path-prim Path-alter
Communications Subsystems: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-12
Tracing a Communications Line
The system displays a listing similar to this output. The NEIGHBOR field displays the system to which a given line connects, and the STATUS field indicates whether the line is up:
Tracing a Communications Line
Use the SCF TRACE command to trace the operation of a communications line. The line continues normal operation while being traced, but it passes all its message traffic to a trace procedure. Tracing enables you to see the hi story of a commun i cations li ne, including its internal processing.
You can display trace files by using the commands available in the PTrace program. For information about PTrace, refer to the PTrace Reference Manual. For information about configuring a trace by using the SCF TRACE command, refer to the configuration and manag emen t manu al for the comm unication s subsystem you want to trace.
1-> INFO PROC ES S $NCP, LINES ET EXPAND Info PROCESS $NCP , LINESET LINESETS AT \COMM (116) #LIN ESETS=3 5 TIME: JUL 9,2001 19:28:04 LINESET NEIGHBOR LDEV TF PID LINE LDEV STATUS FileErr#
1 \CYCLONE (206) 363 200K ( 0, 287) 1 363 READY 2 \SNAX (118) 353 200K ( 5, 333) 1 353 READY 3 \TESS (194) 554 200K ( 8, 279) 1 554 READY 4 \TSII (099) 556 200K ( 2, 265) 1 556 READY 5 \ESP (163) 365 200K ( 1, 274) 1 365 READY 6 \SVLDEV (077) 538 200K ( 7, 265) 1 538 READY . . . 27 \SIERRA (012) 183 10K ( 4, 290) 1 183 READY 28 \PRUNE (175) 677 200K ( 5, 334) 1 677 READY 29 \OPMAN (252) 276 790K ( 5, 294) NPT 1 276 READY 30 \SOCIAL (045) 165 790K ( 8, 280) 1 165 READY 31 \NCCORP2 (080) 295 790K ( 8, 264) 1 295 READY 32 \CS8 (152) 323 -- -- ----­ 1 323 NOT READY (124) 33 \CORE (241) 324 -- -- ----­ 1 324 NOT READY (124) 34 \SUNTEC (062) 367 790K ( 5, 293) NPT 1 367 READY 35 \CS8 (152) 368 -- -- ----­ 1 368 NOT READY (124)
Communications Subsystems: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-13
Recovery Operations for Communications
Subsystems
Recovery Operations for Communications Subsystems
Some general troubleshooting guidelines are:
Examine the contents of the event message log for the subsystem. For example, the WAN subsystem or Kernel subsystem might have issued an event message that provides information about the process failure. Event messages returned by the WAN subsystem and SWAN concentrator are described in the WANMGR and TRAPMUX sections of the Operator Messages Manual, respectively.
HP provides a comprehensive library of troubleshooting guides for the communications subsystems. Attempt to analyze the problems and restart the process or object using the commands described in the appropriate manual listed in Related Reading on page 6-13. If you are unable to start a required process or object, contact your service provider.
Related Reading
For more information about monitoring and performing recovery operations for communications subsystems, see the manuals listed in Table 6-1. The appropriate manual to use depends on how your system is configured.
For example, if a process is configured using the SCF interface to the WAN subsystem and then reconfigured with the SCF interface to another subsystem, only the SCF interface to the other subsystem would provide current information about the configuration. The SCF interface to the WAN subsystem would provide only information about the configuration before it changed.
Table 6-1. Related Reading for Communications Lines and Devices (page 1 of 2)
For Information About... Refer to...
General information about communications subsystems
Introduction to Networking for HP NonStop NS-Series Servers
Using SCF to monitor generic processes
SCF Reference Manual for the Kernel Subsystem
Using SCF to monitor the SLSA subsystem as well as Et hernet addressable devices, such as ServerNet adapters
LAN Con fig uration and M anagement M anual
Communications Subsystems: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
6-14
Related Reading
Using SCF to monitor WAN communic ations lines for de v ic es and intersystem communications protocols
WAN Subsystem Configuration and Management Manual
Using SCF to monitor a specific device or communications protocol product; troubleshooting specific communications subsystems and protocols
Asynchronous Termin als and Printer Processes C onfiguratio n and Management Manual
ATM Ad apter Installation and Supp ort Guide ATM Configuration and Management Manual CP6100 Configuration and Management Manual EnvoyA C P/ XF C onfiguratio n and Mana gement Manual Expand Co nf iguration and M anagement Manual Fibre Cha nnel ServerN et Adapter Instal lat ion and Supp ort
Guide Gigabit Et hernet 4-Port Adapter Install at ion and Supp ort Guide P AM Configuration and Management Manual QIO Configuration and Management Manual SCF Refe rence Manua l f or H -Series RVUs ServerN et C lus t er Manual SNAX/ XF and SNA X/ APN Confi guration and M anagement
Manual SWAN Concentrator and WAN Subsystem Troubleshooting
Guide TCP/IPv6 Configuration and Management Manual TCP/IP Configuratio n and Management Ma nual Token-Ring Adapter Installation and Support Guide X25AM Configuration and Management Manual
Table 6-1. Related Reading for Communications Lines and Devices (page 2 of 2)
For Information About... Refer to...
HP Integrity NonStop NS-Series Operations Guide—529869-005
7-1
7
ServerNet Resources: Monitoring and Recovery
ServerNet Communications Network on page 7-1 System I/O ServerNet Connections on page 7-4 Monitoring the Status of the ServerNet Fabrics on page 7-4
Monitoring the ServerNet Fabrics Using OSM on page 7-5 Monitoring the ServerNet Fabrics Using SCF on page 7-6
Related Reading on page 7-8
When to Use This Section
Use this section to learn about monitoring and performing recovery operations for the internal and external ServerNet fabrics, and to understand how and when an Integrity NonStop NS-series system can be connected to legacy NonStop S-series I/O enclosures.
All Integrity NonStop system I/O is performed through the ServerNet system area network (SAN). LSU logic boards connect the SAN to the replicated four-way microprocessors on Integrity NonStop systems (except for Integrity NonStop NS1000 systems, which have no LSUs; see System I/O ServerNet Connections on page 7-4).
ServerNet Communications Network
The ServerNet communications network is a high-speed network within an Integrity NonStop system that connects processors to each other and to peripheral controllers. This network offers the connectivity of a standard network, but it does not depend on shared resources such as interprocessor buses or I/O channels. Instead, the ServerNet communications network uses the ServerNet architecture, which is wormhole-routed, full-duplex, packet-switched, and point-to-point. This network offers low latency, low software overhead, high bandwidth, and parallel operation.
In the ServerNet architecture, each processor maintains two independent paths to other processors, I/O devices, and ServerNet adapters. These dual paths can be used
Notes. Integrity NonStop NS16000 systems support connectivity to NonStop S-series I/O enclosures, Integrity NonStop NS14000 and NS1000 systems do not. For more information, see Differences Between Integrity NonSto p NS-Series Systems on page 2-2.
An Integrity NonSto p NS16000 system can be part of the same ServerNet cluster as NonStop S-series systems, an Integrity NonStop NS14000 system cannot be. For more information, see the ServerNet Cluster Supplement for Integrity NonStop NS-Series Servers.
Integrity NonStop NS1000 systems do not support ServerNet clusters.
ServerNet Resources: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
7-2
ServerNet Communications Network
simultaneously to improve performance, and to ensure that no single failure disrupts communications among the remaining system components.
A ServerNet adapter provides the interface between a ServerNet fabric and the Fibre Channel and Ethernet links. A ServerNet adapter contains a ServerNet bus interface (SBI) and one or more ServerNet addressable controllers (SACs).
Integrity NonStop NS16000 ServerNet Connectivity
An Integrity NonStop NS16000 system uses the ServerNet fabric for interconnections between the LSUs, p-switches, and IOAMs, enabling an Integrity NonStop system to be connected to legacy NonStop S-series enclosures. Figure 7-1 shows a logical representation of a complete system with the X and Y ServerNet fabrics.
Figure 7-1. Integrity NonStop NS16000 System
ServerNet Resources: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
7-3
ServerNet Communications Network
Integrity NonStop NS14000 ServerNet Connectivity
ServerNet connections between I/O devices and processors depend on whether the Integrity NonStop NS14000 system has an IOAM enclosure or VIO enclosures.
Figure 7-2 shows an NS14000 system with an IOAM enclosure. For more information
on Integrity NonStop NS14000 systems with VIO enclosures, see Integrity No nStop
NS14000 Systems on page 2-3, the NonStop NS14000 Planning Guide, or the
Versatile I/O (VIO) Manual.
Figure 7-2. Integrity NonStop NS14000 System with IOAM Enclosure
20 272625242322
21
1 2 3 4 5 6 7 8
S T Q R
1 2 3 4 5 6 7 8
S T Q R
B A
Y X
C
B A
Y X C
B A
Y X
C B A
Y X C
J1 J3 J5 J7 K1 K3 K5 K7 J0 J2 J4 J6 K0 K2 K4 K6
J1 J3 J5 J7 K1 K3 K5 K7 J0 J2 J4 J6 K0 K2 K4 K6
G4SA
FCSA
G4SA
FCSA
4PSE
4PSE
FCSA
4PSE
FCSA
4PSE
X Fabric
Y Fabr ic
VST165.vsd
IOAM
Enclosure
Blade Element A
Blade Element B
LSU
Enclo sure 0
Connec ti on to
Maintenance
Switch
Connec tion t o
Maintenance
Switch
4-Pr ocesso r, D up lex C onf i gu rat ion
4 3
2 1
4 3
2 1
4 3
2
1
4 3
2 1
B A
Y X
C
B A
Y X C
B A
Y X
C B A
Y X
C
Connec ti ons to
Maintenance
Switch
Connec ti on to 6780
Serve rNet Clust er
Switch
Connec tion to
6780 Server N et
Clus te r Switch
ServerNet Resources: Monitoring and Recovery
HP Integrity NonStop NS-Series Operations Guide—529869-005
7-4
System I/O ServerNet Connections
Integrity NonStop NS1000 ServerNet Connectivity
ServerNet connections between I/O devices and processors depend on whether the Integrity NonStop NS1000 system has an IOAM enclosure or VIO enclosures. For more information on Integrity NonStop NS1000 systems, see the NonStop NS1000
Planning Guide, NonStop NS1000 Hardware Installation Manual, or the Versatile I/O (VIO) Manual.
System I/O ServerNet Connections
For Integrity NonStop NS16000 systems, ServerNet connections to the system I/O devices (storage disk and tape drive as well as Ethernet communication to networks) radiate out from the p-switches for both the X and Y ServerNet fabrics.
ServerNet cables connected to the p-switch PICs in slots 10 through 13 come from the LSUs and processors. Cables connected to the PICs in slots 4 though 9 connect to one or more IOAM enclosures or to NonStop S-series I/O enclosures equipped with IOMF2 CRUs. Figure 7-3 shows the connections to the PICs in a fully populated p-switch.
For Integrity NonStop NS14000 systems, see Integrity NonStop NS14000 ServerNet
Connectivity on page 7-3. Like NS14000 systems, Integrity NonStop NS1000 systems
use 4PSEs to provide ServerNet connections between I/O devices and processors. However, there are no LSUs; the 4PSEs connect directly to the Blade Elements. For more information, see the NonStop NS1000 Hardware Installation Manual.
Monitoring the Status of the ServerNet Fabrics
The ServerNet fabrics provide the communication paths used for interprocessor messages, for comm unication betwee n pr ocessor s and I/O devices, and (in th e case of
Figure 7-3. I/O Connections to the PICS in a P-Switch
Loading...