Cisco ASR 9000 Serie Configuration Manuals

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

First Published: 2011-12-01
Last Modified: 2012-06-01
Americas Headquarters
Cisco Systems, Inc. 170 West Tasman Drive San Jose, CA 95134-1706 USA http://www.cisco.com Tel: 408 526-4000 800 553-NETS (6387) Fax: 408 527-0883
THE SPECIFICATIONS AND INFORMATION REGARDING THE PRODUCTS IN THIS MANUAL ARE SUBJECT TO CHANGE WITHOUT NOTICE. ALL STATEMENTS, INFORMATION, AND RECOMMENDATIONS IN THIS MANUAL ARE BELIEVED TO BE ACCURATE BUT ARE PRESENTED WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. USERS MUST TAKE FULL RESPONSIBILITY FOR THEIR APPLICATION OF ANY PRODUCTS.
THE SOFTWARE LICENSE AND LIMITED WARRANTY FOR THE ACCOMPANYING PRODUCT ARE SET FORTH IN THE INFORMATION PACKET THAT SHIPPED WITH THE PRODUCT AND ARE INCORPORATED HEREIN BY THIS REFERENCE. IF YOU ARE UNABLE TO LOCATE THE SOFTWARE LICENSE OR LIMITED WARRANTY, CONTACT YOUR CISCO REPRESENTATIVE FOR A COPY.
The Cisco implementation of TCP header compression is an adaptation of a program developed by the University of California, Berkeley (UCB) as part of UCB's public domain version of the UNIX operating system. All rights reserved. Copyright©1981, Regents of the University of California.
NOTWITHSTANDING ANY OTHER WARRANTY HEREIN, ALL DOCUMENT FILES AND SOFTWAREOF THESE SUPPLIERS ARE PROVIDED AS IS" WITH ALL FAULTS. CISCO AND THE ABOVE-NAMED SUPPLIERS DISCLAIM ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION, THOSE OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OR ARISING FROM A COURSE OF DEALING, USAGE, OR TRADE PRACTICE.
IN NO EVENT SHALL CISCO OR ITS SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, OR INCIDENTAL DAMAGES, INCLUDING, WITHOUT LIMITATION, LOST PROFITS OR LOSS OR DAMAGE TO DATA ARISING OUT OF THE USE OR INABILITY TO USE THIS MANUAL, EVEN IF CISCO OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
Any Internet Protocol (IP) addresses and phone numbers used in this document are not intended to be actual addresses and phone numbers. Any examples, command display output, network topology diagrams, and other figures included in the document are shown for illustrative purposes only. Any use of actual IP addresses or phone numbers in illustrative content is unintentional and coincidental.
Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries. To view a list of Cisco trademarks, go to this URL: http://
www.cisco.com/go/trademarks. Third-party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership
relationship between Cisco and any other company. (1110R)
©
2012 Cisco Systems, Inc. All rights reserved.

CONTENTS

Preface
CHAPTER 1
Preface xv
Changes to This Document xv
Obtaining Documentation and Submitting a Service Request xv
Implementing and Monitoring Alarms and Alarm Log Correlation 1
Prerequisites for Implementing and Monitoring Alarms and Alarm Log Correlation 2
Information About Implementing Alarms and Alarm Log Correlation 2
Alarm Logging and Debugging Event Management System 2
Correlator 3
System Logging Process 4
Alarm Logger 4
Logging Correlation 4
Correlation Rules 4
Types of Correlation 5
Application of Rules and Rule Sets 5
Root Message and Correlated Messages 5
Alarm Severity Level and Filtering 6
Bistate Alarms 6
Capacity Threshold Setting for Alarms 7
Hierarchical Correlation 7
Context Correlation Flag 7
Duration Timeout Flags 8
Reparent Flag 8
Reissue Nonbistate Flag 8
Internal Rules 9
SNMP Alarm Correlation 9
How to Implement and Monitor Alarm Management and Logging Correlation 9
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
iii
Contents
Configuring Logging Correlation Rules 9
Configuring Logging Correlation Rule Sets 10
Configuring Root-cause and Non-root-cause Alarms 11
Configuring Hierarchical Correlation Rule Flags 13
Applying Logging Correlation Rules 14
Applying Logging Correlation Rule Sets 16
Modifying Logging Events Buffer Settings 17
Modifying Logging Correlator Buffer Settings 19
Displaying Alarms by Severity and Severity Range 20
Displaying Alarms According to a Time Stamp Range 22
Displaying Alarms According to Message Group and Message Code 23
Displaying Alarms According to a First and Last Range 24
Displaying Alarms by Location 25
Displaying Alarms by Event Record ID 26
Displaying the Logging Correlation Buffer Size, Messages, and Rules 27
Clearing Alarm Event Records and Resetting Bistate Alarms 28
Defining SNMP Correlation Buffer Size 30
Defining SNMP Rulesets 31
Configuring SNMP Correlation Rules 31
Applying SNMP Correlation Rules 32
Applying SNMP Correlation Ruleset 33
Configuration Examples for Alarm Management and Logging Correlation 34
Increasing the Severity Level for Alarm Filtering to Display Fewer Events and Modifying
the Alarm Buffer Size and Capacity Threshold: Example 34
Configuring a Nonstateful Correlation Rule to Permanently Suppress Node Status Messages:
Example 34
Configuring a Stateful Correlation Rule for LINK UPDOWN and SONET ALARM Alarms:
Example 36
Additional References 37
CHAPTER 2
iv
Configuring and Managing Embedded Event Manager Policies 41
Prerequisites for Configuring and Managing Embedded Event Manager Policies 42
Information About Configuring and Managing Embedded Event Manager Policies 42
Event Management 42
System Event Detection 42
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
Contents
Policy-Based Event Response 43
Reliability Metrics 43
System Event Processing 43
Embedded Event Manager Management Policies 43
Embedded Event Manager Scripts and the Scripting Interface (Tcl) 44
Script Language 45
Regular Embedded Event Manager Scripts 45
Embedded Event Manager Callback Scripts 46
Embedded Event Manager Policy Tcl Command Extension Categories 46
Cisco File Naming Convention for Embedded Event Manager 47
Embedded Event Manager Built-in Actions 48
Application-specific Embedded Event Management 49
Event Detection and Recovery 49
General Flow of EEM Event Detection and Recovery 49
System Manager Event Detector 50
Timer Services Event Detector 51
Syslog Event Detector 51
None Event Detector 51
Watchdog System Monitor Event Detector 52
Distributed Event Detectors 53
Embedded Event Manager Event Scheduling and Notification 53
Reliability Statistics 53
Hardware Card Reliability Metric Data 53
Process Reliability Metric Data 54
How to Configure and Manage Embedded Event Manager Policies 55
Configuring Environmental Variables 55
Environment Variables 55
Registering Embedded Event Manager Policies 56
Embedded Event Manager Policies 56
How to Write Embedded Event Manager Policies Using Tcl 59
Registering and Defining an EEM Tcl Script 59
Displaying EEM Registered Policies 61
Unregistering EEM Policies 61
Suspending EEM Policy Execution 62
Managing EEM Policies 63
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
v
Contents
Displaying Software Modularity Process Reliability Metrics Using EEM 64
Sample EEM Policies 64
Programming EEM Policies with Tcl 66
Tcl Policy Structure and Requirements 66
EEM Entry Status 68
EEM Exit Status 68
EEM Policies and Cisco Error Number 69
_cerrno: 32-Bit Error Return Values 69
Error Class Encodings for XY 70
Creating an EEM User Tcl Library Index 74
Creating an EEM User Tcl Package Index 77
Configuration Examples for Event Management Policies 80
Environmental Variables Configuration: Example 80
User-Defined Embedded Event Manager Policy Registration: Example 80
Display Available Policies: Example 81
Display Embedded Event Manager Process: Example 81
Configuration Examples for Writing Embedded Event Manager Policies Using Tcl 82
EEM Event Detector Demo: Example 82
EEM Sample Policy Descriptions 82
Event Manager Environment Variables for the Sample Policies 82
Registration of Some EEM Policies 84
Basic Configuration Details for All Sample Policies 85
Using the Sample Policies 85
Running the sl_intf_down.tcl Sample Policy 85
Running the tm_cli_cmd.tcl Sample Policy 86
Running the tm_crash_reporter.tcl Sample Policy 86
Running the tm_fsys_usage.tcl Sample Policy 87
Programming Policies with Tcl: Sample Scripts Example 87
tm_cli_cmd.tcl Sample Policy 87
sl_intf_down.tcl Sample Policy 90
Tracing Tcl set Command Operations: Example 92
Additional References 92
Embedded Event Manager Policy Tcl Command Extension Reference 93
Embedded Event Manager Event Registration Tcl Command Extensions 94
event_register_appl 94
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
vi
Contents
event_register_cli 95
event_register_config 97
event_register_counter 98
event_register_hardware 99
event_register_none 101
event_register_oir 102
event_register_process 103
event_register_snmp 105
event_register_snmp_notification 108
event_register_stat 109
event_register_syslog 112
event_register_timer 114
event_register_timer_subscriber 119
event_register_track 121
event_register_wdsysmon 123
Embedded Event Manager Event Information Tcl Command Extension 129
event_reqinfo 129
event_reqinfo_multi 145
Embedded Event Manager Event Publish Tcl Command Extension 145
event_publish appl 145
Embedded Event Manager Multiple Event Support Tcl Command Extensions 148
Attribute 148
Correlate 148
Trigger 149
Embedded Event Manager Action Tcl Command Extensions 150
action_process 150
action_program 152
action_script 153
action_setver_prior 153
action_setnode 154
action_syslog 154
action_track_read 155
Embedded Event Manager Utility Tcl Command Extensions 156
appl_read 156
appl_reqinfo 157
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
vii
Contents
appl_setinfo 157
counter_modify 158
fts_get_stamp 159
register_counter 160
register_timer 161
timer_arm 163
timer_cancel 165
unregister_counter 166
Embedded Event Manager System Information Tcl Command Extensions 167
sys_reqinfo_cpu_all 167
sys_reqinfo_crash_history 168
sys_reqinfo_mem_all 169
sys_reqinfo_proc 171
sys_reqinfo_proc_all 173
sys_reqinfo_proc_version 173
sys_reqinfo_routername 174
sys_reqinfo_syslog_freq 174
sys_reqinfo_syslog_history 175
sys_reqinfo_stat 176
sys_reqinfo_snmp 177
sys_reqinfo_snmp_trap 178
sys_reqinfo_snmp_trapvar 178
SMTP Library Command Extensions 178
smtp_send_email 179
smtp_subst 180
CLI Library Command Extensions 181
cli_close 181
cli_exec 182
cli_get_ttyname 182
cli_open 183
viii
cli_read 183
cli_read_drain 184
cli_read_line 184
cli_read_pattern 185
cli_write 186
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
Contents
Tcl Context Library Command Extensions 189
context_retrieve 189
context_save 192
CHAPTER 3
Implementing IP Service Level Agreements 195
Prerequisites for Implementing IP Service Level Agreements 196
Restrictions for Implementing IP Service Level Agreements 196
Information About Implementing IP Service Level Agreements 198
About IP Service Level Agreements Technology 198
Service Level Agreements 198
Benefits of IP Service Level Agreements 200
Measuring Network Performance with IP Service Level Agreements 200
Operation Types for IP Service Level Agreements 202
IP SLA Responder and IP SLA Control Protocol 203
Response Time Computation for IP SLA 204
IP SLA VRF Support 204
IP SLA Operation Scheduling 205
IP SLAProactive Threshold Monitoring 205
IP SLA Reaction Configuration 205
IP SLA Threshold Monitoring and Notifications 205
MPLS LSP Monitoring 205
How MPLS LSP Monitoring Works 206
BGP Next-hop Neighbor Discovery 206
IP SLA LSP Ping and LSP Traceroute Operations 207
Proactive Threshold Monitoring for MPLS LSP Monitoring 208
Multi-operation Scheduling for the LSP Health Monitor 208
LSP Path Discovery 208
How to Implement IP Service Level Agreements 209
Configuring IP Service Levels Using the UDP Jitter Operation 209
Enabling the IP SLA Responder on the Destination Device 209
Configuring and Scheduling a UDP Jitter Operation on the Source Device 210
Prerequisites for Configuring a UDP Jitter Operation on the Source Device 212
Configuring and Scheduling a Basic UDP Jitter Operation on the Source Device 212
Configuring and Scheduling a UDP Jitter Operation with Additional Characteristics 214
Configuring the IP SLA for a UDP Echo Operation 219
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
ix
Contents
Prerequisites for Configuring a UDP Echo Operation on the Source Device 219
Configuring and Scheduling a UDP Echo Operation on the Source Device 219
Configuring and Scheduling a UDP Echo Operation with Optional Parameters on the
Source Device 222
Configuring an ICMP Echo Operation 226
Configuring and Scheduling a Basic ICMP Echo Operation on the Source Device 226
Configuring and Scheduling an ICMP Echo Operation with Optional Parameters on
the Source Device 229
Configuring the ICMP Path-echo Operation 232
Configuring and Scheduling a Basic ICMP Path-echo Operation on the Source Device
232
Configuring and Scheduling an ICMP Path-echo Operation with Optional Parameters
on the Source Device 235
Configuring the ICMP Path-jitter Operation 238
Configuring and Scheduling a Basic ICMP Path-jitter Operation 239
Configuring and Scheduling an ICMP Path-jitter Operation with Additional
Parameters 242
Configuring IP SLA MPLS LSP Ping and Trace Operations 246
Configuring and Scheduling an MPLS LSP Ping Operation 246
Configuring and Scheduling an MPLS LSP Trace Operation 250
Configuring IP SLA Reactions and Threshold Monitoring 254
Configuring Monitored Elements for IP SLA Reactions 254
Configuring Triggers for Connection-Loss Violations 254
Configuring Triggers for Jitter Violations 255
Configuring Triggers for Packet Loss Violations 255
Configuring Triggers for Round-Trip Violations 256
Configuring Triggers for Timeout Violations 257
Configuring Triggers for Verify Error Violations 258
Configuring Threshold Violation Types for IP SLA Reactions 259
Generating Events for Each Violation 260
Generating Events for Consecutive Violations 260
Generating Events for X of Y Violations 261
Generating Events for Averaged Violations 262
Specifying Reaction Events 263
Configuring the MPLS LSP Monitoring Instance on a Source PE Router 265
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
x
Contents
Configuring an MPLS LSP Monitoring Ping Instance 265
Configuring an MPLS LSP Monitoring Trace Instance 269
Configuring the Reaction Conditions for an MPLS LSP Monitoring Instance on a Source PE
Router 273
Scheduling an MPLS LSP Monitoring Instance on a Source PE Router 275
LSP Path Discovery 276
Configuring tracking type (rtr) 279
Configuration Examples for Implementing IP Service Level Agreements 280
Configuring IP Service Level Agreements: Example 280
Configuring IP SLA Reactions and Threshold Monitoring: Example 281
Configuring IP SLA MPLS LSP Monitoring: Example 282
Configuring LSP Path Discovery: Example 282
CHAPTER 4
Additional References 282
Implementing Logging Services 285
Prerequisites for Implementing Logging Services 285
Information About Implementing Logging Services 286
System Logging Process 286
Format of System Logging Messages 286
Duplicate Message Suppression 287
Interruption of Message Suppression 287
Syslog Message Destinations 288
Guidelines for Sending Syslog Messages to Destinations Other Than the Console 289
Logging for the Current Terminal Session 289
Syslog Messages Sent to Syslog Servers 289
UNIX System Logging Facilities 289
Hostname Prefix Logging 290
Syslog Source Address Logging 291
UNIX Syslog Daemon Configuration 291
Archiving Logging Messages on a Local Storage Device 291
Setting Archive Attributes 291
Archive Storage Directories 292
Severity Levels 292
Logging History Table 293
Syslog Message Severity Level Definitions 294
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
xi
Contents
Syslog Severity Level Command Defaults 294
How to Implement Logging Services 295
Setting Up Destinations for System Logging Messages 295
Configuring Logging to a Remote Server 296
Configuring the Settings for the Logging History Table 297
Modifying Logging to the Console Terminal and the Logging Buffer 298
Modifying the Format of Time Stamps 299
Disabling Time Stamps 301
Suppressing Duplicate Syslog Messages 302
Disabling the Logging of Link-Status Syslog Messages 302
Displaying System Logging Messages 303
Archiving System Logging Messages to a Local Storage Device 304
Configuration Examples for Implementing Logging Services 306
CHAPTER 5
Configuring Logging to the Console Terminal and the Logging Buffer: Example 306
Setting Up Destinations for Syslog Messages: Example 307
Configuring the Settings for the Logging History Table: Example 307
Modifying Time Stamps: Example 307
Configuring a Logging Archive: Example 307
Where to Go Next 308
Additional References 308
Onboard Failure Logging 311
Prerequisites 312
Information About Implementing OBFL 312
Data Collection Types 312
Baseline Data Collection 312
Event-Driven Data Collection 313
Supported Cards and Platforms 314
How to Implement OBFL 314
xii
Enabling or Disabling OBFL 315
Configuring Message Severity Levels 316
Monitoring and Maintaining OBFL 316
Clearing OBFL Data 317
Configuration Examples for OBFL 318
Enabling and Disabling OBFL: Example 318
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
Contents
Configuring Message Severity Levels: Example 318
Clearing OBFL Messages: Example 319
Displaying OBFL Data: Example 319
Where to Go Next 319
Additional References 319
CHAPTER 6
Implementing Performance Management 323
Prerequisites for Implementing Performance Management 324
Information About Implementing Performance Management 324
PM Functional Overview 324
PM Statistics Server 324
PM Statistics Collector 324
PM Benefits 325
PM Statistics Collection Overview 326
PM Statistics Collection Templates 326
Guidelines for Creating PM Statistics Collection Templates 327
Guidelines for Enabling and Disabling PM Statistics Collection Templates 327
Exporting Statistics Data 328
Binary File Format 328
Binary File ID Assignments for Entity, Subentity, and StatsCounter Names 329
Filenaming Convention Applied to Binary Files 333
PM Entity Instance Monitoring Overview 333
PM Threshold Monitoring Overview 337
Guidelines for Creating PM Threshold Monitoring Templates 337
Guidelines for Enabling and Disabling PM Threshold Monitoring Templates 350
How to Implement Performance Management 351
Configuring an External TFTP Server for PM Statistic Collections 351
Configuring Local Disk Dump for PM Statistics Collections 351
Configuring Instance Filtering by Regular-expression 352
Creating PM Statistics Collection Templates 353
Enabling and Disabling PM Statistics Collection Templates 354
Enabling PM Entity Instance Monitoring 356
Creating PM Threshold Monitoring Templates 356
Enabling and Disabling PM Threshold Monitoring Templates 357
Configuration Examples for Implementing Performance Management 359
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
xiii
Contents
Creating and Enabling PM Statistics Collection Templates: Example 359
Creating and Enabling PM Threshold Monitoring Templates: Example 360
Additional References 360
xiv
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Preface

From Release 6.1.1 onwards, Cisco introduces support for the 64-bit Linux-based IOS XR operating system. Extensive feature parity is maintained between the 32-bit and 64-bit environments. Unless explicitly marked otherwise, the contents of this document are applicable for both the environments. For more details on Cisco IOS XR 64 bit, refer to the Release Notes for Cisco ASR 9000 Series Routers, Release 6.1.1 document.
The Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide preface contains these sections:
Changes to This Document, page xv
Obtaining Documentation and Submitting a Service Request, page xv

Changes to This Document

This table lists the technical changes made to this document since it was first printed.
Table 1: Changes to This Document
Change SummaryDateRevision
June 2012OL-26513-02
Republished for Cisco IOS XR Release 4.2.1.
Initial release of this document.December 2011OL-26513-01

Obtaining Documentation and Submitting a Service Request

For information on obtaining documentation, using the Cisco Bug Search Tool (BST), submitting a service request, and gathering additional information, see What's New in Cisco Product Documentation.
To receive new and revised Cisco technical content directly to your desktop, you can subscribe to the What's
New in Cisco Product Documentation RSS feed. RSS feeds are a free service.
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
xv
Obtaining Documentation and Submitting a Service Request
Preface
xvi
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
CHAPTER 1

Implementing and Monitoring Alarms and Alarm Log Correlation

This module describes the concepts and tasks related to configuring alarm log correlation and monitoring alarm logs and correlated event records. Alarm log correlation extends system logging to include the ability to group and filter messages generated by various applications and system servers and to isolate root messages on the router.
This module describes the new and revised tasks you need to perform to implement logging correlation and monitor alarms on your network.
Note
For more information about system logging on Cisco IOS XR Software and complete descriptions of the alarm management and logging correlation commands listed in this module, see the Related Documents,
on page 37 section of this module.
To locate documentation for other commands that might appear in the course of performing a configuration task, search online in the Cisco ASR 9000 Series Aggregation Services Router Commands Master List.
Feature History for Implementing and Monitoring Alarms and Alarm Log Correlation
ModificationRelease
This feature was introduced.Release 3.7.2
SNMP alarm correlation feature was added.Release 3.8.0
Prerequisites for Implementing and Monitoring Alarms and Alarm Log Correlation, page 2
Information About Implementing Alarms and Alarm Log Correlation, page 2
How to Implement and Monitor Alarm Management and Logging Correlation, page 9
Configuration Examples for Alarm Management and Logging Correlation, page 34
Additional References, page 37
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
1
Implementing and Monitoring Alarms and Alarm Log Correlation

Prerequisites for Implementing and Monitoring Alarms and Alarm Log Correlation

Prerequisites for Implementing and Monitoring Alarms and Alarm Log Correlation
You must be in a user group associated with a task group that includes the proper task IDs. The command reference guides include the task IDs required for each command. If you suspect user group assignment is preventing you from using a command, contact your AAA administrator for assistance.

Information About Implementing Alarms and Alarm Log Correlation

Alarm Logging and Debugging Event Management System

Cisco IOS XR Software Alarm Logging and Debugging Event Management System (ALDEMS) is used to monitor and store alarm messages that are forwarded by system servers and applications. In addition, ALDEMS correlates alarm messages forwarded due to a single root cause.
ALDEMS enlarges on the basic logging and monitoring functionality of Cisco IOS XR Software, providing the level of alarm and event management necessary for a highly distributed system .
Cisco IOS XR Software achieves this necessary level of alarm and event management by distributing logging applications across the nodes on the system.
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
2
Implementing and Monitoring Alarms and Alarm Log Correlation
Figure 1: ALDEMS Component Communications, on page 3 illustrates the relationship between the
components that constitute ALDEMS.
Figure 1: ALDEMS Component Communications
Alarm Logging and Debugging Event Management System
Correlator
The correlator receives messages from system logging (syslog) helper processes that are distributed across the nodes on the router and forwards syslog messages to the syslog process. If a logging correlation rule is configured, the correlator captures messages searching for a match with any message specified in the rule. If the correlator finds a match, it starts a timer that corresponds to the timeout interval specified in the rule. The correlator continues searching for a match to messages in the rule until the timer expires. If the root case message was received, then a correlation occurs; otherwise, all captured messages are forwarded to the syslog. When a correlation occurs, the correlated messages are stored in the logging correlation buffer. The correlator tags each set of correlated messages with a correlation ID.
For more information about logging correlation, see the Logging Correlation, on page 4 section.Note
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
3

Logging Correlation

System Logging Process
By default, routers are configured to send system logging messages to a system logging (syslog) process. Syslog messages are gathered by syslog helper processes that are distributed across the nodes on the system. The system logging process controls the distribution of logging messages to the various destinations, such as the system logging buffer, the console, terminal lines, or a syslog server, depending on the network device configuration.
Alarm Logger
The alarm logger is the final destination for system logging messages forwarded on the router. The alarm logger stores alarm messages in the logging events buffer. The logging events buffer is circular; that is, when full, it overwrites the oldest messages in the buffer.
Implementing and Monitoring Alarms and Alarm Log Correlation
Note
Alarms are prioritized in the logging events buffer. When it is necessary to overwrite an alarm record, the logging events buffer overwrites messages in the following order: nonbistate alarms first, then bistate alarms in the CLEAR state, and, finally, bistate alarms in the SET state. For more information about bistate alarms, see the Bistate Alarms, on page 6 section.
When the table becomes full of messages caused by bistate alarms in the SET state, the earliest bistate message (based on the message time stamp, not arrival time) is reclaimed before others. The buffer size for the logging events buffer and the logging correlation buffer, thus, should be adjusted so that memory consumption is within your requirements.
A table-full alarm is generated each time the logging events buffer wraps around. A threshold crossing notification is generated each time the logging events buffer reaches the capacity threshold.
Messages stored in the logging events buffer can be queried by clients to locate records matching specific criteria. The alarm logging mechanism assigns a sequential, unique ID to each alarm message.
Logging Correlation
Logging correlation can be used to isolate the most significant root messages for events affecting system performance. For example, the original message describing a card online insertion and removal (OIR) of a card can be isolated so that only the root-cause message is displayed and all subsequent messages related to the same event are correlated. When correlation rules are configured, a common root event that is generating secondary (non-root-cause) messages can be isolated and sent to the syslog, while secondary messages are suppressed. An operator can retrieve all correlated messages from the logging correlator buffer to view correlation events that have occurred.
Correlation Rules
Correlation rules can be configured to isolate root messages that may generate system alarms. Correlation rules prevent unnecessary stress on ALDEMS caused by the accumulation of unnecessary messages. Each correlation rule hinges on a message identification, consisting of a message category, message group name, and message code. The correlator process scans messages for occurrences of the message.
If the correlator receives a root message, the correlator stores it in the logging correlator buffer and forwards it to the syslog process on the RP. From there, the syslog process forwards the root message to the alarm
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
4
Implementing and Monitoring Alarms and Alarm Log Correlation
logger in which it is stored in the logging events buffer. From the syslog process, the root message may also be forwarded to destinations such as the console, remote terminals, remote servers, the fault management system, and the Simple Network Management Protocol (SNMP) agent, depending on the network device configuration. Subsequent messages meeting the same criteria (including another occurrence of the root message) are stored in the logging correlation buffer and are forwarded to the syslog process on the router.
If a message matches multiple correlation rules, all matching rules apply and the message becomes a part of all matching correlation queues in the logging correlator buffer.
The following message fields are used to define a message in a logging correlation rule:
Message category
Message group
Message code
Wildcards can be used for any of the message fields to cover wider set of messages. Configure the appropriate set of messages in a logging correlation rule configuration to achieve correlation with a narrow or wide scope (depending on your objective).

Application of Rules and Rule Sets

Types of Correlation
There are two types of correlation that are configured in rules to isolate root-cause messages:
Nonstateful Correlation—This correlation is fixed after it has occurred, and non-root-cause alarms that are suppressed are never forwarded to the syslog process. All non-root-cause alarms remain buffered in correlation buffers.
Stateful Correlation—This correlation can change after it has occurred, if the bistate root-cause alarm clears. When the alarm clears, all the correlated non-root-cause alarms are sent to syslog and are removed from the correlation buffer. Stateful correlations are useful to detect non-root-cause conditions that continue to exist even if the suspected root cause no longer exists.
Application of Rules and Rule Sets
If a correlation rule is applied to the entire router, then correlation takes place only for those messages that match the configured cause values for the rule, regardless of the context or location setting of that message.
If a correlation rule is applied to a specific set of contexts or locations, then correlation takes place only for those messages that match the configured cause values for the rule and that match at least one of those contexts or locations.
In the case of a rule-set application, the behavior is the same; however, the apply configuration takes place for all rules that are part of the given rule set.
The show logging correlator rule command is used to display apply settings for a given rule, including those settings that have been configured with the logging correlator apply ruleset command.
Root Message and Correlated Messages
When a correlation rule is configured and applied, the correlator starts searching for a message match as specified in the rule. After a match is found, the correlator starts a timer corresponding to the timeout interval
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
5

Alarm Severity Level and Filtering

that is also specified in the rule. A message search for a match continues until the timer expires. Correlation occurs after the root-cause message is received.
The first message (with category, group, and code triplet) configured in a correlation rule defines the root-cause message. A root-cause message is always forwarded to the syslog process. See the Correlation Rules, on page
4 section to learn how the root-cause message is forwarded and stored.
Alarm Severity Level and Filtering
Filter settings can be used to display information based on severity level. The alarm filter display indicates the severity level settings used to report alarms, the number of records, and the current and maximum log size.
Alarms can be filtered according to the severity level shown in this table.
Table 2: Alarm Severity Levels for Event Logging
Implementing and Monitoring Alarms and Alarm Log Correlation
System ConditionSeverity Level

Bistate Alarms

Bistate alarms are generated by state changes associated with system hardware, such as a change of interface state from active to inactive, the online insertion and removal (OIR) of a card , or a change in component temperature. Bistate alarm events are reported to the logging events buffer by default; informational and debug messages are not.
Cisco IOS XR Software software provides the ability to reset and clear alarms. Clients interested in monitoring alarms in the system can register with the alarm logging mechanism to receive asynchronous notifications when a monitored alarm changes state.
Bistate alarm notifications provide the following information:
Emergencies0
Alerts1
Critical2
Errors3
Warnings4
Notifications5
Informational6
The alarm state, which may be in the set state or the clear state.
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
6
Implementing and Monitoring Alarms and Alarm Log Correlation

Capacity Threshold Setting for Alarms

The capacity threshold setting determines when the alarm system begins reporting threshold crossing alarms. The capacity threshold for generating warning alarms is generally set at 80 percent of buffer capacity, but individual configurations may require different settings.

Hierarchical Correlation

Hierarchical correlation takes effect when the following conditions are true:
When a single alarm is both a root cause for one rule and a non-root cause for another rule.
When alarms are generated that result in successful correlations associated with both rules.
The following example illustrates two hierarchical correlation rules:
Capacity Threshold Setting for Alarms
CodeGroupCategoryRule 1
Note
Code 1Group 1Cat 1Root Cause 1
Code 2Group 2Cat 2Non-root Cause 2
Rule 2
Code 2Group 2Cat 2Root Cause 2
Code 3Group 3Cat 3Non-root Cause 3
If three alarms are generated for Cause 1, 2, and 3, with all alarms arriving within their respective correlation timeout periods, then the hierarchical correlation appears like this:
Cause 1 -> Cause 2 -> Cause 3
The correlation buffers show two separate correlations: one for Cause 1 and Cause 2 and the second for Cause 2 and Cause 3. However, the hierarchical relationship is implicitly defined.
Stateful behavior, such as reparenting and reissuing of alarms, is supported for rules that are defined as stateful; that is, correlations that can change.

Context Correlation Flag

The context correlation flag allows correlations to take place on a per contextbasis or not.
This flag causes behavior change only if the rule is applied to one or more contexts. It does not go into effect if the rule is applied to the entire router or location nodes.
The following is a scenario of context correlation behavior:
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
7

Duration Timeout Flags

Rule 1 has a root cause A and an associated non-root cause.
Context correlation flag is not set on Rule 1.
Rule 1 is applied to contexts 1 and 2.
If the context correlation flag is not set on Rule 1, a scenario in which alarm A generated from context 1 and alarm B generated from context 2 results in the rule applying to both contexts regardless of the type of context.
If the context correlation flag is now set on Rule 1 and the same alarms are generated, they are not correlated as they are from different contexts.
With the flag set, the correlator analyzes alarms against the rule only if alarms arrive from the same context. In other words, if alarm A is generated from context 1 and alarm B is generated from context 2, then a correlation does not occur.
Duration Timeout Flags
The root-cause timeout (if specified) is the alternative rule timeout to use in the situation in which a non-root-cause alarm arrives before a root-cause alarm in the given rule. It is typically used to give a shorter timeout in a situation under the assumption that it is less likely that the root-cause alarm arrives, and, therefore, releases the hold on the non-root-cause alarms sooner.
Implementing and Monitoring Alarms and Alarm Log Correlation

Reparent Flag

The reparent flag specifies what happens to non-root-cause alarms in a hierarchical correlation when their immediate root cause clears.
The following example illustrates context correlation behavior:
Rule 1 has a root cause A and an associated non-root cause B
Context correlation flag is not set on Rule 1
Rule 1 is applied to contexts 1 and 2
In this scenario, if alarm A arrives generated from context 1 and alarm B generated from context 2, then a correlation occursregardless of context.
If the context correlation flag is now set on Rule 1 and the same alarms are generated, they are not correlated, because they are from different contexts.

Reissue Nonbistate Flag

The reissue nonbistate flag controls whether nonbistate alarms (events) are forwarded from the correlator log if their parent bistate root-cause alarm clears. Active bistate non-root-causes are always forwarded in this situation, because the condition is still present.
The reissue-nonbistate flag allows you to control whether non-bistate alarms are forwarded.
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
8
Implementing and Monitoring Alarms and Alarm Log Correlation

Internal Rules

Internal rules are defined on Cisco IOS XR Software and are used by protocols and processes within Cisco IOS XR Software. These rules are not customer configurable, but you may view them by using the show logging correlator rule command. All internal rule names are prefixed with [INTERNAL].

SNMP Alarm Correlation

In large-scale systems, such as Cisco IOS XR multi-chassis system , there may be situations when you encounter many SNMP traps emitted at regular intervals of time. These traps, in turn, cause additional time in the Cisco IOS XR processing of traps.
The additional traps can also slow down troubleshooting and increases workload for the monitoring systems and the operators. So, this feature addresses these issues.
The objective of this SNMP alarm correlation feature is to:
Extract the generic pieces of correlation functionality from the existing syslog correlator
Internal Rules
Create DLLs and APIs suitable for reusing the functionality in other components
Integrate the SNMP agent with the DLLs to enable SNMP trap correlation

How to Implement and Monitor Alarm Management and Logging Correlation

Configuring Logging Correlation Rules

This task explains how to configure logging correlation rules.
The purpose of configuring logging correlation rules is to define the root cause and non-root-cause alarm messages (with message category, group, and code combinations) for logging correlation. The originating root-cause alarm message is forwarded to the syslog process, and all subsequent (non-root-cause) alarm messages are sent to the logging correlation buffer.
The fields inside a message that can be used for configuring correlation rules are as follows:
Message category (for example, PKT_INFRA, MGBL, OS)
Message group (for example, LINK, LINEPROTO, or OIR)
Message code (for example, UPDOWN or GO_ACTIVE).
The logging correlator mechanism, running on the active route processor, begins queueing messages matching the ones specified in the correlation rules for the time specified in the timeout interval of the correlation rule.
The timeout interval begins when the correlator captures any alarm message specified for a given rule.
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
9

Configuring Logging Correlation Rule Sets

SUMMARY STEPS
1.
2.
3.
4.
5.
DETAILED STEPS
Implementing and Monitoring Alarms and Alarm Log Correlation
configure
logging correlator rule correlation-rule { type { stateful | nonstateful }}
timeout [ milliseconds ]
commit
show logging correlator rule {all | correlation-rule1 ... correlation-rule14 } [ context context1 ... context
6 ] [ location node-id1...node-id6 ] [ rulesource { internal | user }] [ ruletype { nonstateful | stateful }] [ summary | detail ]
PurposeCommand or Action
Step 1
Step 2
Step 3
Step 4
Step 5
configure
logging correlator rule correlation-rule { type { stateful | nonstateful }}
Example:
RP/0/RSP0/CPU0:router(config)# logging correlator
rule rule_stateful
timeout [ milliseconds ]
Example:
RP/0/RSP0/CPU0:router(config-corr-rule-st)# timeout 60000
commit
show logging correlator rule {all | correlation-rule1 ... correlation-rule14 } [ context context1 ... context 6 ] [ location node-id1...node-id6 ] [ rulesource { internal | user }] [ ruletype { nonstateful | stateful }] [ summary | detail ]
Example:
Configures a logging correlation rule.
Stateful correlations can change specifically if the
root-cause alarm is bistate.
Nonstate correlations cannot change. All
non-root-cause alarms remain in the correlation buffers.
Specifies the collection period duration time for the logging correlator rule message.
Timeout begins when the first alarm message
identified by the correlation rule is logged.
(Optional) Displays defined correlation rules.
The output describes the configuration of each rule
name, including the message category, group, and code information.
RP/0/RSP0/CPU0:router# show logging correlator rule all
Configuring Logging Correlation Rule Sets
This task explains how to configure logging correlation rule sets.
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
10
Implementing and Monitoring Alarms and Alarm Log Correlation
SUMMARY STEPS
configure
1.
logging correlator ruleset ruleset
2.
rulename rulename
3.
commit
4.
show logging correlator ruleset { all | correlation-ruleset1...correlation-ruleset14 } [ detail | summary
5.
]
DETAILED STEPS

Configuring Root-cause and Non-root-cause Alarms

PurposeCommand or Action
Step 1
Step 2
Step 3
Step 4
Step 5
configure
logging correlator ruleset ruleset
Example:
RP/0/RSP0/CPU0:router(config)# logging correlator ruleset ruleset1
rulename rulename
Example:
RP/0/RSP0/CPU0:router(config-corr-ruleset)# rulename stateful_rule
commit
show logging correlator ruleset { all | correlation-ruleset1...correlation-ruleset14 } [ detail | summary ]
Example:
RP/0/RSP0/CPU0:router# show logging correlator ruleset all
Configures a logging correlation rule set.
Configures a rule name.
(Optional) Displays defined correlation rule sets.
Configuring Root-cause and Non-root-cause Alarms
To correlate a root cause to one or more non-root-cause alarms and configure them to a rule, use the rootcause and nonrootcause commands specified for the correlation rule.
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
11
Configuring Root-cause and Non-root-cause Alarms
SUMMARY STEPS
configure
1.
logging correlator rule correlation-rule { type { stateful | nonstateful }}
2.
rootcause { msg-category group-name msg-code }
3.
nonrootcause
4.
alarm msg-category group-name msg-code
5.
commit
6.
show logging correlator rule { all | correlation-rule1...correlation-rule14 } [ context context1...context
7.
6 ] [ location node-id1...node-id6 ] [ rulesource { internal | user }] [ ruletype { nonstateful | stateful }] [ summary | detail ]
DETAILED STEPS
Implementing and Monitoring Alarms and Alarm Log Correlation
PurposeCommand or Action
Step 1
Step 2
Step 3
Step 4
Step 5
configure
logging correlator rule correlation-rule { type { stateful | nonstateful }}
Example:
RP/0/RSP0/CPU0:router(config)# logging correlator
rule rule_stateful
rootcause { msg-category group-name msg-code }
Example:
RP/0/RSP0/CPU0:router(config-corr-rule-st)# rootcause CAT_BI_1 GROUP_BI_1 CODE_BI_1
Example:
RP/0/RSP0/CPU0:router(config-corr-rule-st)# nonrootcause
alarm msg-category group-name msg-code
Example:
RP/0/RSP0/CPU0:router(config-corr-rule-st-nonrc)#
alarm CAT_BI_2 GROUP_BI_2 CODE_BI_2
Configures a logging correlation rule and enters submodes for stateful and nonstateful rule types.
Stateful correlations can change specifically if the
root-cause alarm is bistate.
Nonstate correlations cannot change. All
non-root-cause alarms remain in the correlation buffers.
Configures a root-cause alarm message.
This example specifies a root-cause alarm under
stateful configuration mode
Enters the non-root-cause configuration modenonrootcause
Specifies a non-root-cause alarm message.
This command can be issued with the
nonrootcause command, such as
nonrootcause alarm msg-category group-name
msg-code
Step 6
12
commit
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
Implementing and Monitoring Alarms and Alarm Log Correlation

Configuring Hierarchical Correlation Rule Flags

PurposeCommand or Action
Step 7
(Optional) Displays the correlator rules that are defined.show logging correlator rule { all |
correlation-rule1...correlation-rule14 } [ context context1...context 6 ] [ location node-id1...node-id6 ] [
rulesource { internal | user }] [ ruletype { nonstateful | stateful }] [ summary | detail ]
Example:
RP/0/RSP0/CPU0:router# show logging correlator rule
all
Configuring Hierarchical Correlation Rule Flags
Hierarchical correlation is when a single alarm is both a root cause for one correlation rule and a non-root cause for another rule, and when alarms are generated resulting in a successful correlation associated with both rules. What happens to a non-root-cause alarm hinges on the behavior of its correlated root-cause alarm.
There are cases in which you want to control the stateful behavior associated with these hierarchies and to implement flags, such as reparenting and reissuing of nonbistate alarms. This task explains how to implement these flags.
See the Reparent Flag, on page 8 and Reissue Nonbistate Flag, on page 8 sections for detailed information about these flags.
SUMMARY STEPS
DETAILED STEPS
Step 1
Step 2
configure
logging correlator rule correlation-rule { type { stateful | nonstateful }}
configure
1.
logging correlator rule correlation-rule { type { stateful | nonstateful }}
2.
reissue-nonbistate
3.
reparent
4.
commit
5.
show logging correlator rule { all | correlation-rule1...correlation-rule14 } [ context context1...context
6.
6 ] [ location node-id1...node-id6 ] [ rulesource { internal | user }] [ ruletype { nonstateful | stateful }] [ summary | detail ]
PurposeCommand or Action
Configures a logging correlation rule.
Stateful correlations can change specifically if
the root-cause alarm is bistate.
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
13

Applying Logging Correlation Rules

Example:
RP/0/RSP0/CPU0:router(config)# logging correlator rule rule_stateful type nonstateful
Step 3
Step 4
reissue-nonbistate
Example:
RP/0/RSP0/CPU0:router(config-corr-rule-st)# reissue-nonbistate
reparent
Example:
RP/0/RSP0/CPU0:router(config-corr-rule-st)# reparent
Implementing and Monitoring Alarms and Alarm Log Correlation
PurposeCommand or Action
Nonstateful correlations cannot change. All
non-root-cause alarms remain in the correlation buffers.
Issues nonbistate alarm messages (events) from the correlator log after its root-cause alarm clears.
Specifies the behavior of non-root-cause alarms after a root-cause parent clears.
Step 5
Step 6
commit
show logging correlator rule { all |
correlation-rule1...correlation-rule14 } [ context context1...context 6 ] [ location node-id1...node-id6 ] [
rulesource { internal | user }] [ ruletype { nonstateful | stateful }] [ summary | detail ]
Example:
RP/0/RSP0/CPU0:router# show logging correlator rule
all
What to Do Next
To activate a defined correlation rule and rule set, you must apply them by using the logging correlator apply rule and logging correlator apply ruleset commands.
Applying Logging Correlation Rules
This task explains how to apply logging correlation rules.
Applying a correlation rule activates it and gives a scope. A single correlation rule can be applied to multiple scopes on the router; that is, a rule can be applied to the entire router, to several locations, or to several contexts.
(Optional) Displays the correlator rules that are defined.
Note
When a rule is applied or if a rule set that contains this rule is applied, then the rule definition cannot be modified through the configuration until the rule or rule set is once again unapplied.
Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x
14
Loading...
+ 348 hidden pages