Cisco ASR 9000 Serie Configuration Manuals

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

First Published: 2011-12-01

Last Modified: 2012-06-01

Americas Headquarters

Cisco Systems, Inc. 170 West Tasman Drive San Jose, CA 95134-1706 USA http://www.cisco.com Tel: 408 526-4000 800 553-NETS (6387) Fax: 408 527-0883

THE SPECIFICATIONS AND INFORMATION REGARDING THE PRODUCTS IN THIS MANUAL ARE SUBJECT TO CHANGE WITHOUT NOTICE. ALL STATEMENTS, INFORMATION, AND RECOMMENDATIONS IN THIS MANUAL ARE BELIEVED TO BE ACCURATE BUT ARE PRESENTED WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. USERS MUST TAKE FULL RESPONSIBILITY FOR THEIR APPLICATION OF ANY PRODUCTS.

THE SOFTWARE LICENSE AND LIMITED WARRANTY FOR THE ACCOMPANYING PRODUCT ARE SET FORTH IN THE INFORMATION PACKET THAT SHIPPED WITH THE PRODUCT AND ARE INCORPORATED HEREIN BY THIS REFERENCE. IF YOU ARE UNABLE TO LOCATE THE SOFTWARE LICENSE OR LIMITED WARRANTY, CONTACT YOUR CISCO REPRESENTATIVE FOR A COPY.

The Cisco implementation of TCP header compression is an adaptation of a program developed by the University of California, Berkeley (UCB) as part of UCB's public domain version of the UNIX operating system. All rights reserved. Copyright©1981, Regents of the University of California.

NOTWITHSTANDING ANY OTHER WARRANTY HEREIN, ALL DOCUMENT FILES AND SOFTWAREOF THESE SUPPLIERS ARE PROVIDED “AS IS" WITH ALL FAULTS. CISCO AND THE ABOVE-NAMED SUPPLIERS DISCLAIM ALL WARRANTIES, EXPRESSED OR IMPLIED, INCLUDING, WITHOUT LIMITATION, THOSE OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OR ARISING FROM A COURSE OF DEALING, USAGE, OR TRADE PRACTICE.

IN NO EVENT SHALL CISCO OR ITS SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, OR INCIDENTAL DAMAGES, INCLUDING, WITHOUT LIMITATION, LOST PROFITS OR LOSS OR DAMAGE TO DATA ARISING OUT OF THE USE OR INABILITY TO USE THIS MANUAL, EVEN IF CISCO OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

Any Internet Protocol (IP) addresses and phone numbers used in this document are not intended to be actual addresses and phone numbers. Any examples, command display output, network topology diagrams, and other figures included in the document are shown for illustrative purposes only. Any use of actual IP addresses or phone numbers in illustrative content is unintentional and coincidental.

Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries. To view a list of Cisco trademarks, go to this URL: http://

www.cisco.com/go/trademarks. Third-party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership

relationship between Cisco and any other company. (1110R)

Preface

CHAPTER 1

Preface xv

Changes to This Document xv

Obtaining Documentation and Submitting a Service Request xv

Implementing and Monitoring Alarms and Alarm Log Correlation 1

Prerequisites for Implementing and Monitoring Alarms and Alarm Log Correlation 2

Information About Implementing Alarms and Alarm Log Correlation 2

Alarm Logging and Debugging Event Management System 2

Correlator 3

System Logging Process 4

Alarm Logger 4

Logging Correlation 4

Correlation Rules 4

Types of Correlation 5

Application of Rules and Rule Sets 5

Root Message and Correlated Messages 5

Alarm Severity Level and Filtering 6

Bistate Alarms 6

Capacity Threshold Setting for Alarms 7

Hierarchical Correlation 7

Context Correlation Flag 7

Duration Timeout Flags 8

Reparent Flag 8

Reissue Nonbistate Flag 8

Internal Rules 9

SNMP Alarm Correlation 9

How to Implement and Monitor Alarm Management and Logging Correlation 9

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

iii

Contents

Configuring Logging Correlation Rules 9

Configuring Logging Correlation Rule Sets 10

Configuring Root-cause and Non-root-cause Alarms 11

Configuring Hierarchical Correlation Rule Flags 13

Applying Logging Correlation Rules 14

Applying Logging Correlation Rule Sets 16

Modifying Logging Events Buffer Settings 17

Modifying Logging Correlator Buffer Settings 19

Displaying Alarms by Severity and Severity Range 20

Displaying Alarms According to a Time Stamp Range 22

Displaying Alarms According to Message Group and Message Code 23

Displaying Alarms According to a First and Last Range 24

Displaying Alarms by Location 25

Displaying Alarms by Event Record ID 26

Displaying the Logging Correlation Buffer Size, Messages, and Rules 27

Clearing Alarm Event Records and Resetting Bistate Alarms 28

Defining SNMP Correlation Buffer Size 30

Defining SNMP Rulesets 31

Configuring SNMP Correlation Rules 31

Applying SNMP Correlation Rules 32

Applying SNMP Correlation Ruleset 33

Configuration Examples for Alarm Management and Logging Correlation 34

Increasing the Severity Level for Alarm Filtering to Display Fewer Events and Modifying

the Alarm Buffer Size and Capacity Threshold: Example 34

Configuring a Nonstateful Correlation Rule to Permanently Suppress Node Status Messages:

Example 34

Configuring a Stateful Correlation Rule for LINK UPDOWN and SONET ALARM Alarms:

Example 36

Additional References 37

CHAPTER 2

Configuring and Managing Embedded Event Manager Policies 41

Prerequisites for Configuring and Managing Embedded Event Manager Policies 42

Information About Configuring and Managing Embedded Event Manager Policies 42

Event Management 42

System Event Detection 42

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Contents

Policy-Based Event Response 43

Reliability Metrics 43

System Event Processing 43

Embedded Event Manager Management Policies 43

Embedded Event Manager Scripts and the Scripting Interface (Tcl) 44

Script Language 45

Regular Embedded Event Manager Scripts 45

Embedded Event Manager Callback Scripts 46

Embedded Event Manager Policy Tcl Command Extension Categories 46

Cisco File Naming Convention for Embedded Event Manager 47

Embedded Event Manager Built-in Actions 48

Application-specific Embedded Event Management 49

Event Detection and Recovery 49

General Flow of EEM Event Detection and Recovery 49

System Manager Event Detector 50

Timer Services Event Detector 51

Syslog Event Detector 51

None Event Detector 51

Watchdog System Monitor Event Detector 52

Distributed Event Detectors 53

Embedded Event Manager Event Scheduling and Notification 53

Reliability Statistics 53

Hardware Card Reliability Metric Data 53

Process Reliability Metric Data 54

How to Configure and Manage Embedded Event Manager Policies 55

Configuring Environmental Variables 55

Environment Variables 55

Registering Embedded Event Manager Policies 56

Embedded Event Manager Policies 56

How to Write Embedded Event Manager Policies Using Tcl 59

Registering and Defining an EEM Tcl Script 59

Displaying EEM Registered Policies 61

Unregistering EEM Policies 61

Suspending EEM Policy Execution 62

Managing EEM Policies 63

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Contents

Displaying Software Modularity Process Reliability Metrics Using EEM 64

Sample EEM Policies 64

Programming EEM Policies with Tcl 66

Tcl Policy Structure and Requirements 66

EEM Entry Status 68

EEM Exit Status 68

EEM Policies and Cisco Error Number 69

_cerrno: 32-Bit Error Return Values 69

Error Class Encodings for XY 70

Creating an EEM User Tcl Library Index 74

Creating an EEM User Tcl Package Index 77

Configuration Examples for Event Management Policies 80

Environmental Variables Configuration: Example 80

User-Defined Embedded Event Manager Policy Registration: Example 80

Display Available Policies: Example 81

Display Embedded Event Manager Process: Example 81

Configuration Examples for Writing Embedded Event Manager Policies Using Tcl 82

EEM Event Detector Demo: Example 82

EEM Sample Policy Descriptions 82

Event Manager Environment Variables for the Sample Policies 82

Registration of Some EEM Policies 84

Basic Configuration Details for All Sample Policies 85

Using the Sample Policies 85

Running the sl_intf_down.tcl Sample Policy 85

Running the tm_cli_cmd.tcl Sample Policy 86

Running the tm_crash_reporter.tcl Sample Policy 86

Running the tm_fsys_usage.tcl Sample Policy 87

Programming Policies with Tcl: Sample Scripts Example 87

tm_cli_cmd.tcl Sample Policy 87

sl_intf_down.tcl Sample Policy 90

Tracing Tcl set Command Operations: Example 92

Additional References 92

Embedded Event Manager Policy Tcl Command Extension Reference 93

Embedded Event Manager Event Registration Tcl Command Extensions 94

event_register_appl 94

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Contents

event_register_cli 95

event_register_config 97

event_register_counter 98

event_register_hardware 99

event_register_none 101

event_register_oir 102

event_register_process 103

event_register_snmp 105

event_register_snmp_notification 108

event_register_stat 109

event_register_syslog 112

event_register_timer 114

event_register_timer_subscriber 119

event_register_track 121

event_register_wdsysmon 123

Embedded Event Manager Event Information Tcl Command Extension 129

event_reqinfo 129

event_reqinfo_multi 145

Embedded Event Manager Event Publish Tcl Command Extension 145

event_publish appl 145

Embedded Event Manager Multiple Event Support Tcl Command Extensions 148

Attribute 148

Correlate 148

Trigger 149

Embedded Event Manager Action Tcl Command Extensions 150

action_process 150

action_program 152

action_script 153

action_setver_prior 153

action_setnode 154

action_syslog 154

action_track_read 155

Embedded Event Manager Utility Tcl Command Extensions 156

appl_read 156

appl_reqinfo 157

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

vii

Contents

appl_setinfo 157

counter_modify 158

fts_get_stamp 159

register_counter 160

register_timer 161

timer_arm 163

timer_cancel 165

unregister_counter 166

Embedded Event Manager System Information Tcl Command Extensions 167

sys_reqinfo_cpu_all 167

sys_reqinfo_crash_history 168

sys_reqinfo_mem_all 169

sys_reqinfo_proc 171

sys_reqinfo_proc_all 173

sys_reqinfo_proc_version 173

sys_reqinfo_routername 174

sys_reqinfo_syslog_freq 174

sys_reqinfo_syslog_history 175

sys_reqinfo_stat 176

sys_reqinfo_snmp 177

sys_reqinfo_snmp_trap 178

sys_reqinfo_snmp_trapvar 178

SMTP Library Command Extensions 178

smtp_send_email 179

smtp_subst 180

CLI Library Command Extensions 181

cli_close 181

cli_exec 182

cli_get_ttyname 182

cli_open 183

viii

cli_read 183

cli_read_drain 184

cli_read_line 184

cli_read_pattern 185

cli_write 186

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Contents

Tcl Context Library Command Extensions 189

context_retrieve 189

context_save 192

CHAPTER 3

Implementing IP Service Level Agreements 195

Prerequisites for Implementing IP Service Level Agreements 196

Restrictions for Implementing IP Service Level Agreements 196

Information About Implementing IP Service Level Agreements 198

About IP Service Level Agreements Technology 198

Service Level Agreements 198

Benefits of IP Service Level Agreements 200

Measuring Network Performance with IP Service Level Agreements 200

Operation Types for IP Service Level Agreements 202

IP SLA Responder and IP SLA Control Protocol 203

Response Time Computation for IP SLA 204

IP SLA VRF Support 204

IP SLA Operation Scheduling 205

IP SLA—Proactive Threshold Monitoring 205

IP SLA Reaction Configuration 205

IP SLA Threshold Monitoring and Notifications 205

MPLS LSP Monitoring 205

How MPLS LSP Monitoring Works 206

BGP Next-hop Neighbor Discovery 206

IP SLA LSP Ping and LSP Traceroute Operations 207

Proactive Threshold Monitoring for MPLS LSP Monitoring 208

Multi-operation Scheduling for the LSP Health Monitor 208

LSP Path Discovery 208

How to Implement IP Service Level Agreements 209

Configuring IP Service Levels Using the UDP Jitter Operation 209

Enabling the IP SLA Responder on the Destination Device 209

Configuring and Scheduling a UDP Jitter Operation on the Source Device 210

Prerequisites for Configuring a UDP Jitter Operation on the Source Device 212

Configuring and Scheduling a Basic UDP Jitter Operation on the Source Device 212

Configuring and Scheduling a UDP Jitter Operation with Additional Characteristics 214

Configuring the IP SLA for a UDP Echo Operation 219

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Contents

Prerequisites for Configuring a UDP Echo Operation on the Source Device 219

Configuring and Scheduling a UDP Echo Operation on the Source Device 219

Configuring and Scheduling a UDP Echo Operation with Optional Parameters on the

Source Device 222

Configuring an ICMP Echo Operation 226

Configuring and Scheduling a Basic ICMP Echo Operation on the Source Device 226

Configuring and Scheduling an ICMP Echo Operation with Optional Parameters on

the Source Device 229

Configuring the ICMP Path-echo Operation 232

Configuring and Scheduling a Basic ICMP Path-echo Operation on the Source Device

232

Configuring and Scheduling an ICMP Path-echo Operation with Optional Parameters

on the Source Device 235

Configuring the ICMP Path-jitter Operation 238

Configuring and Scheduling a Basic ICMP Path-jitter Operation 239

Configuring and Scheduling an ICMP Path-jitter Operation with Additional

Parameters 242

Configuring IP SLA MPLS LSP Ping and Trace Operations 246

Configuring and Scheduling an MPLS LSP Ping Operation 246

Configuring and Scheduling an MPLS LSP Trace Operation 250

Configuring IP SLA Reactions and Threshold Monitoring 254

Configuring Monitored Elements for IP SLA Reactions 254

Configuring Triggers for Connection-Loss Violations 254

Configuring Triggers for Jitter Violations 255

Configuring Triggers for Packet Loss Violations 255

Configuring Triggers for Round-Trip Violations 256

Configuring Triggers for Timeout Violations 257

Configuring Triggers for Verify Error Violations 258

Configuring Threshold Violation Types for IP SLA Reactions 259

Generating Events for Each Violation 260

Generating Events for Consecutive Violations 260

Generating Events for X of Y Violations 261

Generating Events for Averaged Violations 262

Specifying Reaction Events 263

Configuring the MPLS LSP Monitoring Instance on a Source PE Router 265

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Contents

Configuring an MPLS LSP Monitoring Ping Instance 265

Configuring an MPLS LSP Monitoring Trace Instance 269

Configuring the Reaction Conditions for an MPLS LSP Monitoring Instance on a Source PE

Router 273

Scheduling an MPLS LSP Monitoring Instance on a Source PE Router 275

LSP Path Discovery 276

Configuring tracking type (rtr) 279

Configuration Examples for Implementing IP Service Level Agreements 280

Configuring IP Service Level Agreements: Example 280

Configuring IP SLA Reactions and Threshold Monitoring: Example 281

Configuring IP SLA MPLS LSP Monitoring: Example 282

Configuring LSP Path Discovery: Example 282

CHAPTER 4

Additional References 282

Implementing Logging Services 285

Prerequisites for Implementing Logging Services 285

Information About Implementing Logging Services 286

System Logging Process 286

Format of System Logging Messages 286

Duplicate Message Suppression 287

Interruption of Message Suppression 287

Syslog Message Destinations 288

Guidelines for Sending Syslog Messages to Destinations Other Than the Console 289

Logging for the Current Terminal Session 289

Syslog Messages Sent to Syslog Servers 289

UNIX System Logging Facilities 289

Hostname Prefix Logging 290

Syslog Source Address Logging 291

UNIX Syslog Daemon Configuration 291

Archiving Logging Messages on a Local Storage Device 291

Setting Archive Attributes 291

Archive Storage Directories 292

Severity Levels 292

Logging History Table 293

Syslog Message Severity Level Definitions 294

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Contents

Syslog Severity Level Command Defaults 294

How to Implement Logging Services 295

Setting Up Destinations for System Logging Messages 295

Configuring Logging to a Remote Server 296

Configuring the Settings for the Logging History Table 297

Modifying Logging to the Console Terminal and the Logging Buffer 298

Modifying the Format of Time Stamps 299

Disabling Time Stamps 301

Suppressing Duplicate Syslog Messages 302

Disabling the Logging of Link-Status Syslog Messages 302

Displaying System Logging Messages 303

Archiving System Logging Messages to a Local Storage Device 304

Configuration Examples for Implementing Logging Services 306

CHAPTER 5

Configuring Logging to the Console Terminal and the Logging Buffer: Example 306

Setting Up Destinations for Syslog Messages: Example 307

Configuring the Settings for the Logging History Table: Example 307

Modifying Time Stamps: Example 307

Configuring a Logging Archive: Example 307

Where to Go Next 308

Additional References 308

Onboard Failure Logging 311

Prerequisites 312

Information About Implementing OBFL 312

Data Collection Types 312

Baseline Data Collection 312

Event-Driven Data Collection 313

Supported Cards and Platforms 314

How to Implement OBFL 314

xii

Enabling or Disabling OBFL 315

Configuring Message Severity Levels 316

Monitoring and Maintaining OBFL 316

Clearing OBFL Data 317

Configuration Examples for OBFL 318

Enabling and Disabling OBFL: Example 318

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Contents

Configuring Message Severity Levels: Example 318

Clearing OBFL Messages: Example 319

Displaying OBFL Data: Example 319

Where to Go Next 319

Additional References 319

CHAPTER 6

Implementing Performance Management 323

Prerequisites for Implementing Performance Management 324

Information About Implementing Performance Management 324

PM Functional Overview 324

PM Statistics Server 324

PM Statistics Collector 324

PM Benefits 325

PM Statistics Collection Overview 326

PM Statistics Collection Templates 326

Guidelines for Creating PM Statistics Collection Templates 327

Guidelines for Enabling and Disabling PM Statistics Collection Templates 327

Exporting Statistics Data 328

Binary File Format 328

Binary File ID Assignments for Entity, Subentity, and StatsCounter Names 329

Filenaming Convention Applied to Binary Files 333

PM Entity Instance Monitoring Overview 333

PM Threshold Monitoring Overview 337

Guidelines for Creating PM Threshold Monitoring Templates 337

Guidelines for Enabling and Disabling PM Threshold Monitoring Templates 350

How to Implement Performance Management 351

Configuring an External TFTP Server for PM Statistic Collections 351

Configuring Local Disk Dump for PM Statistics Collections 351

Configuring Instance Filtering by Regular-expression 352

Creating PM Statistics Collection Templates 353

Enabling and Disabling PM Statistics Collection Templates 354

Enabling PM Entity Instance Monitoring 356

Creating PM Threshold Monitoring Templates 356

Enabling and Disabling PM Threshold Monitoring Templates 357

Configuration Examples for Implementing Performance Management 359

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

xiii

Contents

Creating and Enabling PM Statistics Collection Templates: Example 359

Creating and Enabling PM Threshold Monitoring Templates: Example 360

Additional References 360

xiv

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Preface

From Release 6.1.1 onwards, Cisco introduces support for the 64-bit Linux-based IOS XR operating system. Extensive feature parity is maintained between the 32-bit and 64-bit environments. Unless explicitly marked otherwise, the contents of this document are applicable for both the environments. For more details on Cisco IOS XR 64 bit, refer to the Release Notes for Cisco ASR 9000 Series Routers, Release 6.1.1 document.

The Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide preface contains these sections:

Changes to This Document, page xv

•

Obtaining Documentation and Submitting a Service Request, page xv

•

Changes to This Document

This table lists the technical changes made to this document since it was first printed.

Table 1: Changes to This Document

Change SummaryDateRevision

June 2012OL-26513-02

Republished for Cisco IOS XR Release 4.2.1.

Initial release of this document.December 2011OL-26513-01

Obtaining Documentation and Submitting a Service Request

For information on obtaining documentation, using the Cisco Bug Search Tool (BST), submitting a service request, and gathering additional information, see What's New in Cisco Product Documentation.

To receive new and revised Cisco technical content directly to your desktop, you can subscribe to the What's

New in Cisco Product Documentation RSS feed. RSS feeds are a free service.

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Obtaining Documentation and Submitting a Service Request

Preface

xvi

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

CHAPTER 1

Implementing and Monitoring Alarms and Alarm Log Correlation

This module describes the concepts and tasks related to configuring alarm log correlation and monitoring alarm logs and correlated event records. Alarm log correlation extends system logging to include the ability to group and filter messages generated by various applications and system servers and to isolate root messages on the router.

This module describes the new and revised tasks you need to perform to implement logging correlation and monitor alarms on your network.

Note

For more information about system logging on Cisco IOS XR Software and complete descriptions of the alarm management and logging correlation commands listed in this module, see the Related Documents,

on page 37 section of this module.

To locate documentation for other commands that might appear in the course of performing a configuration task, search online in the Cisco ASR 9000 Series Aggregation Services Router Commands Master List.

Feature History for Implementing and Monitoring Alarms and Alarm Log Correlation

ModificationRelease

This feature was introduced.Release 3.7.2

SNMP alarm correlation feature was added.Release 3.8.0

Prerequisites for Implementing and Monitoring Alarms and Alarm Log Correlation, page 2

•

Information About Implementing Alarms and Alarm Log Correlation, page 2

•

How to Implement and Monitor Alarm Management and Logging Correlation, page 9

•

Configuration Examples for Alarm Management and Logging Correlation, page 34

•

Additional References, page 37

•

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Implementing and Monitoring Alarms and Alarm Log Correlation

Prerequisites for Implementing and Monitoring Alarms and Alarm Log Correlation

You must be in a user group associated with a task group that includes the proper task IDs. The command reference guides include the task IDs required for each command. If you suspect user group assignment is preventing you from using a command, contact your AAA administrator for assistance.

Information About Implementing Alarms and Alarm Log Correlation

Alarm Logging and Debugging Event Management System

Cisco IOS XR Software Alarm Logging and Debugging Event Management System (ALDEMS) is used to monitor and store alarm messages that are forwarded by system servers and applications. In addition, ALDEMS correlates alarm messages forwarded due to a single root cause.

ALDEMS enlarges on the basic logging and monitoring functionality of Cisco IOS XR Software, providing the level of alarm and event management necessary for a highly distributed system .

Cisco IOS XR Software achieves this necessary level of alarm and event management by distributing logging applications across the nodes on the system.

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Implementing and Monitoring Alarms and Alarm Log Correlation

Figure 1: ALDEMS Component Communications, on page 3 illustrates the relationship between the

components that constitute ALDEMS.

Figure 1: ALDEMS Component Communications

Alarm Logging and Debugging Event Management System

Correlator

The correlator receives messages from system logging (syslog) helper processes that are distributed across the nodes on the router and forwards syslog messages to the syslog process. If a logging correlation rule is configured, the correlator captures messages searching for a match with any message specified in the rule. If the correlator finds a match, it starts a timer that corresponds to the timeout interval specified in the rule. The correlator continues searching for a match to messages in the rule until the timer expires. If the root case message was received, then a correlation occurs; otherwise, all captured messages are forwarded to the syslog. When a correlation occurs, the correlated messages are stored in the logging correlation buffer. The correlator tags each set of correlated messages with a correlation ID.

For more information about logging correlation, see the Logging Correlation, on page 4 section.Note

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Logging Correlation

System Logging Process

By default, routers are configured to send system logging messages to a system logging (syslog) process. Syslog messages are gathered by syslog helper processes that are distributed across the nodes on the system. The system logging process controls the distribution of logging messages to the various destinations, such as the system logging buffer, the console, terminal lines, or a syslog server, depending on the network device configuration.

Alarm Logger

The alarm logger is the final destination for system logging messages forwarded on the router. The alarm logger stores alarm messages in the logging events buffer. The logging events buffer is circular; that is, when full, it overwrites the oldest messages in the buffer.

Implementing and Monitoring Alarms and Alarm Log Correlation

Note

Alarms are prioritized in the logging events buffer. When it is necessary to overwrite an alarm record, the logging events buffer overwrites messages in the following order: nonbistate alarms first, then bistate alarms in the CLEAR state, and, finally, bistate alarms in the SET state. For more information about bistate alarms, see the Bistate Alarms, on page 6 section.

When the table becomes full of messages caused by bistate alarms in the SET state, the earliest bistate message (based on the message time stamp, not arrival time) is reclaimed before others. The buffer size for the logging events buffer and the logging correlation buffer, thus, should be adjusted so that memory consumption is within your requirements.

A table-full alarm is generated each time the logging events buffer wraps around. A threshold crossing notification is generated each time the logging events buffer reaches the capacity threshold.

Messages stored in the logging events buffer can be queried by clients to locate records matching specific criteria. The alarm logging mechanism assigns a sequential, unique ID to each alarm message.

Logging Correlation

Logging correlation can be used to isolate the most significant root messages for events affecting system performance. For example, the original message describing a card online insertion and removal (OIR) of a card can be isolated so that only the root-cause message is displayed and all subsequent messages related to the same event are correlated. When correlation rules are configured, a common root event that is generating secondary (non-root-cause) messages can be isolated and sent to the syslog, while secondary messages are suppressed. An operator can retrieve all correlated messages from the logging correlator buffer to view correlation events that have occurred.

Correlation Rules

Correlation rules can be configured to isolate root messages that may generate system alarms. Correlation rules prevent unnecessary stress on ALDEMS caused by the accumulation of unnecessary messages. Each correlation rule hinges on a message identification, consisting of a message category, message group name, and message code. The correlator process scans messages for occurrences of the message.

If the correlator receives a root message, the correlator stores it in the logging correlator buffer and forwards it to the syslog process on the RP. From there, the syslog process forwards the root message to the alarm

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Implementing and Monitoring Alarms and Alarm Log Correlation

logger in which it is stored in the logging events buffer. From the syslog process, the root message may also be forwarded to destinations such as the console, remote terminals, remote servers, the fault management system, and the Simple Network Management Protocol (SNMP) agent, depending on the network device configuration. Subsequent messages meeting the same criteria (including another occurrence of the root message) are stored in the logging correlation buffer and are forwarded to the syslog process on the router.

If a message matches multiple correlation rules, all matching rules apply and the message becomes a part of all matching correlation queues in the logging correlator buffer.

The following message fields are used to define a message in a logging correlation rule:

Message category

•

Message group

•

Message code

•

Wildcards can be used for any of the message fields to cover wider set of messages. Configure the appropriate set of messages in a logging correlation rule configuration to achieve correlation with a narrow or wide scope (depending on your objective).

Application of Rules and Rule Sets

Types of Correlation

There are two types of correlation that are configured in rules to isolate root-cause messages:

Nonstateful Correlation—This correlation is fixed after it has occurred, and non-root-cause alarms that are suppressed are never forwarded to the syslog process. All non-root-cause alarms remain buffered in correlation buffers.

Stateful Correlation—This correlation can change after it has occurred, if the bistate root-cause alarm clears. When the alarm clears, all the correlated non-root-cause alarms are sent to syslog and are removed from the correlation buffer. Stateful correlations are useful to detect non-root-cause conditions that continue to exist even if the suspected root cause no longer exists.

Application of Rules and Rule Sets

If a correlation rule is applied to the entire router, then correlation takes place only for those messages that match the configured cause values for the rule, regardless of the context or location setting of that message.

If a correlation rule is applied to a specific set of contexts or locations, then correlation takes place only for those messages that match the configured cause values for the rule and that match at least one of those contexts or locations.

In the case of a rule-set application, the behavior is the same; however, the apply configuration takes place for all rules that are part of the given rule set.

The show logging correlator rule command is used to display apply settings for a given rule, including those settings that have been configured with the logging correlator apply ruleset command.

Root Message and Correlated Messages

When a correlation rule is configured and applied, the correlator starts searching for a message match as specified in the rule. After a match is found, the correlator starts a timer corresponding to the timeout interval

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Alarm Severity Level and Filtering

that is also specified in the rule. A message search for a match continues until the timer expires. Correlation occurs after the root-cause message is received.

The first message (with category, group, and code triplet) configured in a correlation rule defines the root-cause message. A root-cause message is always forwarded to the syslog process. See the Correlation Rules, on page

4 section to learn how the root-cause message is forwarded and stored.

Alarm Severity Level and Filtering

Filter settings can be used to display information based on severity level. The alarm filter display indicates the severity level settings used to report alarms, the number of records, and the current and maximum log size.

Alarms can be filtered according to the severity level shown in this table.

Table 2: Alarm Severity Levels for Event Logging

Implementing and Monitoring Alarms and Alarm Log Correlation

System ConditionSeverity Level

Bistate Alarms

Bistate alarms are generated by state changes associated with system hardware, such as a change of interface state from active to inactive, the online insertion and removal (OIR) of a card , or a change in component temperature. Bistate alarm events are reported to the logging events buffer by default; informational and debug messages are not.

Cisco IOS XR Software software provides the ability to reset and clear alarms. Clients interested in monitoring alarms in the system can register with the alarm logging mechanism to receive asynchronous notifications when a monitored alarm changes state.

Bistate alarm notifications provide the following information:

Emergencies0

Alerts1

Critical2

Errors3

Warnings4

Notifications5

Informational6

The alarm state, which may be in the set state or the clear state.

•

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Implementing and Monitoring Alarms and Alarm Log Correlation

Capacity Threshold Setting for Alarms

The capacity threshold setting determines when the alarm system begins reporting threshold crossing alarms. The capacity threshold for generating warning alarms is generally set at 80 percent of buffer capacity, but individual configurations may require different settings.

Hierarchical Correlation

Hierarchical correlation takes effect when the following conditions are true:

When a single alarm is both a root cause for one rule and a non-root cause for another rule.

•

When alarms are generated that result in successful correlations associated with both rules.

•

The following example illustrates two hierarchical correlation rules:

Capacity Threshold Setting for Alarms

CodeGroupCategoryRule 1

Note

Code 1Group 1Cat 1Root Cause 1

Code 2Group 2Cat 2Non-root Cause 2

Rule 2

Code 2Group 2Cat 2Root Cause 2

Code 3Group 3Cat 3Non-root Cause 3

If three alarms are generated for Cause 1, 2, and 3, with all alarms arriving within their respective correlation timeout periods, then the hierarchical correlation appears like this:

Cause 1 -> Cause 2 -> Cause 3

The correlation buffers show two separate correlations: one for Cause 1 and Cause 2 and the second for Cause 2 and Cause 3. However, the hierarchical relationship is implicitly defined.

Stateful behavior, such as reparenting and reissuing of alarms, is supported for rules that are defined as stateful; that is, correlations that can change.

Context Correlation Flag

The context correlation flag allows correlations to take place on a “per context” basis or not.

This flag causes behavior change only if the rule is applied to one or more contexts. It does not go into effect if the rule is applied to the entire router or location nodes.

The following is a scenario of context correlation behavior:

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Duration Timeout Flags

Rule 1 has a root cause A and an associated non-root cause.

•

Context correlation flag is not set on Rule 1.

•

Rule 1 is applied to contexts 1 and 2.

•

If the context correlation flag is not set on Rule 1, a scenario in which alarm A generated from context 1 and alarm B generated from context 2 results in the rule applying to both contexts regardless of the type of context.

If the context correlation flag is now set on Rule 1 and the same alarms are generated, they are not correlated as they are from different contexts.

With the flag set, the correlator analyzes alarms against the rule only if alarms arrive from the same context. In other words, if alarm A is generated from context 1 and alarm B is generated from context 2, then a correlation does not occur.

Duration Timeout Flags

The root-cause timeout (if specified) is the alternative rule timeout to use in the situation in which a non-root-cause alarm arrives before a root-cause alarm in the given rule. It is typically used to give a shorter timeout in a situation under the assumption that it is less likely that the root-cause alarm arrives, and, therefore, releases the hold on the non-root-cause alarms sooner.

Implementing and Monitoring Alarms and Alarm Log Correlation

Reparent Flag

The reparent flag specifies what happens to non-root-cause alarms in a hierarchical correlation when their immediate root cause clears.

The following example illustrates context correlation behavior:

Rule 1 has a root cause A and an associated non-root cause B

•

Context correlation flag is not set on Rule 1

•

Rule 1 is applied to contexts 1 and 2

•

In this scenario, if alarm A arrives generated from context 1 and alarm B generated from context 2, then a correlation occurs—regardless of context.

If the context correlation flag is now set on Rule 1 and the same alarms are generated, they are not correlated, because they are from different contexts.

Reissue Nonbistate Flag

The reissue nonbistate flag controls whether nonbistate alarms (events) are forwarded from the correlator log if their parent bistate root-cause alarm clears. Active bistate non-root-causes are always forwarded in this situation, because the condition is still present.

The reissue-nonbistate flag allows you to control whether non-bistate alarms are forwarded.

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Implementing and Monitoring Alarms and Alarm Log Correlation

Internal Rules

Internal rules are defined on Cisco IOS XR Software and are used by protocols and processes within Cisco IOS XR Software. These rules are not customer configurable, but you may view them by using the show logging correlator rule command. All internal rule names are prefixed with [INTERNAL].

SNMP Alarm Correlation

In large-scale systems, such as Cisco IOS XR multi-chassis system , there may be situations when you encounter many SNMP traps emitted at regular intervals of time. These traps, in turn, cause additional time in the Cisco IOS XR processing of traps.

The additional traps can also slow down troubleshooting and increases workload for the monitoring systems and the operators. So, this feature addresses these issues.

The objective of this SNMP alarm correlation feature is to:

Extract the generic pieces of correlation functionality from the existing syslog correlator

•

Internal Rules

Create DLLs and APIs suitable for reusing the functionality in other components

•

Integrate the SNMP agent with the DLLs to enable SNMP trap correlation

•

How to Implement and Monitor Alarm Management and Logging Correlation

Configuring Logging Correlation Rules

This task explains how to configure logging correlation rules.

The purpose of configuring logging correlation rules is to define the root cause and non-root-cause alarm messages (with message category, group, and code combinations) for logging correlation. The originating root-cause alarm message is forwarded to the syslog process, and all subsequent (non-root-cause) alarm messages are sent to the logging correlation buffer.

The fields inside a message that can be used for configuring correlation rules are as follows:

Message category (for example, PKT_INFRA, MGBL, OS)

•

Message group (for example, LINK, LINEPROTO, or OIR)

•

Message code (for example, UPDOWN or GO_ACTIVE).

•

The logging correlator mechanism, running on the active route processor, begins queueing messages matching the ones specified in the correlation rules for the time specified in the timeout interval of the correlation rule.

The timeout interval begins when the correlator captures any alarm message specified for a given rule.

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Configuring Logging Correlation Rule Sets

SUMMARY STEPS

DETAILED STEPS

Implementing and Monitoring Alarms and Alarm Log Correlation

configure

logging correlator rule correlation-rule { type { stateful | nonstateful }}

timeout [ milliseconds ]

commit

show logging correlator rule {all | correlation-rule1 ... correlation-rule14 } [ context context1 ... context

6 ] [ location node-id1...node-id6 ] [ rulesource { internal | user }] [ ruletype { nonstateful | stateful }] [ summary | detail ]

PurposeCommand or Action

Step 1

Step 2

Step 3

Step 4

Step 5

configure

logging correlator rule correlation-rule { type { stateful | nonstateful }}

Example:

RP/0/RSP0/CPU0:router(config)# logging correlator

rule rule_stateful

timeout [ milliseconds ]

Example:

RP/0/RSP0/CPU0:router(config-corr-rule-st)# timeout 60000

commit

show logging correlator rule {all | correlation-rule1 ... correlation-rule14 } [ context context1 ... context 6 ] [ location node-id1...node-id6 ] [ rulesource { internal | user }] [ ruletype { nonstateful | stateful }] [ summary | detail ]

Example:

Configures a logging correlation rule.

Stateful correlations can change specifically if the

•

root-cause alarm is bistate.

Nonstate correlations cannot change. All

•

non-root-cause alarms remain in the correlation buffers.

Specifies the collection period duration time for the logging correlator rule message.

Timeout begins when the first alarm message

•

identified by the correlation rule is logged.

(Optional) Displays defined correlation rules.

The output describes the configuration of each rule

•

name, including the message category, group, and code information.

RP/0/RSP0/CPU0:router# show logging correlator rule all

Configuring Logging Correlation Rule Sets

This task explains how to configure logging correlation rule sets.

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Implementing and Monitoring Alarms and Alarm Log Correlation

SUMMARY STEPS

configure

logging correlator ruleset ruleset

rulename rulename

commit

show logging correlator ruleset { all | correlation-ruleset1...correlation-ruleset14 } [ detail | summary

]

DETAILED STEPS

Configuring Root-cause and Non-root-cause Alarms

PurposeCommand or Action

Step 1

Step 2

Step 3

Step 4

Step 5

configure

logging correlator ruleset ruleset

Example:

RP/0/RSP0/CPU0:router(config)# logging correlator ruleset ruleset1

rulename rulename

Example:

RP/0/RSP0/CPU0:router(config-corr-ruleset)# rulename stateful_rule

commit

show logging correlator ruleset { all | correlation-ruleset1...correlation-ruleset14 } [ detail | summary ]

Example:

RP/0/RSP0/CPU0:router# show logging correlator ruleset all

Configures a logging correlation rule set.

Configures a rule name.

(Optional) Displays defined correlation rule sets.

Configuring Root-cause and Non-root-cause Alarms

To correlate a root cause to one or more non-root-cause alarms and configure them to a rule, use the rootcause and nonrootcause commands specified for the correlation rule.

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Configuring Root-cause and Non-root-cause Alarms

SUMMARY STEPS

configure

logging correlator rule correlation-rule { type { stateful | nonstateful }}

rootcause { msg-category group-name msg-code }

nonrootcause

alarm msg-category group-name msg-code

commit

show logging correlator rule { all | correlation-rule1...correlation-rule14 } [ context context1...context

6 ] [ location node-id1...node-id6 ] [ rulesource { internal | user }] [ ruletype { nonstateful | stateful }] [ summary | detail ]

DETAILED STEPS

Implementing and Monitoring Alarms and Alarm Log Correlation

PurposeCommand or Action

Step 1

Step 2

Step 3

Step 4

Step 5

configure

logging correlator rule correlation-rule { type { stateful | nonstateful }}

Example:

RP/0/RSP0/CPU0:router(config)# logging correlator

rule rule_stateful

rootcause { msg-category group-name msg-code }

Example:

RP/0/RSP0/CPU0:router(config-corr-rule-st)# rootcause CAT_BI_1 GROUP_BI_1 CODE_BI_1

Example:

RP/0/RSP0/CPU0:router(config-corr-rule-st)# nonrootcause

alarm msg-category group-name msg-code

Example:

RP/0/RSP0/CPU0:router(config-corr-rule-st-nonrc)#

alarm CAT_BI_2 GROUP_BI_2 CODE_BI_2

Configures a logging correlation rule and enters submodes for stateful and nonstateful rule types.

Stateful correlations can change specifically if the

•

root-cause alarm is bistate.

Nonstate correlations cannot change. All

•

non-root-cause alarms remain in the correlation buffers.

Configures a root-cause alarm message.

This example specifies a root-cause alarm under

•

stateful configuration mode

Enters the non-root-cause configuration modenonrootcause

Specifies a non-root-cause alarm message.

This command can be issued with the

•

nonrootcause command, such as

nonrootcause alarm msg-category group-name

msg-code

Step 6

commit

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Implementing and Monitoring Alarms and Alarm Log Correlation

Configuring Hierarchical Correlation Rule Flags

PurposeCommand or Action

Step 7

(Optional) Displays the correlator rules that are defined.show logging correlator rule { all |

correlation-rule1...correlation-rule14 } [ context context1...context 6 ] [ location node-id1...node-id6 ] [

rulesource { internal | user }] [ ruletype { nonstateful | stateful }] [ summary | detail ]

Example:

RP/0/RSP0/CPU0:router# show logging correlator rule

all

Configuring Hierarchical Correlation Rule Flags

Hierarchical correlation is when a single alarm is both a root cause for one correlation rule and a non-root cause for another rule, and when alarms are generated resulting in a successful correlation associated with both rules. What happens to a non-root-cause alarm hinges on the behavior of its correlated root-cause alarm.

There are cases in which you want to control the stateful behavior associated with these hierarchies and to implement flags, such as reparenting and reissuing of nonbistate alarms. This task explains how to implement these flags.

See the Reparent Flag, on page 8 and Reissue Nonbistate Flag, on page 8 sections for detailed information about these flags.

SUMMARY STEPS

DETAILED STEPS

Step 1

Step 2

configure

logging correlator rule correlation-rule { type { stateful | nonstateful }}

configure

logging correlator rule correlation-rule { type { stateful | nonstateful }}

reissue-nonbistate

reparent

commit

show logging correlator rule { all | correlation-rule1...correlation-rule14 } [ context context1...context

6 ] [ location node-id1...node-id6 ] [ rulesource { internal | user }] [ ruletype { nonstateful | stateful }] [ summary | detail ]

PurposeCommand or Action

Configures a logging correlation rule.

Stateful correlations can change specifically if

•

the root-cause alarm is bistate.

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

Applying Logging Correlation Rules

Example:

RP/0/RSP0/CPU0:router(config)# logging correlator rule rule_stateful type nonstateful

Step 3

Step 4

reissue-nonbistate

Example:

RP/0/RSP0/CPU0:router(config-corr-rule-st)# reissue-nonbistate

reparent

Example:

RP/0/RSP0/CPU0:router(config-corr-rule-st)# reparent

Implementing and Monitoring Alarms and Alarm Log Correlation

PurposeCommand or Action

Nonstateful correlations cannot change. All

•

non-root-cause alarms remain in the correlation buffers.

Issues nonbistate alarm messages (events) from the correlator log after its root-cause alarm clears.

Specifies the behavior of non-root-cause alarms after a root-cause parent clears.

Step 5

Step 6

commit

show logging correlator rule { all |

correlation-rule1...correlation-rule14 } [ context context1...context 6 ] [ location node-id1...node-id6 ] [

rulesource { internal | user }] [ ruletype { nonstateful | stateful }] [ summary | detail ]

Example:

RP/0/RSP0/CPU0:router# show logging correlator rule

all

What to Do Next

To activate a defined correlation rule and rule set, you must apply them by using the logging correlator apply rule and logging correlator apply ruleset commands.

Applying Logging Correlation Rules

This task explains how to apply logging correlation rules.

Applying a correlation rule activates it and gives a scope. A single correlation rule can be applied to multiple scopes on the router; that is, a rule can be applied to the entire router, to several locations, or to several contexts.

(Optional) Displays the correlator rules that are defined.

Note

When a rule is applied or if a rule set that contains this rule is applied, then the rule definition cannot be modified through the configuration until the rule or rule set is once again unapplied.

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

+ 348 hidden pages

Cisco ASR 9000 Serie Configuration Manuals

Specifications and Main Features

Frequently Asked Questions

User Manual

Cisco ASR 9000 Series Aggregation Services Router System Monitoring Configuration Guide, Release 4.2.x

CONTENTS

Preface

Changes to This Document

Obtaining Documentation and Submitting a Service Request

Implementing and Monitoring Alarms and Alarm Log Correlation

Prerequisites for Implementing and Monitoring Alarms and Alarm Log Correlation

Information About Implementing Alarms and Alarm Log Correlation

Alarm Logging and Debugging Event Management System

Correlator

Logging Correlation

System Logging Process

Alarm Logger

Correlation Rules

Application of Rules and Rule Sets

Types of Correlation

Root Message and Correlated Messages

Alarm Severity Level and Filtering

Bistate Alarms

Capacity Threshold Setting for Alarms

Hierarchical Correlation

Context Correlation Flag

Duration Timeout Flags

Reparent Flag

Reissue Nonbistate Flag

Internal Rules

SNMP Alarm Correlation

How to Implement and Monitor Alarm Management and Logging Correlation

Configuring Logging Correlation Rules

Configuring Logging Correlation Rule Sets

Configuring Root-cause and Non-root-cause Alarms

Configuring Hierarchical Correlation Rule Flags

Applying Logging Correlation Rules