Juniper Networks, Inc.
1133 Innovation Way
Sunnyvale, California 94089
USA
408-745-2000
www.juniper.net
Juniper Networks, the Juniper Networks logo, Juniper, and Junos are registered trademarks of Juniper Networks, Inc. in
the United States and other countries. All other trademarks, service marks, registered marks, or registered service marks
are the property of their respective owners.
Juniper Networks assumes no responsibility for any inaccuracies in this document. Juniper Networks reserves the right
to change, modify, transfer, or otherwise revise this publication without notice.
The information in this document is current as of the date on the title page.
ii
YEAR 2000 NOTICE
Juniper Networks hardware and software products are Year 2000 compliant. Junos OS has no known time-related
limitations through the year 2038. However, the NTP application is known to have some difficulty in the year 2036.
END USER LICENSE AGREEMENT
The Juniper Networks product that is the subject of this technical documentation consists of (or is intended for use with)
Juniper Networks software. Use of such software is subject to the terms and conditions of the End User License Agreement
(“EULA”) posted at https://support.juniper.net/support/eula/. By downloading, installing or using such software, you
agree to the terms and conditions of that EULA.
Table of Contents
1
About the Documentation | ix
Documentation and Release Notes | ix
Documentation Conventions | ix
Documentation Feedback | xii
Requesting Technical Support | xii
Self-Help Online Tools and Resources | xiii
Creating a Service Request with JTAC | xiii
Introduction to HealthBot
HealthBot Overview | 15
Benefits of HealthBot | 15
iii
Closed-Loop Automation | 16
Main Components of HealthBot | 17
HealthBot Health Monitoring | 17
HealthBot Root Cause Analysis | 18
HealthBot Log File Analysis | 19
HealthBot Concepts | 20
HealthBot Data Collection Methods | 21
Data Collection - ’Push’ Model | 21
Data Collection - ’Pull’ Model | 22
HealthBot Topics | 22
HealthBot Rules - Basics | 23
HealthBot Rules - Deep Dive | 25
Rules | 25
Sensors | 28
Fields | 28
Vectors | 30
Variables | 30
Functions | 31
Triggers | 31
Tagging | 34
Rule Properties | 35
HealthBot Playbooks | 35
Healthbot Tagging | 36
Overview | 36
HealthBot Tagging Terminology | 37
How It Works | 41
Examples | 42
HealthBot Time Series Database (TSDB) | 48
Historical Context | 48
TSDB Improvements | 49
Database Sharding | 50
Database Replication | 51
iv
Database Reads and Writes | 52
Manage TSDB Options in the HealthBot GUI | 53
HealthBot CLI Configuration Options | 55
HealthBot Machine Learning (ML) | 56
HealthBot Machine Learning Overview | 56
Understanding HealthBot Anomaly Detection | 57
Field | 57
Algorithm | 58
Learning period | 58
Pattern periodicity | 59
Understanding HealthBot Outlier Detection | 59
Dataset | 60
Algorithm | 60
Sigma coefficient (k-fold-3sigma only) | 62
Sensitivity | 62
Learning period | 62
Understanding HealthBot Predict | 62
Field | 63
Algorithm | 63
Learning period | 63
Pattern periodicity | 63
2
Prediction offset | 63
HealthBot Rule Examples | 64
HealthBot Anomaly Detection Example | 64
HealthBot Outlier Detection Example | 73
Frequency Profiles and Offset Time | 78
Frequency Profiles | 78
Configuration Using HealthBot GUI | 79
Configuration Using HealthBot CLI | 81
Apply a Frequency Profile Using the HealthBot GUI | 81
Apply a Frequency Profile Using the HealthBot CLI | 83
Offset Time Unit | 83
Offset Used in Formulas | 84
v
Offset Used in References | 85
Offset Used in Vectors | 86
Offset Used in Triggers | 88
Offset Used in Trigger Reference | 89
HealthBot Licensing | 91
HealthBot Licensing Overview | 91
Managing HealthBot Licenses | 94
Add a License to HealthBot | 94
View Licensing Status in HealthBot | 94
Management and Monitoring
Manage HealthBot Users and Groups | 98
User Management | 98
Group Management | 99
Limitations | 104
Manage Devices, Device Groups, and Network Groups | 104
Adding a Device | 106
Editing a Device | 109
Adding a Device Group | 109
Editing a Device Group | 113
Configuring a Retention Policy for the Time Series Database | 113
Adding a Network Group | 114
Editing a Network Group | 117
HealthBot Rules and Playbooks | 118
Add a Pre-Defined Rule | 119
Create a New Rule Using the HealthBot GUI | 119
Rule Filtering | 121
Sensors | 123
Fields | 125
Vectors | 128
Variables | 130
Functions | 131
Triggers | 133
vi
Rule Properties | 136
Edit a Rule | 136
Add a Pre-Defined Playbook | 137
Create a New Playbook Using the HealthBot GUI | 138
Edit a Playbook | 139
Manage Playbook Instances | 140
View Information About Playbook Instances | 141
Create a Playbook Instance | 143
Manually Pause or Play a Playbook Instance | 145
Create a Schedule to Automatically Play/Pause a Playbook Instance | 146
Monitor Device and Network Health | 148
Dashboard | 149
Health | 152
Network Health | 165
Graph Page | 165
Alarms and Notifications | 181
Generate Alarm Notifications | 181
Manage Alarms Using Alarm Manager | 190
Stream Sensor and Field Data from HealthBot | 195
Generate Reports | 200
Configure a Secure Data Connection for HealthBot Devices | 216
Configure Security Profiles for SSL and SSH Authentication | 217
Configure Security Authentication for a Specific Device or Device Group | 218
Configure Data Summarization | 219
Creating a Data Summarization Profile | 220
Applying Data Summarization Profiles to a Device Group | 221
Modify the UDA and UDF Engines | 222
Overview | 222
How it Works | 223
Usage Notes | 224
vii
Configuration | 225
SIMULATE | 225
MODIFY | 226
ROLLBACK | 227
Logs for HealthBot Services | 227
Configure Service Log Levels for a Device Group or Network Group | 228
Download Logs for HealthBot Services | 229
Troubleshooting | 230
HealthBot Self Test | 230
Overview | 230
Other Uses for the Self Test Tool | 231
Usage Notes | 231
How to Use the Self Test Tool | 232
Device Reachability Test | 232
Overview | 232
Usage Notes | 233
How to Use the Device Reachability Tool | 233
Ingest Connectivity Test | 234
Overview | 234
Usage Notes | 235
How to Use the Ingest Connectivity Tool | 235
Debug No-Data | 236
Overview | 236
Usage Notes | 237
How to Use the Debug No-Data Tool | 238
HealthBot Configuration – Backup and Restore | 240
Back Up the Configuration | 240
Restore the Configuration | 240
Backup or Restore the Time Series Database (TSDB) | 241
viii
About the Documentation
IN THIS SECTION
Documentation and Release Notes | ix
Documentation Conventions | ix
Documentation Feedback | xii
Requesting Technical Support | xii
Use this guide to understand the features you can configure and the tasks you can perform from the
HealthBot web UI.
ix
Documentation and Release Notes
To obtain the most current version of all Juniper Networks®technical documentation, see the product
documentation page on the Juniper Networks website at https://www.juniper.net/documentation/.
If the information in the latest release notes differs from the information in the documentation, follow the
product Release Notes.
Juniper Networks Books publishes books by Juniper Networks engineers and subject matter experts.
These books go beyond the technical documentation to explore the nuances of network architecture,
deployment, and administration. The current list can be viewed at https://www.juniper.net/books.
Documentation Conventions
Table 1 on page x defines notice icons used in this guide.
Table 1: Notice Icons
x
DescriptionMeaningIcon
Indicates important features or instructions.Informational note
Caution
Indicates a situation that might result in loss of data or hardware
damage.
Alerts you to the risk of personal injury or death.Warning
Alerts you to the risk of personal injury from a laser.Laser warning
Indicates helpful information.Tip
Alerts you to a recommended use or implementation.Best practice
Table 2 on page x defines the text and syntax conventions used in this guide.
Table 2: Text and Syntax Conventions
ExamplesDescriptionConvention
Fixed-width text like this
Italic text like this
Represents text that you type.Bold text like this
Represents output that appears on
the terminal screen.
Introduces or emphasizes important
•
new terms.
Identifies guide names.
•
Identifies RFC and Internet draft
•
titles.
To enter configuration mode, type
the configure command:
user@host> configure
user@host> show chassis alarms
No alarms currently active
A policy term is a named structure
•
that defines match conditions and
actions.
Junos OS CLI User Guide
•
RFC 1997, BGP Communities
•
Attribute
Table 2: Text and Syntax Conventions (continued)
xi
ExamplesDescriptionConvention
Italic text like this
Text like this
< > (angle brackets)
| (pipe symbol)
Represents variables (options for
which you substitute a value) in
commands or configuration
statements.
Represents names of configuration
statements, commands, files, and
directories; configuration hierarchy
levels; or labels on routing platform
components.
variables.
Indicates a choice between the
mutually exclusive keywords or
variables on either side of the symbol.
The set of choices is often enclosed
in parentheses for clarity.
Configure the machine’s domain
name:
[edit]
root@# set system domain-name
domain-name
To configure a stub area, include
•
the stub statement at the [edit
protocols ospf area area-id]
hierarchy level.
The console port is labeled
•
CONSOLE.
stub <default-metric metric>;Encloses optional keywords or
broadcast | multicast
(string1 | string2 | string3)
# (pound sign)
[ ] (square brackets)
Indention and braces ( { } )
; (semicolon)
GUI Conventions
Indicates a comment specified on the
same line as the configuration
statement to which it applies.
Encloses a variable for which you can
substitute one or more values.
Identifies a level in the configuration
hierarchy.
Identifies a leaf statement at a
configuration hierarchy level.
rsvp { # Required for dynamic MPLS
only
community name members [
community-ids ]
[edit]
routing-options {
static {
route default {
nexthop address;
retain;
}
}
}
Table 2: Text and Syntax Conventions (continued)
xii
ExamplesDescriptionConvention
Bold text like this
> (bold right angle bracket)
Represents graphical user interface
(GUI) items you click or select.
Separates levels in a hierarchy of
menu selections.
In the Logical Interfaces box, select
•
All Interfaces.
To cancel the configuration, click
•
Cancel.
In the configuration editor hierarchy,
select Protocols>Ospf.
Documentation Feedback
We encourage you to provide feedback so that we can improve our documentation. You can use either
of the following methods:
Online feedback system—Click TechLibrary Feedback, on the lower right of any page on the Juniper
•
Networks TechLibrary site, and do one of the following:
Click the thumbs-up icon if the information on the page was helpful to you.
•
Click the thumbs-down icon if the information on the page was not helpful to you or if you have
•
suggestions for improvement, and use the pop-up form to provide feedback.
E-mail—Send your comments to techpubs-comments@juniper.net. Include the document or topic name,
•
URL or page number, and software version (if applicable).
Requesting Technical Support
Technical product support is available through the Juniper Networks Technical Assistance Center (JTAC).
If you are a customer with an active Juniper Care or Partner Support Services support contract, or are
covered under warranty, and need post-sales technical support, you can access our tools and resources
online or open a case with JTAC.
JTAC policies—For a complete understanding of our JTAC procedures and policies, review the JTAC User
•
Guide located at https://www.juniper.net/us/en/local/pdf/resource-guides/7100059-en.pdf.
JTAC hours of operation—The JTAC centers have resources available 24 hours a day, 7 days a week,
•
365 days a year.
Self-Help Online Tools and Resources
For quick and easy problem resolution, Juniper Networks has designed an online self-service portal called
the Customer Support Center (CSC) that provides you with the following features:
HealthBot is a highly automated and programmable device-level diagnostics and network analytics tool
that provides consistent and coherent operational intelligence across network deployments. Integrated
with multiple data collection methods (such as Junos Telemetry Interface, NETCONF, syslog, and SNMP),
HealthBot aggregates and correlates large volumes of time-sensitive telemetry data, providing a
multidimensional and predictive view of the network. Additionally, HealthBot translates troubleshooting,
maintenance, and real-time analytics into an intuitive user experience to give network operators actionable
insights into the health of an individual device and the overall network.
15
Benefits of HealthBot
Customization—Provides a framework to define and customize health profiles, allowing truly actionable
•
insights for the specific device or network being monitored.
Automation—Automates root cause analysis and log file analysis, streamlines diagnostic workflows, and
•
provides self-healing and remediation capabilities.
Greater network visibility—Provides advanced multidimensional analytics across network elements,
•
giving you a clearer understanding of network behavior to establish operational benchmarks, improve
resource planning, and minimize service downtime.
Intuitive graphical user interface—Offers an intuitive web-based GUI for policy management and easy
•
data consumption.
Open integration —Lowers the barrier of entry for telemetry and analytics by providing open source
•
data pipelines, notification capabilities, and third-party device support.
Multiple data collection methods—Includes support for Junos Telemetry Interface (JTI), NETCONF,
•
syslog, NetFlow, and SNMP.
Closed-Loop Automation
HealthBot offers closed-loop automation. The automation workflow can be divided into seven main steps
(see Figure 1 on page 17):
1. Define—HealthBot provides tools for the user to define the health parameters of key network elements
through customizable key performance indicators (KPIs), rules, and playbooks.
2. Collect—HealthBot collects rule-based telemetry data from multiple devices using various types of
data transfer methods.
3. Store—HealthBot stores time-sensitive telemetry data in a time-series database (TSDB). This allows
users to query, perform operations on, and write new data back to the database, days, or even
weeks after initial storage.
4. Analyze—HealthBot analyzes telemetry data based on customizable KPIs, rules, and playbooks.
5. Visualize—HealthBot provides multiple ways for you to visualize the aggregated telemetry data through
the HealthBot web UI to gain actionable and predictive insight into the health of your devices and
overall network.
16
6. Notify—HealthBot notifies you through the HealthBot web UI and alarm notifications when problems
in the network are detected.
7. Act—HealthBot performs user-defined actions to help resolve and proactively prevent network problems.
Health Monitoring—View an abstracted, hierarchical representation of device and network-level health,
•
and define the health parameters of key network elements through customizable key performance
indicators (KPIs), rules, and playbooks.
Root Cause Analysis—Find the root cause of a device or network-level issue when HealthBot detects a
•
problem with a network element.
Log File Analysis—Analyze relevant system log messages by filtering out noise.
•
HealthBot Health Monitoring
The Challenge
With increasing data traffic generated by cloud-native applications and emerging technologies, service
providers and enterprises need a network analytics solution to analyze volumes of telemetry data, offer
insights into overall network health, and produce actionable intelligence. While telemetry-based techniques
have existed for years, the growing number of protocols, data formats, and key performance indicators
(KPIs) from diverse networking devices has made data analysis complex and costly. Traditional CLI-based
interfaces require specialized skills to extract business value from telemetry data, creating a barrier to
entry for network analytics
The HealthBot Health Monitoring Solution
By aggregating and correlating raw telemetry data from multiple sources, the HealthBot health monitoring
feature provides a multidimensional view of network health that reports current status, as well as projected
threats to the infrastructure and its workloads.
Health status determination is tightly integrated with the HealthBot root cause analysis (RCA) application,
which can make use of syslog log data received from the network and its devices. HealthBot health
monitoring provides status indicators that alert you when, for example, a network resource is currently
operating outside a user-defined performance policy, as well as risk analysis using historical trends to
predict whether a resource may be unhealthy in the future. HealthBot health monitoring not only offers
a fully customizable view of the current health of network elements, it also automatically initiates remedial
actions based on predefined service level agreements (SLAs).
18
Defining the health of a network element, such as broadband network gateway (BNG), provider edge (PE),
core, and leaf-spine, is highly contextual. Each element plays a different role in a network, with unique key
performance indicators (KPIs) to monitor. Given that there is no single definition for network health across
all use cases, HealthBot provides a highly customizable framework to allow you to define your own health
profiles.
HealthBot Root Cause Analysis
The Challenge
In some cases, it can be challenging for a network operator to figure out what causes a Junos OS networking
device to stop working properly. When this happens, the typical workflow to find the root cause of the
network problem involves contacting a specialist from Juniper Networks, who would then troubleshoot
and triage the unhealthy component based on knowledge built from years of experience. After completing
this time-intensive assessment, the problem would then be reassigned to the relevant engineering team.
The HealthBot RCA Solution
The purpose of the HealthBot root cause analysis (RCA) application is to simplify the process of finding
the root cause of a network issue. HealthBot RCA captures the troubleshooting knowledge of, for example,
the Juniper Networks specialists as part of a knowledge base in the form of HealthBot rules. These rules
are evaluated either on demand by a specific trigger or periodically in the background to ascertain the
health of a networking component, such as routing protocol, system, interface, or chassis, on the device.
To illustrate the benefits of HealthBot RCA, let us consider the problem of OSPF flapping.
Figure 2 on page 19 highlights the workflow sequence involved in debugging OSPF flapping. If a network
operator or Juniper Networks specialist were assigned this troubleshooting task, he or she would need to
RPD–InfraKernel–SystemChassisInterfaces
PFE–Interface
RPD–OSPF
ControlPlane
HostPath
DataPlane
PPM–OSPFPFE–UkernPFE–HostPathRE–HostPath
XM–ASICLU–ASICPFE–System
PFE–jnh
g300019
perform manual debugging steps for each tile of the workflow sequence in order to find the root cause of
the OSPF flapping. The HealthBot RCA application, on the other hand, delivers this expert service to you
automatically as a bot. The RCA bot tracks all of the telemetry data collected by the HealthBot and translates
the information into graphical status indicators (displayed in the HealthBot web UI) that correlate to
different parts of the workflow sequence shown in Figure 2 on page 19.
Figure 2: High-level workflow to debug OSPF-flapping
19
When configuring HealthBot, each tile of the workflow sequence shown in Figure 2 on page 19 can be
defined by one or more rules. For example, the RPD-OSPF tile could be defined as two rule conditions:
one to check if "hello-transmitted" counters are incrementing and the other to check if "hello-received"
counters are incrementing. Based on these user-defined rules, HealthBot provides status indicators, alarm
notifications, and an alarm management tool through the web UI to inform and alert you of specific network
conditions that could, for example, lead to OSPF flapping. By isolating a problem area in the workflow,
HealthBot RCA proactively guides you in determining the appropriate corrective action to take to fix a
pending issue or avoid a potential one.
HealthBot Log File Analysis
The Challenge
Networking devices can generate a lot of log messages, some of these messages are arcane and others
create a lot of noise and clutter that drown out the more significant, meaningful messages. Network
operators need an easy way to sort through and organize all of these log messages, as well as make sense
of the information in order to take action, if necessary.
The HealthBot Log File Analysis Solution
Fully integrated with the HealthBot health monitoring and RCA features, HealthBot log file analysis can
be implemented with the use of log patterns and pattern sets within the syslog ingest settings. The pattern
sets can be applied to Rules to automatically filter out unnecessary log messages and help highlight only
the relevant, actionable messages. Healthbot log file analysis consists of two main components:
1. An ingest engine that lets HealthBot receive syslog messages from networks and devices.
2. Pre-defined and customizable search patterns and pattern sets that can be applied to rules.
See Syslog Ingest for more information about syslog ingest.
RELATED DOCUMENTATION
HealthBot Getting Started Guide
20
HealthBot Concepts
IN THIS SECTION
HealthBot Data Collection Methods | 21
HealthBot Topics | 22
HealthBot Rules - Basics | 23
HealthBot Rules - Deep Dive | 25
HealthBot Playbooks | 35
HealthBot is a highly programmable telemetry-based analytics application. With it, you can diagnose and
root cause network issues, detect network anomalies, predict potential network issues, and create real-time
remedies for any issues that come up.
To accomplish this, network devices and HealthBot have to be configured to send and receive large amounts
of data, respectively. Device configuration is covered throughout this and other sections of the guide.
Configuring HealthBot, or any application, to read and react to incoming telemetry data requires a language
that describes several elements that are specific to the systems and data under analysis. This type of
language is called a Domain Specific Language (DSL), i.e., a language that is specific to one domain. Any
DSL is built to help answer questions. For HealthBot, these questions are:
Q: What components make up the systems that are sending data?
•
A: Network devices are made up of memory, cpu, interfaces, protocols and so on. In HealthBot, these
are called “HealthBot Topics” on page 22.
Q: How do we gather, filter, process, and analyze all of this incoming telemetry data?
•
A: HealthBot uses “HealthBot Rules - Basics” on page 23 that consist of information blocks called sensors,
fields, variables, triggers, and more.
Q: How do we determine what to look for?
•
A: It depends on the problem you want to solve or the question you want to answer. Healthbot uses
“HealthBot Playbooks” on page 35 to create collections of specific rules and apply them to specific
groups of devices in order accomplish specific goals. For example, part of the system-kpis-playbook can
alert a user when system memory usage crosses a user-defined threshold.
This section covers these key concepts and more, which you need to understand before using HealthBot.
21
HealthBot Data Collection Methods
In order to provide visibility into the state of your network devices, HealthBot first needs to collect their
telemetry data and other status information. It does this using sensors.
HealthBot supports sensors that “push” data from the device to HealthBot and sensors that require
HealthBot to “pull” data from the device using periodic polling.
Data Collection - ’Push’ Model
As the number of objects in the network, and the metrics they generate, have grown, gathering operational
statistics for monitoring the health of a network has become an ever-increasing challenge. Traditional ’pull’
data-gathering models, like SNMP and the CLI, require additional processing to periodically poll the network
element, and can directly limit scaling.
The ’push’ model overcomes these limits by delivering data asynchronously, which eliminates polling. With
this model, the HealthBot server can make a single request to a network device to stream periodic updates.
As a result, the ’push’ model is highly scalable and can support the monitoring of thousands of objects in
a network. Junos devices support this model in the form of the Junos Telemetry Interface (JTI).
HealthBot currently supports four ‘push’ ingest types.
Native GPB
•
NetFlow
•
OpenConfig
•
Syslog
•
These push-model data collection—or ingest—methods are explained in detail in the HealthBot Data Ingest
Guide.
Data Collection - ’Pull’ Model
While the ’push’ model is the preferred approach for its efficiency and scalability, there are still cases where
the ’pull’ data collection model is appropriate. With the ’pull’ model, HealthBot requests data from network
devices at periodic intervals.
HealthBot currently supports two ‘pull’ ingest types.
iAgent (CLI/NETCONF)
•
SNMP
•
These pull-model data collection—or ingest—methods are explained in detail in the HealthBot Data Ingest
Guide.
22
HealthBot Topics
Network devices are made up of a number of components and systems from CPUs and memory to interfaces
and protocol stacks and more. In HealthBot, a topic is the construct used to address those different device
components. The Topic block is used to create name spaces that define what needs to be modeled. Each
Topic block is made up of one or more Rule blocks which, in turn, consist of the Field blocks, Function
blocks, Trigger blocks, etc. See “HealthBot Rules - Deep Dive” on page 25 for details. Each rule created
in HealthBot must be part of a topic. Juniper has curated a number of these system components into a list
of Topics such as:
chassis
•
class-of-service
•
external
•
firewall
•
interfaces
•
kernel
•
linecard
•
logical-systems
•
protocol
•
routing-options
•
security
•
service
•
system
•
You can create sub-topics underneath any of the Juniper topic names by appending .<sub-topic> to the
topic name. For example, kernel.tcpip or system.cpu.
Any pre-defined rules provided by Juniper fit within one of the Juniper topics with the exception of external,
The external topic is reserved for user-created rules. In the HealtBot web GUI, when you create a new
rule, the Topics field is automatically populated with the external topic name.
HealthBot Rules - Basics
HealthBot’s primary function is collecting and reacting to telemetry data from network devices. Defining
how to collect the data, and how to react to it, is the role of a rule.
23
HealthBot ships with a set of default rules, which can be seen on the Configuration > Rules page of the
HealthBot GUI, as well as in GitHub in the healthbot-rules repository. You can also create your own rules.
The structure of a HealthBot rule looks like this:
To keep rules organized, HealthBot organizes them into topics. Topics can be very general, like system, or
they can be more granular, like protocol.bgp. Each topic contains one or more rules.
As described above, a rule contains all the details and instructions to define how to collect and handle the
data. Each rule contains the following required elements:
The sensor defines the parameters for collecting the data. This typically includes which data collection
•
method to use (as discussed above in “HealthBot Data Collection Methods” on page 21), some guidance
on which data to ingest, and how often to push or pull the data. In any given rule, a sensor can be defined
directly within the rule or it can be referenced from another rule.
Example: Using the SNMP sensor, poll the network device every 60 seconds to collect all the device
•
data in the Juniper SNMP MIB table jnxOperatingTable.
The sensor typically ingests a large set of data, so fields provide a way to filter or manipulate that data,
•
allowing you to identify and isolate the specific pieces of information you care about. Fields can also act
as placeholder values, like a static threshold value, to help the system perform data analysis.
Example: Extract, isolate, and store the jnxOperating15MinLoadAvg (CPU 15-minute average utilization)
•
value from the SNMP table specified above in the sensor.
Triggers periodically bring together the fields with other elements to compare data and determine current
•
device status. A trigger includes one or more ’when-then’ statements, which include the parameters that
define how device status is visualized on the health pages.
Example: Every 90 seconds, check the CPU 15min average utilization value, and if it goes above a
•
defined threshold, set the device’s status to red on the device health page and display a message
showing the current value.
24
The rule can also contain the following optional elements:
Vectors allow you to leverage existing elements to avoid the need to repeatedly configure the same
•
elements across multiple rules.
Examples: A rule with a configured sensor, plus a vector to a second sensor from another rule; a rule
•
with no sensors, and vectors to fields from other rules
Variables can be used to provide additional supporting parameters needed by the required elements
•
above.
Examples: The string “ge-0/0/0”, used within a field collecting status for all interfaces, to filter the
•
data down to just the one interface; an integer, such as “80”, referenced in a field to use as a static
threshold value
Functions allow you to provide instructions (in the form of a Python script) on how to further interact
•
with data, and how to react to certain events.
Examples: A rule that monitors input and output packet counts, using a function to compare the count
•
values; a rule that monitors system storage, invoking a function to cleanup temp and log files if storage
utilization goes above a defined threshold
NOTE: Rules, on their own, don’t actually do anything. To make use of rules you need to add
them to “HealthBot Playbooks” on page 35.
HealthBot Rules - Deep Dive
IN THIS SECTION
Rules | 25
Sensors | 28
Fields | 28
Vectors | 30
Variables | 30
Functions | 31
Triggers | 31
Tagging | 34
Rule Properties | 35
25
A rule is a package of components, or blocks, needed to extract specific information from the network or
from a Junos device. Rules conform to a specifically tailored domain specific language (DSL) for analytics
applications. The DSL is designed to allow rules to capture:
The minimum set of input data that the rule needs to be able to operate
•
The minimum set of telemetry sensors that need to be configured on the device(s)
•
The fields of interest from the configured sensors
•
The reporting or polling frequency
•
The set of triggers that operate on the collected data
•
The conditions or evaluations needed for triggers to kick in
•
The actions or notifications that need to be performed when a trigger kicks in
•
The details around rules, topics and playbooks are presented in the following sections.
Rules
Rules are meant to be free of any hard coding. Think of threshold values; If a threshold is hard coded, there
is no easy way to customize it for a different customer or device that has different requirements. Therefore,
rules are defined using parameterization to set the default values. This allows the parameters to be left at
default or be customized by the operator at the time of deployment. Customization can be done at the
device group or individual device level while applying the HealthBot Playbooks on page 35 in which the
individual rules are contained.
Rules that are device-centric are called device rules. Device components such as chassis, system, linecards,
and interfaces are all addressed as HealthBot Topics on page 22 in the rule definition. Generally, device
rules make use of sensors on the devices.
Rules that span multiple devices are called network rules. Network rules:
must have a rule-frequency configured
•
must not contain sensors
•
cannot be mixed with device rules in a playbook
•
To deploy either type of rule, include the rule in a playbook and then apply the playbook to a device group
or network group.
NOTE: HealthBot comes with a set of pre-defined rules.
Not all of the blocks that make up a rule are required for every rule. Whether or not a specific block is
required in a rule definition depends on what sort of information you are trying to get to. Additionally,
some rule components are not valid for network rules. Table 3 on page 26 lists the components of a rule
and provides a brief description of each one.
26
Table 3: Rule Components
“Sensors” on
page 28
The Sensors block is like the access method for getting
at the data. There are multiple types of sensors available
in HealthBot: OpenConfig, Native GPB, iAgent, SNMP,
and syslog.
It defines what sensors need to be active on the device
in order to get to the data fields on which the triggers
eventually operate. Sensor names are referenced by the
Fields.
OpenConfig and iAgent sensors require that a frequency
be set for push interval or polling interval respectively.
SNMP sensors also require you to set a frequency.
Required in Device
Rules?What it DoesBlock
that only use a field
reference from another rule
or a vector with references
from another rule. In these
cases, rule-frequency must
be explicitly defined.
Valid for
Network
Rules?
NoNo–Rules can be created
Table 3: Rule Components (continued)
Required in Device
Rules?What it DoesBlock
27
Valid for
Network
Rules?
Fields on
page 28
“Vectors” on
page 30
“Variables” on
page 30
“Functions” on
page 31
The source for the Fields block can be a pointer to a
sensor, a reference to a field defined in another rule, a
constant, or a formula. The field can be a string, integer
or floating point. The default field type is string.
and comparing elements amongst different sets. A vector
is used to hold multiple values from one or more fields.
Invariant rule definitions are achieved through
mustache-style templating like {{<placeholder-variable>
}}. The placeholder-variable value is set in the rule by
default or can be user-defined at deployment time.
and actions by creating prototype methods in external
files written in languages like python. The functions block
includes details on the file path, method to be accessed,
and any arguments, including argument description and
whether it is mandatory.
YesYes-Fields contain the data
on which the triggers
operate. Starting in
HealthBot release 3.1.0,
regular fields and key-fields
can be added to rules based
on conditional tagging
profiles. See the “Tagging”
on page 34 section below.
YesNoThe Vectors block allows handling of lists, creating sets,
NoNoThe Variables block allows you to pass values into rules.
NoNoThe Functions block allows you to extend fields, triggers,
“Triggers” on
page 31
“Rule
Properties” on
page 35
The Triggers block operates on fields and are defined by
one or more Terms. When the conditions of a Term are
met, then the action defined in the Term is taken.
By default, triggers are evaluated every 10 seconds, unless
explicitly configured for a different frequency.
By default, all triggers defined in a rule are evaluated in
parallel.
for a HealthBot rule, such as hardware dependencies,
software dependencies, and version history.
YesYes–Triggers enable rules
to take action.
YesNoThe Rule Properties block allows you to specify metadata
Sensors
When defining a sensor, you must specify information such as sensor name, sensor type and data collection
frequency. As mentioned in Table 3 on page 26, sensors can be one of the following:
OpenConfig—For information on OpenConfig JTI sensors, see the Junos Telemetry Interface User Guide.
•
Native GPB—For information on Native GPB JTI sensors, see the Junos Telemetry Interface User Guide.
•
iAgent—The iAgent sensors use NETCONF and YAML-based PyEZ tables and views to fetch the necessary
•
data. Both structured (XML) and unstructured (VTY commands and CLI output) data are supported.
For information on Junos PyEZ, see the Junos PyEz Documentation.
SNMP—Simple Network Management Protocol.
•
syslog—system log
•
BYOI—Bring your own ingest – Allows you to define your own ingest types.
•
Flow—NetFlow traffic flow analysis protocol
•
28
sFlow—sFlow packet sampling protocol
•
When different rules have the same sensor defined, only one subscription is made per sensor. A key,
consisting of sensor-path for OpenConfig and Native GPB sensors, and the tuple of file and table for iAgent
sensors is used to identify the associated rule.
When multiple sensors with the same sensor-path key have different frequencies defined, the lowest
frequency is chosen for the sensor subscription.
Fields
There are four types of field sources, as listed in Table 3 on page 26. Table 4 on page 29 describes the
four field ingest types in more detail.
Table 4: Field Ingest Type Details
DetailsField Type
29
Sensor
Reference
Subscribing to a sensor typically provides access to multiple columns of data. For instance,
subscribing to the OpenConfig interface sensor provides access to a bunch of information including
counter related information such as:
/interfaces/counters/tx-bytes,
/interfaces/counters/rx-bytes,
/interfaces/counters/tx-packets,
/interfaces/counters/rx-packets,
/interfaces/counters/oper-state, etc.
Given the rather long names of paths in OpenConfig sensors, the Sensor definition within Fields
allows for aliasing, and filtering. For single-sensor rules, the required set of Sensors for the Fields
table are programmatically auto-imported from the raw table based on the triggers defined in
the rule.
Triggers can only operate on Fields defined within that rule. In some cases, a Field might need
to reference another Field or Trigger output defined in another Rule. This is achieved by
referencing the other field or trigger and applying additional filters. The referenced field or trigger
is treated as a stream notification to the referencing field. References aren’t supported within
the same rule.
Constant
Formula
References can also take a time-range option which picks the value, if available, from the
time-range provided. Field references must always be unambiguous, so proper attention must
be given to filtering the result to get just one value. If a reference receives multiple data points,
or values, only the latest one is used. For example, if you are referencing a the values contained
in a field over the last 3 minutes, you might end up with 6 values in that field over that time-range.
HealthBot only uses the latest value in a situation like this.
A field defined as a constant is a fixed value which cannot be altered during the course of
execution. HealthBot Constant types can be strings, integers, and doubles.
Raw sensor fields are the starting point for defining triggers. However, Triggers often work on
derived fields defined through formulas by applying mathematical transformations.
Formulas can be pre-defined or user-defined (UDF). Pre-defined formulas include: Min, Max,
Mean, Sum, Count, Rate of Change, Elapsed Time, Standard Deviation, Microburst, Dynamic
Threshold, Anomaly Detection, Outlier Detection, and Predict.
Some pre-defined formulas can operate on time ranges in order to work with historical data. If
a time range is not specified, then the formula works on current data, specified as now.
Vectors
Vectors are useful in helping to gather multiple elements into a single rule. For example, using a vector
you could gather all of the interface error fields. The syntax for Vector is:
vector <vector-name>{
path [$field-1 $field-2 .. $field-n];
filter <list of specific element(s) to filter out from vector>;
append <list of specific element(s) to be added to vector>;
}
$field-n can be field of type reference.
The fields used in defining vectors can be direct references to fields defined in other rules:
vector <vector-name>{
path [/device-group[device-group-name=<device-group>]\
/device[device-name=<device>]/topic[topic-name=<topic>]\
/rule[rule-name=<rule>]/field[<field-name>=<field-value>\
AND|OR ...]/<field-name> ...];
filter <list of specific element(s) to filter out from vector>;
append <list of specific element(s) to be added to vector>;
}
30
This syntax allows for optional filtering through the <field-name>=<field-value> portion of the construct.
Vectors can also take a time-range option that picks the values from the time-range provided. When
multiple values are returned over the given time-range, they are all selected as an array.
The following pre-defined formulas are supported on vectors:
unique @vector1–Returns the unique set of elements from vector1
•
@vector1 and @vector2–Returns the intersection of unique elements in vector1 and vector2.
•
@vector1 or @vector2–Returns the total set of unique elements in the two vectors.
•
@vector1 unless @vector2–Returns the unique set of elements in vector-1, but not in vector-2
•
Variables
Variables are defined during rule creation on the Variables page. This part of variable definition creates
the default value that gets used if no specific value is set in the device group or on the device during
Loading...
+ 213 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.