VMware vCenter Operations Manager - 5.7.1 Getting Started Guide

VMware vCenter Operations Manager
Getting Started Guide
Custom User Interface
vCenter Operations Manager 5.7.1
This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent editions of this document, see http://www.vmware.com/support/pubs.
VMware vCenter Operations Manager Getting Started Guide
You can find the most up-to-date technical documentation on the VMware Web site at:
http://www.vmware.com/support/
The VMware Web site also provides the latest product updates.
If you have comments about this documentation, submit your feedback to:
docfeedback@vmware.com
Copyright © 2013 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at
http://www.vmware.com/go/patents.
VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies.
VMware, Inc.
3401 Hillview Ave. Palo Alto, CA 94304 www.vmware.com
2 VMware, Inc.

Contents

VMware vCenter Operations Manager Getting Started Guide 5
Introducing Custom User Interface Features and Concepts 7
1
Viewing Resources 7
Understanding How vCenter Operations Manager Collects Data 8
Understanding Alerts 8
Understanding Health Ratings 10
Monitoring Virtual Resources 11
Analyzing Performance and Capacity 13
Introducing Common Tasks 17
2
Logging In and Using vCenter Operations Manager 17
Monitoring Day-to-Day Operations 19
Handling Alerts 24
Optimizing Your Resources 33
Designing Your Workspace 41
3
Working with Dashboards 41
Working with Widgets 49
Using and Configuring Widgets 53
4
Edit a Widget Configuration 54
Supported Widget Interactions 55
Configure Widget Interactions 61
Advanced Health Tree Widget 62
Alerts Widget 64
Application Detail Widget 66
Application Overview Widget 67
Configuration Overview Widget 68
Custom Relationship Widget 69
Data Distribution Analysis Widget 71
Generic Scoreboard Widget 73
GEO Widget 76
Health Status Widget 77
Health Tree Widget 78
Health-Workload Scoreboard Widget 80
Heat Map Widget 81
Mashup Charts Widget 84
Metric Graph Widget 86
Metric Graph (Rolling View) Widget 90
Metric Selector Widget 93
VMware, Inc.
3
VMware vCenter Operations Manager Getting Started Guide
Metric Sparklines Widget 94
Metric Weather Map Widget 97
Resources Widget 99
Root Cause Ranking Widget 101
Tag Selector Widget 102
Text Widget 103
Top-N Analysis Widget 104
VC Relationship Widget 107
VC Relationship (Planning) Widget 108
Define Metric Sets for a Widget 109
Index 111
4 VMware, Inc.

VMware vCenter Operations Manager Getting Started Guide

The VMware vCenter Operations Manager Getting Started Guide (Custom User Interface) introduces the VMware® vCenter™ Operations Manager Custom user interface, explains key terms and concepts, and describes how to set up dashboards and configure widgets.
Intended Audience
This information is intended for anyone who uses or administers vCenter Operations Manager by using the Custom user interface.
VMware, Inc. 5
VMware vCenter Operations Manager Getting Started Guide
6 VMware, Inc.
Introducing Custom User Interface
Features and Concepts 1
vCenter Operations Manager collects performance data from monitored software and hardware resources in your enterprise and provides predictive analysis and real-time information about problems. The Custom user interface presents data and analysis through alerts, in configurable dashboards, on predefined pages, and in several predefined reports.
Before you start using the Custom user interface, become familiar with key features, concepts, and terminology.
This chapter includes the following topics:
“Viewing Resources,” on page 7
n
“Understanding How vCenter Operations Manager Collects Data,” on page 8
n
“Understanding Alerts,” on page 8
n
“Understanding Health Ratings,” on page 10
n
“Monitoring Virtual Resources,” on page 11
n
“Analyzing Performance and Capacity,” on page 13
n

Viewing Resources

In vCenter Operations Manager, a resource is an entity in your computing environment for which vCenter Operations Manager collects data. A resource can be a single entity, such as a router or database, or a container that holds other resources.
Applications and tiers are types of container resources. An application defines an interdependent set of hardware and software components. A tier is a group of resources that performs a specific task in an application. An application can contain multiple tiers. With applications and tiers, you can combine, track, and analyze metrics for related resources over a period of time.
To make resources easier to find in the Custom user interface, an administrator assigns resources to resource tag values. A resource tag is a type of information, and a resource tag value is an individual instance of that type of information.
You can view resources in many places in the Custom user interface, including the Environment Overview page. The list on the left side of the Environment Overview page contains the resource tags and resource tag values. The List tab contains the resources.
In the example, the resource tag value called VMware vCenter Operations is selected in the left pane under the Application resource tag, and the List tab shows the resources assigned to the VMware vCenter Operations resource tag value. The resources are tiers in the VMware vCenter Operations application.
VMware, Inc.
7
VMware vCenter Operations Manager Getting Started Guide
Figure 11. Example of Resources on the Environment Overview Page

Understanding How vCenter Operations Manager Collects Data

vCenter Operations Manager can collect several types of data for a single resource. For example, for a database server it might receive data on free disk space, CPU use, and average response time. Each type of data that vCenter Operations Manager collects is called an attribute.
A metric is an instance of an attribute for a particular resource. For each metric, vCenter Operations Manager collects and stores multiple readings over time. Each piece of data that vCenter Operations Manager collects is called a metric observation or value.
A vCenter Operations Manager administrator creates attribute packages to define combinations of attributes and assigns attribute packages to resources. An attribute package specifies the attributes to collect for the resource to which it is assigned.
An administrator identifies the attributes that are most important in your enterprise as key performance indicators (KPIs). A KPI is a high-priority metric that might indicate a severe problem in your infrastructure if it exceeds its normal value range. vCenter Operations Manager treats KPIs differently from other attributes.
A vCenter Operations Manager administrator might also create super metrics and super metric packages. A super metric is useful when a single metric cannot tell you what you need to know about the behavior of your enterprise. For example, you might have a super metric that tracks the average free disk space for all of the database servers in your enterprise by averaging the free disk space metric for all servers. A super metric package is similar to an attribute package, except that it defines combinations of super metrics.

Understanding Alerts

For each attribute, vCenter Operations Manager maintains thresholds of normal behavior and generates anomalies when a metric violates a threshold. If vCenter Operations Manager determines that the current combination of anomalies indicates a real problem, or if a KPI violates a threshold, it generates an alert.
An alert is a notification that informs you of an abnormal condition that might require attention. An alert can describe a problem in a resource, including applications and tiers. Different combinations of conditions cause different types of alerts.
For example, if CPU use for all of the servers in a tier exceeds a threshold, vCenter Operations Manager generates an anomaly for each out-of-threshold metric value and sends an alert to notify you of the problem. The alert lists all of the anomalies for each metric.
8 VMware, Inc.
Chapter 1 Introducing Custom User Interface Features and Concepts
You can view alerts in several places in the Custom user interface, including the Alerts Overview page. If a vCenter Operations Manager administrator sets up the alert notification feature, you might also receive alerts in email messages.
On the Alerts Overview page, the list on the left side of the page contains resource tags and resource tag values and the right pane contains alerts. By default, the alert list includes alerts for all resources. If you select a resource tag value in the left pane, the alert list contains only the alerts for the resources that have the selected resource tag value.
Figure 12. Example of Alerts on the Alerts Overview Page

Using Thresholds to Identify Abnormal Behavior

A threshold is a value that marks the boundary between normal and abnormal behavior for a metric. When a metric crosses a threshold, vCenter Operations Manager generates an anomaly. vCenter Operations Manager uses dynamic thresholds and hard thresholds.
With dynamic thresholds, vCenter Operations Manager determines whether to generate an anomaly based on how often a metric has violated its thresholds and by the amount of the violation. vCenter Operations Manager calculates and continually adjusts a degree of abnormality for the metric. If the metric value is within this degree of abnormality, vCenter Operations Manager does not generate an anomaly, even if the value is outside of the dynamic threshold.
A hard threshold is a value that a vCenter Operations Manager administrator defines for a metric. A hard threshold changes only when an administrator changes it.
The dynamic threshold for a metric appears as a shaded area in a metric graph. Out-of-range values that generated anomalies appear as yellow areas. You can view metric graphs in several places in the Custom user interface, including the Metric Graph widget.
Figure 13. Example of a Dynamic Threshold for a Metric
VMware, Inc. 9
VMware vCenter Operations Manager Getting Started Guide

Understanding KPI Alert Generation

When a KPI violates a threshold, vCenter Operations Manager always generates an alert. When they configure attribute packages, vCenter Operations Manager administrators can identify any attribute for any resource as a KPI.
The type of alert that vCenter Operations Manager generates depends on the type of threshold that the KPI uses. When a KPI violates an internally calculated dynamic threshold, vCenter Operations Manager sends a Smart KPI Breach alert. When a KPI violates a user-defined hard threshold, vCenter Operations Manager sends a Classic KPI HT Breach alert.

Identifying the Root Causes of Alert Symptoms

The root cause of an alert is the condition or symptom that was the first step in the chain of events that led to the alert. For example, a slowdown in network traffic through a particular router could lead to an increased time per transaction for users of your Web site, which in turn could cause vCenter Operations Manager to generate an alert for the Web server resource.
For each alert, vCenter Operations Manager lists the most likely root causes of the symptoms that caused the alert and ranks the causes in order of importance.
You can view root causes in the Root Cause column on the Alerts Overview page and in the Root Cause Ranking widget.
Figure 14. Example of Root Causes on the Alerts Overview Page

Understanding Health Ratings

vCenter Operations Manager examines internally generated metrics and uses its proprietary analytics formulas to determine an overall health rating for a resource. The health rating, which ranges from 0 to 100, gives you a quick overview of the current state of a resource.
vCenter Operations Manager generates and stores internally generated metrics for every resource. Internally generated metrics include the total number of alerts and anomalies and the number of active alerts.
The health rating appears as a numeric rating and as a colored indicator in the Custom user interface. The color is based on the range of the health rating. You can view the health rating for a resource anywhere that a resource is listed and in the Health widget. For resources that VMware vCenter Server™ manages, health ratings appear in the VC Relationship widget and on the Resource Detail page.
10 VMware, Inc.
Chapter 1 Introducing Custom User Interface Features and Concepts
Figure 15. Example of Health Ratings on the Environment Overview Page

Monitoring Virtual Resources

vCenter Operations Manager provides performance, relationship, and capacity data for objects in your virtual environment. It uses badges to represent derived metrics and give you a high-level, broad view of the performance and condition of your virtual environment.

Interpreting Workload Scores for Virtual Resources

vCenter Operations Manager combines the metrics that show demand on virtual machines and other virtual objects in a single value called a workload. These metrics include CPU use and memory use. vCenter Operations Manager indicates workload as a numeric score and as a colored circle. The color is based on the range of the workload score.
Workload scores typically range from 0 to 100. A score of 0 indicates that an object is not being used. A score greater than 100 indicates that an object is trying to access more resources than are currently available. When an object's workload score is greater than 100, you might need to allocate more resources to the object or move some tasks to other objects.
You can view workload scores for virtual objects in the VC Relationship widget and on the Resource Detail page.
VMware, Inc. 11
VMware vCenter Operations Manager Getting Started Guide
Figure 16. Example of a Workload Score on the Resource Detail Page

Interpreting Capacity Scores for Virtual Resources

The capacity score indicates how close a virtual object is to exhausting its available computing resources. The computing resources include disk space, memory size, and network capacity.
To calculate the capacity score, vCenter Operations Manager combines applicable metrics from the object and applies its analytics algorithms to determine long-term cycles and trends. It uses these results to calculate when an object might run out of a type of resource.
vCenter Operations Manager indicates capacity as a numerical score and a colored hexagon. The color is based on the range of the capacity score. You can view capacity scores for virtual objects in the VC Relationship (Planning) widget.
Figure 17. Example of a Capacity Score in the VC Relationship (Planning) Widget

Viewing Change Events for Virtual Resources

A change event is any change to the virtual infrastructure. It can include changes on virtual machines or ESX hosts, such as adding, removing, connecting, or disconnecting an ESX host, and starting, stopping, or reconfiguring a virtual machine.
vCenter Operations Manager can show change events on its performance graph on the Resource Detail page for a virtual object. A vCenter Operations Manager administrator can configure whether change events appear on performance graphs.
12 VMware, Inc.

Analyzing Performance and Capacity

With vCenter Operations Manager forensics features, you can analyze the performance and capacity of your resources and use this information to balance the resources in your environment. Forensics features include cross-silo analysis, top-n analysis, the problem fingerprint library, capacity analysis, and VC analysis.

Examining Anomalies with Cross-Silo Analysis

With cross-silo analysis, you can examine graphs that show the number of anomalies over time for particular resources. You can zoom in on a graph to focus on a specific period of time, such as immediately before an alert. You can also click on a location in a graph to see a ranking of the likely root causes for the anomalies at that specific time.
Cross-silo analysis information appears on the Cross-Silo Analysis page. For applications, the anomaly graph includes a line that shows the internally determined 90 percent threshold. If the number of anomalies exceeds this threshold, vCenter Operations Manager generates an early warning Smart Alert.
Figure 18. Example of Cross-Silo Analysis
Chapter 1 Introducing Custom User Interface Features and Concepts

Using Top-N Analysis to Determine Best and Worst Performers

With top-n analysis, you can identify the top or bottom resources, metrics, or alerts in selected categories, such as the five most or least healthy resources in a tier. You can also select the resource tag and time frame to analyze and the categories to show.
Top-n analysis information appears on the Top-N Analysis page and in the Top-N Analysis widget. The Resource Health pane shows the health of the resource. The information that you select in the Tools pane appears in the right pane.
VMware, Inc. 13
VMware vCenter Operations Manager Getting Started Guide
Figure 19. Example of Top-N Analysis

Isolating and Resolving Issues with Problem Fingerprinting

When a KPI for an application or tier violates a threshold, vCenter Operations Manager examines the events that preceded the violation. If it finds enough related information, it captures the set of events that preceded the violation as a fingerprint. If vCenter Operations Manager finds a similar series of events in the future, it can issue a predictive alert to warn of a likely KPI violation.
Fingerprinting helps you to quickly isolate and resolve problems by reducing the number of possible silos and tiers in which a problem might have occurred, capturing the events that precede a problem for root­cause analysis, and notifying you of problems before they occur.
You can see fingerprint information on the Problem Fingerprint Library page. If you select a fingerprint branch in the left pane, the details for that branch appear in the right pane.
Figure 110. Example of Problem Fingerprints
14 VMware, Inc.
Chapter 1 Introducing Custom User Interface Features and Concepts

Performing Capacity Analysis

Some metrics, such as disk space use or network traffic measurements, frequently show long-term trends. Short-term fluctuations can hide this data or make it difficult to calculate.
The capacity analysis feature looks at the overall value trend for a selected metric, tells you when a metric is likely to reach a threshold, and indicates how confident vCenter Operations Manager is of the prediction. This information can help you plan for infrastructure upgrades, such as adding additional storage and network routers.
Capacity analysis information appears on the Capacity Analysis page.
Figure 111. Example of Capacity Analysis Information

Analyzing and Balancing Virtual Resources

With the VC analysis feature, you can compare the metric values of different objects in your virtual environment by using predefined heat maps or by creating your own custom heat maps.
A heat map contains rectangles of different colors and sizes, and each rectangle represents an object in your virtual environment. The color of a rectangle represents the value of one metric, and the size of a rectangle represents the value of another metric. For example, one of the predefined heat maps shows the total memory and percentage of memory use for each virtual machine. Larger rectangles are virtual machines that have more total memory. Green indicates low memory use and red indicates high memory use.
vCenter Operations Manager updates heat maps in real time as its collects new values for each object and metric. The colored bar below a heat map is the legend. The legend identifies the values that the endpoints represent and the midpoint of the color range.
Heat map objects are grouped by parent. A heat map that shows virtual machine performance groups virtual machines by the ESX hosts on which they run.
VMware, Inc. 15
VMware vCenter Operations Manager Getting Started Guide
16 VMware, Inc.

Introducing Common Tasks 2

With vCenter Operations Manager, you can perform a large number of monitoring and troubleshooting tasks, including tracking anomalies, handling alerts, and optimizing resources based on predictive information that vCenter Operations Manager generates. The best way to learn the full feature set of vCenter Operations Manager is to get hands-on experience with the system.
Before you can perform these tasks, a vCenter Operations Manager administrator must configure the resources that vCenter Operations Manager monitors and start data collection. For information about configuring vCenter Operations Manager, see the VMware vCenter Operations Manager Administration Guide (Custom User Interface).
For information about all monitoring and troubleshooting tasks, see the vCenter Operations Manager online help.
NOTE Your ability to use any vCenter Operations Manager feature depends on the access rights that a vCenter Operations Manager administrator assigns to you. If you cannot use a feature, contact your administrator to find out if your access rights should be adjusted.
This chapter includes the following topics:
“Logging In and Using vCenter Operations Manager,” on page 17
n
“Monitoring Day-to-Day Operations,” on page 19
n
“Handling Alerts,” on page 24
n
“Optimizing Your Resources,” on page 33
n

Logging In and Using vCenter Operations Manager

The vCenter Operations Manager client is a Web-based application. You use vCenter Operations Manager by typing the URL of the Custom user interface in a Web browser.

Log In to the Custom User Interface

After you install vCenter Operations Manager and the vCenter Operations Manager services are running, you can use a supported Web browser to connect to the vCenter Operations Manager server or vApp and use the Custom user interface.
Prerequisites
Verify that you have a supported Web browser.
VMware, Inc.
17
VMware vCenter Operations Manager Getting Started Guide
Procedure
1 In your Web browser, type the URL for the Custom user interface.
Option Description
Standalone version
vApp version
ip_address is the IP address or fully qualified host name of the vCenter Operations Manager server or vApp.
2 Type your user name and password.
3 Click Login.
After you log in, the Home page appears in your browser window.
NOTE If your session is inactive for 30 minutes, it times out and you must log in again.

Using the Custom User Interface

After you log in to the Custom user interface, your Home page appears in the browser window.
Figure 21. Example Home Page
https://ip_address
https://ip_address/vcops-custom
The Home page contains the following components.
Dashboards
The tabs near the top of the Home page are your dashboards. In the example, the dashboards are Operations and Alerts by Type. The user groups to which you belong determine which dashboards are available to you. A vCenter Operations Manager administrator assigns you to one or more user groups when he or she creates your user account. You can switch to a different dashboard by clicking its tab or selecting it from the Dashboards menu. You can click Home at any time to return to your Home page.
Widgets
The panes on a dashboard are called widgets. A widget is a collection of related information about attributes, resources, applications, or the overall processes in your environment. Each dashboard contains one or more widgets. In the example, the Operations dashboard contains the Resource Selector and Active Alerts widgets. If your user account has the necessary access rights, you can customize dashboards and widgets.
18 VMware, Inc.
Chapter 2 Introducing Common Tasks
Menus
You use the menus at the top of your Home page to select and use vCenter Operations Manager features. For example, you use the Reports menu to generate reports, the Alerts menu to view alerts, and the Forensics menu to use forensics features.
Icons
You click icons on pages and widgets to perform tasks in the Custom user interface. For example, you can click the icons in a widget to expand, collapse, resize, or remove the widget. When you point to an icon, a tooltip appears that describes the function of the icon.
Using Breadcrumbs
When you move away from your Home page, a breadcrumb appears in the top left corner of the page under the menu bar. Breadcrumbs help you navigate the Custom user interface and determine where you are.
You click the link in a breadcrumb to return to a specific page in the user interface.

Modify User Preferences

You can change the color scheme for your workspace, specify the time zone that vCenter Operations Manager uses when it displays times in your workspace, or edit the password for your user account.
Procedure
1 Click User Preferences at the top of your Home page.
2 Modify your user preferences on the Manage User Account Settings window.
Option Action
Password
Scheme
Time Preference
Click change to change your password.
Select light or dark to change the color scheme for your workspace.
Select Browser to use the time settings on your computer or Host to sync your vCenter Operations Manager session with the time on the vCenter Operations Manager server.
You cannot change your user name, first name, last name, email address, or account description on the Manage User Account Settings window. Only a vCenter Operations Manager administrator can change these values for a user account.
3 Click OK to save your changes.

Monitoring Day-to-Day Operations

Monitoring day-to-day operations involves evaluating the overall health of your enterprise and identifying health problems for specific resources.
For each resource, vCenter Operations Manager determines a health score, which is a 0 to 100 ranking. One of the ways that vCenter Operations Manager indicates the health of a resource is to show a colored indicator. The color is based on the range of the health score.
VMware, Inc. 19
VMware vCenter Operations Manager Getting Started Guide
Table 21. Default Health Color Ranges
Color Range
Green 76 to 100
Yellow 51 to 75
Orange 26 to 50
Red 1 to 25
Blue 0

View an Overview of Resource Health

You can view a graphical representation of the health of all vCenter Operations Manager resources that have a specific resource tag value on the Environment Overview page.
Procedure
1 Select Environment > Environment Overview to view the Environment Overview page.
2 Select one or more resource tag values in the left pane.
3 Click the Group tab.
The Group tab shows a colored icon that represents the current health of each resource that has the selected tag value. If you point to a colored icon, a tooltip appears that describes the resource that the icon represents.
Option Action
View the health icons for any time in the past six hours
View detailed information for a resource
Move the slider at the bottom of the Group tab to the left. The slider moves in five-minute increments.
Click the icon for the resource and click the Show Detail icon on the toolbar.

Identify Health Problems for a Specific Resource

You can use the Resource Detail page to identify health problems for a specific resource. For a global resource, the Resource Detail page contains information about the current state of the resource, its metrics, and its place in the resource tree. For a virtual resource, the Resource Detail page shows information about the main performance characteristics, key metrics, and events for the virtual resource.
Procedure
1 Select Environment > Environment Overview.
2 (Optional) In the left pane, expand a resource tag and select a tag value that is assigned to the resource.
Selecting a tag value can shorten the resource list and make it easier to find a specific resource.
3 Select the resource on the List tab.
4 Click the Show Detail icon.
The Resource Detail page appears for the resource.
20 VMware, Inc.
Chapter 2 Introducing Common Tasks
Resource Detail Information for Global Resources
For global resources, the Resource Detail page contains information about the current state of the resource, its metrics, and its place in the resource tree. Because problems might be related to or caused by the behavior of related resources, the Resource Detail page also shows details for child or parent resources.
For complete information about using the Resource Detail page, see the online help.
Health Status Pane
Located in the upper left of the Resource Detail page, this pane shows the current health score of the resource. A health score is a 0 to 100 ranking that vCenter Operations Manager determines for each resource. The Health Status pane shows the current health score of a global resource and a graph that shows how the health score has changed over the last six hours.
NOTE If vCenter Operations Manager does not receive metrics for a resource at a particular time, it cannot calculate the health score for that time and it shows a score of -1 on the graph. A -1 score can occur if collection is turned off for a resource or if vCenter Operations Manager encounters a data gathering problem.
Root Cause Ranking Pane
This pane shows information about metrics on related resources that contributed to alerts, including the percentage likelihood that the metrics contributed to a root cause. vCenter Operations Manager bases the percentage likelihood that a metric contributed to a root cause on the number of symptoms and when the symptoms occurred relative to the alert.
For more information about the information in the Root Cause Ranking pane, see “Understanding Health
Symptoms,” on page 23.
Health Tree Pane
This pane shows the section of the resource hierarchy around a global resource, including all of the parent container resources that hold the resource. If the resource is a container, the health tree also shows all of the child resources that the container resource holds.
Metric Selector and Metric Graph Panes
When you click a resource in the Health Tree pane, the metric groups for the resource appear in the Metric Selector pane. You can expand a metric group to view the individual metrics in the group. You can double­click a metric in the Metric Selector pane to view a graph for the metric in the Metric Graph pane. Metric graphs show the recent performance and predicted future performance of metrics for a resource.
Resource Detail Information for Virtual Resources
For virtual resources, the Resource Detail page shows information about performance characteristics, key metrics, and events. The information on the Resource Detail page is organized in several panes.
For resource pools and folders, the Resource Detail page shows the same information that it does for global resources.
Status Pane
Located in the upper left of the Resource Detail page, this pane contains aggregated information about the health, workload, anomalies, and faults of the selected resource. The default history graph period is six hours. You can click metric icons to switch the metric to view.
Metric Details Pane
This pane occupies the middle of the Resource Detail page. The information that appears in this pane depends on the metric that you select to view in the Status pane.
VMware, Inc. 21
VMware vCenter Operations Manager Getting Started Guide
Table 22. Information in the Metric Details Pane
Metric Description
Health Shows information for workload, anomalies, and faults.
Workload Shows information for CPU and memory used by the currently selected resource, by the
hypervisor, and by the child objects of the selected resource. You can point to colored sections in the bars to view information about the objects that they represent.
Anomalies Contains a list of symptoms for all child container objects of the currently selected resource. A
symptom is a metric that contributes to the health state of an objects. See “Understanding Health
Symptoms,” on page 23.
Faults Shows information for faults. A fault score indicates the degree of problems that the object is
experiencing. It includes events such as loss of redundancy in NICs or HBAs, memory checksum errors, HA failover problems, and CIM events.
Workload Pane
This pane shows information about space use and read and write capacity. The Space bar represents the amount of space used. The IOPs (input/output operations per second), Throughput, and Latency bars provide read and write capacity information.
Key Metrics Pane
This pane contains the metrics of greatest interest related to the performance characteristic that you select.
In the Default key metrics view, vCenter Operations Manager selects the four most interesting metrics, through bubbling, by following these criteria.
From all metrics, vCenter Operations Manager selects KPIs that are violating their thresholds. It sorts
n
KPIs by display order (CPU, memory, network I/O, and disk I/O).
From all remaining metrics, vCenter Operations Manager adds non-KPI metrics that are violating their
n
thresholds to the list by display order.
vCenter Operations Manager adds other metrics to the list by display order.
n
You can click All Metrics to view health status and resource hierarchy information, identify root causes of health degradation, and view metric graphs that show recent performance and predicted performance of metrics for the selected resource.
Resources Pane
This pane shows the current list of properties for the resource. You can click the More Details link to show more available properties for the selected resource. The More Details link is not provided for certain virtual resources.
Related Objects Pane
This pane shows values for the currently selected performance characteristic of all objects that are related to the selected resource. Depending on the resource type that you selected (virtual machine, datacenter, or vCenter Server system), this pane shows parent objects, peer objects, and child objects.
Events and Health Pane
This pane appears in the lower third of the Resource Detail page when you click Health in the Status pane. You can expand this pane to view the graph of the current health metric values. If an administrator configures it, the graph contains events that might affect the selected resource. You can use the icons at the top of the pane to change the display.
22 VMware, Inc.
Chapter 2 Introducing Common Tasks
Events and Workload Pane
This pane appears in the lower third of the Resource Detail page when you click Workload in the Status pane. You can expand this pane to view the graph of recent workload metric values. If an administrator configures it, the graph shows events that might affect the selected resource. You can use the icons at the top of the pane to change the display.
Events and Anomalies Pane
This pane appears in the lower third of the Resource Detail page when you click Anomalies in the Status pane. You can expand this pane to view the graph of anomalies. If an administrator configures it, the graph shows corresponding events that might affect the selected resource. You can use the icons at the top of the pane to change the display.
Events and Faults Pane
This pane appears in the lower third of the Resource Detail page when you click Faults in the Status pane. You can expand this pane to view a graph of faults. If an administrator configures it, the graph shows corresponding events that might affect the selected resource. You can use the icons at the top of the pane to change the display.
Storage and Network Pane
For objects that have storage and network resources, this pane shows basic storage-related metrics. The pie chart uses both volume and color to present information. The volume of the pie chart represents the amount of used disk space. The color coding visualizes the nearness of the moment when disk space is exhausted.

Understanding Health Symptoms

A symptom is a metric that contributes to the health state of an object. For a global resource, you view health symptoms in the Root Cause Ranking pane on the Resource Detail page. For a virtual resource, you view health symptoms in the Metric Details pane on the Resource Detail page.
The Resource Detail page lists symptoms by child resource kinds. The parentheses after the resource kind name contain information about the number of symptoms that are violating their thresholds for the resource group.
Figure 22. Example of a Symptom Group
The example shows a portion of the type of information that you might see when you view health symptoms. Metrics that are violating their thresholds appear in metric groups. The parentheses after the metric group name contain the number of violations for the metrics in the metric group.
When you expand a metrics group, the list of metrics that are violating their thresholds appears. In each metric row, you can check the percentage of objects that have threshold violations for the metric. A vertical blue line represents the point in time when the first symptom became active.
Figure 23. Example of an Expanded Symptom Group
VMware, Inc. 23
VMware vCenter Operations Manager Getting Started Guide
The icons in a metric row add information about the metric values that the row contains. When you point to an icon, a tooltip appears that describes the meaning of the icon. You can double-click a metric row to view details about the selected symptom.

Handling Alerts

An alert is a notification that informs you of an abnormal condition that might require attention. Handling alerts involves viewing alerts, determining which alerts to respond to, managing alerts in the Custom user interface, and identifying alert trends.

View Alerts on the Alerts Overview Page

By default, the Alerts Overview page shows alerts for all resources. You can filter the alert list by resource, alert type and subtype, and alert status. You can also search for alerts generated for particular resources and during a specific time period.
Procedure
1 Select Alerts > Alerts Overview.
2 (Optional) Filter the alert list.
Option Action
Show alerts for resources that have a specific value
Show alerts for resources that do not have a specific value
Show alerts for resources that have a specific resource name or resource kind
Show alerts for a specific time period
Show alerts that have a specific type or subtype
Show alerts that have a specific status
Remove an alert type or alert status filter
Remove all resource filters
3 (Optional) Click a column header and use the controls in the pop-up menu to sort the alert list or add or
remove columns from the display.
Select one or more resource tag values in the resource tag list. If you select more than one value for the same tag, the list includes resources that have either value. If you select values for two or more different tags, the list includes only resources that have all of the selected values.
Select one or more resource tag values in the resource tag list and click the Invert Result icon. For example, if you select New York and London, alerts for all resources that are not in either city appear in the list.
Type a full or partial name in the Search text box and click the right angle bracket (>).
Select a date from the Start Date and End Date menus and click the right angle bracket (>).
Click one or more of the alert type icons at the top of the alert list. For example, click the Smart (Early Warning), Smart (KPI Breach), and Smart (KPI Prediction) icons to show all smart alerts.
Click one or more of the alert status icons at the top of the alert list. For example, click the Active Alerts icon and the Own Alerts icon to show the active alerts assigned to you.
Click the icon again to toggle it off.
Click the Deselect All icon at the top of the resource tag list.
Alerts Overview Page Information
The information on the Alerts Overview page is organized in multiple columns.
Table 23. Columns on the Alerts Overview Page
Column Name Description
Critical Level A color-coded icon that indicates the criticality level of the alert. See “Alert Criticality Levels,” on
page 25.
Sub-Type An icon that indicates the subtype of the alert. See “Alert Types and Subtypes,” on page 26.
24 VMware, Inc.
Chapter 2 Introducing Common Tasks
Table 23. Columns on the Alerts Overview Page (Continued)
Column Name Description
Start Time The date and time that vCenter Operations Manager first generated the alert.
Duration How long the alert lasted.
Status An icon that indicates whether the alert is active or canceled.
Resource Name The resource that is associated with the alert.
Resource Kind The kind of resource for which vCenter Operations Manager generated the alert.
Metric The metric that violated a threshold and triggered the alert.
Root Cause The symptom that most likely indicates the start of the chain of events that led to the alert.
Worst Sub Containers
Id A sequential alert ID number.
Type An icon that indicates the type of alert. See “Alert Types and Subtypes,” on page 26.
Cancel Time The date and time that a user canceled the alert.
Info For external notification alerts, this column might contain additional information that was sent
Update Time The date and time that the alert was most recently updated.
Resource Identifiers
Control State An icon that indicates whether the alert is open, assigned, suspended, or suppressed.
Resource Id The ID of the resource that is associated with the alert.
User Name If the alert is assigned to a user, the name of the user.
If the alert is for a resource that contains other container resources, such as an application that contains tiers, this column contains icons that represent the subcontainers that most likely contributed to the alert.
along with the alert message.
Up to five identifiers that uniquely identify the resource. Resource identifiers are often the same as the resource name. Multiple resources can have the same name. The adapter type determines the identifiers that appear.
Alert Criticality Levels
Every alert has a criticality level that specifies how serious the problem is and how quickly you should take action.
Table 24. Alert Criticality Levels
Level Color Value
Critical Red 4
Immediate Orange 3
Warning Yellow 2
Info Blue 1
None 0
vCenter Operations Manager predictive Smart Alerts, KPI prediction, and early warning alerts are always critical alerts. Other types of alerts might be of any criticality, depending on the alert type and how the attribute that trigged the alert is configured.
You can color code alerts by criticality on the Alerts Overview page and in the Alerts widget.
VMware, Inc. 25
VMware vCenter Operations Manager Getting Started Guide
Alert Types and Subtypes
vCenter Operations Manager generates several types of alerts, and each alert type has its own triggers and contents. An alert might be caused by abnormal behavior of one or more metrics for a resource, including a tier or application, or when a fingerprint predicts an upcoming problem.
By default, vCenter Operations Manager generates all types of alerts except for classic abnormality alerts. Because they do not involve KPIs, classic abnormality alerts are unlikely to require any action.
Smart Alerts
A Smart Alert is triggered when an internally calculated value indicates that a problem occurs, or soon will occur. The internally calculated value can be a correlated prediction of future behavior or a dynamic threshold breach on a KPI. Except for metrics that are designated as KPIs, Smart Alerts do not depend on user-defined values. Smart Alerts have certain subtypes.
Table 25. Smart Alert Subtypes
Subtype Description Determined By
KPI DT Breach A KPI breached one of its internally
calculated dynamic thresholds.
KPI Prediction vCenter Operations Manager predicts that a
KPI will soon breach a threshold. The current combination of metrics might match a stored metric fingerprint that predicts the breach, or a metric that is correlated with the KPI has breached its threshold.
KPI Prediction alerts are always critical.
Early Warning vCenter Operations Manager correlation
algorithms calculate a greater than 90 percent chance above the noise threshold that there is a problem with an application.
Early Warning alerts are always critical.
Analytics based on a user-defined KPI.
Analytics based on a user-defined KPI.
Analytics based on application topology.
Classic Alerts
A classic alert is an alert that is generated by classic monitoring software. It relies on user-defined hard thresholds, not vCenter Operations Manager dynamic thresholds or other advanced calculations. Classic alerts have certain subtypes.
Table 26. Classic Alert Subtypes
Subtype Description Reason
KPI HT Breach A user-defined KPI breached a user-
defined hard threshold.
Abnormality A non-KPI metric breached one of its
internally calculated dynamic thresholds.
Notification A non-KPI metric breached a user-
defined hard threshold. Typical system and network management applications send this type of alert.
A user-defined KPI breached one of its user-defined hard thresholds.
A single non-KPI metric breached one of its dynamic thresholds.
A single non-KPI metric breached one of its internally calculated dynamic thresholds.
Administrative Alerts
An administrative alert indicates a problem with vCenter Operations Manager, the monitoring software from which it collects data, or the enterprise network. Administrative alerts have certain subtypes.
26 VMware, Inc.
Chapter 2 Introducing Common Tasks
Table 27. Administrative Alert Subtypes
Subtype Description
System A vCenter Operations Manager component failed.
Environment vCenter Operations Manager stopped receiving data from
one or more resources. A problem might exist with the resource, the monitoring software, or the network infrastructure.

Resolve an Alert

How you resolve an alert depends on the alert's type and criticality level and your organization's rules, procedures, and priorities. You can view information about the event that triggered an alert, its effects, and its likely causes, on the Alert Summary page. The Alert Summary page usually contains enough information to determine who in your enterprise should respond to the alert.
The top of the Alert Summary page shows the alert type, when the alert started, the duration of the alert, and the associated resource or metric. The rest of the page is divided into separate panes.
NOTE Do not follow this procedure to resolve administrative system alerts.
Procedure
1 Select Alerts > Alerts Overview and double-click the row for the alert in the alert list.
2 Examine the information in the Reason pane on the Alert Summary page and determine the action to
take.
The Reason pane contains specific information about the alert, including the type of trigger that caused it, the resource or metric for the trigger, and details about the trigger. You can click More next to the resource name to view all of the properties for the resource.
Option Action
The alert is for an application, tier, or container
The alert is for a KPI
Identify the owner of the resource.
Identify the service level agreement (SLA) that is associated with the KPI.
3 Examine the information in the Impact pane on the Alert Summary page.
The Impact pane shows health information for the last six hours for the resource for which the alert was generated, including any subcontainers in the resource. For a tier, the pane shows a health graph for the application that contains the tier and the current health of all of the tiers in the application. The pane also shows a graph for up to five metrics. Breaching KPIs appear first, followed by breaching super metrics, non-breaching KPIs, and non-breaching super metrics.
4 If the information in the Impact pane indicates that KPIs were breached, identify the SLAs that are
associated with the KPIs.
VMware, Inc. 27
VMware vCenter Operations Manager Getting Started Guide
5 Examine the information in the Root Cause pane on the Alert Summary page.
The Root Cause pane shows the likely root cause container resources, ranked by analytical algorithm on the container resource. You can perform actions on the Root Cause pane to view symptom information.
Option Action
Show the symptom groups for a resource
Show the top five individual symptoms in a symptom group
List the individual anomalies that comprise a symptom
6 Identify the owner of the root cause resource.
Option Action
The Root Cause pane indicates multiple tiers
The Root Cause pane indicates a single resource
7 Hand off resolution of the alert to the owner that you identified in Step 6.
Double-click the resource. Symptom groups are ranked by percentage of possible resources that exhibit the symptoms in the group.
Double-click the symptom group. Symptoms are ranked by percentage of possible resources that exhibit the symptom.
Double-click the symptom. A pop-up window opens that lists the individual anomalies.
To copy the list to the clipboard, click the Copy to Clipboard icon.
n
This feature is useful if you are opening a problem ticket or sending an email message about the alert.
To graph an anomaly, select an anomaly in the pop-up window and
n
click the Dynamic Dashboard icon.
Examine the first tier and identify its owner. Select all metrics that have 50 percent or greater probability.
Select all metrics that have 50 percent or greater probability.
Provide the highest-probability abnormal metrics from the Root Cause pane and the URL of the Alert Summary page. For predictive alerts, also provide due time and probability information from the Reason pane.
8 Notify the owner that you identified in Step 2 and any other interested parties.
Provide SLA and KPI breach information, the URL of the Alert Summary page, and the name of the person responsible for resolving the alert.
Resolving Administrative System Alerts
An administrative system alert indicates a problem with one of the vCenter Operations Manager components. When you resolve an administrative system alert, follow certain recommended procedures.
For information about resolving administrative system alerts, see the VMware vCenter Operations Manager Administration Guide (Custom User Interface) or the online help.

Troubleshoot an Alert

The person responsible for fixing the condition that caused an alert typically uses the Alert Detail page to troubleshoot the problem.
The Alert Detail page contains information that can help you diagnose the cause of the behavior that resulted in the alert and determine how to prevent the alert from happening again.
Procedure
1 Select Alerts > Alerts Overview and double-click the row for the alert in the alert list.
2 On the Alert Summary page, click the Troubleshoot button.
The Alert Detail page appears for the alert.
28 VMware, Inc.
Chapter 2 Introducing Common Tasks
Alert Detail Information
The Alert Detail page contains information that can help you diagnose the cause of the behavior that resulted in the alert and determine how to prevent the alert from happening again.
For detailed information about using each pane and tab on the Alert Detail page, see the online help.
Reason Pane
This pane contains specific information about the alert, including the type of trigger that caused it, the resource or metric for the trigger, and details about the trigger.
Impact Pane
This pane shows health information for the last six hours for the resource for which the alert was generated, including any subcontainers in the resource. For a tier, the pane shows a health graph for the application that contains the tier and the current health of all of the tiers in the application. The pane also shows a graph for up to five metrics. Breaching KPIs appear first, followed by breaching super metrics, non-breaching KPIs, and non-breaching super metrics.
Mashup Tab
This tab contains mashup charts, which show different aspects of the behavior of a resource. Mashup charts include a health chart, an anomaly count graph, and metric graphs.
The health chart includes each alert for the specified time period.
n
The anomaly count graph shows the number of anomalies for the resource and its children at a specific
n
time. For an application, the anomaly graph shows the anomaly count for tiers that contain root cause metrics for the resource. A red line marks the noise threshold for the resource. An anomaly count higher than the red line indicates a 90 percent probability of a problem and triggers an early warning alert.
Metric graphs appear for all of the KPIs for any resource that is listed as a root-cause resource. For an
n
application, the root-cause resource is the application and tiers that contain root causes.
Timeline Tab
This tab shows all of the anomalies that contributed to an alert. The anomalies appear in a dual-drag timeline that starts with the beginning of the first anomaly and ends with the current time or, if the alert was canceled, the cancel time.
Relationships Tab
This tab shows a resource relationship chart for the resource for which an alert was generated. Resource relationship charts show the structure of the topography around a specific resource, including parent and child resources.
Metric Charts Tab
This tab shows metric graphs for the resource for which an alert was generated. Metric graphs show the recent performance and predicted future performance of metrics.
Correlations Tab
You can show the behavior or anomaly correlations to the KPI metrics that contributed to an alert on this tab. Each metric has a percentage correlation. The larger the number, the more closely the metrics are related.
Notes Tab
You can add and view notes for an alert on this tab.
VMware, Inc. 29
VMware vCenter Operations Manager Getting Started Guide

Managing Alerts

You can suspend, suppress, and take or release ownership of multiple alerts on the Alerts Overview page. When you perform an action on an alert, the alert's status changes. You can also manage a single alert on the Alert Detail page.
Take Ownership of an Alert
When you take ownership of an alert, you acknowledge that the alert is yours. Taking ownership of an alert is important if multiple operators manage alerts in your environment. You can take ownership of multiple alerts on the Alerts Overview page.
You can also take ownership of alerts in the Alerts widget.
Procedure
1 Select Alerts > Alerts Overview and select the alert or alerts in the alert list.
You can press Ctrl+click to select multiple alerts or Shift+click to select a range of alerts.
2 Click the Take Ownership icon.
3 Click Yes in the confirmation dialog box.
When you own the alert, the Assigned Alerts icon appears in the Control State column and your user name appears in the User Name column in the alert list.
Suspend an Alert
When you suspend an alert, you can specify a number of minutes. If the problem condition exists after the time elapses, vCenter Operations Manager reactivates the alert. You can suspend multiple alerts on the Alerts Overview page.
When you suspend an alert, you take ownership of it. You cannot suspend an alert that another user owns.
You can also suspend alerts in the Alerts widget.
NOTE Suspending an alert does not cancel its cancel cycle. The alert is still canceled according to the cancel cycle value set when the resource or application was configured.
Procedure
1 Select Alerts > Alerts Overview and select the alert or alerts in the alert list.
You can press Ctrl+click to select multiple alerts or Shift+click to select a range of alerts.
2 Click the Suspend icon.
3 Type the number of minutes to suspend the alert and click OK.
When the alert is suspended, the Suspended Alerts icon appears in the Control State column in the alert list.
Suppress an Alert
When you suppress an alert, you can specify a specific number of days. If the problem condition exists after the time has elapsed, vCenter Operations Manager reactivates the alert. You can suppress multiple alerts on the Alerts Overview page.
When you suppress an alert, you take ownership of it. You cannot suppress an alert that another user owns.
30 VMware, Inc.
Loading...
+ 82 hidden pages