3401 Hillview Ave.
Palo Alto, CA 94304
www.vmware.com
2 VMware, Inc.
Contents
About This User Guide5
Monitoring Objects in Your Managed Environment by Using
1
vRealize Operations Manager7
What to Do When... 7
User Scenario: A User Calls With a Problem 8
User Scenario: An Alert Arrives in Your Inbox 12
User Scenario: You See Problems as You Monitor the State of Your Objects 19
Monitoring and Responding to Alerts 31
Monitoring Alerts in vRealize Operations Manager 31
Monitoring and Responding to Problems 35
Evaluating Object Information Using Badge Alerts and the Summary Tab 35
Investigating Object Alerts 38
Evaluating Metric Information 40
Analyzing the Resources in Your Environment 46
Using Troubleshooting Tools to Resolve Problems 46
Creating and Using Object Details 47
Examining Relationships in Your Environment 52
User Scenario: Investigate the Root Cause of a Problem by Using the Troubleshooting Tab
Options 53
Running Actions from vRealize Operations Manager 56
Run Actions From Toolbars in vRealize Operations Manager 57
Troubleshoot Actions in vRealize Operations Manager 57
Monitor Recent Task Status 59
Troubleshoot Failed Tasks 60
Viewing Your Inventory 66
VMware, Inc.
Planning the Capacity for Your Managed Environment Using
2
vRealize Operations Manager67
Right-Sizing Capacity for Stress-Free Demand and Value 70
User Scenario: Planning Capacity for an Increase in Workload 74
Create a Sample Project to Increase Workload Capacity 74
Create a Sample Project to Add a Host and Virtual Machines 75
View the Result of Your Capacity Projects 76
Planning Hardware Projects in vRealize Operations Manager 77
Create a Project to Plan for Hardware Changes 77
Planning Virtual Machine Projects and Scenarios 78
Create a Virtual Machine Project Using Populated Metrics 79
Create a Sample Project for a New Virtual Machine 80
Create a Sample Project to Simulate Removing a Virtual Machine 80
Custom Proles in VMware vRealize Operations Manager 81
Custom Datacenters in VMware vRealize Operations Manager 81
3
vRealize Operations Manager User Guide
Index83
4 VMware, Inc.
About This User Guide
The VMware® vRealize Operations Manager User Guide describes what to do when users experience
performance problems in your managed environment.
As a system administrator, you might become aware of a problem with an object in your environment when
vRealize Operations Manager generates an alert, or when a user contacts you. To help ensure optimal
performance, this information describes how you use vRealize Operations Manager to monitor,
troubleshoot, and take action to address problems. It also provides information on how to assess whether
problems due to over demand or lack of capacity require a system change or upgrade.
Intended Audience
This information is intended for vRealize Operations Manager administrators, virtual infrastructure
administrators, and operations engineers who track and maintain object performance in your managed
environment.
VMware Technical Publications Glossary
VMware Technical Publications provides a glossary of terms that might be unfamiliar to you. For denitions
of terms as they are used in VMware technical documentation, go to
hp://www.vmware.com/support/pubs.
VMware, Inc.
5
vRealize Operations Manager User Guide
6 VMware, Inc.
Monitoring Objects in Your Managed
Environment by Using
vRealize Operations Manager1
You can use vRealize Operations Manager to resolve problems that your customers raise, respond to alerts
that identify problems before your customers report problems, and generally monitor your environment for
problems.
When your customers experience performance problems and call you to resolve the problem, the data that
vRealize Operations Manager collects and analyzes is presented to you in graphical forms so that you can
compare and contrast objects, understand the relationship between objects, and determine the root cause of
problems.
To manage your environment as a proactive rather than reactive administrator, you monitor and respond to
alerts. A generated alert noties you when objects in your environment are experiencing problems. If you
resolve the problem based on the alert before your customers notice, then you avoid service interruptions.
You can investigate the problems that generate alerts or that result in calls by using the Alerts, Events,
Details, and Environment tabs.
If you nd the root cause of the problem, you might be able to resolve the problem by running an action.
The actions make changes to objects in the target system, for example, the VMware vCenter Server® system,
from vRealize Operations Manager.
This chapter includes the following topics:
“What to Do When...,” on page 7
n
“Monitoring and Responding to Alerts,” on page 31
n
“Monitoring and Responding to Problems,” on page 35
n
“Running Actions from vRealize Operations Manager,” on page 56
n
“Viewing Your Inventory,” on page 66
n
What to Do When...
As a virtual infrastructure administrator, network operations center engineer, or other IT professional, you
use vRealize Operations Manager to monitor objects in your environment so that you can ensure service to
your customers and resolve any problems that occur.
Your vRealize Operations Manager administrator has congured vRealize Operations Manager to manage
two vCenter Server instances that manage multiple hosts and virtual machines. It is your rst day using
vRealize Operations Manager to manage your environment.
User Scenario: A User Calls With a Problem on page 8
n
The vice president of sales telephones the help desk reporting that her virtual machine, VPSALES4632,
is running slow. She is working on sales reports for an upcoming meeting and is running behind
schedule because of the slow performance of her virtual machine.
VMware, Inc.
7
vRealize Operations Manager User Guide
User Scenario: An Alert Arrives in Your Inbox on page 12
n
You return from lunch to nd an alert notication in your inbox. You can use
vRealize Operations Manager to investigate and resolve the alert.
User Scenario: You See Problems as You Monitor the State of Your Objects on page 19
n
As you investigate your objects in the context of this scenario, vRealize Operations Manager provides
details to help you resolve the problems. You analyze the state of your environment, examine current
problems, investigate solutions, and take action to resolve the problems.
User Scenario: A User Calls With a Problem
The vice president of sales telephones the help desk reporting that her virtual machine, VPSALES4632, is
running slow. She is working on sales reports for an upcoming meeting and is running behind schedule
because of the slow performance of her virtual machine.
As a network operations engineer, you were just reviewing the morning alerts and did not see any problems
with her virtual machine, so you begin troubleshooting the problem.
Procedure
1Search for a Specic Object on page 8
As a network operations engineer, you must locate the customer's virtual machine in
vRealize Operations Manager so that you can begin troubleshooting the reported problem.
2Review Alerts Related to Reported Problems on page 9
The sales vice president reports degraded performance in a virtual machine. To determine if the
virtual machine has any alerts indicating the cause, review alerts for the virtual machine.
3Use the Troubleshooting Tabs to Investigate a Reported Problem on page 10
To troubleshoot problems with the VPSALES4632 virtual machine, as an example, you evaluate the
symptoms, examine time line information, consider events, and create metric charts to nd the root
cause of the problem.
Search for a Specific Object
As a network operations engineer, you must locate the customer's virtual machine in
vRealize Operations Manager so that you can begin troubleshooting the reported problem.
You use vRealize Operations Manager to monitor three vCenter Server instances with a total of 360 hosts
and 18,000 virtual machines. The easiest way to locate a particular virtual machine is to search for it.
Procedure
1In the Search text box, located on the vRealize Operations Manager title bar, type the name of the
virtual machine.
The Search text box displays all the objects that contain the string you type in the text box. If your
customer knows that her virtual machine name contains SALES, you can type the string and the virtual
machine is included in the list.
2Select the object in the list.
The main pane displays the object name and the Summary tab. The left pane displays and the related
objects, including the host system and vCenter Server instance.
What to do next
Look for alerts related to the reported problem for the object. See “Review Alerts Related to Reported
Problems,” on page 9.
8 VMware, Inc.
Chapter 1 Monitoring Objects in Your Managed Environment by Using vRealize Operations Manager
Review Alerts Related to Reported Problems
The sales vice president reports degraded performance in a virtual machine. To determine if the virtual
machine has any alerts indicating the cause, review alerts for the virtual machine.
Alerts on an object can give you an insight into problems beyond the specic problem reported by the user.
Prerequisites
Locate the customer's virtual machine so that you can review related alerts. See “Search for a Specic
Object,” on page 8.
Procedure
1Click the Summary tab for the object generating alerts.
The Summary tab displays active alerts for the object.
2Review the top alerts for Health, Risk, and Eciency.
Top alerts identify the primary contributors to the current state of the object. Do any of them appear to
contribute to the slow response time? For example, any ballooning or swapping alerts indicate that you
must add memory to the virtual machine. Are any alerts related to memory contention? Contention can
be an indicator that you must add memory to the host.
3If the Summary tab does not include top problems that appear to explain the reported problem, click
the Alerts tab.
The Alerts tab displays all active alerts for the current object.
4Review the alerts for problems that are similar to or contribute to the reported problem.
aTo view the active and canceled alerts, click Status: Active to clear the lter and display active and
inactive alerts.
The canceled alerts might provide information about the problem.
bSo that you can locate alerts generated on or before the time when your customer reported the
problem, click the Created On column to sort the alerts .
cTo view alerts for the parent objects in the same list with the alert for the virtual machine, click
View From, then select, for example, Host System under Parents.
The system adds these object types to the list so that you can determine if alerts among the parent
objects are contributing to the reported problem.
5If you locate an alert that appears to explain the reported problem, click the alert name in the alerts list.
6On the Alert > Symptoms tabs, review the triggered symptoms and recommendations to determine if
the alert indicates the root cause of the reported problem.
What to do next
If the alert appears to indicate the source of the problem, follow the recommendations and verify the
n
resolution with your customer. For an example, see “Run a Recommendation on a Datastore to Resolve
an Alert,” on page 18.
If you cannot locate the cause of the reported problem among the alerts, begin more in-depth
n
troubleshooting. See “Use the Troubleshooting Tabs to Investigate a Reported Problem,” on page 10.
VMware, Inc. 9
vRealize Operations Manager User Guide
Use the Troubleshooting Tabs to Investigate a Reported Problem
To troubleshoot problems with the VPSALES4632 virtual machine, as an example, you evaluate the
symptoms, examine time line information, consider events, and create metric charts to nd the root cause of
the problem.
If a review of the alerts did not help you identify the cause of the problem reported for the virtual machine,
use the Troubleshooting tabs: Alert > Symptoms, Event > Timeline, and All Metrics to troubleshoot the
history and current state of the virtual machine.
Prerequisites
Locate the object for which the problem was reported. See “Search for a Specic Object,” on page 8.
n
Review the alerts for the virtual machine to determine if the problem is already identied and
n
recommendations made. See “Review Alerts Related to Reported Problems,” on page 9.
Procedure
1In the menu, click Environment, then click Inventory and select VPSALES4632 from the tree.
The main pane updates to display the object Summary tab.
2Click the Alerts tab, click the Symptoms tab, and review the symptoms to determine if one of the
symptoms is related to the reported problem.
Depending on how your alerts are congured, some symptoms might be triggered but not sucient to
generate an alert.
aReview symptom names to determine if one or more symptoms are related to the reported
problem.
The Information column provides the triggering condition, trend, and current value. What are the
most common symptoms that aect response time? Do you see any symptoms related to CPU or
memory usage?
bSort by the Created On date so that you can focus on the time frame in which your customer
reported that the problem.
cClick the Status: Activelterbuon to disable the lter so that you can review active and inactive
symptoms.
Based on symptoms, you think the problem is related to CPU or memory use. But you do not know if
the problem is with the virtual machine or with the host.
3Click the Events > Timeline tabs and review the alerts, symptoms, and change events over time that
might help you identify common trends that are contributing to the reported problem.
aTo determine if other virtual machines had symptoms triggered and alerts generated at the same
time as your reported problem, click View From > Peer.
Other virtual machine alerts are added to the time line. If you see that multiple virtual machines
triggered symptoms in the same time frame, then you can investigate parent objects.
bClick View From and select Host System from the Parent list.
The alerts and symptoms that are associated with the host on which the virtual machine is
deployed are added to the time line. Use the information to determine if a correlation exists
between the reported problem and the alerts on the host.
10 VMware, Inc.
Chapter 1 Monitoring Objects in Your Managed Environment by Using vRealize Operations Manager
4Click the Events > Events tab to view changes in the collected metrics for the problematic virtual
machine that could direct you toward the cause of the reported problem.
aManipulate the Date Controls option view event for the approximate time when your customer
reported the problem.
bUse the Filters to lter on event criticality and status. Select the Symptoms options if you want to
include these in your analysis.
cClick an Event to view the details about the event.
dClick View From, select Host System under Parents, and repeat the analysis.
Comparing events on the virtual machine and the host, and evaluating those results, indicates that CPU
or memory issues are the likely cause of the problem.
5If you can identify that the problem is related to, for example, CPU or memory use, click the All Metrics
tab to create your own metric charts so that you can determine whether it is one or the other, or a
combination.
aIf host is still the focus, then start by working with host metrics.
bIn the metric list, double-click the CPU Usage (%) and the Memory Usage (%) metrics to add them
to the workspace on the right.
cIn the map, click the VPSALES4632 object.
The metric list now displays the virtual machine metrics.
dIn the metric list, double-click the CPU Usage (%) and the Memory Usage (%) metrics to add them
to the workspace on the right.
eReview the host and virtual machine charts to see if you can identify a paern that indicates the
cause of the reported problem.
In this scenario, comparing the four charts reveals that CPU use is normal on both the host and the
virtual machine, and the memory use is normal on the virtual machine. However, the memory use on
the host began going consistently high three days before the reported problem on the VPSALES4632
virtual machine.
The host memory is running consistently high, aecting the response time for the virtual machines. The
number of virtual machines it is running is well within the supported amounts. The possible cause might be
too many high process applications on the virtual machines. You can move some of the virtual machines to
other hosts, distribute the workload, or power o idle virtual machines.
What to do next
In this example, you can use vRealize Operations Manager to power o virtual machines on the host so
n
that you can improve the performance of the virtual machines that are in use. See “Run Actions From
Toolbars in vRealize Operations Manager,” on page 57.
If the combination of charts that you created on the All Metrics tab are something that you might want
n
to use again, click Generate Dashboard.
VMware, Inc. 11
vRealize Operations Manager User Guide
User Scenario: An Alert Arrives in Your Inbox
You return from lunch to nd an alert notication in your inbox. You can use vRealize Operations Manager
to investigate and resolve the alert.
As a network operations engineer, you are responsible for several hosts and their datastores and virtual
machines, and you receive emails when an alert is generated for your monitored objects. In addition to
alerting you to problems in your environment, alerts should provide viable recommendations to resolve
those problems. As you investigate this alert, you are evaluating the data to determine if one or more of the
recommendations can resolve the problem.
This scenario assumes that you congured the outbound alerts to send standard email using SMTP and that
you congurednotications to send you alert notications using the standard email plug-in. When
outbound alerts and notications are congured, vRealize Operations Manager sends you messages when
an alert is generated so that you can begin responding to problems as quickly as possible.
Prerequisites
Verify that outbound alerts are congured for standard email alerts. See vRealize Operations Manager
n
Customization and Administration Guide.
Procedure
1Respond to an Alert in Your Email on page 13
As a network operations engineer, you receive an email message from vRealize Operations Manager
with information about one of the data stores for which you are responsible. The email notication
informs you about the problem even when you are not presently working in
vRealize Operations Manager.
2Evaluate Other Triggered Symptoms for the Aected Data Store on page 14
Because you need more information about the data store before you decide on the best response, you
examine the Symptoms tab to see other triggered symptoms for the data store.
3Compare Alerts and Events Over Time in Response to a Datastore Alert on page 15
To evaluate an alert over time, compare the current alert and symptoms to other alerts and symptoms,
other events, other objects, and over time.
4View the Aected Datastore in Relation to Other Objects on page 16
To view the object for which the alert was generated as it relates to other objects, use the topological
map on the Relationships tab.
5Construct Metric Charts to Investigate the Cause of the Data Store Alert on page 17
To analyze the capacity metrics related to the generated alert, you create charts that compare dierent
metrics. These comparisons help identify when something changed in your environment and what
eect it had on the datastore.
6Run a Recommendation on a Datastore to Resolve an Alert on page 18
As a network operations engineer, you investigated the alert regarding datastore disk space and
determined that the provided recommendations can the problem. The recommendation to delete
unused snapshots is especially useful. Use vRealize Operations Manager to delete the snapshots.
12 VMware, Inc.
Chapter 1 Monitoring Objects in Your Managed Environment by Using vRealize Operations Manager
Respond to an Alert in Your Email
As a network operations engineer, you receive an email message from vRealize Operations Manager with
information about one of the data stores for which you are responsible. The email notication informs you
about the problem even when you are not presently working in vRealize Operations Manager.
In your email client, you receive an alert similar to the following message.
Alert was updated at Tue Jul 01 16:34:04 MDT :
Info:datastore1 Datastore is acting abnormally since Mon Jun 30 10:21:07 MDT and was last
updated at Tue Jul 01 16:34:04 MDT
Alert Definition Name: Datastore is running out of disk space
Alert Definition Description: Datastore is running out of disk space
Object Name : datastore1
Object Type : Datastore
Alert Impact: risk
Alert State : critical
Alert Type : Storage
Alert Sub-Type : Capacity
Object Health State: info
Object Risk State: critical
Object Efficiency State: info
Symptoms:
SYMPTOM SET - self
Symptom Name | Object Name | Object ID | Metric | Message Info
Datastore space usage reaching critical limit datastore1 | b0885859-
e0c5-4126-8eba-6a21c895fe1b | Capacity|Used Space | HT above 99.20800922575977 > 95
Recommendations:
- Storage VMotion some Virtual Machines to a different Datastore
- Delete unused snapshots of Virtual Machines
- Add more capacity to the Datastore
Notification Rule Name: All alerts -- datastores
Notification Rule Description:
Alert ID : a9d6cf35-a332-4028-90f0-d1876459032b
Operations Manager Server - 192.0.2.0
Alert details
Prerequisites
Verify that outbound alerts are congured for standard email alerts. See vRealize Operations Manager
n
Customization and Administration Guide.
Verify that the notications are congured to send messages to your users for the alert denition. For an
n
example of how to create an alert notication, see vRealize Operations Manager Customization and
Administration Guide.
Procedure
1In your email client, review the message so that you understand the state of the aected objects and
determine if you must begin investigating immediately.
Look for the alert name, the alert state to determine the current level of criticality, and the aected
objects.
VMware, Inc. 13
vRealize Operations Manager User Guide
2In the email message, click Alert Details.
vRealize Operations Manager opens on the Summary tab in the alert details for the generated alert and
aected object.
3Review the Summary tab information.
OptionEvaluation Process
Alert name and
description
RecommendationsReview the top recommendation, and if available, other recommendations, to
What is Causing the
Issue?
What to do next
If you determine that the recommendations will resolve the problem, implement them. See “Run a
n
Recommendation on a Datastore to Resolve an Alert,” on page 18.
Review the name and description and verify that you are evaluating the alert for which
you received an email message.
understand the steps that you must take to resolve the issue. If implemented, will the
prioritized recommendations resolve the problem?
Which symptoms were triggered? Which were not triggered? What aect does this
evaluation have on your investigation? In this example, the alert that the datastore is
running out of space is congured so that the criticality is symptom based. If you
received a critical alert, then it is likely that the symptoms are already at a critical level,
having moved up from Warning and Immediate. Look at the sparkline or metric graph
chart for each symptom to determine when the problem escalated on the datastore
object.
If you need more information about the aected objects, continue your investigation. Begin by looking
n
at other triggered symptoms for the data store. See “Evaluate Other Triggered Symptoms for the
Aected Data Store,” on page 14.
Evaluate Other Triggered Symptoms for the Affected Data Store
Because you need more information about the data store before you decide on the best response, you
examine the Symptoms tab to see other triggered symptoms for the data store.
If other symptoms are triggered for the object besides the symptom included in the alert, evaluate them to
determine what the symptoms reect about the state of the object, and to decide whether the related
recommendations might resolve the problem.
Prerequisites
Verify that you are addressing the alert for which you received an alert message in your email. See
“Respond to an Alert in Your Email,” on page 13.
Procedure
1In the menu, click Alerts and select the alert name in the data grid.
The center pane view changes to display the alert detail tabs.
2Click View additional metrics > Alerts > Symptoms and review the active symptoms.
OptionEvaluation Process
CriticalityAre other symptoms of similar criticality present that are aecting the object?
SymptomAre any of the triggered symptoms related to the symptoms that triggered the current alert?
Symptoms related to time remaining, capacity, or stress that could indicate storage problems?
14 VMware, Inc.
Chapter 1 Monitoring Objects in Your Managed Environment by Using vRealize Operations Manager
OptionEvaluation Process
Created OnDo the date and time stamps for the symptoms indicate that they were triggered before the alert you
are investigating, indicating that it might be a related symptom? Were the symptoms triggered after
the alert was generated, indicating that the alert symptoms contributed to these other symptoms?
Information Can you identify a correlation between the alert symptoms and the other symptoms based on the
triggering metric values?
What to do next
If your review of the symptoms and the provided information clearly indicates that the
n
recommendations will solve the problem, implement one or more of the recommendations. For an
example of implementing one of the recommendations, see “Run a Recommendation on a Datastore to
Resolve an Alert,” on page 18.
If your review of the symptoms did not convince you that the recommendations will resolve the
n
problem or provide you with enough information to identify the root cause, continue your investigation
using the Events > Timeline tab. See “Compare Alerts and Events Over Time in Response to a
Datastore Alert,” on page 15.
Compare Alerts and Events Over Time in Response to a Datastore Alert
To evaluate an alert over time, compare the current alert and symptoms to other alerts and symptoms, other
events, other objects, and over time.
As a network operations engineer, you use the Events > Timeline tab to compare this alert to other alerts
and events in your environment. This way, you can determine if you can resolve the problem of the
datastore running out of disk space by applying one or more alert recommendations.
Prerequisites
Verify that you are addressing the alert for which you received an alert message in your email. See
“Respond to an Alert in Your Email,” on page 13.
Procedure
1In the menu, click Alerts and select the alert name in the data grid.
The alert details appear to the right.
2Click View Events > Timeline.
The Timeline tab displays the generated alert and the triggered symptoms for the aected object in a
scrollable timeline format, starting when the alert was generated.
3Scroll through the timeline using the week timeline at the boom.
4To view events that might contribute to the alert, click Event Filters and click the check box for each
event type.
Events related to the object are added to the timeline. You add the events to your evaluation of the
current state of the object and determine whether the recommendations can resolve the problem.
5Click View From and select Host under Parents.
Because the alert is related to disk space, adding the host to the timeline enables you to see what alerts
and symptoms are generated for the host. As you scroll through the timeline, ask: when did some of the
related alerts begin? When are they no longer on the timeline? What was the eect on the state of the
datastore object?
VMware, Inc. 15
vRealize Operations Manager User Guide
6Click View From and select Peer under Parents.
If other datastores have alerts related to the alert you are currently investigating, seeing when the alerts
for the other datastores were generated can help you determine what resource problems you are
experiencing.
7To remove canceled alerts from your timeline, click Filters and deselect the Canceled check box.
Removing the canceled alerts and symptoms from the timeline clears the view and enables you to focus
on current alerts.
What to do next
If your evaluation of alerts in the timeline indicated that one or more of the recommendations to resolve
n
the alert are valid, implement the recommendations. See “Run a Recommendation on a Datastore to
Resolve an Alert,” on page 18.
If you need more information about the aected object, continue your investigation. See “View the
n
Aected Datastore in Relation to Other Objects,” on page 16.
View the Affected Datastore in Relation to Other Objects
To view the object for which the alert was generated as it relates to other objects, use the topological map on
the Relationships tab.
As a network operations engineer, you view a datastore and the related objects in a map to further your
understanding of the problem. The map view helps determine if implementing the alert recommendations
can resolve the problem.
Prerequisites
Evaluate the alert over time and in comparison to related objects. See “Compare Alerts and Events Over
Time in Response to a Datastore Alert,” on page 15.
Procedure
1In the menu, click Alerts, select the alert name in the data grid, and click View additional metrics > All
Metrics.
2Click Show Object Relationships.
The Relationships tab displays the datastore in a map with the related objects. By default, the badge
that this alert aects is selected only on the toolbar. Objects in the tree show a colored square to indicate
the current state of the badge.
3To view the alert status of the objects for the other badges, click the Healthbuon and then the
Eciencybuon.
As you click each badge buon, the squares on each object indicate whether an alert is generated and
the criticality of the alert.
4To view alerts for an object, select the object and click Alerts.
The alert list dialog box appears, enabling you to search and sort for alerts for the object.
5To view a list of the child objects for an object in the map, click the object.
A list of the number of children by object type appears at the boom of the center pane.
6Use the options to evaluate the datastore.
For example, what does the map tell you about the number of virtual machines that are associated with
the datastore? If many virtual machines are associated with a datastore, moving them might free
datastore disk space.
16 VMware, Inc.
Chapter 1 Monitoring Objects in Your Managed Environment by Using vRealize Operations Manager
What to do next
If your review of the map provided enough information to indicate that one or more of the
n
recommendations to resolve the alert are valid, implement the recommendations. See “Run a
Recommendation on a Datastore to Resolve an Alert,” on page 18.
If you need more information about the aected object, continue your investigation. See “Construct
n
Metric Charts to Investigate the Cause of the Data Store Alert,” on page 17.
Construct Metric Charts to Investigate the Cause of the Data Store Alert
To analyze the capacity metrics related to the generated alert, you create charts that compare dierent
metrics. These comparisons help identify when something changed in your environment and what eect it
had on the datastore.
As a network operations engineer, you create custom charts so that you can further investigate the problem,
and to determine if implementing the alert recommendations will resolve the problem that the alert
identies.
Prerequisites
View the topological map for the data store to determine if related objects are contributing to the alert or if
triggering symptoms indicate that the data store is contributing to other problems in your environment. See
“View the Aected Datastore in Relation to Other Objects,” on page 16.
Procedure
1In the menu, click Alerts, select the alert name in the data grid, and click View additional metrics > All
Metrics.
The Metric Charts tab does not include charts. You must add the charts to compare.
2To analyze the rst recommendation, Add more capacity to the Datastore Storage, add related charts to
the workspace.
aEnter capacity in the metric list search text box.
The list displays metrics that contain the search term.
bDouble-click the following metrics to add the following charts to the workspace:
Capacity | Used Space (GB)
n
Disk Space | Capacity (GB)
n
Summary | Number of Capacity Consumers
n
cCompare the charts.
For example, if the Capacity | Used Space (%) chart shows an increase in used space, but the Disk
Space | Capacity (GB) did not increase and the Summary | Number of Capacity Consumers did
not decrease, then adding capacity is a solution, but it does not address the root cause.
3To analyze the second recommendation, vMotion some Virtual Machines to a different Datastore,
add related charts to the workspace.
aEnter vm in the metric list search text box.
bDouble-click the Summary | Total Number of VMs metric to add it to the workspace
cCompare the 4 charts.
For example, if the Summary | Total Number of VMs chart shows that the number of virtual
machines did not increase enough to negatively aect the data store, then moving some of the
virtual machines is a solution, but it does not address the root cause.
VMware, Inc. 17
vRealize Operations Manager User Guide
4To analyze the third recommendation, Delete unused snapshots of virtual machines, add related charts
to the workspace.
aEnter snapshot in the metric list search text box.
bDouble-click the following metrics to add the charts to the workspace:
Disk Space | Snapshot Space (GB)
n
Disk Space Reclaimable | Snapshot Space | Waste Value (GB)
n
cCompare the charts.
For example, if the amount of Disk Space | Snapshot Space (GB) increased and the Disk Space
Reclaimable | Snapshot Space | Waste Value (GB) indicates an area where space can be reclaimed,
then deleting unused snapshots will positively aect the data store disk space problem and resolve
the alert.
5If this is a problematic data store that you must continue to monitor, you can create a dashboard.
aClick the Generate Dashboardbuon on the workspace toolbar.
bEnter a name for the dashboard and click OK.
In this example, use a name like Datastore disk space.
The dashboard is added to your available dashboards.
You compared metric charts to determine if the recommendations are valid and which recommendation to
implement rst. In this example, the Delete unused snapshots of Virtual Machines recommendation appears
to be the most likely way to resolve the alert.
What to do next
Implement the alert recommendations. See “Run a Recommendation on a Datastore to Resolve an Alert,” on
page 18.
Run a Recommendation on a Datastore to Resolve an Alert
As a network operations engineer, you investigated the alert regarding datastore disk space and determined
that the provided recommendations can the problem. The recommendation to delete unused snapshots is
especially useful. Use vRealize Operations Manager to delete the snapshots.
If you have not enabled actions in the vCenter adapter, you can manually delete the snapshots on your
vCenter Server instance.
Prerequisites
Compare the metric charts to identify the likely root cause of the alert. See “Compare Alerts and Events
n
Over Time in Response to a Datastore Alert,” on page 15 .
Procedure
1In the menu, click Alerts and select the alert name in the data grid. The alerts detail information
appears on the right.
2Review the Recommendations.
Recommendations include the Storage vMotion some virtual machines to a different datastore
recommendation and the Delete unused snapshots for virtual machines recommendation. The delete
unused snapshot recommendation includes an action buon.
3Click Delete Unused Snapshots for Datastore.
18 VMware, Inc.
Chapter 1 Monitoring Objects in Your Managed Environment by Using vRealize Operations Manager
4In the Days Old text box, select or enter the number of days old the snapshot must be to be retrieved for
deletions and click OK.
For example, enter 30 to retrieve all snapshots on the datastore that are 30 days old or older.
5In the Delete Unused Snapshots for Datastore dialog box, review the Snapshot Space, Snapshot Create
Time, and the VM Name. Determine which snapshots to delete and select the check box for each one to
delete.
6Click OK.
The dialog box that appears provides a link to Recent Tasks and a link to the task.
7To verify that the task ran successfully, click Recent Tasks.
The Recent Tasks page appears. The Delete Unused Snapshots action includes two tasks, one to retrieve
the snapshots and one to delete the snapshots.
8Select the Delete Unused Snapshot task that has the more recent nish time.
This is the delete task. The status should be Completed.
In this example, you ran an action on the datastore in vCenter Server. The other recommendations might
also be valid.
What to do next
Verify that the recommendations resolve the alert. Run a few collection cycles after you run the action
n
and verify that the alert is canceled. Alerts are canceled when the conditions that generated them are no
longer true.
Implement the other recommendations. The other recommendations for this alert require you to use
n
other applications. You cannot implement the recommendations from vRealize Operations Manager.
User Scenario: You See Problems as You Monitor the State of Your Objects
As you investigate your objects in the context of this scenario, vRealize Operations Manager provides details
to help you resolve the problems. You analyze the state of your environment, examine current problems,
investigate solutions, and take action to resolve the problems.
As a virtual infrastructure administrator, you regularly browse through vRealize Operations Manager at
various levels so that you know the general state of the objects in your managed environment. Although no
one has called or complained, and you do not see any new alerts, you are starting to see that your cluster is
running out of capacity.
This scenario refers to objects that are associated with the VMware vSphere Solution, which connects
vRealize Operations Manager to one or more vCenter Server instances. The objects in your environment
include multiple vCenter Server instances, data centers, clusters (cluster compute resources), host systems,
resource pools, and virtual machines.
As you perform the steps in this scenario, and progress through the stages of troubleshooting, you learn
how to use vRealize Operations Manager to help you resolve problems. You will analyze the state of the
objects in your environment, examine current problems, investigate solutions, and take action to resolve the
problems.
This scenario shows you how to evaluate the problems that occur on your objects, and take action to resolve
problems.
With the Analysis tab, you view the seings for object resources, click the links provided to further
n
analyze the problem, and examine the policy seings and thresholds.
VMware, Inc. 19
vRealize Operations Manager User Guide
Using the Events tab, you examine the symptoms that triggered on the objects, determine when the
n
problems that triggered those symptoms occurred, identify the events associated with those problems,
and examine the metric values involved.
On the Details tab, you investigate the metric activity as a graph, list, or distribution chart, and view the
n
heat maps to examine the criticality levels of your objects.
With the Environment tab, you evaluate the health, risk, and eciency of various objects as they relate
n
to your overall object hierarchy. You view the object relationships to determine how an object that is in a
critical state might be aecting other objects.
To support future troubleshooting and ongoing maintenance, you can create a new alert denition, and
create a dashboard and one or more views and reports. To plan for growth and account for newly approved
projects, you can create and commit capacity projects. To enforce the rules used to monitor your objects, you
can create and customize operational policies.
Prerequisites
Verify that you are monitoring one or more vCenter Server instances. See the vRealize Operations Manager
Customization and Administration Guide.
Procedure
1Analyze the State of Your Environment on page 21
The Analysis tabs help you analyze your objects in multiple ways. As a Virtual Infrastructure
Administrator, you use the Analysis tabs to evaluate the details about the state of your objects to help
you resolve problems.
2Troubleshoot Problems with a Host System on page 21
You use the Troubleshooting tabs to identify the root cause of problems that are not resolved by alert
recommendations or simple analysis.
3Examine the Environment Details on page 23
Examine the status of your objects in the views and heatmaps so that you can identify the trends and
spikes that are occurring with the resources on your cluster and objects. To determine whether any
deviations have occurred, you can display overall summaries for an object, such as for the cluster disk
space usage breakdown.
4Examine the Environment Relationships on page 25
You use the Environment Overview and List to examine the status of the badges as they relate to the
objects in your environment hierarchy, and determine which objects are in a critical state for a
particular badge. To view the relationships between your objects to determine whether an ancestor
object that has a critical problem might be causing problems with the descendants of the object, you
use the Environment Map.
5Fix the Problem on page 27
You use the analysis and troubleshooting features of vRealize Operations Manager to examine
problems that put your objects in a critical state, and identify solutions. To resolve the problems,
where actions exist for the object type, you select an object and an available action that is specic to the
object. Or, you can open the object in the vSphere Web Client and modify the object seings to resolve
the problem.
6Create a New Alert Denition on page 28
Based on the root cause of the problem, and the solutions that you used to x the problem, you can
create a new alert denition for vRealize Operations Manager to alert you. When the alert is triggered
on your host system, vRealize Operations Manager alerts you and provides recommendations on how
to solve the problem.
20 VMware, Inc.
Chapter 1 Monitoring Objects in Your Managed Environment by Using vRealize Operations Manager
7Create Dashboards and Views on page 29
To help you investigate and troubleshoot problems with your cluster and host systems that might
occur in the future, you can create dashboards and views that apply the troubleshooting tools and
solutions that you used to research and solve the problems with your host system, to make those
troubleshooting tools and solutions available for future use.
Analyze the State of Your Environment
The Analysis tabs help you analyze your objects in multiple ways. As a Virtual Infrastructure Administrator,
you use the Analysis tabs to evaluate the details about the state of your objects to help you resolve problems.
As you browse through the inventory tree, you notice that one of your clusters, named USA-Cluster, is
experiencing capacity problems. You use the Analysis tabs to begin to investigate the cause of the problem
on USA-Cluster, and you start to see problems reported with the capacity on one of your host systems and
other objects.
Prerequisites
Verify that you understand the context of this scenario. See “User Scenario: You See Problems as You
Monitor the State of Your Objects,” on page 19.
Procedure
1In the menu, click Environment, then in the left pane click vSphere Hosts and Clustersand select the
object.
2Click the Analysis tab.
You see red icons on the Capacity Remaining and Time Remaining tabs.
3Click the Time Remaining tab.
You see that the memory allocation is severely constrained.
4View the time remaining breakdown for the cluster.
The icons indicate that zero days remain, with no planned capacity projects considered.
5Scroll down until you see the Time Remaining in Related Objects pane.
The parent object is the data center, and the peer represents another cluster. The child objects include
the resource pool and host systems. The data center and one of the host systems are experiencing
critical memory problems.
6Hover your mouse over the red parent and child icons.
The memory capacity has expired on the data center and one of the host systems.
The memory capacity problem on the cluster is aecting the memory capacity of the related objects.
What to do next
Use the Troubleshooting tab to further troubleshoot the capacity problems on your cluster and host system.
See “Troubleshoot Problems with a Host System,” on page 21.
Troubleshoot Problems with a Host System
You use the Troubleshooting tabs to identify the root cause of problems that are not resolved by alert
recommendations or simple analysis.
To further troubleshoot the symptoms of the capacity problems that are occurring on the cluster and host
system, and determine when those problems occurred, you use the Troubleshooting tabs to continue to
investigate the memory problem.
VMware, Inc. 21
vRealize Operations Manager User Guide
Prerequisites
Use the Analysis tabs to analyze your environment. See “Analyze the State of Your Environment,” on
page 21.
Procedure
1In the menu, click Environment, then in the left pane click vSphere Hosts and Clusters and select the
object. For example, USA-Cluster.
2Click the Alerts tab and review the symptoms.
The Symptoms tab displays the symptoms that triggered on the selected cluster. You notice that several
critical symptoms exist.
Cluster Compute Resource Time Remaining with committed projects is critically low
n
Cluster Compute Resource Time Remaining is critically low
n
Capacity remaining is critically low
n
3Analyze the critical symptoms.
aHover your mouse over each critical symptom to identify the metric used.
bTo view only the symptoms that aect the cluster, enter cluster in the quick lter text box.
When you hover over Cluster Compute Resource Time Remaining is critically low, the metric
Badge|Time Remaining with committed projects (%) appears. You notice that its value is less than
or equal to zero, which caused the capacity symptom to trigger and generate an alert on USACluster.
4Click the Events > Timeline tab to review the triggered symptoms, alerts, and events that occurred on
USA-Cluster over time, and identify when the problems occurred.
aClick the calendar and select Last 7 Days as the range.
Several events appear in red.
bHover your mouse over each event to view the details.
cTo display the events that occurred on the cluster's data center, click View From, and select
Datacenter.
Warning events for the data center appear in yellow.
dHover your mouse over the warning events.
You notice that the density is starting to get low, and that a hard threshold violation occurred on
the data center late in the evening. The hard threshold violation shows that the Badge|Density
metric value was under the acceptable value of 25, and that the violation triggered with a value of
14.89.
eTo view the aected child objects, click View From and select Host System.
5Click the Events tab to examine the changes that occurred on USA-Cluster, and determine whether a
change occurred that contributed to the root cause of the alert or other problems with the cluster.
aReview the graph.
By reviewing the graph, you can determine whether a reoccurring event has caused the errors.
Each event indicates that the guest le system is out of disk space. The aected objects appear in
the pane below the graph.
bClick each red triangle to identify the aected object and highlight it in the pane below.
22 VMware, Inc.
Chapter 1 Monitoring Objects in Your Managed Environment by Using vRealize Operations Manager
6Click the All Metrics tab to evaluate the objects in their context in the environment topology to help
identify the possible cause of a problem.
aIn the top view, select USA-Cluster.
bIn the metrics pane, expand Badge and double-click Badge|Capacity Remaining (%).
The Badge|Capacity Remaining (%) calculation is added to the lower right pane.
cIn the metrics pane, double-click Density.
dIn the metrics pane, double-click Workload.
eOn the toolbar, click Date Controls and select Last 7 Days.
The metric chart indicates that the capacity for the cluster remained at a steady level for the past
week, but that the cluster density increased to its maximum value in the last several days. The
Badge|Workload (%) calculation displays the workload extremes that correspond to the density
problem.
You have analyzed the symptoms, timeline, events, and metrics related to the problems on your cluster, and
determined that the heavy workload on the cluster has decreased the cluster density in the last several days,
which indicates that the cluster is starting to run out of capacity.
What to do next
Examine the Details views and heatmaps to interpret the properties, metrics, and alerts to look for trends
and spikes that occur in the resources for your objects, the distributions of resources across your objects, and
data maps to examine the use of various resource types across your objects. See “Examine the Environment
Details,” on page 23.
Examine the Environment Details
Examine the status of your objects in the views and heatmaps so that you can identify the trends and spikes
that are occurring with the resources on your cluster and objects. To determine whether any deviations have
occurred, you can display overall summaries for an object, such as for the cluster disk space usage
breakdown.
To examine the problems with your USA-Cluster further, use the Details views to display the metrics and
collected capacity data for your cluster. Each view includes specic metrics data collected from your objects.
For example, trend views use data collected from objects over time to generate trends and forecasts for
resources such as memory, CPU, disk space, and so on.
Use the heatmaps to examine the capacity levels on the cluster, host systems, and virtual machines. The
block sizes and colors are based on the metrics selected in the heatmap conguration. For example, the
heatmap that shows the most abnormal workload for virtual machines is sized by the Badge|Workload (%)
metric, and is colored by the Badge|Anomaly metric.
Prerequisites
Use the Troubleshooting tabs to look for root causes. See “Troubleshoot Problems with a Host System,” on
page 21
Procedure
1Click Environment > vSphere Hosts and Clusters > USA-Cluster.
VMware, Inc. 23
vRealize Operations Manager User Guide
2Examine the detailed information about USA-Cluster in the views.
aClick the Details tab and click Views.
The views provide multiple ways to look at dierent types of collected data by using trends, lists,
distributions, and summaries.
bIn the search text box, enter capacity.
The list lters and displays the capacity views for clusters and other objects.
cClick the view named Cluster Capacity Risk Forecast, and examine the number of virtual
machines for USA-Cluster in the lower pane.
Even though the USA-Cluster has two host systems and 30 virtual machines, no capacity exists.
3Examine the host systems in the cluster, and reclaim capacity from the descendant virtual machines.
aClick the Analysis tab, and click Capacity Remaining.
bIn the inventory tree, expand USA-Cluster, and click each of the host systems.
The host system named w2-vcopsqe2-009 is in a critical state, with no capacity remaining.
cIn the lower pane, expand Memory, and expand Allocation.
The stress free value is zero, and the amount of memory available is zero, which indicates that the
capacity of the host system has been depleted.
dClick the Details tab, and click Views, and click the Virtual Machine Reclaimable Capacity view.
eIn the lower pane, click the title of the Reclaimable Memory column to sort the list of virtual
machines so that the largest amount of reclaimable capacity is on top.
fTo reclaim capacity from several virtual machines, click to the right of the rst virtual machine
name, then press Shift and click to the right of the last virtual machine that has capacity to reclaim.
The virtual machines that have reclaimable capacity are highlighted.
gClick the gear icon, and select Set CPU Count and Memory for VM.
hClick the Current CPU column title to sort the list according to the highest number of CPUs.
Based on the actual use of the virtual machines listed, the New CPU column recommends fewer
CPUs for each virtual machine.
iClick the check box next to each virtual machine that has a recommended lower CPU count, and
click OK.
By reducing the number of CPUs for each virtual machine, you free up capacity on your host
system, and improve the USA-Cluster capacity and workload.
4Examine the heatmaps for the host system and virtual machine objects in USA-Cluster.
aIn the inventory tree, click USA-Cluster.
bClick Details, click Heatmaps, and click through the list of heatmap views.
cClick Which VMs currently have the highest CPU demand and contention?
The heatmap displays blocks that represent the objects in USA-Cluster. The block for a virtual
machine appears in red, which indicates that it has a critical problem.
dHover over the red block and examine the details.
The cluster, host system, and virtual machine names appear, with links to more information about
the object.
24 VMware, Inc.
Chapter 1 Monitoring Objects in Your Managed Environment by Using vRealize Operations Manager
eClick Show Sparkline to display the activity trend on the virtual machine.
fClick each of the Details links to display more information.
To verify that freeing up memory on the virtual machines has improved the workload of the host system
and the cluster, you can now examine the status of the host system and cluster.
You used views and heatmaps to evaluate the status of your objects and identify trends and spikes, and free
up capacity for your host system and USA-Cluster. To further narrow in on problems, you can examine the
other views and heatmaps. You can also create your own views and heatmaps.
What to do next
Examine the badge status for the objects in your environment hierarchy to determine which objects are in a
critical state, and examine the object relationships to determine whether a problem on one object is aecting
one or more other objects. See “Examine the Environment Relationships,” on page 25.
Examine the Environment Relationships
You use the Environment Overview and List to examine the status of the badges as they relate to the objects
in your environment hierarchy, and determine which objects are in a critical state for a particular badge. To
view the relationships between your objects to determine whether an ancestor object that has a critical
problem might be causing problems with the descendants of the object, you use the Environment Map.
As you click each of the badges in the Environment Overview, you see that several objects are experiencing
critical problems with health, workload, and faults. Others are reporting critical risk status, and many are in
critical time remaining and capacity remaining states.
Several objects are experiencing stress. You notice that you can reclaim capacity from multiple virtual
machines and a host system, but the overall eciency status for your environment displays no problems.
Prerequisites
Examine the status of your objects in views and heatmaps. See “Examine the Environment Details,” on
page 23.
Procedure
1Click Environment > vSphere Hosts and Clusters > USA-Cluster.
VMware, Inc. 25
vRealize Operations Manager User Guide
2Examine the USA-Cluster environment overview to evaluate the badge states of the objects in a
hierarchical view.
aIn the inventory tree, click USA-Cluster, and click Environment > Overview.
bOn the Badge toolbar, click through the badges and look for red icons to identify critical problems.
OptionEvaluation Process
Status iconsWhen the status of my object is critical, what must I do to resolve the problem?
Badges: Health, Workload,
Anomalies, and Faults
Badges: Risk, Time
Remaining, Capacity
Remaining, Stress
Badges: Eciency,
Reclaimable Capacity,
Density
As you click through the badges, you notice that your vCenter Server and other top level objects
appear to be healthy, but you see that a host system and several virtual machines are in a critical
state for health, workload, and faults. Several objects also have critical problems with time
remaining and capacity remaining.
How can I be notied before serious problems occur?
How might the health and workload of my host systems be aecting my virtual
machines?
Are anomalies and faults on my host systems and virtual machines aecting
other objects?
How does the stress level of my cluster and host systems aect the virtual
machines descendants?
To improve eciency, how can I reclaim capacity from the cluster, host systems,
resource pool, and virtual machines, and apply the reclaimed capacity to other
objects in my environment?
cHover your mouse over the red icon for the host system to display the IP address.
dEnter the IP address in the search text box, and click the link that appears.
The host system is highlighted in the inventory tree. You can then look for recommendations or
alerts for the host system on the Summary tab.
3Examine the environment list and view the badge status for your objects to determine which objects are
in a critical state.
aClick Environment > List.
bExamine the badge states for the objects in USA-Cluster.
cClick the Capacity Remaining badge column name to sort the object list and display the objects
that are in a critical state.
Many of the objects that are at risk for capacity remaining also display critical states for time
remaining, risk, and health. You notice that multiple virtual machines and a host system named
w2-vropsqe2-009 are critically aected. Because the host system is experiencing the most critical
problems, and is likely aecting other objects, you must focus on resolving the problems with the
host system.
dClick the host system named w2-vropsqe2-009, which is in a critical state, to locate it in the
inventory tree.
eClick w2-vropsqe2-009 in the inventory tree, and click the Summary tab to look for
recommendations and alerts so that you can take action.
26 VMware, Inc.
Loading...
+ 60 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.