Dell Server Pro Management Pack 2.0 User Manual

Page 1
Dell™ Server PRO Management Pack 2.0
For Microsoft
Virtual Machine Manager
®
System Center
User’s Guide
www.dell.com | support.dell.com
Page 2
NOTE: A NOTE indicates important information that helps you make better use of
your computer.
CAUTION: A CAUTION indicates potential damage to hardware or loss of data if
instructions are not followed.
____________________
Information in this document is subject to change without notice. © 2009 Dell Inc. All rights reserved.
Reproduction of these materials in any manner whatsoever without the written permission of Dell Inc. is strictly forbidden.
Trademarks used in this text: Dell, the DELL logo, Pow erE dg e, and OpenManage are trademarks of Dell Inc.; Hyper-V, Microsoft, Windows, Windows Vista, and Windows Server are either trademarks or registered trademarks of Microsoft Corporation in the United States and/or other countries.
Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell Inc. disclaims any proprietary interest in trademarks and trade names other than its own.
December 2009
Page 3
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5
What’s New in this Release? . . . . . . . . . . . . . . . 6
Overview . . . . . . . . . . . . . . . . . . . . . . . . . 6
Related Terms
What is a PRO Tip?
Feature Highlights . . . . . . . . . . . . . . . . . . . . . 7
Understanding PRO Tip Management
Supported Operating Systems
Other Documents You May Need . . . . . . . . . . . . 10
Obtaining Technical Assistance
. . . . . . . . . . . . . . . . . . . . . . . 6
. . . . . . . . . . . . . . . . . . . . 7
. . . . . . . . . . 8
. . . . . . . . . . . . . . 10
. . . . . . . . . . . . 11
2 Getting Started With Dell PRO Pack . . . . 13
Minimum Requirements . . . . . . . . . . . . . . . . . 13
Installing SCOM/SCE and SCVMM Agents
Integrating Operations Manager with SCVMM
Importing Dell PRO Pack
Configuring PRO Tips . . . . . . . . . . . . . . . . . . 16
. . . . . . . . . . . . . . . . 14
. . . . . . . 14
. . . . 14
Testing the Setup Using Scenarios
. . . . . . . . . . . 17
Contents 3
Page 4
Uninstalling PRO Pack . . . . . . . . . . . . . . . . . 19
Security Considerations
. . . . . . . . . . . . . . . . . 19
3 Using Dell PRO Pack . . . . . . . . . . . . . . . . 21
Monitoring Using SCVMM . . . . . . . . . . . . . . . 21
Implementation of Recovery Actions
Monitoring Using PRO Specific Alerts on SCOM/SCE
. . . . . . . . . . . . . . . . . . . . . . 26
Using Health Explorer to Reset Alerts
Recovery Action Overrides
. . . . . . . . . . . . . . . 28
Alert Cause and Recovery Action . . . . . . . . . . . . 29
. . . . . . . 22
. . . . . . . . . 27
4 Contents
Page 5
1

Introduction

This document is intended for system administrators who use the Dell™ Server PRO Management Pack (Dell PRO Pack) to monitor Dell systems and take remedial action when an inefficient system is identified.
The Dell PRO Pack integrates with the following:
•Microsoft® System Center Operations Manager (SCOM) 2007 SP1
SCOM 2007 R2
System Center Essentials (SCE) 2007 version1
System Center Virtual Machine Manager (SCVMM) 2008
SCVMM 2008 R2
This integration enables you to proactively manage virtual environments and ensure high availability of your Dell systems.
To implement PRO Pack, see "Getting Started with Dell PRO Pack".
CAUTION: Due to the possibility of data corruption and/ or data loss, it is
recommended that the procedures in this document should be performed only by personnel with knowledge and experience of using the Microsoft Windows operating system and Systems Center Operations Manager 2007/ System Center Essentials 2007.
®
NOTE: The Readme file, DellPROMP2.0_Readme.txt contains the latest information
about the software and management station requirements, as well as information about known issues. It is posted on the Systems Management documentation page on the Dell Support website at support.dell.com/manuals. The readme file is also packaged in the self-extracting executable Dell_ PROPack_2.0.0_A00.exe.
Introduction 5
Page 6

What’s New in this Release?

This release of PRO Pack supports the following:
SCOM 2007 R2
SCVMM 2008 R2
Virtual machine live migration with no downtime
Feature to Override Dell PRO Pack default recovery actions
Additional Dell OpenManage™ alerts
Change in the names of recovery actions from "Maintenance mode" and "VM Migration" in PRO Pack 1.0 to "Restrict", and "Restrict and Migrate" mode
Improvements on the resolutions of some old alerts
For more information on the alerts and their resolutions, see "Alert Cause and Recovery Action."

Overview

SCOM 2007/SCE 2007 uses PRO-enabled Management Pack to collect and store information on Dell hardware along with a description of their health status. Dell PRO Pack works with SCOM/SCE (henceforth, referred to as Operations Manager) and SCVMM 2008 to manage Dell physical devices and their hosted virtual machines using this available health information. Dell PRO Pack recommends remedial actions when monitored objects transition to an unhealthy state (for example, virtual disk failure or predictive drive error), by leveraging the monitoring and alerting capabilities of Operations Manager and remediation capabilities of SCVMM.

Related Terms

A managed system
Administrator, which is monitored and managed using Operations Manager and SCVMM. It can be managed locally or remotely using supported tools.
A management station
• based Dell System that has the Operations Manager and SCVMM installed to manage virtual workloads.
6 Introduction
is a Dell system running Dell™ OpenManage™ System
(or)
managing station
can be a Microsoft Windows -
Page 7

What is a PRO Tip?

PRO (Performance and Resource Optimization) Tip is a feature that enables monitoring of your virtualized infrastructure and alerting when there is an opportunity to optimize the usage of these resources. A PRO Tip window contains the description of the event that produced the PRO Tip and the suggested remedial action. This feature allows you to perform a load-balance of virtual machines between physical hosts when specific threshold values are reached. Alternatively, you can migrate virtual machines when a hardware failure is detected.
The PRO Tip window in the SCVMM Administrator console enables you to view active PRO Tips for the host groups. The Operations Manager console displays the corresponding alerts as well, to ensure a consistent monitoring experience.
You can implement the recommended action mentioned in the PRO Tip manually. You can also configure PRO tip to implement the recommended action automatically.

Feature Highlights

Dell PRO Pack:
Performs PRO-management of Dell PowerEdge™ systems running Microsoft Hyper-V™ platforms, by continually monitoring the health of your physical and virtual infrastructure.
Works with Operations Manager and SCVMM to detect events such as loss of power supply redundancy, higher temperature than threshold values, system storage battery error, virtual disk failure, and so on. For more information on events supported by Dell PRO Pack, see "Alert Cause and Recovery Action".
Generates PRO Tip when the monitored hardware moves to an unhealthy state.
Minimizes downtime by implementing the remedial action provided on PRO Tips. The two remedial actions are:
Restrict:
temporarily unavailable for placement of new virtual machines until the maintenance tasks have been completed.
In this mode, it is recommended that the server should be
Introduction 7
Page 8
Managed System 1 Managed System 2
Management StationManagement Station
Dell PowerEdge
Hyper-V Hypervisor
Management Agents
(SCOM/SCE & SCVMM)
Dell OMSA
VM VM
Dell PowerEdge
Hyper-V Hypervisor
Dell OMSA
Dell PowerEdge
SCE 2007 SP1/ SCOM 2007 SP1/R2
Dell PowerEdge
SCVMM 2008/R2
I
m
p
l
e
m
e
n
t
s
R
e
so
l
u
t
i
o
n
Notifies
Alerts
VM
Dell
PRO Pack
Management Agents
(SCOM/SCE & SCVMM)
Restrict and migrate:
In this mode, in order to prevent loss of service from the virtual workloads, it is recommended that all running virtual machines be migrated from the server to another healthy server immediately.

Understanding PRO Tip Management

To help you understand how Dell PRO Pack works, this section explains a typical setup and the sequence of events involved.
Figure 1-1. Interaction of Components
In the figure, a group of PowerEdge systems are the managed systems. Two PowerEdge systems act as management stations hosting the Operations Manager and SCVMM. Dell OpenManage Server Administrator generates alerts with corresponding severity when there is a transition to an unhealthy state and the same alerts are monitored by Dell PRO Pack for PRO.
Dell PRO Pack contains a mapping between Server Administrator alerts and the associated remedial action.
8 Introduction
Page 9
The following table describes the sequence of events that occur in generating and handling of a typical PRO Tip.
Table 1-1. Sequence of events with description
Sequence Number Event
1 Operations Manager agents on the host enable to
detect the warning, error, or failure alerts that are logged by Dell OpenManage Server Administrator.
2 Alert is sent to Operations Manager.
3 Operations Manager console displays active PRO
specific alerts.
4 Operations Manager notifies the alert and the
associated PRO Tip ID to SCVMM.
5 SCVMM displays a corresponding entry in the
PRO Tip window with remedial action.
6 Implement the PRO Tip to enable recovery action on
the managed system that is, either placing the managed system in the restricted mode, or restrict and migrate virtual machines from the managed system.
7 SCVMM notifies Operations Manager about the
successful completion of the recovery action.
8 The SCVMM console displays the status of the PRO
Tip as "Resolved" after it is successfully implemented.
9 PRO Tip disappears from SCVMM PRO tip window.
10 PRO Active alert disappears from SCOM.
For more information on the types of events and the associated remedial actions, see "Alert Cause and Recovery Action".
Introduction 9
Page 10

Supported Operating Systems

For the detailed Operating Systems support matrix, see the Dell PRO Pack readme file, DellPROMP2.0_Readme.txt. You the self-extra posted on the Systems Management documentation page on the Dell Support website at support.dell.com/manuals.
cting executable -
Dell_ PROPack_2.0.0_A00.exe.
can find the readme packaged in
It is also

Other Documents You May Need

Besides this guide, you can find the following guides on the Systems
Management and Systems documentation pages on the Dell Support website at support.dell.com/manuals:
•The
•The
•The
Dell OpenManage Server Administrator CIM Reference Guide
documents the Common Information Model (CIM) provider, an extension of the standard management object format (MOF) file. The CIM provider MOF documents supported classes of management objects.
Dell OpenManage Server Administrator Messages Reference Guide
the messages that are displayed in your Server Administrator home page Alert log or on your operating system’s event viewer. This guide explains the text, severity, and cause of each service alert message that Server Administrator issues.
Dell OpenManage Server Administrator Command Line Interface
User's Guide
Administrator, including an explanation of the command line interface (CLI) commands to view system status, access logs, create reports, configure various component parameters, and set critical thresholds.
documents the complete command line interface for Server
lists
10 Introduction
Page 11
•The
The Dell Systems Management Tools and Documentation DVD contains a readme file for Server Administrator and additional readme files for other systems management software applications found on the DVD.
For documentation on virtualization solutions, see the Dell Support website at support.dell.com/manuals.
Dell OpenManage Server Administrator Storage Management User's Guide
is a comprehensive reference guide for configuring and managing local and remote storage attached to a system. This document is also available in
Dell
HTML and PDF formats on the Documentation DVD online help.
and from the Storage Management console as
Systems Management Tools and

Obtaining Technical Assistance

If at any time you do not understand a procedure described in this guide, or if your product does not perform as expected, different types of help are available. For more information see "Getting Help" in your system’s Installation and Troubleshooting Guide or the Hardware Owner’s Manual.
Additionally, Dell Enterprise Training and Certification is available; see www.dell.com/training for more information. This service might not be offered in all locations.
Introduction 11
Page 12
12 Introduction
Page 13
2

Getting Started with Dell PRO Pack

Minimum Requirements

To implement the Dell™ PRO Pack, you must ensure that the following minimum execution environment exists:
Management Station:
•Microsoft SP1/R2 or System Center Essentials (SCE) 2007 installed on supported hardware and operating system
System Center Virtual Machine Manager (SCVMM) 2008/R2 installed on supported hardware and operating system
Integration of SCOM and SCVMM
Managed System:
Microsoft Hyper-V™ hosts on any Dell PowerEdge™ systems ranging from x9xx to xx1x (both inclusive)
Dell OpenManage™ Server Administrator (including the Server Administrator Storage Management Service.)
It is recommended that you install the latest version of Dell
Minimum supported version of OMSA is 5.3
Live Migration:
SCVMM R2 with Windows Server 2008 R2 or Microsoft Hyper-V Server 2008 R2
OpenManage 6.2
You can download the latest version of OMSA from the Dell Support website at
®
System Center Operations Management (SCOM) 2007
OpenManage Server Administrator(OMSA) 6.2
support.dell.com
.
NOTE: For the list of supported operating systems for Operations Manager and
SCVMM, see the Microsoft website at http://technet.microsoft.com/hiin/library/bb309428(en-us).aspx.
Getting Started with Dell PRO Pack 13
Page 14

Installing SCOM/SCE and SCVMM Agents

When you use the setup to monitor your infrastructure, SCOM/SCE (Operations Manager) and SCVMM agents installed on the managed hosts enable data transfer between the managed system and management stations. Agents of both SCVMM and Operations Manager are installed manually or automatically during the discovery process on all Hyper-V hosts.

Integrating Operations Manager with SCVMM

For the setup to support Dell PRO Pack, Operations Manager must be integrated with SCVMM. For detailed description of the steps, see the Microsoft TechNet Library.
For SCOM & VMM 2008 Integration, see http://technet.microsoft.com/hi-
in/library/cc956099(en-us).aspx.
For SCE & VMM 2008 Integration, see
http://go.microsoft.com/fwlink/?LinkId=148206.
For SCOM & VMM R2 Integration, see http://technet.microsoft.com/hi­in/library/ee236463(en-us).aspx.

Importing Dell PRO Pack

Dell PRO Pack version 2.0 is provided in a sealed format as a .mp file. To import Dell PRO Pack:
1
Download the website to a removable media or a local repository.
2
Extract the contents of the file to a suitable folder on your system.
3
Launch the Operations Manager console.
4
Right-click
The
Select Management Pack to import
Dell_ PROPack_2.0.0_A00.exe
Management Packs
in the
from the Dell Support
Administration
screen is displayed.
tab.
5
Browse to the location where you extracted the
Dell_PROPack_2.0.0_A00.exe Dell.Connections.hyperv.PROPack.mp
6
Click
Open.
14 Getting Started with Dell PRO Pack
file and select the
file.
Page 15
The
Import Management Packs
message in the
Management Pack Details
screen is displayed with a warning
section, as shown in Figure 2-1. Operations Manager displays this generic warning as a part of the security process when you manually install a management pack. For more information on how you can change the security settings for installing Management Packs manually, see the Microsoft TechNet Library.
Figure 2-1. Security Warning Message
7
Click
Install
.
A confirmation dialog box is displayed.
8
Click
Yes
.
For alerts and PRO Tips to be generated, ensure that SCVMM discovers the managed objects and displays them in the State View.
Getting Started with Dell PRO Pack 15
Page 16

Configuring PRO Tips

The Dell systems and virtual infrastructure are monitored for either Critical only, or both Critical and Warning alerts.
•A
Wa rn in g
below the acceptable level. For example, the component may still be functioning, but it could potentially fail, or the component may be functioning in an impaired state.
Critical
•A failure is imminent. By default, the monitoring level is set to "Warning and Critical".
To enable PRO Tips for both War ni ng and Critical alerts and automatic implementation of Pro Tips:
1
Launch the SCVMM console.
2
In the
The Figure 2-2.
Figure 2-2. Configuring PRO Tips
alert is generated when a reading for the component is above or
alert is generated when the component has either failed or
Host Groups
Host Groups Properties for All Hosts
section, right-click
All Hosts
and select
window appears, as shown in
Properties
.
16 Getting Started with Dell PRO Pack
Page 17
3
Select the
PRO
tab and select the
Enable PRO on this Host Group
option.
4
By default, the monitoring level is set to
Warning and Critical
, which means that the application will display PRO Tips generated for both Warning and Critical alerts. To restrict the PRO Tips to Critical alerts only, select the
5
Select the
Critical only
Automatically implement PRO tips on this Host Group
option.
option.
NOTE: By default, the automation level is set to Critical only, which means
that PRO Tips with a Critical severity level are automatically implemented.
To implement all PRO Tips automatically, select the
Warning and Critical
option.
6
Click OK to save your settings.

Testing the Setup Using Scenarios

To confirm that the imported Dell PRO Pack is fully functional, create the scenarios listed in the two tables and check if the activities listed under Expected System Response column are carried out.
Scenario 1 - The backplane board temperature exceeds its warning threshold value on a managed system.
Table 2-1. Checking recovery action for warning alert conditions.
Your Actions Expected System Response
Generate a temperature alert warning on the managed system, such that the backplane board temperature exceeds its warning threshold limit. The event id is 1053 with the source being OpenManage Server Administrator.
Select the Implement option in the PRO Tip window.
• Dell PRO Pack generates the corresponding alert in Operations Manager.
• Operations Manager passes an alert associated with the PRO Tip to SCVMM.
• PRO Tip appears in the SCVMM PRO Tip window.
Places the host in Restrict mode.
Getting Started with Dell PRO Pack 17
Page 18
Table 2-1. Checking recovery action for warning alert conditions.
(continued)
Your Actions Expected System Response
Verify that the host is placed in the Restrict mode and the PRO Tip resolved the alert.
• After successful implementation of the PRO Tip, the status changes to "Resolved" and the PRO Tip entry is moved out of the PRO Tip window.
• Corresponding alert disappears in the Operations Manager Alert View.
Select the Dismiss option instead of the Implement option in the PRO Tip window.
The PRO Tip is dismissed. No recovery task is performed. The corresponding PRO Tip entry is moved out of the PRO Tip window.
Scenario 2 - the backplane board temperature exceeds its failure threshold value on a managed system.
Table 2-2. Checking recovery action for failure alert conditions.
Your Actions Expected System Response
Generate a temperature alert on the managed system, such that the backplane board temperature exceeds its failure threshold limit. The event id is 1054 with the source being OpenManage Server Administrator.
Select the Implement option in the PRO Tip window.
• Dell PRO Pack generates the corresponding alert in Operations Manager.
• Operations Manager passes an alert associated with the PRO Tip to SCVMM.
• PRO Tip appears on the SCVMM PRO Tip window.
SCVMM generates the following recovery actions:
a
Sets the host in Restrict mode.
b
Determines the list of virtual systems running on the unhealthy host.
c
Determines the best-rated healthy host.
d
Migrates the virtual machine to best-rated host.
e
Repeats this action until all the running virtual machines are migrated from the unhealthy host.
18 Getting Started with Dell PRO Pack
Page 19
Table 2-2. Checking recovery action for failure alert conditions.
Your Actions Expected System Response
Verify that the virtual systems are moved to a healthy host and PRO Tip resolved the alert.
Select the Dismiss option instead of the Implement option in the PRO Tip window.
• After successful implementation of the PRO Tip, the status changes to "Resolved" and the PRO Tip entry is moved out of the PRO Tip window.
• Corresponding alert disappears in the Operations Manager Alert View.
No action is taken and virtual systems are not moved. The corresponding PRO Tip entry is moved out of the PRO Tip window.
For more information, see "Using Health Explorer to Reset Alerts".
(continued)

Uninstalling PRO Pack

You can uninstall PRO Pack by deleting it in the Operations Manager console. When you delete PRO Pack, all the settings and thresholds associated with it are removed from Operations Manager.
To uninstall PRO Pack:
1
Launch the Operations Manager console.
2
Select
Administration→ Management Packs
3
In the
Management Packs
Management Pack and click
pane, right-click Dell PRO-enabled
Delete
.
.

Security Considerations

Operations Console access privileges are handled internally by SCOM/SCE. This can be setup using the User Roles option under Administration Security feature on the SCOM/SCE console. The profile of the role assigned to you determines what actions you can perform and which objects you are able to manage. For more information on security considerations see the
Microsoft System Center Operations Manager SP1/R2 and Microsoft Systems Centre Essentials 2007 online help.
Getting Started with Dell PRO Pack 19
Page 20
20 Getting Started with Dell PRO Pack
Page 21
3
Click the PRO Tips menu.

Using Dell PRO Pack

Monitoring Using SCVMM

You can manage the health of your virtualized environment using PRO Tips displayed on the SCVMM console.
To see the PRO Tip window, click the PRO Tips menu on the toolbar located below the main menu, as shown in Figure 3-1. The menu also shows the number of active PRO Tips in brackets.
Figure 3-1. PRO Tip Button on the SCVMM Console
Alternatively, if you select the Show this window when new PRO Tips are created option in the PRO Tip window, the window opens automatically on
the SCVMM console when a PRO Tip is generated.
The PRO Tip window displays information in a tabular format about the source, tip ( machine
You can see a description of the problem that triggered the alert, the cause, and suggested remedial action for recovery below the table.
), and state.
a concise statement of the problem associated with the host
Using Dell PRO Pack 21
Page 22
Figure 3-2. PRO Tip Window

Implementation of Recovery Actions

The PRO Tip window provides an option to either implement or dismiss the recommended action. If you select the Implement option, SCVMM implements one of the recovery tasks described below, based on the nature of the alert.
Placing the host in Restrict mode
Placing a host in Restrict mode prevents future assignment of workload to the host until the problem is resolved.
When a host is placed in the Restrict mode, it still receives alerts in the Operations Manager and associated PRO Tips in SCVMM.
The system health conditions that can trigger the maintenance mode tasks are non-critical hardware alerts on the virtualization host such as, ambient chassis temperature warning alert on a Dell™ PowerEdge™ virtualization host system.
Migration of virtual machines
The PRO Tip management pack uses SCVMM algorithms to move virtual machines from the affected system to a healthy one. The two SCVMM algorithms are Load Balance and Resource Maximization.
22 Using Dell PRO Pack
Page 23
Select the Load Balance algorithm if you want SCVMM to evenly distribute virtual machines (VMs) across a pool of hosts.
Select the Resource Maximization algorithm if you prefer to saturate the host completely before moving to a new one.
The placement requirements for identifying a healthy system and moving the virtual machines are as follows:
Hardware requirements
are requirements that a machine hosting the virtual machines must meet in order to run such as sufficient memory and storage.
Software requirements
are requirements that if met by the host, allows a virtual machine to perform more optimally such as CPU allocation, network bandwidth, network availability, disk IO bandwidth, and free memory.
SCVMM assigns a star rating to hosts in a range of zero to five. If a hardware requirement is not met, for example, not enough hard disk and memory capacity, the host automatically gets zero stars and SCVMM does not allow you to place a VM on that host.
The system health conditions that trigger migration of VMs are hardware failure alerts on a virtualization host, such as virtual disk failure and predictive drive error. Dell PRO Pack migrates VMs with the Running status. It does not migrate VMs with status such as Stop, Pause, and Saved.
After you successfully implement the recovery task the following changes take place:
•The status of PRO Tip changes to
Resolved
and the PRO Tip entry moves
out of the PRO Tip window.
Corresponding alert disappears in the Operations Manager Alert View.
An entry is displayed in the This entry shows the status of the job as
Jobs
section on the SCVMM console.
Completed
, as shown in the
Figure 3-3.
Using Dell PRO Pack 23
Page 24
Figure 3-3. Completed Job
PRO Tip implementation of moving VMs can fail if no other healthy hosts are available in the host group or host cluster. In such a case, the PRO Tip window displays the state of the corresponding PRO Tip as Failed, and the reason is elaborated in the Error section. The status of the corresponding entry in the Jobs section on the SCVMM console also displays as Failed.
NOTE: In the PRO Tip window the failure message is updated dynamically.
However, to refresh the data you have to click outside the PRO Tip window and then click again to bring the window in focus.
If you select the Dismiss option, the PRO Tip is not executed and the following changes take place:
The PRO Tip is removed from the SCVMM PRO Tip console.
The alert in Operations Manager is removed from the .
Alerts
Dell Server PRO
For more informartion, see "Using Health Explorer to Reset Alerts."
24 Using Dell PRO Pack
Page 25
VM Live Migration
With live migration, you can migrate a VM from one node of a Windows Server 2008 R2 failover cluster to another node in the same cluster without any downtime. As a connected user, you will not experience any interruption during live migration.
The difference in quick migration and live migration is that there is a downtime in quick migration whereas, there is no downtime in live migration.
NOTE: Windows Server 2008 Hyper-V supports Quick Migration. Windows Server
2008 R2 Hyper-V supports both Quick Migration and Live Migration.
Figure 3-4. Live Migration
For more information about Hyper-V live migration, see
http://go.microsoft.com/fwlink/?LinkId=147115.
Using Dell PRO Pack 25
Page 26

Monitoring Using PRO Specific Alerts on SCOM/SCE

You can monitor the physical devices in your network using the Operations Manager console.
The Operations Manager console provides the following views:
Alert View
information on the severity level, source, name, resolution state, along with the date and time of creation. To access the Alert View do the following:
a
b
c
The alerts are displayed on the right-side of the screen, as shown in Figure 3-5.
Figure 3-5. Alert View
- Displays Dell PRO specific alerts in a tabular format with
Launch the Operations Manager console.
Select the
Select
Monitoring
Dell Server PRO Alerts
tab.
from
Dell Server PRO Pack.
26 Using Dell PRO Pack
Page 27
State View
format. The State View displays objects with the name, path, storage health of the Dell system, and so on. You can personalize the State View by defining which objects you want displayed and customizing how the data looks.
Figure 3-6. State View
For more information on creating a State view see the Microsoft website.
- Displays the Dell system objects discovered in a tabular

Using Health Explorer to Reset Alerts

Health Explorer enables you to view and take action on alerts. When you select the Dismiss option in the PRO Tip window the alert is removed from the PRO Tip window. However, to reset this alert manually in the Health Explorer do the following:
Launch the
1
2
Right-click the alert that you want to close.
3
Select
Health Explorer
Reset Health
window from the
.
Actions
pane.
Using Dell PRO Pack 27
Page 28

Recovery Action Overrides

PRO Pack 2.0 supports two recovery actions. The following flag values trigger the respective recovery action:
1: For migration recovery action
2: For placing the server in restricted mode
You can override the default recovey actions by changing the default recovery action flag value. For example, change the recovery flag value from '2' to '1' with the overrides option provided in SCOM
After overriding the default value to '1', and on implementation of PRO Tip, recovery action will trigger migration of virtual machines from the host.
PRO Pack 2.0 supports only two override values, '1' and '2'. If you enter any other value, PRO Tip implementation fails and an error message is displayed.
To override the recovery action,
1
Click the
2
Search for the Dell PRO Pack monitors.
3
Select the monitor which you intended to override.
4
Right click and select
5
Select the
6
Change the value of RecoveryOverrideFlag.
7
Select
8
Click
Authoring
tab in SCOM.
Override Recovery.
Override
NOTE: When you select Enable, SCOM performs an auto-implementation for
the unit monitor. Since, this involves VMM migration, review and set the values accordingly.
Enforce
check box.
check box.
Apply
CAUTION: Saving the settings in the default management pack, creates a
dependency between PRO Pack and the management pack. When you remove or delete PRO Pack, you must delete the default management pack as well, as it contains default settings for SCOM. Hence, it is recommended that you save settings using a new MP.
9
Click
Save
overrides.
10
Generate an alert and PRO Tip.
11
Select
Implement PRO Tip
.
28 Using Dell PRO Pack
Page 29
This verifies that the overridden recovery action is successful.
Figure 3-7. Override Recovery Action

Alert Cause and Recovery Action

The following table lists the alerts and the corresponding recommended remedial action:
Restrict: It is recommended that the server should be temporarily unavailable for placement of new VMs until the maintenance tasks have been completed.
Restrict and Migrate: In this mode, in order to prevent loss of service from the virtual workloads, it is recommended that all running VMs be migrated from the server to another healthy server immediately.
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in SCVMM
1053 Temperature
sensor detected a warning value
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Warning A temperature sensor
on the backplane board, system board, CPU, or drive carrier in the specified system exceeded its warning threshold value.
Using Dell PRO Pack 29
Restrict
Page 30
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell Event ID
1054 Temperature
1104 Fan sensor
1154 Voltage sensor
1203 Current sensor
1204 Current sensor
1305 Redundancy
1306 Redundancy
Alert Description in SCOM/ SCE & PRO Tip in SCVMM
sensor detected a failure value
detected a failure value.
detected a failure value.
detected a warning value.
detected a failure value.
degraded.
lost.
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Error A temperature sensor
on the backplane board, system board, CPU, or drive carrier in the specified system exceeded its failure threshold value.
Error A fan sensor in the
specified system detected the failure of one or more fans.
Error A voltage sensor in the
specified system exceeded its failure threshold value.
Warning A current sensor in the
specified system exceeded its warning threshold value.
Error A current sensor in the
specified system exceeded its failure threshold value.
Warning A power supply sensor
reading in the specified system exceeded a warning threshold.
Error A power supply has
been disconnected or has failed.
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict
Restrict
30 Using Dell PRO Pack
Page 31
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in SCVMM
1353 Power supply
detected a warning.
1354 Power supply
detected a failure.
1403 Memory Device
Status Warning
1404 Memory Device
Error.
1703 Battery sensor
detected a warning value.
Severity Alert Cause Dell PRO Tip
Warning A power supply sensor
reading in the specified system exceeded definable warning threshold.
Error A power supply has
been disconnected or has failed.
Warning A memory device
correction rate exceeded an acceptable value.
Error A memory device
correction rate exceeded an acceptable value, a memory spare bank w as act ivated, or a multibit ECC error occurred.
Warning A battery sensor in the
specified system detected that a battery is in a predictive failure state.
(continued)
Recommended Remedial Action
Restrict
Restrict
Restrict
Restrict and Migrate
Restrict
Using Dell PRO Pack 31
Page 32
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell Event ID
2048 Device Failed
2056 Virtual Disk
2057 Virtual Disk
2076 Virtual Disk
2082 Virtual Disk
Alert Description in SCOM/ SCE & PRO Tip in SCVMM
Error.
Failed .
Degraded Wa r ni n g
Check Consistency Failed .
Rebuild Failure
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical A storage component
such as a physical disk or an enclosure has failed. The failed component may have been identified by the controller while performing a task such as a rescan or a check consistency.
Critical One or more physical
disks included in the virtual disk have failed.
Warning This alert message
occurs when a physical disk included in a redundant virtual disk fails.
Critical A physical disk
included in the virtual disk failed or there is an error in the parity information.
Error A physical disk
included in the virtual disk has failed or is corrupt.
Restrict and Migrate
Restrict and Migrate
Restrict
Restrict
Restrict
32 Using Dell PRO Pack
Page 33
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in SCVMM
2083 Physical Disk
Rebuild Failed
Severity Alert Cause Dell PRO Tip
Critical A physical disk
included in the virtual disk has failed or is corrupt.
(continued)
Recommended Remedial Action
Restrict
2094 Predictive
Failure reported
2100 Temperature
exceeded Maximum Wa r ni n g Threshold
2101 Temperature
dropped below Minimum Wa r ni n g Threshold
2102 Temperature
exceeded Maximum Failure Threshold.
2103 Temperature
dropped below the Minimum Failure Threshold.
Warning The physical disk is
predicted to fail.
Warning The physical disk
enclosure is too hot. A variety of factors can cause the excessive temperature.
Warning The physical disk
enclosure is too cool.
Critical The physical disk
enclosure is too hot. A variety of factors can cause the excessive temperature.
Critical The physical disk
enclosure is too cool.
Restrict
Restrict
Restrict
Restrict and Migrate
Restrict and Migrate
Using Dell PRO Pack 33
Page 34
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell Event ID
2112 Enclosure
2122 Redundancy
2123 Redundancy
2125 Controller cache
2129 BGI (Back
Alert Description in SCOM/ SCE & PRO Tip in SCVMM
shutdown
degraded
Lost
pinned for missing or offline VD
Ground Initialization) Failed Error
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical The physical disk
enclosure is either hotter or cooler than the maximum or minimum allowable temperature range.
Warning One or more of the
enclosure components has failed. For example, a fan or power supply may have failed.
War n in g A v ir tua l di s k o r an
enclosure has lost data redundancy.
Warning Controller getting
disconnected from its VD, while IO is happening
Critical BGI of a virtual disk
has failed.
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict
Restrict
2137 Communication
Time-out Wa r ni n g
Warning The controller is
34 Using Dell PRO Pack
unable to communicate with an enclosure.
Restrict and Migrate
Page 35
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in SCVMM
2145 Controller
battery low
Severity Alert Cause Dell PRO Tip
Warning The controller battery
charge is low.
(continued)
Recommended Remedial Action
Restrict
2169 The controller
battery needs to be replaced
2171 The controller
battery temperature is above normal.
2174 The controller
battey has been removed.
2178 The controller
battery Learn cycle has timed out
2187 Single-bit ECC
error limit exceeded on the controller DIMM
Critical The controller battery
cannot recharge. The battery may have been already recharged the maximum number of times. In addition, the battery charger may not be working.
Warning The room temperature
may be too hot. The system fan may also be degraded or failed.
Warning The controller cannot
communicate with the battery. The battery may be removed or the contact point maye degraded
Warning The controller battery
must be fully charged before the Learn cycle can begin.
Warning The controller memory
is malfunctioning.
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict
Restrict and Migrate
Using Dell PRO Pack 35
Page 36
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell Event ID
2201 A global hot
2203 A dedicated hot
2206 The only hot
2207 The only hot
2213 Recharge count
Alert Description in SCOM/ SCE & PRO Tip in SCVMM
spare failed
spare failed
spare available is a SATA disk. SATA disks cannot replace SAS disks
spare available is a SAS disk. SAS disks cannot replace SATA disks
maximum exceeded
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Warning The controller is not
able to communicate with a disk that is assigned as a global hot spare. The disk may have failed or been removed.
Warning The controller is not
able to communicate with a disk that is assigned as a dedicated hot spare.
Warning The only physical disk
available to be assigned as a hot spare is using SATA technology.
Warning The only physical disk
available to be assigned as a hot spare is using SAS technology.
War n in g A v ir tua l di s k o r an
enclosure has lost data redundancy. In the case of a virtual disk, one or more physical disks included in the virtual disk have failed.
Restrict
Restrict
Restrict
Restrict
Restrict
36 Using Dell PRO Pack
Page 37
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in SCVMM
2246 The controller
battery is degraded
Severity Alert Cause Dell PRO Tip
Warning The temperature of the
the battery is high. This maybe due to the battery being charged.
(continued)
Recommended Remedial Action
Restrict
2264 A device is
missing
2265 A device is in an
unknown state
2268 Storage
Management communication Error
2272 Patrol Read
found an uncorrectable media error
Warning The controller cannot
communicate with a device. The device may be removed.
Warning The controller cannot
communicate with a device. The state of the device cannot be determined.
Critical Storage Management
has lost communication with a controller. This may occur if the controller driver or firmware is experiencing a problem.
Critical The Patrol Read task
has encounted an error that cannot be corrected. There may be a bad disk block that cannot be remapped.
Restrict
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
Using Dell PRO Pack 37
Page 38
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell Event ID
2273 A block on the
2282 Hot spare
2283 A redundant
2289 Multi-bit ECC
Alert Description in SCOM/ SCE & PRO Tip in SCVMM
physical disk has been punctured by the controller
SMART polling failed
path is broken
error on controller DIMM
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical The controller
encountered an unrecoverable medium error when attempting to read a block on the physical disk and marked that block as invalid.
Critical The controller
firmware attempted to do SMART polling on the hot spare but was not able to complete the SMART polling.
Warning The controller has two
connectors that are connected to the same enclosure.
Critical An error involving
multiple bits has been encountered during a read or write operation.
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
2290 Single-bit ECC
error on controller DIMM
Warning An error involving a
38 Using Dell PRO Pack
single bit has been encountered during a read or write operation.
Restrict
Page 39
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in SCVMM
2292 Communication
with the enclosure has been lost
2293 EMM
(Enclosure Management Module) Failure
Severity Alert Cause Dell PRO Tip
Critical The controller has lost
communication with an enclosure management module (EMM). The cables may be loose or defective.
Error The failure may be
caused by a loss of power to the EMM.
(continued)
Recommended Remedial Action
Restrict and Migrate
Restrict and Migrate
2298 The enclosure
has a bad sensor
2299 Bad PHY Critical There is a problem with
2300 Unstable
Enclosure Failure
2301 Enclosure
Hardware Error
Warning The enclosure has a
bad sensor. The enclosure sensors monitor the fan speeds, temperature probes, and so on.
a physical connection or PHY.
Critical The controller is not
receiving a consistent response from the enclosure.
Critical The enclosure or an
enclosure component is in a Failed or Degraded state.
Using Dell PRO Pack 39
Restrict
Restrict
Restrict and Migrate
Restrict and Migrate
Page 40
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell Event ID
2302 The enclosure is
2306 Bad block table
2307 Bad block table
2310 A virtual disk is
2312 A power supply
Alert Description in SCOM/ SCE & PRO Tip in SCVMM
not responding
is full
is full.
permanently degraded
in the enclosure has an AC failure
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical The enclosure or an
enclosure component is in a Failed or Degraded state.
Warning The bad block table is
the table used for remapping bad disk blocks. This table fills as bad disk blocks are remapped.
Critical The bad block table is
the table used for remapping bad disk blocks.
Critical A redundant virtual
disk has lost redundancy. This may occur when the virtual disk suffers the failure of more than one physical disk.
Warning The power supply has
an AC failure
Restrict and Migrate
Restrict
Restrict
Restrict and Migrate
Restrict
2313 A power supply
in the enclosure has a DC failure
Warning The power supply has a
40 Using Dell PRO Pack
DC failure.
Restrict
Page 41
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in SCVMM
2314 The
initialization sequence of SAS components failed during system startup. SAS management and monitoring is not possible.
2318 Problems with
the battery or the battery charger have been detected. The battery health is poor.
2319 Single-bit ECC
error on controller DIMM.
Severity Alert Cause Dell PRO Tip
Critical Storage Management is
unable to monitor or manage SAS devices.
Warning The battery or the
battery charger is not functioning properly.
Warning The dual in-line
memory module (DIMM) is beginning to malfunction.
(continued)
Recommended Remedial Action
Restrict and Migrate
Restrict
Restrict and Migrate
2320 Single-bit ECC
error.
Critical The dual in-line
memory module (DIMM) is malfunctioning.
Using Dell PRO Pack 41
Restrict and Migrate
Page 42
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell Event ID
2321 Single-bit ECC
2322 The DC power
2324 The AC power
2327 The NVRAM
2328 The NVRAM
Alert Description in SCOM/ SCE & PRO Tip in SCVMM
error. The controller DIMM is nonfunctional. There will be no further reporting.
supply is switched off.
supply cable has been removed.
has corrupted data. The controller is reinitializing the NVRAM
has corrupt data.
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical The dual in-line
memory module (DIMM) is malfunctioning. Data loss or data corruption is eminent.
Critical The power supply unit
is switched off. Either a user switched off the power supply unit or it is defective.
Critical The power cable may
be pulled out or removed. The power cable may also have overheated and become warped and nonfunctional.
Warning The NVRAM has
corrupted data. This may ocurr after a power surge, a battery failure, or for other reasons. The controller is reinitializing the NVRAM
Warning The NVRAM has
corrupt data. The controller is unable to correct the situation
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
42 Using Dell PRO Pack
Page 43
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in SCVMM
2329 SAS port report Warning The text for this alert is
2337 The controller is
unable to recover cached data from the battery backup unit (BBU).
2340 The background
initialization (BGI) completed with uncorrectable errors.
2342 The Check
Consistency found inconsistent parity data. Data redundancy may be lost.
2349 A bad disk block
could not be reassigned during a write operation.
Severity Alert Cause Dell PRO Tip
Critical The controller was
Critical The background
War n ing Th e dat a on a s ou rc e
Critical A write operation could
(continued)
generated by the controller and can vary depending on the situation.
unable to recover data from the cache.
initialization task encountered errors that cannot be corrected.
disk and the redundant data on a target disk is inconsistent.
not complete because the disk contains bad disk blocks that could not be reassigned. Data loss may have occurred.
Recommended Remedial Action
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict and Migrate
Restrict
Using Dell PRO Pack 43
Page 44
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell Event ID
2350 There was an
2356 SAS SMP
2357 SAS expander
Alert Description in SCOM/ SCE & PRO Tip in SCVMM
unrecoverable disk media error during the rebuild or recovery operation
communications error.
error
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical The rebuild or recovery
operation encountered an unrecoverable disk media error.
Critical The text for this alert is
generated by the firmware and can vary depending on the situation. The reference to SMP in this text refers to SAS Management Protocol.
Critical There may be a
problem with the enclosure. Verify the health of the enclosure and its components.
Restrict
Restrict
Restrict
44 Using Dell PRO Pack
Loading...