Reproduction of these materials in any manner whatsoever without the written permission of Dell Inc.
is strictly forbidden.
Trademarks used in this text: Dell, the DELL logo, Pow erE dg e, and OpenManage are trademarks of
Dell Inc.; Hyper-V, Microsoft, Windows, Windows Vista, and Windows Server are either trademarks or
registered trademarks of Microsoft Corporation in the United States and/or other countries.
Other trademarks and trade names may be used in this document to refer to either the entities claiming
the marks and names or their products. Dell Inc. disclaims any proprietary interest in trademarks and
trade names other than its own.
Alert Cause and Recovery Action . . . . . . . . . . . . 29
. . . . . . . 22
. . . . . . . . . 27
4Contents
Page 5
1
Introduction
This document is intended for system administrators who use the
Dell™ Server PRO Management Pack (Dell PRO Pack) to monitor
Dell systems and take remedial action when an inefficient system
is identified.
The Dell PRO Pack integrates with the following:
•Microsoft® System Center Operations Manager (SCOM) 2007 SP1
•SCOM 2007 R2
•System Center Essentials (SCE) 2007 version1
•System Center Virtual Machine Manager (SCVMM) 2008
•SCVMM 2008 R2
This integration enables you to proactively manage virtual environments and
ensure high availability of your Dell systems.
To implement PRO Pack, see "Getting Started with Dell PRO Pack".
CAUTION: Due to the possibility of data corruption and/ or data loss, it is
recommended that the procedures in this document should be performed only by
personnel with knowledge and experience of using the Microsoft Windows
operating system and Systems Center Operations Manager 2007/ System Center
Essentials 2007.
®
NOTE: The Readme file, DellPROMP2.0_Readme.txt contains the latest information
about the software and management station requirements, as well as information
about known issues. It is posted on the Systems Management documentation page
on the Dell Support website at support.dell.com/manuals. The readme file is also
packaged in the self-extracting executable Dell_ PROPack_2.0.0_A00.exe.
Introduction5
Page 6
What’s New in this Release?
This release of PRO Pack supports the following:
•SCOM 2007 R2
•SCVMM 2008 R2
•Virtual machine live migration with no downtime
•Feature to Override Dell PRO Pack default recovery actions
•Additional Dell OpenManage™ alerts
•Change in the names of recovery actions from "Maintenance mode" and
"VM Migration" in PRO Pack 1.0 to "Restrict", and "Restrict and Migrate"
mode
•Improvements on the resolutions of some old alerts
For more information on the alerts and their resolutions, see "Alert Cause and
Recovery Action."
Overview
SCOM 2007/SCE 2007 uses PRO-enabled Management Pack to collect and
store information on Dell hardware along with a description of their health
status. Dell PRO Pack works with SCOM/SCE (henceforth, referred to as
Operations Manager) and SCVMM 2008 to manage Dell physical devices and
their hosted virtual machines using this available health information. Dell PRO
Pack recommends remedial actions when monitored objects transition to an
unhealthy state (for example, virtual disk failure or predictive drive error),
by leveraging the monitoring and alerting capabilities of Operations Manager
and remediation capabilities of SCVMM.
Related Terms
•
A managed system
Administrator, which is monitored and managed using Operations
Manager and SCVMM. It can be managed locally or remotely using
supported tools.
A management station
•
based Dell System that has the Operations Manager and SCVMM
installed to manage virtual workloads.
6Introduction
is a Dell system running Dell™ OpenManage™ System
(or)
managing station
can be a Microsoft Windows -
Page 7
What is a PRO Tip?
PRO (Performance and Resource Optimization) Tip is a feature that enables
monitoring of your virtualized infrastructure and alerting when there is an
opportunity to optimize the usage of these resources. A PRO Tip window
contains the description of the event that produced the PRO Tip and the
suggested remedial action. This feature allows you to perform a load-balance
of virtual machines between physical hosts when specific threshold values are
reached. Alternatively, you can migrate virtual machines when a hardware
failure is detected.
The PRO Tip window in the SCVMM Administrator console enables you to
view active PRO Tips for the host groups. The Operations Manager console
displays the corresponding alerts as well, to ensure a consistent monitoring
experience.
You can implement the recommended action mentioned in the PRO Tip
manually. You can also configure PRO tip to implement the recommended
action automatically.
Feature Highlights
Dell PRO Pack:
•Performs PRO-management of Dell PowerEdge™ systems running
Microsoft Hyper-V™ platforms, by continually monitoring the health of
your physical and virtual infrastructure.
•Works with Operations Manager and SCVMM to detect events such as
loss of power supply redundancy, higher temperature than threshold
values, system storage battery error, virtual disk failure, and so on. For more
information on events supported by Dell PRO Pack, see "Alert Cause and
Recovery Action".
•Generates PRO Tip when the monitored hardware moves to an unhealthy
state.
•Minimizes downtime by implementing the remedial action provided on
PRO Tips. The two remedial actions are:
•
Restrict:
temporarily unavailable for placement of new virtual machines until
the maintenance tasks have been completed.
In this mode, it is recommended that the server should be
Introduction7
Page 8
•
Managed System 1Managed System 2
Management StationManagement Station
Dell PowerEdge
Hyper-V Hypervisor
Management Agents
(SCOM/SCE & SCVMM)
Dell OMSA
VMVM
Dell PowerEdge
Hyper-V Hypervisor
Dell OMSA
Dell PowerEdge
SCE 2007 SP1/
SCOM 2007 SP1/R2
Dell PowerEdge
SCVMM 2008/R2
I
m
p
l
e
m
e
n
t
s
R
e
so
l
u
t
i
o
n
Notifies
Alerts
VM
Dell
PRO Pack
Management Agents
(SCOM/SCE & SCVMM)
Restrict and migrate:
In this mode, in order to prevent loss of service
from the virtual workloads, it is recommended that all running virtual
machines be migrated from the server to another healthy server
immediately.
Understanding PRO Tip Management
To help you understand how Dell PRO Pack works, this section explains a
typical setup and the sequence of events involved.
Figure 1-1. Interaction of Components
In the figure, a group of PowerEdge systems are the managed systems. Two
PowerEdge systems act as management stations hosting the Operations
Manager and SCVMM. Dell OpenManage Server Administrator generates
alerts with corresponding severity when there is a transition to an unhealthy
state and the same alerts are monitored by Dell PRO Pack for PRO.
Dell PRO Pack contains a mapping between Server Administrator alerts and
the associated remedial action.
8Introduction
Page 9
The following table describes the sequence of events that occur in generating
and handling of a typical PRO Tip.
Table 1-1. Sequence of events with description
Sequence NumberEvent
1Operations Manager agents on the host enable to
detect the warning, error, or failure alerts that are
logged by Dell OpenManage Server Administrator.
2Alert is sent to Operations Manager.
3Operations Manager console displays active PRO
specific alerts.
4Operations Manager notifies the alert and the
associated PRO Tip ID to SCVMM.
5SCVMM displays a corresponding entry in the
PRO Tip window with remedial action.
6Implement the PRO Tip to enable recovery action on
the managed system that is, either placing the
managed system in the restricted mode, or restrict and
migrate virtual machines from the managed system.
7SCVMM notifies Operations Manager about the
successful completion of the recovery action.
8The SCVMM console displays the status of the PRO
Tip as "Resolved" after it is successfully implemented.
9PRO Tip disappears from SCVMM PRO tip window.
10PRO Active alert disappears from SCOM.
For more information on the types of events and the associated remedial
actions, see "Alert Cause and Recovery Action".
Introduction9
Page 10
Supported Operating Systems
For the detailed Operating Systems support matrix, see the Dell PRO Pack
readme file, DellPROMP2.0_Readme.txt. You
the self-extra
posted on the Systems Management documentation page on the
Dell Support website at support.dell.com/manuals.
cting executable -
Dell_ PROPack_2.0.0_A00.exe.
can find the readme packaged in
It is also
Other Documents You May Need
Besides this guide, you can find the following guides on the Systems
Management and Systems documentation pages on the Dell Support website
at support.dell.com/manuals:
•The
•The
•The
Dell OpenManage Server Administrator CIM Reference Guide
documents the Common Information Model (CIM) provider, an extension
of the standard management object format (MOF) file. The CIM provider
MOF documents supported classes of management objects.
Dell OpenManage Server Administrator Messages Reference Guide
the messages that are displayed in your Server Administrator home page
Alert log or on your operating system’s event viewer. This guide explains
the text, severity, and cause of each service alert message that Server
Administrator issues.
Dell OpenManage Server Administrator Command Line Interface
User's Guide
Administrator, including an explanation of the command line interface
(CLI) commands to view system status, access logs, create reports,
configure various component parameters, and set critical thresholds.
documents the complete command line interface for Server
lists
10Introduction
Page 11
•The
The Dell Systems Management Tools and Documentation DVD contains
a readme file for Server Administrator and additional readme files for
other systems management software applications found on the DVD.
For documentation on virtualization solutions, see the Dell Support website at
support.dell.com/manuals.
Dell OpenManage Server Administrator Storage Management User's Guide
is a comprehensive reference guide for configuring and managing local and
remote storage attached to a system. This document is also available in
Dell
HTML and PDF formats on the
Documentation DVD
online help.
and from the Storage Management console as
Systems Management Tools and
Obtaining Technical Assistance
If at any time you do not understand a procedure described in this guide, or
if your product does not perform as expected, different types of help are
available. For more information see "Getting Help" in your system’s
Installation and Troubleshooting Guide or the Hardware Owner’s Manual.
Additionally, Dell Enterprise Training and Certification is available;
see www.dell.com/training for more information. This service might not
be offered in all locations.
Introduction11
Page 12
12Introduction
Page 13
2
Getting Started with Dell PRO Pack
Minimum Requirements
To implement the Dell™ PRO Pack, you must ensure that the following
minimum execution environment exists:
•Management Station:
•Microsoft
SP1/R2 or System Center Essentials (SCE) 2007 installed on
supported hardware and operating system
•System Center Virtual Machine Manager (SCVMM) 2008/R2
installed on supported hardware and operating system
•Integration of SCOM and SCVMM
•Managed System:
•Microsoft Hyper-V™ hosts on any Dell PowerEdge™ systems ranging
from x9xx to xx1x (both inclusive)
•Dell OpenManage™ Server Administrator (including the Server
Administrator Storage Management Service.)
•It is recommended that you install the latest version of Dell
•Minimum supported version of OMSA is 5.3
•Live Migration:
•SCVMM R2 with Windows Server 2008 R2 or Microsoft Hyper-V
Server 2008 R2
•OpenManage 6.2
You can download the latest version of OMSA from the Dell Support
website at
®
System Center Operations Management (SCOM) 2007
OpenManage Server Administrator(OMSA) 6.2
support.dell.com
.
NOTE: For the list of supported operating systems for Operations Manager and
SCVMM, see the Microsoft website at
http://technet.microsoft.com/hiin/library/bb309428(en-us).aspx.
Getting Started with Dell PRO Pack13
Page 14
Installing SCOM/SCE and SCVMM Agents
When you use the setup to monitor your infrastructure, SCOM/SCE
(Operations Manager) and SCVMM agents installed on the managed hosts
enable data transfer between the managed system and management stations.
Agents of both SCVMM and Operations Manager are installed manually or
automatically during the discovery process on all Hyper-V hosts.
Integrating Operations Manager with SCVMM
For the setup to support Dell PRO Pack, Operations Manager must be integrated
with SCVMM. For detailed description of the steps, see the Microsoft TechNet
Library.
For SCOM & VMM 2008 Integration, see http://technet.microsoft.com/hi-
in/library/cc956099(en-us).aspx.
For SCE & VMM 2008 Integration, see
http://go.microsoft.com/fwlink/?LinkId=148206.
For SCOM & VMM R2 Integration, see http://technet.microsoft.com/hiin/library/ee236463(en-us).aspx.
Importing Dell PRO Pack
Dell PRO Pack version 2.0 is provided in a sealed format as a .mp file. To
import Dell PRO Pack:
1
Download the
website to a removable media or a local repository.
2
Extract the contents of the file to a suitable folder on your system.
section, as shown in Figure 2-1.
Operations Manager displays this generic warning as a part of the security
process when you manually install a management pack. For more
information on how you can change the security settings for installing
Management Packs manually, see the Microsoft TechNet Library.
Figure 2-1. Security Warning Message
7
Click
Install
.
A confirmation dialog box is displayed.
8
Click
Yes
.
For alerts and PRO Tips to be generated, ensure that SCVMM discovers the
managed objects and displays them in the State View.
Getting Started with Dell PRO Pack15
Page 16
Configuring PRO Tips
The Dell systems and virtual infrastructure are monitored for either Critical
only, or both Critical and Warning alerts.
•A
Wa rn in g
below the acceptable level. For example, the component may still be
functioning, but it could potentially fail, or the component may be
functioning in an impaired state.
Critical
•A
failure is imminent. By default, the monitoring level is set to "Warning and
Critical".
To enable PRO Tips for both War ni ng and Critical alerts and automatic
implementation of Pro Tips:
1
Launch the SCVMM console.
2
In the
The
Figure 2-2.
Figure 2-2. Configuring PRO Tips
alert is generated when a reading for the component is above or
alert is generated when the component has either failed or
Host Groups
Host Groups Properties for All Hosts
section, right-click
All Hosts
and select
window appears, as shown in
Properties
.
16Getting Started with Dell PRO Pack
Page 17
3
Select the
PRO
tab and select the
Enable PRO on this Host Group
option.
4
By default, the monitoring level is set to
Warning and Critical
, which
means that the application will display PRO Tips generated for both
Warning and Critical alerts. To restrict the PRO Tips to Critical alerts only,
select the
5
Select the
Critical only
Automatically implement PRO tips on this Host Group
option.
option.
NOTE: By default, the automation level is set to Critical only, which means
that PRO Tips with a Critical severity level are automatically implemented.
To implement all PRO Tips automatically, select the
Warning and Critical
option.
6
Click OK to save your settings.
Testing the Setup Using Scenarios
To confirm that the imported Dell PRO Pack is fully functional, create the
scenarios listed in the two tables and check if the activities listed under
Expected System Response column are carried out.
Scenario 1 - The backplane board temperature exceeds its warning threshold
value on a managed system.
Table 2-1. Checking recovery action for warning alert conditions.
Your ActionsExpected System Response
Generate a temperature alert warning
on the managed system, such that
the backplane board temperature
exceeds its warning threshold limit.
The event id is 1053 with the source
being OpenManage Server
Administrator.
Select the Implement option in the
PRO Tip window.
• Dell PRO Pack generates the corresponding
alert in Operations Manager.
• Operations Manager passes an alert
associated with the PRO Tip to SCVMM.
• PRO Tip appears in the SCVMM PRO
Tip window.
Places the host in Restrict mode.
Getting Started with Dell PRO Pack17
Page 18
Table 2-1. Checking recovery action for warning alert conditions.
(continued)
Your ActionsExpected System Response
Verify that the host is placed in the
Restrict mode and the PRO Tip
resolved the alert.
• After successful implementation of the PRO
Tip, the status changes to "Resolved" and
the PRO Tip entry is moved out of the PRO
Tip window.
• Corresponding alert disappears in the
Operations Manager Alert View.
Select the Dismiss option instead of
the Implement option in the PRO
Tip window.
The PRO Tip is dismissed. No recovery task is
performed. The corresponding PRO Tip entry
is moved out of the PRO Tip window.
Scenario 2 - the backplane board temperature exceeds its failure threshold
value on a managed system.
Table 2-2. Checking recovery action for failure alert conditions.
Your ActionsExpected System Response
Generate a temperature alert on
the managed system, such that
the backplane board temperature
exceeds its failure threshold limit.
The event id is 1054 with the
source being OpenManage Server
Administrator.
Select the Implement option in
the PRO Tip window.
• Dell PRO Pack generates the corresponding
alert in Operations Manager.
• Operations Manager passes an alert associated
with the PRO Tip to SCVMM.
• PRO Tip appears on the SCVMM PRO
Tip window.
SCVMM generates the following recovery
actions:
a
Sets the host in Restrict mode.
b
Determines the list of virtual systems running
on the unhealthy host.
c
Determines the best-rated healthy host.
d
Migrates the virtual machine to best-rated
host.
e
Repeats this action until all the running
virtual machines are migrated from the
unhealthy host.
18Getting Started with Dell PRO Pack
Page 19
Table 2-2. Checking recovery action for failure alert conditions.
Your ActionsExpected System Response
Verify that the virtual systems are
moved to a healthy host and PRO
Tip resolved the alert.
Select the Dismiss option instead
of the Implement option in the
PRO Tip window.
• After successful implementation of the PRO
Tip, the status changes to "Resolved" and the
PRO Tip entry is moved out of the PRO Tip
window.
• Corresponding alert disappears in the
Operations Manager Alert View.
No action is taken and virtual systems are not
moved. The corresponding PRO Tip entry is
moved out of the PRO Tip window.
For more information, see "Using Health
Explorer to Reset Alerts".
(continued)
Uninstalling PRO Pack
You can uninstall PRO Pack by deleting it in the Operations Manager console.
When you delete PRO Pack, all the settings and thresholds associated with it
are removed from Operations Manager.
To uninstall PRO Pack:
1
Launch the Operations Manager console.
2
Select
Administration→ Management Packs
3
In the
Management Packs
Management Pack and click
pane, right-click Dell PRO-enabled
Delete
.
.
Security Considerations
Operations Console access privileges are handled internally by SCOM/SCE.
This can be setup using the User Roles option under Administration → Security feature on the SCOM/SCE console. The profile of the role assigned
to you determines what actions you can perform and which objects you are
able to manage. For more information on security considerations see the
Microsoft System Center Operations Manager SP1/R2 and Microsoft Systems
Centre Essentials 2007 online help.
Getting Started with Dell PRO Pack19
Page 20
20Getting Started with Dell PRO Pack
Page 21
3
Click the PRO Tips menu.
Using Dell PRO Pack
Monitoring Using SCVMM
You can manage the health of your virtualized environment using PRO Tips
displayed on the SCVMM console.
To see the PRO Tip window, click the PRO Tips menu on the toolbar located
below the main menu, as shown in Figure 3-1. The menu also shows the
number of active PRO Tips in brackets.
Figure 3-1. PRO Tip Button on the SCVMM Console
Alternatively, if you select the Show this window when new PRO Tips are
created option in the PRO Tip window, the window opens automatically on
the SCVMM console when a PRO Tip is generated.
The PRO Tip window displays information in a tabular format about
the source, tip (
machine
You can see a description of the problem that triggered the alert, the cause,
and suggested remedial action for recovery below the table.
), and state.
a concise statement of the problem associated with the host
Using Dell PRO Pack21
Page 22
Figure 3-2. PRO Tip Window
Implementation of Recovery Actions
The PRO Tip window provides an option to either implement or dismiss the
recommended action. If you select the Implement option, SCVMM
implements one of the recovery tasks described below, based on the nature of
the alert.
Placing the host in Restrict mode
Placing a host in Restrict mode prevents future assignment of workload to the
host until the problem is resolved.
When a host is placed in the Restrict mode, it still receives alerts in the
Operations Manager and associated PRO Tips in SCVMM.
The system health conditions that can trigger the maintenance mode tasks
are non-critical hardware alerts on the virtualization host such as, ambient
chassis temperature warning alert on a Dell™ PowerEdge™ virtualization host
system.
Migration of virtual machines
The PRO Tip management pack uses SCVMM algorithms to move virtual
machines from the affected system to a healthy one. The two SCVMM
algorithms are Load Balance and Resource Maximization.
22Using Dell PRO Pack
Page 23
Select the Load Balance algorithm if you want SCVMM to evenly distribute
virtual machines (VMs) across a pool of hosts.
Select the Resource Maximization algorithm if you prefer to saturate the
host completely before moving to a new one.
The placement requirements for identifying a healthy system and moving the
virtual machines are as follows:
•
Hardware requirements
are requirements that a machine hosting
the virtual machines must meet in order to run such as sufficient memory
and storage.
•
Software requirements
are requirements that if met by the host, allows a
virtual machine to perform more optimally such as CPU allocation,
network bandwidth, network availability, disk IO bandwidth, and free
memory.
SCVMM assigns a star rating to hosts in a range of zero to five. If a hardware
requirement is not met, for example, not enough hard disk and memory
capacity, the host automatically gets zero stars and SCVMM does not allow
you to place a VM on that host.
The system health conditions that trigger migration of VMs are hardware
failure alerts on a virtualization host, such as virtual disk failure and predictive
drive error. Dell PRO Pack migrates VMs with the Running status. It does not
migrate VMs with status such as Stop, Pause, and Saved.
After you successfully implement the recovery task the following changes
take place:
•The status of PRO Tip changes to
Resolved
and the PRO Tip entry moves
out of the PRO Tip window.
•Corresponding alert disappears in the Operations Manager Alert View.
•An entry is displayed in the
This entry shows the status of the job as
Jobs
section on the SCVMM console.
Completed
, as shown in the
Figure 3-3.
Using Dell PRO Pack23
Page 24
Figure 3-3. Completed Job
PRO Tip implementation of moving VMs can fail if no other healthy hosts are
available in the host group or host cluster. In such a case, the PRO Tip
window displays the state of the corresponding PRO Tip as Failed, and the
reason is elaborated in the Error section. The status of the corresponding
entry in the Jobs section on the SCVMM console also displays as Failed.
NOTE: In the PRO Tip window the failure message is updated dynamically.
However, to refresh the data you have to click outside the PRO Tip window and
then click again to bring the window in focus.
If you select the Dismiss option, the PRO Tip is not executed and the
following changes take place:
•The PRO Tip is removed from the SCVMM PRO Tip console.
•The alert in Operations Manager is removed from the
.
Alerts
Dell Server PRO
For more informartion, see "Using Health Explorer to Reset Alerts."
24Using Dell PRO Pack
Page 25
VM Live Migration
With live migration, you can migrate a VM from one node of a Windows
Server 2008 R2 failover cluster to another node in the same cluster without
any downtime. As a connected user, you will not experience any interruption
during live migration.
The difference in quick migration and live migration is that there is a
downtime in quick migration whereas, there is no downtime in live migration.
NOTE: Windows Server 2008 Hyper-V supports Quick Migration. Windows Server
2008 R2 Hyper-V supports both Quick Migration and Live Migration.
Figure 3-4. Live Migration
For more information about Hyper-V live migration, see
http://go.microsoft.com/fwlink/?LinkId=147115.
Using Dell PRO Pack25
Page 26
Monitoring Using PRO Specific Alerts on
SCOM/SCE
You can monitor the physical devices in your network using the Operations
Manager console.
The Operations Manager console provides the following views:
•
Alert View
information on the severity level, source, name, resolution state, along
with the date and time of creation. To access the Alert View do the
following:
a
b
c
The alerts are displayed on the right-side of the screen, as shown in
Figure 3-5.
Figure 3-5. Alert View
- Displays Dell PRO specific alerts in a tabular format with
Launch the Operations Manager console.
Select the
Select
Monitoring
Dell Server PRO Alerts
tab.
from
Dell Server PRO Pack.
26Using Dell PRO Pack
Page 27
•
State View
format. The State View displays objects with the name, path, storage
health of the Dell system, and so on. You can personalize the State View by
defining which objects you want displayed and customizing how the data
looks.
Figure 3-6. State View
For more information on creating a State view see the Microsoft website.
- Displays the Dell system objects discovered in a tabular
Using Health Explorer to Reset Alerts
Health Explorer enables you to view and take action on alerts. When you
select the Dismiss option in the PRO Tip window the alert is removed from
the PRO Tip window. However, to reset this alert manually in the Health
Explorer do the following:
Launch the
1
2
Right-click the alert that you want to close.
3
Select
Health Explorer
Reset Health
window from the
.
Actions
pane.
Using Dell PRO Pack27
Page 28
Recovery Action Overrides
PRO Pack 2.0 supports two recovery actions. The following flag values trigger
the respective recovery action:
•1: For migration recovery action
•2: For placing the server in restricted mode
You can override the default recovey actions by changing the default recovery
action flag value. For example, change the recovery flag value from '2' to '1'
with the overrides option provided in SCOM
After overriding the default value to '1', and on implementation of PRO Tip,
recovery action will trigger migration of virtual machines from the host.
PRO Pack 2.0 supports only two override values, '1' and '2'. If you enter any
other value, PRO Tip implementation fails and an error message is displayed.
To override the recovery action,
1
Click the
2
Search for the Dell PRO Pack monitors.
3
Select the monitor which you intended to override.
4
Right click and select
5
Select the
6
Change the value of RecoveryOverrideFlag.
7
Select
8
Click
Authoring
tab in SCOM.
Override Recovery.
Override
NOTE: When you select Enable, SCOM performs an auto-implementation for
the unit monitor. Since, this involves VMM migration, review and set the
values accordingly.
Enforce
check box.
check box.
Apply
CAUTION: Saving the settings in the default management pack, creates a
dependency between PRO Pack and the management pack. When you remove or
delete PRO Pack, you must delete the default management pack as well, as it
contains default settings for SCOM. Hence, it is recommended that you save
settings using a new MP.
9
Click
Save
overrides.
10
Generate an alert and PRO Tip.
11
Select
Implement PRO Tip
.
28Using Dell PRO Pack
Page 29
This verifies that the overridden recovery action is successful.
Figure 3-7. Override Recovery Action
Alert Cause and Recovery Action
The following table lists the alerts and the corresponding recommended
remedial action:
Restrict: It is recommended that the server should be temporarily unavailable
for placement of new VMs until the maintenance tasks have been completed.
Restrict and Migrate: In this mode, in order to prevent loss of service from
the virtual workloads, it is recommended that all running VMs be migrated
from the server to another healthy server immediately.
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in
SCVMM
1053Temperature
sensor detected
a warning value
Severity Alert CauseDell PRO Tip
Recommended
Remedial Action
WarningA temperature sensor
on the backplane
board, system board,
CPU, or drive carrier in
the specified system
exceeded its warning
threshold value.
Using Dell PRO Pack29
Restrict
Page 30
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell
Event
ID
1054Temperature
1104Fan sensor
1154Voltage sensor
1203Current sensor
1204Current sensor
1305Redundancy
1306Redundancy
Alert Description
in SCOM/ SCE &
PRO Tip in
SCVMM
sensor detected
a failure value
detected a
failure value.
detected a
failure value.
detected a
warning value.
detected a
failure value.
degraded.
lost.
Severity Alert CauseDell PRO Tip
Recommended
Remedial Action
ErrorA temperature sensor
on the backplane
board, system board,
CPU, or drive carrier in
the specified system
exceeded its failure
threshold value.
ErrorA fan sensor in the
specified system
detected the failure of
one or more fans.
ErrorA voltage sensor in the
specified system
exceeded its failure
threshold value.
WarningA current sensor in the
specified system
exceeded its warning
threshold value.
ErrorA current sensor in the
specified system
exceeded its failure
threshold value.
WarningA power supply sensor
reading in the specified
system exceeded a
warning threshold.
ErrorA power supply has
been disconnected or
has failed.
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict
Restrict
30Using Dell PRO Pack
Page 31
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in
SCVMM
1353Power supply
detected a
warning.
1354Power supply
detected a
failure.
1403Memory Device
Status Warning
1404Memory Device
Error.
1703Battery sensor
detected a
warning value.
Severity Alert CauseDell PRO Tip
WarningA power supply sensor
reading in the specified
system exceeded
definable warning
threshold.
ErrorA power supply has
been disconnected or
has failed.
WarningA memory device
correction rate
exceeded an acceptable
value.
ErrorA memory device
correction rate
exceeded an acceptable
value, a memory spare
bank w as act ivated, or a
multibit ECC error
occurred.
WarningA battery sensor in the
specified system
detected that a battery
is in a predictive failure
state.
(continued)
Recommended
Remedial Action
Restrict
Restrict
Restrict
Restrict and Migrate
Restrict
Using Dell PRO Pack31
Page 32
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell
Event
ID
2048Device Failed
2056Virtual Disk
2057Virtual Disk
2076Virtual Disk
2082Virtual Disk
Alert Description
in SCOM/ SCE &
PRO Tip in
SCVMM
Error.
Failed .
Degraded
Wa r ni n g
Check
Consistency
Failed .
Rebuild Failure
Severity Alert CauseDell PRO Tip
Recommended
Remedial Action
CriticalA storage component
such as a physical disk
or an enclosure has
failed. The failed
component may have
been identified by the
controller while
performing a task such
as a rescan or a check
consistency.
CriticalOne or more physical
disks included in the
virtual disk have failed.
WarningThis alert message
occurs when a physical
disk included in a
redundant virtual disk
fails.
CriticalA physical disk
included in the virtual
disk failed or there is an
error in the parity
information.
ErrorA physical disk
included in the virtual
disk has failed or is
corrupt.
Restrict and Migrate
Restrict and Migrate
Restrict
Restrict
Restrict
32Using Dell PRO Pack
Page 33
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in
SCVMM
2083Physical Disk
Rebuild Failed
Severity Alert CauseDell PRO Tip
CriticalA physical disk
included in the virtual
disk has failed or is
corrupt.
(continued)
Recommended
Remedial Action
Restrict
2094Predictive
Failure reported
2100Temperature
exceeded
Maximum
Wa r ni n g
Threshold
2101Temperature
dropped below
Minimum
Wa r ni n g
Threshold
2102Temperature
exceeded
Maximum
Failure
Threshold.
2103Temperature
dropped below
the Minimum
Failure
Threshold.
WarningThe physical disk is
predicted to fail.
WarningThe physical disk
enclosure is too hot.
A variety of factors can
cause the excessive
temperature.
WarningThe physical disk
enclosure is too cool.
CriticalThe physical disk
enclosure is too hot. A
variety of factors can
cause the excessive
temperature.
CriticalThe physical disk
enclosure is too cool.
Restrict
Restrict
Restrict
Restrict and Migrate
Restrict and Migrate
Using Dell PRO Pack33
Page 34
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell
Event
ID
2112Enclosure
2122Redundancy
2123Redundancy
2125Controller cache
2129BGI (Back
Alert Description
in SCOM/ SCE &
PRO Tip in
SCVMM
shutdown
degraded
Lost
pinned for
missing or
offline VD
Ground
Initialization)
Failed Error
Severity Alert CauseDell PRO Tip
Recommended
Remedial Action
CriticalThe physical disk
enclosure is either
hotter or cooler than
the maximum or
minimum allowable
temperature range.
WarningOne or more of the
enclosure components
has failed. For example,
a fan or power supply
may have failed.
War n in gA v ir tua l di s k o r an
enclosure has lost data
redundancy.
WarningController getting
disconnected from its
VD, while IO is
happening
CriticalBGI of a virtual disk
has failed.
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict
Restrict
2137Communication
Time-out
Wa r ni n g
WarningThe controller is
34Using Dell PRO Pack
unable to
communicate with an
enclosure.
Restrict and Migrate
Page 35
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in
SCVMM
2145Controller
battery low
Severity Alert CauseDell PRO Tip
WarningThe controller battery
charge is low.
(continued)
Recommended
Remedial Action
Restrict
2169The controller
battery needs to
be replaced
2171The controller
battery
temperature is
above normal.
2174The controller
battey has been
removed.
2178The controller
battery Learn
cycle has timed
out
2187Single-bit ECC
error limit
exceeded on the
controller
DIMM
CriticalThe controller battery
cannot recharge. The
battery may have been
already recharged the
maximum number of
times. In addition, the
battery charger may not
be working.
WarningThe room temperature
may be too hot. The
system fan may also be
degraded or failed.
WarningThe controller cannot
communicate with the
battery. The battery
may be removed or the
contact point maye
degraded
WarningThe controller battery
must be fully charged
before the Learn cycle
can begin.
WarningThe controller memory
is malfunctioning.
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict
Restrict and Migrate
Using Dell PRO Pack35
Page 36
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell
Event
ID
2201A global hot
2203A dedicated hot
2206The only hot
2207The only hot
2213Recharge count
Alert Description
in SCOM/ SCE &
PRO Tip in
SCVMM
spare failed
spare failed
spare available is
a SATA disk.
SATA disks
cannot replace
SAS disks
spare available is
a SAS disk. SAS
disks cannot
replace SATA
disks
maximum
exceeded
Severity Alert CauseDell PRO Tip
Recommended
Remedial Action
WarningThe controller is not
able to communicate
with a disk that is
assigned as a global hot
spare. The disk may
have failed or been
removed.
Warning The controller is not
able to communicate
with a disk that is
assigned as a dedicated
hot spare.
WarningThe only physical disk
available to be assigned
as a hot spare is using
SATA technology.
Warning The only physical disk
available to be assigned
as a hot spare is using
SAS technology.
War n in gA v ir tua l di s k o r an
enclosure has lost data
redundancy. In the case
of a virtual disk, one or
more physical disks
included in the virtual
disk have failed.
Restrict
Restrict
Restrict
Restrict
Restrict
36Using Dell PRO Pack
Page 37
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in
SCVMM
2246The controller
battery is
degraded
Severity Alert CauseDell PRO Tip
WarningThe temperature of the
the battery is high.
This maybe due to the
battery being charged.
(continued)
Recommended
Remedial Action
Restrict
2264A device is
missing
2265A device is in an
unknown state
2268Storage
Management
communication
Error
2272Patrol Read
found an
uncorrectable
media error
WarningThe controller cannot
communicate with a
device. The device may
be removed.
WarningThe controller cannot
communicate with a
device. The state of the
device cannot be
determined.
CriticalStorage Management
has lost
communication with a
controller. This may
occur if the controller
driver or firmware is
experiencing a
problem.
CriticalThe Patrol Read task
has encounted an error
that cannot be
corrected. There may
be a bad disk block that
cannot be remapped.
Restrict
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
Using Dell PRO Pack37
Page 38
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell
Event
ID
2273A block on the
2282Hot spare
2283A redundant
2289Multi-bit ECC
Alert Description
in SCOM/ SCE &
PRO Tip in
SCVMM
physical disk has
been punctured
by the controller
SMART polling
failed
path is broken
error on
controller
DIMM
Severity Alert CauseDell PRO Tip
Recommended
Remedial Action
CriticalThe controller
encountered an
unrecoverable medium
error when attempting
to read a block on the
physical disk and
marked that block as
invalid.
CriticalThe controller
firmware attempted to
do SMART polling on
the hot spare but was
not able to complete
the SMART polling.
WarningThe controller has two
connectors that are
connected to the same
enclosure.
CriticalAn error involving
multiple bits has been
encountered during a
read or write operation.
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
2290Single-bit ECC
error on
controller
DIMM
WarningAn error involving a
38Using Dell PRO Pack
single bit has been
encountered during a
read or write operation.
Restrict
Page 39
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in
SCVMM
2292Communication
with the
enclosure has
been lost
2293EMM
(Enclosure
Management
Module) Failure
Severity Alert CauseDell PRO Tip
CriticalThe controller has lost
communication with
an enclosure
management module
(EMM). The cables
may be loose or
defective.
ErrorThe failure may be
caused by a loss of
power to the EMM.
(continued)
Recommended
Remedial Action
Restrict and Migrate
Restrict and Migrate
2298The enclosure
has a bad sensor
2299Bad PHY Critical There is a problem with
2300Unstable
Enclosure
Failure
2301Enclosure
Hardware Error
WarningThe enclosure has a
bad sensor. The
enclosure sensors
monitor the fan speeds,
temperature probes,
and so on.
a physical connection
or PHY.
CriticalThe controller is not
receiving a consistent
response from the
enclosure.
CriticalThe enclosure or an
enclosure component is
in a Failed or Degraded
state.
Using Dell PRO Pack39
Restrict
Restrict
Restrict and Migrate
Restrict and Migrate
Page 40
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell
Event
ID
2302The enclosure is
2306Bad block table
2307Bad block table
2310A virtual disk is
2312A power supply
Alert Description
in SCOM/ SCE &
PRO Tip in
SCVMM
not responding
is full
is full.
permanently
degraded
in the enclosure
has an AC
failure
Severity Alert CauseDell PRO Tip
Recommended
Remedial Action
CriticalThe enclosure or an
enclosure component is
in a Failed or Degraded
state.
WarningThe bad block table is
the table used for
remapping bad disk
blocks. This table fills
as bad disk blocks are
remapped.
CriticalThe bad block table is
the table used for
remapping bad disk
blocks.
CriticalA redundant virtual
disk has lost
redundancy. This may
occur when the virtual
disk suffers the failure
of more than one
physical disk.
WarningThe power supply has
an AC failure
Restrict and Migrate
Restrict
Restrict
Restrict and Migrate
Restrict
2313A power supply
in the enclosure
has a DC failure
WarningThe power supply has a
40Using Dell PRO Pack
DC failure.
Restrict
Page 41
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in
SCVMM
2314The
initialization
sequence of SAS
components
failed during
system startup.
SAS
management
and monitoring
is not possible.
2318Problems with
the battery or
the battery
charger have
been detected.
The battery
health is poor.
2319Single-bit ECC
error on
controller
DIMM.
Severity Alert CauseDell PRO Tip
CriticalStorage Management is
unable to monitor or
manage SAS devices.
WarningThe battery or the
battery charger is not
functioning properly.
WarningThe dual in-line
memory module
(DIMM) is beginning
to malfunction.
(continued)
Recommended
Remedial Action
Restrict and Migrate
Restrict
Restrict and Migrate
2320Single-bit ECC
error.
Critical The dual in-line
memory module
(DIMM) is
malfunctioning.
Using Dell PRO Pack41
Restrict and Migrate
Page 42
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell
Event
ID
2321Single-bit ECC
2322The DC power
2324The AC power
2327The NVRAM
2328The NVRAM
Alert Description
in SCOM/ SCE &
PRO Tip in
SCVMM
error. The
controller
DIMM is
nonfunctional.
There will be no
further
reporting.
supply is
switched off.
supply cable has
been removed.
has corrupted
data. The
controller is
reinitializing the
NVRAM
has corrupt data.
Severity Alert CauseDell PRO Tip
Recommended
Remedial Action
CriticalThe dual in-line
memory module
(DIMM) is
malfunctioning. Data
loss or data corruption
is eminent.
CriticalThe power supply unit
is switched off. Either a
user switched off the
power supply unit or it
is defective.
CriticalThe power cable may
be pulled out or
removed. The power
cable may also have
overheated and
become warped and
nonfunctional.
WarningThe NVRAM has
corrupted data. This
may ocurr after a power
surge, a battery failure,
or for other reasons.
The controller is
reinitializing the
NVRAM
WarningThe NVRAM has
corrupt data. The
controller is unable to
correct the situation
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
42Using Dell PRO Pack
Page 43
Table 3-1. Alert Cause and Recovery Action
Dell
Alert Description
Event
in SCOM/ SCE &
ID
PRO Tip in
SCVMM
2329SAS port report WarningThe text for this alert is
2337The controller is
unable to
recover cached
data from the
battery backup
unit (BBU).
2340The background
initialization
(BGI)
completed with
uncorrectable
errors.
2342The Check
Consistency
found
inconsistent
parity data. Data
redundancy may
be lost.
2349A bad disk block
could not be
reassigned
during a write
operation.
Severity Alert CauseDell PRO Tip
CriticalThe controller was
CriticalThe background
War n ingTh e dat a on a s ou rc e
CriticalA write operation could
(continued)
generated by the
controller and can vary
depending on the
situation.
unable to recover data
from the cache.
initialization task
encountered errors that
cannot be corrected.
disk and the redundant
data on a target disk is
inconsistent.
not complete because
the disk contains bad
disk blocks that could
not be reassigned. Data
loss may have occurred.
Recommended
Remedial Action
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict and Migrate
Restrict
Using Dell PRO Pack43
Page 44
Table 3-1. Alert Cause and Recovery Action
(continued)
Dell
Event
ID
2350There was an
2356SAS SMP
2357SAS expander
Alert Description
in SCOM/ SCE &
PRO Tip in
SCVMM
unrecoverable
disk media error
during the
rebuild or
recovery
operation
communications
error.
error
Severity Alert CauseDell PRO Tip
Recommended
Remedial Action
CriticalThe rebuild or recovery
operation encountered
an unrecoverable disk
media error.
CriticalThe text for this alert is
generated by the
firmware and can vary
depending on the
situation. The
reference to SMP in
this text refers to SAS
Management Protocol.
CriticalThere may be a
problem with the
enclosure. Verify the
health of the enclosure
and its components.
Restrict
Restrict
Restrict
44Using Dell PRO Pack
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.