Dell Server Pro Management Pack 2.1 User Manual

Page 1
Dell Server PRO Management
Pack 2.1 for Microsoft
System Center
Virtual Machine Manager
User’s Guide
Page 2
Notes and Cautions
NOTE: A NOTE indicates important information that helps you make better use of
CAUTION: A CAUTION indicates potential damage to hardware or loss of data if
instructions are not followed.
____________________
Information in this document is subject to change without notice. © 2011 Dell Inc. All rights reserved.
Reproduction of these materials in any manner whatsoever without the written permission of Dell Inc. is strictly forbidden.
Trademarks used in this text: Dell™, the DELL logo, PowerEdge™, and OpenManage™ are trademarks of Dell Inc. Hyper-V trademarks or registered trademarks of Microsoft Corporation in the United States and/or other countries.
Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell Inc. disclaims any proprietary interest in trademarks and trade names other than its own.
2011 – 10
®
, Microsoft®, Windows®, and Windows Server® are either
Page 3
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 5
What’s New . . . . . . . . . . . . . . . . . . . . . . . . 6
Overview
Related Terms . . . . . . . . . . . . . . . . . . . . . . . 7
What is a PRO Tip?
Features and Functionalities
Understanding PRO Tip Management . . . . . . . . . . 9
Supported Operating Systems
. . . . . . . . . . . . . . . . . . . . . . . . . 6
. . . . . . . . . . . . . . . . . . . . 7
. . . . . . . . . . . . . . . 8
. . . . . . . . . . . . . . 11
2 Using Dell Performance Resource
Optimization Pack . . . . . . . . . . . . . . . . . . 13
Monitoring Using SCVMM . . . . . . . . . . . . . . . 13
Implementing Recovery Actions
Migration of Virtual Machines . . . . . . . . . . . 14
Monitoring Using PRO Specific Alerts on SCOM/SCE
Using Health Explorer to Reset Alerts
. . . . . . . . . . . . . . . . . . . . . . 16
. . . . . . . . . . 14
. . . . . . . . . 17
Overriding Recovery Actions
Alerts and Recovery Actions
. . . . . . . . . . . . . . 18
. . . . . . . . . . . . . . 19
Contents 3
Page 4
3 Related Documentation and
Resources 37
Security Considerations . . . . . . . . . . . . . . . . . 37
Other Documents You May Need
Getting Technical Assistance
. . . . . . . . . . . . 37
. . . . . . . . . . . . . . 38
4 Contents
Page 5
1

Introduction

This document is intended for system administrators who use the Dell Server PRO Management Pack (Dell PRO Pack) to monitor Dell systems and take remedial action when an inefficient system is identified.
The Dell PRO Pack integrates with the following:
Microsoft System Center Operations Manager (SCOM) 2007 SP1
SCOM 2007 R2
System Center Essentials (SCE) 2007 with SP1
SCE 2010
System Center Virtual Machine Manager (SCVMM) 2008
SCVMM 2008 R2
SCVMM 2008 R2 with SP1
This integration enables you to proactively manage virtual environments and ensure high availability of your Dell systems.
To implement PRO Pack, see the Dell Server PRO Management Pack 2.1 for Microsoft System Center Virtual Machine Manager Installation guide.
Also, see
Features and Functionalities
.
CAUTION: Due to the possibility of data corruption and/ or data loss, it is
recommended that the procedures in this document should be performed only by personnel with knowledge and experience of using the Microsoft Windows operating system and Systems Center Operations Manager 2007 SP1/ 2010, or System Center Essentials 2007 SP1/2010.
NOTE: The readme file, DellMPv21_PROPack_Readme.txt, contains information
about the software and management station requirements, and known issues. It is also available at support.dell.com/manuals. The readme file is also packaged in the self-extracting executable Dell_PROPack_v2.1.0_A00.exe.
Introduction 5
Page 6

What’s New

This release of PRO Pack supports the following:
SCE 2010
SCVMM 2008 R2 SP1
New hardware support
Additional Dell OpenManage alerts and Network Interface Card (NIC) alerts
Improvements on the resolutions of some old alerts
For more information on the alerts and their resolutions, see Alerts and
Recovery Actions.

Overview

SCOM/SCE uses PRO-enabled Management Pack to collect and store information on Dell hardware along with a description of their health status. Dell PRO Pack works with SCOM/SCE (henceforth, referred to as Operations Manager) and SCVMM 2008 R2 /SCVMM 2008 R2 SP1 to manage Dell physical devices and their hosted virtual machines (VMs) using this available health information. Dell PRO Pack recommends remedial actions when monitored objects transition to an unhealthy state (for example, virtual disk failure or predictive drive error), by leveraging the monitoring and alerting capabilities of Operations Manager and remediation capabilities of SCVMM.
Also see:
Features and Functionalities
Understanding PRO Tip Management
6 Introduction
Page 7

Related Terms

A managed system
Administrator (OMSA), which is monitored and managed using Operations Manager and SCVMM. It can be managed locally or remotely using supported tools.
A management station
Dell system that has the Operations Manager and SCVMM installed to manage virtual workloads.
is a Dell system running the Dell OpenManage Server
or
managing station
is a Microsoft Windows -based

What is a PRO Tip?

PRO (Performance and Resource Optimization) Tip is a feature that monitors your virtual infrastructure and provide alerts when there is an opportunity to optimize the usage of these resources. A PRO Tip window contains the description of the event that generated the PRO Tip and the suggested remedial action. This feature allows you to perform a load-balance of VMs between physical hosts, when specific threshold values are reached. Alternatively, you can migrate VMs when a hardware failure is detected.
The PRO Tip window in the SCVMM Administrator console enables you to view active PRO Tips for the host groups. The Operations Manager console displays the corresponding alerts as well, to ensure a consistent monitoring experience.
You can manually implement the recommended action mentioned in the PRO Tip. You can also configure the PRO tip to automatically implement the recommended action.
Introduction 7
Page 8

Features and Functionalities

Dell PRO Pack:
Performs PRO-management of Dell PowerEdge systems running Microsoft Hyper-V platforms, by continually monitoring the health of your physical and virtual infrastructure.
Works with Operations Manager and SCVMM to detect events such as loss of power supply redundancy, higher temperature than threshold values, system storage battery error, virtual disk failure, and so on. For more information on events supported by Dell PRO Pack, see Alerts and
Recovery Actions.
Generates PRO Tip when the monitored hardware moves to an unhealthy state.
Performs VM live migration with no downtime. For more information, see
VM Live Migration.
Overrides Dell PRO Pack default recovery actions. For more information, see Overriding Recovery Actions.
Minimizes downtime by implementing the remedial action provided on PRO Tips. The two remedial actions are:
Restrict:
temporarily unavailable for placing new VMs till the maintenance tasks are complete.
Restrict and migrate:
running VMs are migrated from an unhealthy server to a healthy server to prevent loss of service. For more information, see Implementing
Recovery Actions.
In this mode, it is recommended that servers are made
In this mode, it is recommended that all
8 Introduction
Page 9

Understanding PRO Tip Management

Managed System 1 Managed System 2
Management StationManagement Station
Dell PowerEdge
Hyper-V Hypervisor
Management Agents
(SCOM/SCE & SCVMM)
Dell OMSA
VM VM
Dell PowerEdge
Hyper-V Hypervisor
Dell OMSA
Dell PowerEdge
SCE 2007 SP1/2010 SCOM 2007 SP1/R2
Dell PowerEdge
SCVMM 2008/R2, R2 SP1
I
m
p
l
e
m
e
n
t
s
R
e
s
o
l
u
t
i
o
n
Notifies
Alerts
VM
Dell
PRO Pack
Management Agents
(SCOM/SCE & SCVMM)
This section explains a typical Dell PRO Pack setup and the sequence of events involved in PRO tip management.
Figure 1-1. Interaction of Components
In the figure, a group of PowerEdge systems act as the managed systems and two PowerEdge systems act as management stations hosting the Operations Manager and SCVMM. OMSA generates alerts with corresponding severity when there is a transition to an unhealthy state. The same alerts are monitored by Dell PRO Pack for PRO. Dell PRO Pack maps the OMSA alerts with its remedial action.
Introduction 9
Page 10
Table 1-1describes the sequence of events that occur in PRO Tip management.
Table 1-1. Sequence of events with description
Sequence Number Event
1 Operations Manager agents on the host are enabled to
detect the warning, error, or failure alerts that are generated by OMSA.
2 Alert is sent to Operations Manager.
3 Operations Manager console displays active PRO
alerts.
4 Operations Manager notifies the alert and the
associated PRO Tip ID to SCVMM.
5 SCVMM displays a corresponding entry in the
PRO Tip window with remedial action.
6 Implement the PRO Tip to enable recovery action on
the managed system by placing the managed system in the Restrict mode, or Restrict and Migrate mode. system.
7 SCVMM notifies Operations Manager about the
successful completion of the recovery action.
8 The SCVMM console displays the status of the PRO
Tip as Resolved after it is successfully implemented.
9 PRO Tip disappears from SCVMM PRO Tip window.
10 PRO Active alert disappears from SCOM.
For more information on the type of events and the associated remedial actions, see
Alerts and Recovery Actions
.
10 Introduction
Page 11

Supported Operating Systems

The Dell PRO Pack supported operating systems on the managed system and management station are as follows:
Managed system:
The managed system for PRO Pack is a Virtual Machine Manager Server. For more information, see technet.microsoft.com/en-us/library/cc764213.aspx
Management station:
For the list of supported configurations of SCOM, SCE, and SCVMM, see the following:
SCOM 2007 R2 -
SCOM 2007 SP1 -
SCE 2007 SP1 -
SCE 2010 -
SCVMM 2008/R2/R2 with SP1 -
us/library/cc764231.aspx
technet.microsoft.com/en-us/library/bb309428.aspx
technet.microsoft.com/en-us/library/dd819933.aspx
technet.microsoft.com/en-us/library/ff741762.aspx
technet.microsoft.com/en-us/library/bb422876.aspx
technet.microsoft.com/en-
Introduction 11
Page 12
12 Introduction
Page 13
2
Click the PRO Tips menu.

Using Dell Performance Resource Optimization Pack

Monitoring Using SCVMM

You can manage the health of your virtualized environment using PRO Tips displayed on the SCVMM console.
To see the PRO Tip window, click the PRO Tips menu on the toolbar, as shown in Figure 2-1. The menu also displays the number of active PRO Tips in parentheses.
Figure 2-1. PRO Tip Button on the SCVMM Console
Alternatively, if you select the Show this window when new PRO Tips are created option in the PRO Tip window, the window opens automatically on
the SCVMM console when a PRO Tip is generated.
The PRO Tip window displays information such as source, tip, and state of the PRO Tip in a tabular format. The window also displays description of the problem that triggered the alert, the cause, and the suggested remedial action for recovery.
Using Dell Performance Resource Optimization Pack 13
Page 14

Implementing Recovery Actions

The PRO Tip window provides an option to either implement or dismiss the recommended action. If you select Implement, SCVMM implements one of the recovery tasks described below, based on the nature of the alert.
Placing the Host in Restrict Mode
Placing a host in Restrict mode prevents assignment of workload to the host until the problem is solved. In this mode, the host still receives alerts on the Operations Manager and associated PRO Tips on the SCVMM.
The system health conditions that can trigger the maintenance are non­critical hardware alerts on the virtualization host such as, ambient chassis temperature warning on a Dell PowerEdge virtualization host system.

Migration of Virtual Machines

The PRO Tip management pack uses SCVMM algorithms to move VMs from the affected system to a healthy one. Select the Load Balance algorithm if you want SCVMM to evenly distribute VMs across a pool of hosts, or the Resource Maximization algorithm if you prefer to saturate the host completely before moving to a new one.
The requirements for identifying a healthy system and moving the VMs are as follows:
Hardware requirements
to run VMs. For example, sufficient memory and storage.
Software requirements
virtual machine to perform more optimally. For example, CPU allocation, network bandwidth, network availability, disk IO bandwidth, and free memory.
SCVMM assigns a star rating to hosts in a range of zero to five. If a hardware requirement is not met, for example, insufficient hard disk and memory capacity, the host automatically gets zero star and SCVMM does not allow you to place a VM on that host.
The system health conditions that trigger migration of VMs are hardware failure alerts on a virtualization host, such as virtual disk failure and predictive drive error. Dell PRO Pack migrates VMs with the Running status. It does not migrate VMs with status such as Stop, Pa us e, and Saved. This is based on the star rating of the associated servers.
— Requirements that a host must meet in order
— Requirements if met by the host, allows a
14 Using Dell Performance Resource Optimization Pack
Page 15
After you successfully implement the recovery task, the following changes take place:
•The status of PRO Tip changes to out of the
PRO Tip
window.
Corresponding alert disappears in the Operations Manager
An entry is displayed in the This entry displays the status of the job as
Resolved
Jobs
section on the SCVMM console.
and the PRO Tip entry moves
Completed
, as shown in the
Alert View
Figure 2-2.
Figure 2-2. Completed Job
.
PRO Tip implementation of moving VMs can fail if no other healthy hosts are available in the host group or host cluster. In such a case, the PRO Tip window displays the state of the corresponding PRO Tip as Failed, and the reason is elaborated in the Error section. The status of the corresponding entry in the Jobs section on the SCVMM console also display as Failed.
NOTE: In the PRO Tip window the failure message is updated dynamically.
However, to refresh the data you have to click outside the PRO Tip window and then click again to bring the window in focus.
Using Dell Performance Resource Optimization Pack 15
Page 16
If you select Dismiss, the PRO Tip is not executed and the following changes take place:
The PRO Tip is removed from the SCVMM PRO Tip console.
The alert in Operations Manager is removed from the
Alerts
.
For more informartion, see Using Health Explorer to Reset Alerts.
VM Live Migration
As a connected user, during live migration, you can migrate a VM from one node of a Windows Server 2008 R2 failover cluster to another node in the same cluster without any downtime or interruption.
The difference in quick migration and live migration is that there is a downtime in quick migration whereas, there is no downtime in live migration.
NOTE: Windows Server 2008 Hyper-V supports Quick Migration. Windows Server
2008 R2 Hyper-V supports both Quick Migration and Live Migration.
Dell Server PRO

Monitoring Using PRO Specific Alerts on SCOM/SCE

You can monitor the physical devices in your network using the Operations Manager console.
The Operations Manager console provides the following views:
Alert View
information on the severity level, source, name, resolution state, and, date and time of creation. To access the
a
b
c
— Displays Dell PRO specific alerts in a tabular format with
Alert View
Launch the Operations Manager console.
Select the
Select
Monitoring
Dell Server PRO Alerts
tab.
from
:
Dell Server PRO Pack.
16 Using Dell Performance Resource Optimization Pack
Page 17
The alerts are displayed on the right-side of the screen, as shown in Figure 2-3.
Figure 2-3. Alert View
State View
format. The State View displays objects with the name, path, storage health of the Dell system, and so on. You can personalize the State View by defining which objects you want to display and how the data is displayed.
Figure 2-4. State View
— Displays the discovered Dell system objects in a tabular

Using Health Explorer to Reset Alerts

Health Explorer enables you to view and take action on alerts. When you select Dismiss in the PRO Tip window, the alert is removed from it. However, to reset this alert manually in the Health Explorer do the following:
1
On the
2
Right-click the alert that you want to close.
3
Select
Actions
Reset Health
menu, click
Using Dell Performance Resource Optimization Pack 17
Health Explorer.
. The alert disappears from the PRO Tip window.
Page 18

Overriding Recovery Actions

PRO Pack 2.1 supports two recovery actions. The following flag values trigger the respective recovery action:
1: For migration
2: For placing the server in restricted mode
You can override the default recovery action by changing the default recovery action flag value. For example, change the recovery flag value from 2 to 1 using the overrides option provided in SCOM console. After overriding the default value to 1, and implementing PRO Tip, recovery action triggers migration of VMs from the host. If you enter any other value other than 1 and 2, PRO Tip implementation fails, and an error message is displayed.
To override the recovery action,
1
Click the
2
Search for the Dell PRO Pack monitors.
3
Select the monitor which you intend to override.
4
Right-click and select
5
Select the
6
Change the value of
7
Select
8
Click
Authoring
tab in SCOM.
Override Recovery.
Override
NOTE: When you select Enable, SCOM performs an auto-implementation for
the unit monitor. Since, this involves VMM migration, review and set the values accordingly.
Enforce
check box.
RecoveryOverrideFlag
check box.
.
Apply
CAUTION: Saving the settings in the default management pack, creates a
dependency between PRO Pack and the management pack. When you remove or delete PRO Pack, you must delete the default management pack as well, as it contains default settings for SCOM. Hence, it is recommended that you save settings using a new MP.
9
Click
Save Overrides
.
18 Using Dell Performance Resource Optimization Pack
Page 19
10
Generate an alert and PRO Tip.
11
Select
Implement PRO Tip
.
This verifies that the overridden recovery action is successful.
Figure 2-5. Overriding Recovery Action

Alerts and Recovery Actions

Table 2-1 lists the alerts and the corresponding recommended remedial actions:
Table 2-1. Alert Cause and Recovery Action
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
1053 Temperature
sensor detected a warning value
Using Dell Performance Resource Optimization Pack 19
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Warning A temperature sensor
on the backplane board, system board, CPU, or drive carrier in the specified system exceeded its warning threshold value.
Restrict
Page 20
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
1054 Temperature
sensor detected a failure value
1104 Fan sensor
detected a failure value
1154 Voltage sensor
detected a failure value
1203 Current
sensor detected a warning value
1204 Current
sensor detected a failure value
1305 Redundancy
degraded
1306 Redundancy
lost
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Error A temperature sensor
on the backplane board, system board, CPU, or drive carrier in the specified system exceeded its failure threshold value.
Error A fan sensor in the
specified system detected the failure of one or more fans.
Error A voltage sensor in the
specified system exceeded its failure threshold value.
Warning A current sensor in the
specified system exceeded its warning threshold value.
Error A current sensor in the
specified system exceeded its failure threshold value.
Warning A power supply sensor
reading in the specified system exceeded a warning threshold.
Error A power supply has
been disconnected or has failed.
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict
Restrict
20 Using Dell Performance Resource Optimization Pack
Page 21
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
1353 Power supply
detected a warning
1354 Power supply
detected a failure
1403 Memory
Device Status Wa rn i n g
1404 Memory
Device Error
1703 Battery sensor
detected a warning value
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Warning A power supply sensor
reading in the specified system exceeded definable warning threshold.
Error A power supply has
been disconnected or has failed.
Warning A memory device
correction rate exceeded an acceptable value.
Error A memory device
correction rate exceeded an acceptable value, a memory spare bank was activated, or a multibit ECC error occurred.
Warning A battery sensor in the
specified system detected that a battery is in a predictive failure state.
Restrict
Restrict
Restrict
Restrict and Migrate
Restrict
Using Dell Performance Resource Optimization Pack 21
Page 22
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2048 Device Failed
Error
2056 Virtual Disk
Failed
2057 Virtual Disk
Degraded Wa r ni n g
2076 Virtual Disk
Check Consistency Failed
2082 Virtual Disk
Rebuild Failure
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical A storage component
such as a physical disk or an enclosure has failed. The failed component may have been identified by the controller while performing a task such as a rescan or a check consistency.
Critical One or more physical
disks included in the virtual disk have failed.
Warning This alert message
occurs when a physical disk included in a redundant virtual disk fails.
Critical A physical disk
included in the virtual disk failed or there is an error in the parity information.
Error A physical disk
included in the virtual disk has failed or is corrupt.
Restrict and Migrate
Restrict and Migrate
Restrict
Restrict
Restrict
22 Using Dell Performance Resource Optimization Pack
Page 23
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2083 Physical Disk
Rebuild Failed
2094 Predictive
Failure reported
2100 Temperature
exceeded Maximum Wa rn i n g Threshold
2101 Temperature
dropped below Minimum Wa rn i n g Threshold
2102 Temperature
exceeded Maximum Failure Threshold
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical A physical disk
included in the virtual disk has failed or is corrupt.
Warning The physical disk is
predicted to fail.
Warning The physical disk
enclosure is too hot. A variety of factors can cause the excessive temperature.
Warning The physical disk
enclosure is too cool.
Critical The physical disk
enclosure is too hot. A variety of factors can cause the excessive temperature.
Restrict
Restrict
Restrict
Restrict
Restrict and Migrate
2103 Temperature
dropped below the Minimum Failure Threshold
Using Dell Performance Resource Optimization Pack 23
Critical The physical disk
enclosure is too cool.
Restrict and Migrate
Page 24
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2112 Enclosure
shutdown
2122 Redundancy
degraded
2123 Redundancy
Lost
2125 Controller
cache pinned for missing or offline VD
2129 BGI (Back
Ground Initialization) Failed Error
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical The physical disk
enclosure is either hotter or cooler than the maximum or minimum allowable temperature range.
Warning One or more of the
enclosure components has failed. For example, a fan or power supply may have failed.
Warning A virtual disk or an
enclosure has lost data redundancy.
Warning Controller getting
disconnected from its VD, while IO is happening
Critical BGI of a virtual disk
has failed.
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict
Restrict
2137 Communicati
on Time-out Wa r ni n g
Warning The controller is
unable to communicate with an enclosure.
24 Using Dell Performance Resource Optimization Pack
Restrict and Migrate
Page 25
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2145 Controller
battery low
2169 The controller
battery needs to be replaced
2171 The controller
battery temperature is above normal
2174 The controller
battey has been removed
2178 The controller
battery Learn cycle has timed out
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Warning The controller battery
charge is low.
Critical The controller battery
cannot recharge. The battery may have been already recharged the maximum number of times. In addition, the battery charger may not be working.
Warning The room temperature
may be too hot. The system fan may also be degraded or failed.
Warning The controller cannot
communicate with the battery. The battery may be removed or the contact point maye degraded
Warning The controller battery
must be fully charged before the Learn cycle can begin.
Restrict
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict
Using Dell Performance Resource Optimization Pack 25
Page 26
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2187 Single-bit
ECC error limit exceeded on the controller DIMM
2201 A global hot
spare failed
2203 A dedicated
hot spare failed
2206 The only hot
spare available is a SATA disk. SATA disks cannot replace SAS disks
2207 The only hot
spare available is a SAS disk. SAS disks cannot replace SATA disks
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Warning The controller
memory is malfunctioning.
Warning The controller is not
able to communicate with a disk that is assigned as a global hot spare. The disk may have failed or been removed.
Warning The controller is not
able to communicate with a disk that is assigned as a dedicated hot spare.
Warning The only physical disk
available to be assigned as a hot spare is using SATA technology.
Warning The only physical disk
available to be assigned as a hot spare is using SAS technology.
Restrict and Migrate
Restrict
Restrict
Restrict
Restrict
26 Using Dell Performance Resource Optimization Pack
Page 27
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2213 Recharge
count maximum exceeded
2246 The controller
battery is degraded
2264 A device is
missing
2265 A device is in
an unknown state
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Warning A virtual disk or an
enclosure has lost data redundancy. In the case of a virtual disk, one or more physical disks included in the virtual disk have failed.
Warning The temperature of
the the battery is high. This maybe due to the battery being charged.
Warning The controller cannot
communicate with a device. The device may be removed.
Warning The controller cannot
communicate with a device. The state of the device cannot be determined.
Restrict
Restrict
Restrict
Restrict and Migrate
2268 Storage
Management communicatio n Error
Using Dell Performance Resource Optimization Pack 27
Critical Storage Management
has lost communication with a controller. This may occur if the controller driver or firmware is experiencing a problem.
Restrict and Migrate
Page 28
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2272 Patrol Read
found an uncorrectable media error
2273 A block on the
physical disk has been punctured by the controller
2282 Hot spare
SMART polling failed
2283 A redundant
path is broken
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical The Patrol Read task
has encounted an error that cannot be corrected. There may be a bad disk block that cannot be remapped.
Critical The controller
encountered an unrecoverable medium error when attempting to read a block on the physical disk and marked that block as invalid.
Critical The controller
firmware attempted to do SMART polling on the hot spare but was not able to complete the SMART polling.
Warning The controller has two
connectors that are connected to the same enclosure.
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
2289 Multi-bit
ECC error on controller DIMM
Critical An error involving
multiple bits has been encountered during a read or write operation.
28 Using Dell Performance Resource Optimization Pack
Restrict and Migrate
Page 29
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2290 Single-bit
ECC error on controller DIMM
2292 Communicati
on with the enclosure has been lost
2293 EMM
(Enclosure Management Module) Failure
2298 The enclosure
has a bad sensor
2299 Bad PHY Critical There is a problem
Severity Alert Cause Dell PRO Tip
Warning An error involving a
single bit has been encountered during a read or write operation.
Critical The controller has lost
communication with an enclosure management module (EMM). The cables may be loose or defective.
Error The failure may be
caused by a loss of power to the EMM.
Warning The enclosure has a
bad sensor. The enclosure sensors monitor the fan speeds, temperature probes, and so on.
with a physical connection or PHY.
Recommended Remedial Action
Restrict
Restrict and Migrate
Restrict and Migrate
Restrict
Restrict
Using Dell Performance Resource Optimization Pack 29
Page 30
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2300 Unstable
Enclosure Failure
2301 Enclosure
Hardware Error
2302 The enclosure
is not responding
2306 Bad block
table is full
2307 Bad block
table is full.
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical The controller is not
receiving a consistent response from the enclosure.
Critical The enclosure or an
enclosure component is in a Failed or Degraded state.
Critical The enclosure or an
enclosure component is in a Failed or Degraded state.
Warning The bad block table is
the table used for remapping bad disk blocks. This table fills as bad disk blocks are remapped.
Critical The bad block table is
the table used for remapping bad disk blocks.
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
Restrict
Restrict
30 Using Dell Performance Resource Optimization Pack
Page 31
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2310 A virtual disk
is permanently degraded
2312 A power
supply in the enclosure has an AC failure
2313 A power
supply in the enclosure has a DC failure
2314 The
initialization sequence of SAS components failed during system startup. SAS management and monitoring is not possible.
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical A redundant virtual
disk has lost redundancy. This may occur when the virtual disk suffers the failure of more than one physical disk.
Warning The power supply has
an AC failure
Warning The power supply has
a DC failure.
Critical Storage Management
is unable to monitor or manage SAS devices.
Restrict and Migrate
Restrict
Restrict
Restrict and Migrate
Using Dell Performance Resource Optimization Pack 31
Page 32
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2318 Problems with
the battery or the battery charger have been detected. The battery health is poor.
2319 Single-bit
ECC error on controller DIMM.
2320 Single-bit
ECC error
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Warning The battery or the
battery charger is not functioning properly.
Warning The dual in-line
memory module (DIMM) is beginning to malfunction.
Critical The dual in-line
memory module (DIMM) is malfunctioning.
Restrict
Restrict and Migrate
Restrict and Migrate
2321 Single-bit
ECC error. The controller DIMM is nonfunctional . There will be no further reporting
Critical The dual in-line
memory module (DIMM) is malfunctioning. Data loss or data corruption is eminent.
32 Using Dell Performance Resource Optimization Pack
Restrict and Migrate
Page 33
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2322 The DC
power supply is switched off
2324 The AC power
supply cable has been removed
2327 The NVRAM
has corrupted data. The controller is reinitializing the NVRAM
2328 The NVRAM
has corrupt data
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical The power supply unit
is switched off. Either a user switched off the power supply unit or it is defective.
Critical The power cable may
be pulled out or removed. The power cable may also have overheated and become warped and nonfunctional.
Warning The NVRAM has
corrupted data. This may ocurr after a power surge, a battery failure, or for other reasons. The controller is reinitializing the NVRAM
Warning The NVRAM has
corrupt data. The controller is unable to correct the situation
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
Restrict and Migrate
2329 SAS port
report
Using Dell Performance Resource Optimization Pack 33
Warning The text for this alert
is generated by the controller and can vary depending on the situation.
Restrict and Migrate
Page 34
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2337 The controller
is unable to recover cached data from the battery backup unit (BBU)
2340 The
background initialization (BGI) completed with uncorrectable errors
2342 The Check
Consistency found inconsistent parity data. Data redundancy may be lost
2349 A bad disk
block could not be reassigned during a write operation
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical The controller was
unable to recover data from the cache.
Critical The background
initialization task encountered errors that cannot be corrected.
Warning The data on a source
disk and the redundant data on a target disk is inconsistent.
Critical A write operation
could not complete because the disk contains bad disk blocks that could not be reassigned. Data loss may have occurred.
Restrict
Restrict and Migrate
Restrict and Migrate
Restrict
34 Using Dell Performance Resource Optimization Pack
Page 35
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2350 There was an
unrecoverable disk media error during the rebuild or recovery operation
2356 SAS SMP
communicatio ns error
2357 SAS expander
error
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical The rebuild or
recovery operation encountered an unrecoverable disk media error.
Critical The text for this alert
is generated by the firmware and can vary depending on the situation. The reference to SMP in this text refers to SAS Management Protocol.
Critical There may be a
problem with the enclosure. Verify the health of the enclosure and its components.
Restrict
Restrict
Restrict
2387 Virtual disk
bad block medium error is detected
2396 The Check
Consistency detected uncorrectable multiple medium errors
Using Dell Performance Resource Optimization Pack 35
Critical Virtual disk bad blocks
are due to presence of unrecoverable bad blocks on one or more member physical disks
Critical Medium errors in the
physical drives.
Restrict and Migrate
Restrict and Migrate
Page 36
Table 2-1. Alert Cause and Recovery Action
(continued)
Dell Event ID Alert
Description on SCOM/ SCE and PRO Tip in SCVMM
2397 The Check
Consistency completed with uncorrectable errors
2416 Disk medium
error detected
2417 There is an
unrecoverable medium error detected on virtual disk
2,4
Driver Name:
b06bdrv,ebdrv
b57w2k,b57nd60 x, b57nd60a,l2nd
Dell OMNIC Broadcom Network Interface Link Down
Severity Alert Cause Dell PRO Tip
Recommended Remedial Action
Critical Medium errors in the
physical drives.
Warning Disk medium error
detected
Critical Unrecoverable
medium error detected on virtual disk.
Critical The network link is
down.
Restrict and Migrate
Restrict
Restrict and Migrate
Restrict
13,27,29,70
Driver Name:
e1express, e1qexpress, ixgbe, e1000
Dell OMNIC Intel Network Interface Link Down
Critical Link has been
disconnected.
36 Using Dell Performance Resource Optimization Pack
Restrict
Page 37
3

Related Documentation and Resources

This chapter gives the details of documents and resources to help you work with the Pro Pack 2.1.

Security Considerations

Operations Console access privileges are handled internally by SCOM/SCE. This can be setup using the User Roles option under Administration Security feature on the SCOM/SCE console. The profile of the role assigned to you determines what actions you can perform and which objects you are able to manage. For more information on security considerations, see the
Microsoft System Center Operations Manager SP1/R2 and Microsoft Systems Centre Essentials 2007/2010 online help.

Other Documents You May Need

In addition to this guide, you can access the following guides available at
support.dell.com/manuals. On the Manuals page, click SoftwareSystems Management. Click the appropriate product link on the right-side to access
the documents.
•The
Dell OpenManage Server Administrator CIM Reference Guide
documents the Common Information Model (CIM) provider, an extension of the standard management object format (MOF) file. The CIM provider MOF documents supported classes of management objects.
The Dell OpenManage Server Administrator Messages Reference Guide
the messages that are displayed in your Server Administrator home page Alert log or on your operating system’s event viewer. This guide explains the text, severity, and cause of each service alert message that Server Administrator issues.
lists
Related Documentation and Resources 37
Page 38
The Dell OpenManage Server Administrator Command Line Interface User's Guide
documents the complete command line interface for Server Administrator, including an explanation of the command line interface (CLI) commands to view system status, access logs, create reports, configure various component parameters, and set critical thresholds.
For information on terms used in this document, see the Glossary at support.dell.com/manuals.

Getting Technical Assistance

For customers in the United States, call 800-WWW-DELL (800-999-3355).
NOTE: If you do not have an active Internet connection, you can find contact
information on your purchase invoice, packing slip, bill, or Dell product catalog.
For information on technical support, visit dell.com/contactus. Additionally, Dell Enterprise Training and Certification is available; see www.dell.com/training for more information. This service might not be offered in all locations.
38 Related Documentation and Resources
Loading...