Dell OpenManage Server Administrator Version 5.2 Messages Reference Guide

Download

Page 1

Dell OpenManage™ Server

Administrator

Messages Reference Guide

www.dell.com | support.dell.com

Page 2

Notes and Notices

NOTE: A NOTE indicates important information that helps you make better use of your computer.

NOTICE: A NOTICE indicates either potential damage to hardware or loss of data and tells you how to avoid the problem.

____________________

Reproduction in any manner whatsoever without the written permission of Dell Inc. is strictly forbidden. Trademarks used in this text: Dell, the DELL logo and Dell OpenManage are trademarks of Dell Inc.; Microsoft and Windows are registered

trademarks and Windows Server is a trademark of Microsoft Corporation; Red Hat is a registered trademark of Red registered trademark of Novell, Inc. in the United States and other countries.

Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products. Dell Inc. disclaims any proprietary interest in trademarks and trade names other than its own.

February 2007

Hat, Inc.; SUSE is a

Page 3

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

What’s New in this Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Messages Not Described in This Guide

Understanding Event Messages

Sample Event Message Text

Viewing Alerts and Event Messages

. . . . . . . . . . . . . . . . . . . . . . 5

. . . . . . . . . . . . . . . . . . . . . . . . . . 6

. . . . . . . . . . . . . . . . . . . . . . . . . 7

. . . . . . . . . . . . . . . . . . . . . . . 7

Viewing Events in Windows 2000 Advanced Server and Windows Server 2003

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Viewing Events in Red Hat Enterprise Linux and SUSE Linux Enterprise Server

Viewing the Event Information Understanding the Event Description

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

. . . . . . . . . . . . . . . . . . . . . . . . 9

. . . . . . . . . . . . . . . . . . . 10

2 Event Message Reference . . . . . . . . . . . . . . . . . . . . . . . 13

Miscellaneous Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Temperature Sensor Messages

Cooling Device Messages

Voltage Sensor Messages

Current Sensor Messages

Chassis Intrusion Messages

. . . . . . . . . . . . . . . . . . . . . . . . . 15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

. . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Redundancy Unit Messages

Power Supply Messages

Memory Device Messages

Fan Enclosure Messages

AC Power Cord Messages

. . . . . . . . . . . . . . . . . . . . . . . . . . . 26

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

. . . . . . . . . . . . . . . . . . . . . . . . . . . 32

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Hardware Log Sensor Messages

Processor Sensor Messages

. . . . . . . . . . . . . . . . . . . . . . . . . . 37

. . . . . . . . . . . . . . . . . . . . . . . . 35

Contents 3

Page 4

Pluggable Device Messages . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Battery Sensor Messages

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3 System Event Log Messages for IPMI Systems. . . . . . . . . 43

Temperature Sensor Events . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Voltage Sensor Events

Fan Sensor Events

Processor Status Events

Power Supply Events

Memory ECC Events

BMC Watchdog Events

Memory Events

Hardware Log Sensor Events

Drive Events

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Intrusion Events

BIOS Generated System Events

R2 Generated System Events

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

. . . . . . . . . . . . . . . . . . . . . . . . . . 49

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

. . . . . . . . . . . . . . . . . . . . . . . . . 52

. . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4 Storage Management Message Reference . . . . . . . . . . . 57

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

4 Contents

Cable Interconnect Events

Battery Events

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Entity Presence Events

. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Alert Monitoring and Logging . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Alert Message Format with Substitution Variables

Alert Message Change History

. . . . . . . . . . . . . . . . . . . . . . . . . 60

Alert Descriptions and Corrective Actions

. . . . . . . . . . . . . . . 57

. . . . . . . . . . . . . . . . . . . 63

Page 5

Introduction

Dell OpenManage™ Server Administrator produces event messages stored primarily in the operating describes the event messages created by Server Administrator version 5.2 or later and displayed in the Server Administrator Alert log.

Server Administrator creates events in response to sensor status changes and other monitored parameters. The Server Administrator event monitor uses these status change events to add descriptive messages to the operating system event log or the Server Administrator Alert log.

Each event message that Server Administrator adds to the Alert log consists of a unique identifier called the event ID for a specific event source category and a descriptive message. The event message includes the severity, cause of the event, and other relevant information, such as the event location and the monitored item’s previous state.

Tables provided in this guide list all Server Administrator event IDs in numeric order. Each entry includes the event ID’s corresponding description, severity level, and cause. Message text in angle brackets (for example, Server

What’s New in this Release

Modifications have been made to the Storage Management Service events. For more information, see "

system or Server Administrator event logs and sometimes in SNMP traps. This document

<State>

Administrator.

Alert Message Change History

) describes the event-specific information provided by the

Messages Not Described in This Guide

This guide describes only event messages created by Server Administrator and displayed in the Server Administrator Alert log. For information on other messages produced by your system, consult one of the following sources:

• Your system’s

• Other system documentation

• Operating system documentation

• Application program documentation

Installation and Troubleshooting Guide

Introduction 5

Page 6

Understanding Event Messages

This section describes the various types of event messages generated by the Server Administrator. When

an event occurs on your system, the Server Administrator sends information about one of the

following event types to the systems management console:

Table 1-1. Understanding Event Messages

Icon Alert Severity Component Status

An event that describes the successful operation of a unit.

OK/Normal

Warning/Non-critical

Critical/Failure/Error

informational purposes and does not indicate an error condition. For example, the alert may indicate the normal start or stop of an operation, such as power supply or

sensor reading returning to normal.

An event that is not necessarily significant, but may indicate a possible future problem.

component (such as a temperature probe in an enclosure) has crossed a warning threshold.

A significant event that indicates actual or imminent loss of data or loss of function.

For example,

For example, a Warning/Non-critical alert may indicate that a

crossing a failure threshold or a hardware failure such as

Server Administrator generates events based on status changes in the following sensors:

•

Temperature Sensor

— Helps protect critical components by alerting the systems management console when temperatures become too high inside a chassis; also monitors a variety of locations in the chassis and in any attached systems.

•

Fan Sensor

•

Voltage Sensor

— Monitors fans in various locations in the chassis and in any attached systems.

— Monitors voltages across critical components in various chassis locations and in any

attached systems.

•

Current Sensor

— Monitors the current (or amperage) output from the power supply (or supplies) in

the chassis and in any attached systems.

•

Chassis Intrusion Sensor

•

Redundancy Unit Sensor

— Monitors intrusion into the chassis and any attached systems.

— Monitors redundant units (critical units such as fans, AC power cords, or power supplies) within the chassis; also monitors the chassis and any attached systems. For example, redundancy allows a second or

th fan to keep the chassis components at a safe temperature when another fan has failed. Redundancy is normal when the intended number of critical components are operating. Redundancy is degraded when a component fails, but others are still operating. Redundancy is lost when there is one less critical redundancy device than required.

•

Power Supply Sensor

Memory Prefailure Sensor

•

— Monitors power supplies in the chassis and in any attached systems.

— Monitors memory modules by counting the number of Error Correction

Code (ECC) memory corrections.

The alert is provided for

an array disk.

6 Introduction

Page 7

•

Fan Enclosure Sensor

insertion into the system, and by measuring how long a fan enclosure is absent from the chassis. This sensor monitors the chassis and any attached systems.

•

AC Power Cord Sensor

Hardware Log Sensor

•

Processor Sensor

Pluggable Device Sensor

• pluggable devices, such as memory cards.

•

Battery Sensor

— Monitors the status of one or more batteries in the system.

— Monitors protective fan enclosures by detecting their removal from and

— Monitors the presence of AC power for an AC power cord.

— Monitors the size of a hardware log.

— Monitors the processor status in the system.

— Monitors the addition, removal, or configuration errors for some

Sample Event Message Text

The following example shows the format of the event messages logged by Server Administrator.

EventID: 1000

Source: Server Administrator

Category: Instrumentation Service

Type: Information

Date and Time: Mon Oct 21 10:38:00 2002

Computer:

Description:

Server Administrator starting

Data: Bytes in Hex

Viewing Alerts and Event Messages

An event log is used to record information about important events.

Server Administrator generates alerts that are added to the operating system event log and to the Server

Administrator Alert log. To view these alerts in Server Administrator:

Select the

You can also view the event log using your operating system’s event viewer. Each operating system’s event viewer accesses the applicable operating system event log.

System

object in the tree view.

Logs

tab.

Alert

subtab.

Introduction 7

Page 8

The location of the event log file depends on the operating system you are using.

• In the Microsoft® Windows® 2000 Advanced Server and Windows Server™ 2003 operating systems, messages are logged to the system event log and optionally to a unicode text file, using Notepad), that is located in the

C:\Program Files\Dell\SysMgt

• In the Red Hat

Enterprise Linux and SUSE® Linux Enterprise Server operating system, messages are

install_path

\omsa\log

directory. The default

logged to the system log file. The default name of the system log file is

dcsys32.log

install_path

/var/log/messages

(viewable

. You can view

the messages file using a text editor such as vi or emacs.

NOTE: Logging messages to a unicode text file is optional. By default, the feature is disabled. To enable this

feature, modify the Event Manager section of the dcemdy32.ini file as follows:

• In Windows, locate the file at <install_path>\dataeng\ini and set

The default install_path is C:\Program Files\Dell\SysMgt. Restart the DSM SA Event Manager service.

• In Red Hat Enterprise Linux and SUSE Linux Enterprise Server, locate the file at <install_path>/dataeng/ini and

UnitextLog.enabled=True.

set "/etc/init.d/dataeng restart" command to restart the Server Administrator event manager service. This will also restart the Server Administrator data manager and SNMP services.

The default install_path is /opt/dell/srvadmin. Issue the

UnitextLog.enabled=True

The following subsections explain how to open the Windows 2000 Advanced Server, Windows Server 2003, and the Red Hat Enterprise Linux and SUSE Linux Enterprise Server event viewers.

Viewing Events in Windows 2000 Advanced Server and Windows Server 2003

Click the

Double-click

In the

The

Start

Administrative Tools

Event Viewer

System Log

button, point to

window, click the

Settings

, and click

Control Panel

, and then double-click

Tree

tab and then click

Event Viewer

window displays a list of recently logged events.

System Log

To view the details of an event, double-click one of the event items.

NOTE: You can also look up the dcsys32.log file, in the install_path\omsa\log directory, to view the separate

event log file. The default install_path is C:\Program Files\Dell\SysMgt.

Viewing Events in Red Hat Enterprise Linux and SUSE Linux Enterprise Server

Use a text editor such as vi or emacs to view the file named

The following example shows the Red Hat Enterprise Linux (and SUSE Linux Enterprise Server) message log, /var/log/messages. The

NOTE: These messages are typically displayed as one long line. In the following example, the message is

displayed using line breaks to help you see the message text more clearly.

8 Introduction

root

/var/log/messages

text in boldface type indicates the message text.

Page 9

...

Feb 6 14:20:51 server01 Server Administrator: Instrumentation Service EventID: 1000

Server Administrator starting

Feb 6 14:20:51 server01 Server Administrator: Instrumentation Service EventID: 1001

Server Administrator startup complete

Feb 6 14:21:21 server01 Server Administrator: Instrumentation Service EventID: 1254 Chassis intrusion detected Sensor location: Main chassis

intrusion Chassis location: Main System Chassis Previous state was: OK (Normal) Chassis intrusion state: Open

Feb 6 14:21:51 server01 Server Administrator: Instrumentation Service EventID: 1252 Chassis intrusion returned to normal Sensor location: Main

chassis intrusion Chassis location: Main System Chassis Previous state was: Critical (Failed) Chassis intrusion state: Closed

Viewing the Event Information

The event log for each operating system contains some or all of the following information:

•

Date

— The date the event occurred.

•

Time

— The local time the event occurred.

•

Ty p e

— A classification of the event severity: Information, Warning, or Error.

User

•

— The name of the user on whose behalf the event occurred.

Computer

Source

Understanding the Event Description

Ta b l e 1-2 lists in alphabetical order each line item that may appear in the event description.

Table 1-2. Event Description Reference

Description Line Item Explanation

Action performed was:

Action requested was:

Additional Details:

details for the event>

Chassis intrusion state:

Chassis location:

chassis>

Configuration error type:

Current sensor value (in Amps):

Date and time of action:

Device location: <

chassis

Discrete current state:

Discrete temperature state:

<State>

<Additional

<Name of

Location in

<State>

Specifies the action that was performed, for example:

Action performed was: Power cycle

Specifies the action that was requested, for example:

Action requested was: Reboot, shutdown OS first

Specifies additional details available for the hot plug event, for example:

Memory device: DIMM1_A Serial number: FFFF30B1

Specifies information pertaining to the event, for example:

Power supply input AC is off, Power supply POK (power OK) signal is not normal, Power supply is turned off

Specifies the chassis intrusion state (open or closed), for example:

Chassis intrusion state: Open

Specifies name of the chassis that generated the message, for example:

Chassis location: Main System Chassis

Specifies the type of configuration error that occurred, for example:

Configuration error type: Revision mismatch

Specifies the current sensor value in amps, for example:

Current sensor value (in Amps): 7.853

Specifies the date and time the action was performed, for example:

Date and time of action: Sat Jun 12 16:20:33 2004

Specifies the location of the device in the specified chassis, for example:

Device location: Memory Card A

Specifies the state of the current sensor, for example:

Discrete current state: Good

Specifies the state of the temperature sensor, for example:

Discrete temperature state: Good

10 Introduction

Page 11

Table 1-2. Event Description Reference (continued)

Description Line Item Explanation

Discrete voltage state:

Fan sensor value:

Log type:

Memory device bank location:

Memory device location:

Number of devices required for full redundancy:

Possible memory module event cause:

Power Supply type:

power supply>

Previous redundancy state was:

<State>

Previous state was:

Processor sensor status:

<State>

<type of

<State>

Specifies the state of the voltage sensor, for example:

Discrete voltage state: Good

Specifies the fan speed in revolutions per minute (RPM) or On/Off, for example:

Fan sensor value (in RPM): 2600

Fan sensor value: Off

Specifies the type of hardware log, for example:

Log type: ESM

Specifies the name of the memory bank in the system that generated the message, for example:

Memory device bank location: Bank_1

Specifies the location of the memory module in the chassis, for example:

Memory device location: DIMM_A

Specifies the number of power supply or cooling devices required to achieve full redundancy, for example:

Number of devices required for full redundancy: 4

Specifies a list of possible causes for the memory module event, for example:

Possible memory module event cause: Single bit warning error rate exceeded

Single bit error logging disabled

Specifies the type of power supply, for example:

Power Supply type: VRM

Specifies the status of the previous redundancy message, for example:

Previous redundancy state was: Lost

Specifies the previous state of the sensor, for example:

Previous state was: OK (Normal)

Specifies the status of the processor sensor, for example:

Processor sensor status: Configuration error

Introduction 11

Page 12

Table 1-2. Event Description Reference (continued)

Description Line Item Explanation

Redundancy unit:

location in chassis>

Sensor location:

chassis>

Temperature sensor value:

Voltage sensor value (in Volts):

<Redundancy

<Location in

Specifies the location of the redundant power supply or cooling unit in the chassis, for example:

Redundancy unit: Fan Enclosure

Specifies the location of the sensor in the specified chassis, for example:

Sensor location: CPU1

Specifies the temperature in degrees Celsius, for example:

Temperature sensor value (in degrees Celsius): 30

Specifies the voltage sensor value in volts, for example:

Voltage sensor value (in Volts): 1.693

12 Introduction

Page 13

Event Message Reference

The following tables lists in numerical order each event ID and its corresponding description, along with its severity and cause.

NOTE: For corrective actions, see the appropriate documentation.

Miscellaneous Messages

Miscellaneous messages in Table 2-1 indicate that certain alert systems are up and working.

Table 2-1. Miscellaneous Messages

Event ID Description Severity Cause

0000 Log was cleared Information User cleared the log from Server

Administrator.

0001 Log backup created Information The log was full, copied to backup, and

cleared.

1000 Server Administrator starting Information Server Administrator is beginning to

initialize.

1001 Server Administrator startup

complete

1002 A system BIOS update has been

scheduled for the next reboot

1003 A previously scheduled system

BIOS update has been canceled

1004 Thermal shutdown protection

has been initiated

Information Server Administrator completed its

initialization.

Information The user has chosen to update the flash

basic input/output system (BIOS).

Information The user decides to cancel the flash

BIOS update, or an error occurs during the flash.

Error This message is generated when a

system is configured for thermal shutdown due to an error event. If a temperature sensor reading exceeds the error threshold for which the system is configured, the operating system shuts down and the system powers off. This event may also be initiated on certain systems when a fan enclosure is removed from the system for an extended period of time.

Event Message Reference 13

Page 14

Table 2-1. Miscellaneous Messages (continued)

Event ID Description Severity Cause

1005 SMBIOS data is absent Warning The system does not contain the

required systems management BIOS version 2.2 or higher, or the BIOS is corrupted.

1006 Automatic System Recovery

(ASR) action was performed

Action performed was:

Date and time of action:

and time>

1007 User initiated host system

control action

Action requested was:

1008 Systems Management Data

Manager Started

1009 Systems Management Data

Manager Stopped

1011 RCI table is corrupt Warning This message is generated when the

1012 IPMI Status

Interface: <

being used

the IPMI interface

>, <

additional

<Date

information if available and applicable

Error This message is generated when an

automatic system recovery action is

Information User requested a host system control

Information Systems Management Data Manager

Information This message is generated to indicate

performed due to a hung operating system. The action performed and the time of action are provided.

action to reboot, power off, or power cycle the system. Alternatively the user had indicated protective measures to be initiated in the event of a thermal shutdown.

services were started.

services were stopped.

BIOS Remote Configuration Interface (RCI) table is corrupted or cannot be read by the systems management software.

the Intelligent Platform Management Interface (IPMI)) status of the system.

Additional information, when available, includes Baseboard Management Controller (BMC) not present, BMC not responding, System Event Log (SEL) not present, and SEL Data Record (SDR) not present.

14 Event Message Reference

Page 15

Temperature Sensor Messages

Temperature sensors listed in Table 2-2 help protect critical components by alerting the systems management console when temperatures become too high inside a chassis. The temperature sensor messages use additional variables: sensor location, chassis location, previous state, and temperature sensor value or state.

Table 2-2. Temperature Sensor Messages

Event ID Description Severity Cause

1050 Temperature sensor has failed

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

If sensor type is not discrete:

Temperature sensor value (in degrees Celsius):

If sensor type is discrete:

Discrete temperature state:

<State>

1051 Temperature sensor value

unknown

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

If sensor type is not discrete:

Temperature sensor value (in degrees Celsius):

If sensor type is discrete:

Discrete temperature state:

<State>

Information A temperature sensor on the backplane

board, system board, or the carrier in the specified system failed. The sensor location, chassis location, previous state, and temperature sensor value are provided.

Information A temperature sensor on the backplane

board, system board, or drive carrier in the specified system could not obtain a reading. The sensor location, chassis location, previous state, and a nominal temperature sensor value are provided.

Event Message Reference 15

Page 16

Table 2-2. Temperature Sensor Messages (continued)

Event ID Description Severity Cause

1052 Temperature sensor returned

to a normal value

Sensor location:

<Location in

chassis>

Chassis location:

<Name of

chassis>

Previous state was:

If sensor type is not discrete:

Temperature sensor value (in degrees Celsius):

If sensor type is discrete:

Discrete temperature state:

<State>

Information A temperature sensor on the backplane

board, system board, or drive carrier in the specified system returned to a valid range after crossing a failure threshold. The sensor location, chassis location, previous state, and temperature sensor value are provided.

<State>

1053 Temperature sensor detected

a warning value

Sensor location:

<Location in

chassis>

Chassis location:

<Name of

chassis>

Previous state was:

If sensor type is not discrete:

Temperature sensor value (in degrees Celsius):

If sensor type is discrete:

Discrete temperature state:

<State>

Warning A temperature sensor on the backplane

board, system board, CPU, or drive carrier in the specified system exceeded its warning threshold. The sensor location, chassis location, previous state, and temperature sensor value are provided.

<State>

16 Event Message Reference

Page 17

Table 2-2. Temperature Sensor Messages (continued)

Event ID Description Severity Cause

1054 Temperature sensor detected

a failure value

Sensor location:

<Location in

chassis>

Chassis location:

<Name of

chassis>

Previous state was:

If sensor type is not discrete:

Temperature sensor value (in degrees Celsius):

If sensor type is discrete:

Discrete temperature state:

<State>

Error A temperature sensor on the backplane

board, system board, or drive carrier in the specified system exceeded its failure threshold. The sensor location, chassis location, previous state, and temperature sensor value are provided.

<State>

1055 Temperature sensor detected

a non-recoverable value

Sensor location:

<Location in

chassis>

Chassis location:

<Name of

chassis>

Previous state was:

If sensor type is not discrete:

Temperature sensor value (in degrees Celsius):

If sensor type is discrete:

Discrete temperature state:

<State>

Error A temperature sensor on the backplane

board, system board, or drive carrier in the specified system detected an error from which it cannot recover. The sensor location, chassis location, previous state, and temperature sensor value are provided.

<State>

Event Message Reference 17

Page 18

Cooling Device Messages

Cooling device sensors listed in Table 2-3 monitor how well a fan is functioning. Cooling device messages provide status and warning information for fans in a particular chassis.

Table 2-3. Cooling Device Messages

Event ID Description Severity Cause

1100 Fan sensor has failed

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Fan sensor value:

1101 Fan sensor value unknown

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Fan sensor value:

1102 Fan sensor returned to a

normal value

Sensor location:

in chassis>

Chassis location:

chassis>

Previous state was:

Fan sensor value:

1103 Fan sensor detected a

warning value

Sensor location:

in chassis>

Chassis location:

chassis>

Previous state was:

Fan sensor value:

<Location

<Name of

<State>

<Location

<Name of

<State>

Information A fan sensor in the specified system is not

functioning. The sensor location, chassis location, previous state, and fan sensor value are provided.

Information A fan sensor in the specified system could not

obtain a reading. The sensor location, chassis location, previous state, and a nominal fan sensor value are provided.

Information A fan sensor reading on the specified system

returned to a valid range after crossing a warning threshold. The sensor location, chassis location, previous state, and fan sensor value are provided.

Warning A fan sensor reading in the specified system

exceeded a warning threshold. The sensor location, chassis location, previous state, and fan sensor value are provided.

18 Event Message Reference

Page 19

Table 2-3. Cooling Device Messages (continued)

Event ID Description Severity Cause

1104 Fan sensor detected a

failure value

Sensor location:

<Location

in chassis>

Chassis location:

<Name of

Error A fan sensor in the specified system detected

the failure of one or more fans. The sensor location, chassis location, previous state, and fan sensor value are provided.

chassis>

Previous state was:

Fan sensor value:

1105 Fan sensor detected a

non-recoverable value

Sensor location:

in chassis>

Chassis location:

<State>

<Location

<Name of

Error A fan sensor detected an error from which it

cannot recover. The sensor location, chassis location, previous state, and fan sensor value are provided.

chassis>

Previous state was:

Fan sensor value:

<State>

Voltage Sensor Messages

Voltage sensors listed in Table 2-4 monitor the number of volts across critical components. Voltage sensor messages provide status and warning information for voltage sensors in a particular chassis.

Table 2-4. Voltage Sensor Messages

Event ID Description Severity Cause

1150 Voltage sensor has failed

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

If sensor type is not discrete:

Voltage sensor value (in Volts):

If sensor type is discrete:

Discrete voltage state:

<State>

Information A voltage sensor in the specified system

failed. The sensor location, chassis location, previous state, and voltage sensor value are provided.

Event Message Reference 19

Page 20

Table 2-4. Voltage Sensor Messages (continued)

Event ID Description Severity Cause

1151 Voltage sensor value unknown

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

If sensor type is not discrete:

Voltage sensor value (in Volts):

If sensor type is discrete:

Discrete voltage state:

1152 Voltage sensor returned to a

normal value

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

If sensor type is not discrete:

Voltage sensor value (in Volts):

If sensor type is discrete:

Discrete voltage state:

1153 Voltage sensor detected a

warning value

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

If sensor type is not discrete:

Voltage sensor value (in Volts):

If sensor type is discrete:

Discrete voltage state:

<State>

<State>

<State>

Information A voltage sensor in the specified system

could not obtain a reading. The sensor location, chassis location, previous state, and a nominal voltage sensor value are provided.

Information A voltage sensor in the specified system

returned to a valid range after crossing a failure threshold. The sensor location, chassis location, previous state, and voltage sensor value are provided.

Warning A voltage sensor in the specified system

exceeded its warning threshold. The sensor location, chassis location, previous state, and voltage sensor value are provided.

20 Event Message Reference

Page 21

Table 2-4. Voltage Sensor Messages (continued)

Event ID Description Severity Cause

1154 Voltage sensor detected a

failure value

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

If sensor type is not discrete:

Voltage sensor value (in Volts):

If sensor type is discrete:

Discrete voltage state:

1155 Voltage sensor detected a

non-recoverable value

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

If sensor type is not discrete:

Voltage sensor value (in Volts):

If sensor type is discrete:

Discrete voltage state:

<State>

<State>

Error A voltage sensor in the specified system

exceeded its failure threshold. The sensor location, chassis location, previous state, and voltage sensor value are provided.

Error A voltage sensor in the specified system

detected an error from which it cannot recover. The sensor location, chassis location, previous state, and voltage sensor value are provided.

Event Message Reference 21

Page 22

Current Sensor Messages

Current sensors listed in Table 2-5 measure the amount of current (in amperes) that is traversing critical components. Current sensor messages provide status and warning information for current sensors in a particular chassis.

Table 2-5. Current Sensor Messages

Event ID Description Severity Cause

1200 Current sensor has failed

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

If sensor type is not discrete:

Current sensor value (in Amps):

If sensor type is discrete:

Discrete current state:

<State>

1201 Current sensor value unknown

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

If sensor type is not discrete:

Current sensor value (in Amps):

If sensor type is discrete:

Discrete current state:

<State>

Information A current sensor on the power supply for the

specified system failed. The sensor location, chassis location, previous state, and current sensor value are provided.

Information A current sensor on the power supply for the

specified system could not obtain a reading. The sensor location, chassis location, previous state, and a nominal current sensor value are provided.

22 Event Message Reference

Page 23

Table 2-5. Current Sensor Messages (continued)

Event ID Description Severity Cause

1202 Current sensor returned to

a normal value

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

If sensor type is not discrete:

Current sensor value (in Amps):

If sensor type is discrete:

Discrete current state:

Information A current sensor on the power supply for the

specified system returned to a valid range after crossing a failure threshold. The sensor location, chassis location, previous state, and current sensor value are provided.

<State>

1203 Current sensor detected a

warning value

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

If sensor type is not discrete:

Current sensor value (in Amps):

If sensor type is discrete:

Discrete current state:

Warning A current sensor on the power supply for the

specified system exceeded its warning threshold. The sensor location, chassis location, previous state, and current sensor value are provided.

<State>

Event Message Reference 23

Page 24

Table 2-5. Current Sensor Messages (continued)

Event ID Description Severity Cause

1204 Current sensor detected a

failure value

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

If sensor type is not discrete:

Current sensor value (in Amps):

If sensor type is discrete:

Discrete current state:

Error A current sensor on the power supply for the

specified system exceeded its failure threshold. The sensor location, chassis location, previous state, and current sensor value are provided.

<State>

1205 Current sensor detected a

non-recoverable value

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

If sensor type is not discrete:

Current sensor value (in Amps):

If sensor type is discrete:

Discrete current state:

Error A current sensor in the specified system

detected an error from which it cannot recover. The sensor location, chassis location, previous state, and current sensor value are provided.

<State>

24 Event Message Reference

Page 25

Chassis Intrusion Messages

Chassis intrusion messages listed in Table 2-6 are a security measure. Chassis intrusion means that someone is opening the cover to a system’s chassis. Alerts are sent to prevent unauthorized removal of parts from a chassis.

Table 2-6. Chassis Intrusion Messages

Event ID Description Severity Cause

1250 Chassis intrusion sensor has

failed

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Chassis intrusion state:

1251 Chassis intrusion sensor

value unknown

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Chassis intrusion state:

1252 Chassis intrusion returned

to normal

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Chassis intrusion state:

Information A chassis intrusion sensor in the specified

system failed. The sensor location, chassis location, previous state, and chassis intrusion state are provided.

Information A chassis intrusion sensor in the specified

system could not obtain a reading. The sensor location, chassis location, previous state, and chassis intrusion state are provided.

Information A chassis intrusion sensor in the specified

system detected that a cover was opened while the system was operating but has since been replaced. The sensor location, chassis location, previous state, and chassis intrusion state are provided.

Event Message Reference 25

Page 26

Table 2-6. Chassis Intrusion Messages (continued)

Event ID Description Severity Cause

1253 Chassis intrusion in

progress

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Chassis intrusion state:

1254 Chassis intrusion detected

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Chassis intrusion state:

1255 Chassis intrusion sensor

detected a non-recoverable value

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Chassis intrusion state:

Warning A chassis intrusion sensor in the specified

system detected that a system cover is currently being opened and the system is operating. The sensor location, chassis location, previous state, and chassis intrusion state are provided.

Error A chassis intrusion sensor in the specified

system detected that the system cover was opened while the system was operating. The sensor location, chassis location, previous state, and chassis intrusion state are provided.

Error A chassis intrusion sensor in the specified

system detected an error from which it cannot recover. The sensor location, chassis location, previous state, and chassis intrusion state are provided.

Redundancy Unit Messages

Redundancy means that a system chassis has more than one of certain critical components. Fans and power supplies, for example, are so important for preventing damage or disruption of a computer system that a chassis may have “extra” fans or power supplies installed. Redundancy allows a second or nth fan to keep the chassis components at a safe temperature when the primary fan has failed. Redundancy is normal when the intended number of critical components are operating. Redundancy is degraded when a component fails but others are still operating. Redundancy is lost when the number of components functioning falls below the redundancy threshold.

26 Event Message Reference

Ta b l e 2-7 lists the redundancy unit messages.

Page 27

The number of devices required for full redundancy is provided as part of the message, when applicable, for the redundancy unit and the platform. For details on redundancy computation, see the respective platform documentation.

Table 2-7. Redundancy Unit Messages

Event ID Description Severity Cause

1300 Redundancy sensor has failed

Redundancy unit:

<Redundancy

location in chassis>

Chassis location: <Name of chassis>

Previous redundancy state was:

<State>

1301 Redundancy sensor value

unknown

Redundancy unit:

<Redundancy

location in chassis>

Chassis location: <Name of chassis>

Previous redundancy state was:

<State>

1302 Redundancy not applicable

Redundancy unit:

<Redundancy

location in chassis>

Chassis location: <Name of chassis>

Previous redundancy state was:

<State>

1303 Redundancy is offline

Redundancy unit:

<Redundancy

location in chassis>

Chassis location: <Name of chassis>

Previous redundancy state was:

<State>

Information A redundancy sensor in the specified system

failed. The redundancy unit location, chassis location, previous redundancy state, and the number of devices required for full redundancy are provided.

Information A redundancy sensor in the specified system

could not obtain a reading. The redundancy unit location, chassis location, previous redundancy state, and the number of devices required for full redundancy are provided.

Information A redundancy sensor in the specified system

detected that a unit was not redundant. The redundancy location, chassis location, previous redundancy state, and the number of devices required for full redundancy are provided.

Information A redundancy sensor in the specified system

detected that a redundant unit is offline. The redundancy unit location, chassis location, previous redundancy state, and the number of devices required for full redundancy are provided.

Event Message Reference 27

Page 28

Table 2-7. Redundancy Unit Messages (continued)

Event ID Description Severity Cause

1304 Redundancy regained

Redundancy unit:

<Redundancy

location in chassis>

Chassis location: <Name of chassis>

Previous redundancy state was:

Information A redundancy sensor in the specified system

detected that a “lost” redundancy device has been reconnected or replaced; full redundancy is in effect. The redundancy unit location, chassis location, previous redundancy state, and the number of devices required for full redundancy are provided.

<State>

1305 Redundancy degraded

Redundancy unit:

<Redundancy

location in chassis>

Chassis location: <Name of chassis>

Previous redundancy state was:

Warning A redundancy sensor in the specified system

detected that one of the components of the redundancy unit has failed but the unit is still redundant. The redundancy unit location, chassis location, previous redundancy state, and the number of devices required for full redundancy are provided.

<State>

1306 Redundancy lost

Redundancy unit:

location in chassis>

Chassis location: <Name of chassis>

Previous redundancy state was:

<Redundancy

Warnin g o r Error (depending on the number of units that are functional)

A redundancy sensor in the specified system detected that one of the components in the redundant unit has been disconnected, has failed, or is not present. The redundancy unit location, chassis location, previous redundancy state, and the number of devices required for full redundancy are provided.

<State>

28 Event Message Reference

Page 29

Power Supply Messages

Power supply sensors monitor how well a power supply is functioning. Power supply messages listed in Ta b l e 2-8 provide status and warning information for power supplies present in a particular chassis.

Table 2-8. Power Supply Messages

Event ID Description Severity Cause

1350 Power supply sensor has

failed Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Power Supply type:

power supply>

If in configuration error state:

Configuration error type:

1351 Power supply sensor value

unknown

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Power Supply type:

power supply>

If in configuration error state:

Configuration error type:

<type of

Information A power supply sensor in the specified

system failed. The sensor location, chassis location, previous state, and additional power supply status information are provided.

Information A power supply sensor in the specified

system could not obtain a reading. The sensor location, chassis location, previous state, and additional power supply status information are provided.

Event Message Reference 29

Page 30

Table 2-8. Power Supply Messages (continued)

Event ID Description Severity Cause

1352 Power supply returned to

normal Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Power Supply type:

<type of

Information A power supply has been reconnected or

replaced. The sensor location, chassis location, previous state, and additional power supply status information are provided.

power supply>

If in configuration error state:

Configuration error type:

1353 Power supply detected a

warning Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Power Supply type:

<type of

Warning A power supply sensor reading in the

specified system exceeded a user-definable warning threshold. The sensor location, chassis location, previous state, and additional power supply status information are provided.

power supply>

If in configuration error state:

Configuration error type:

30 Event Message Reference

Page 31

Table 2-8. Power Supply Messages (continued)

Event ID Description Severity Cause

1354 Power supply detected a failure

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Power Supply type:

<type of

Error A power supply has been disconnected or

has failed. The sensor location, chassis location, previous state, and additional power supply status information are provided.

power supply>

If in configuration error state:

Configuration error type:

<type

of configuration error>

1355 Power supply sensor detected

a non-recoverable value

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was: <State>

Power Supply type:

<type of

Error A power supply sensor in the specified system

detected an error from which it cannot recover. The sensor location, chassis location, previous state, and additional power supply status information are provided.

power supply>

If in configuration error state:

Configuration error type:

Event Message Reference 31

Page 32

Memory Device Messages

Memory device messages listed in Table 2-9 provide status and warning information for memory modules present in a particular system. Memory devices determine health status by monitoring the ECC memory correction rate and the type of memory events that have occurred.

NOTE: A critical status does not always indicate a system failure or loss of data. In some instances, the system has

exceeded the ECC correction rate. Although the system continues to function, you should perform system maintenance as described in Table

NOTE: In Table 2-9, <status> can be either critical or non-critical.

Table 2-9. Memory Device Messages

Event ID Description Severity Cause

1403 Memory device status is

Possible memory module event cause:

1404 Memory device status is

Possible memory module event cause: <list of causes>

Memory device location:

Memory device location:

2-9.

Warning A memory device correction rate

exceeded an acceptable value. The memory device status and location are provided.

Error A memory device correction rate

exceeded an acceptable value, a memory spare bank was activated, or a multibit ECC error occurred. The system continues to function normally (except for a multibit error). Replace the memory module identified in the message during the system’s next scheduled maintenance. Clear the memory error on multibit ECC error. The memory device status and location are provided.

32 Event Message Reference

Page 33

Fan Enclosure Messages

Some systems are equipped with a protective enclosure for fans. Fan enclosure messages listed in Ta b l e 2-10 monitor whether foreign objects are present in an enclosure and how long a fan enclosure is missing from a chassis.

Table 2-10. Fan Enclosure Messages

Event ID Description Severity Cause

1450 Fan enclosure sensor has

failed

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

1451 Fan enclosure sensor value

unknown

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

1452 Fan enclosure inserted into

system

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

1453 Fan enclosure removed from

system

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Information The fan enclosure sensor in the specified

system failed. The sensor location and chassis location are provided.

Information The fan enclosure sensor in the specified

system could not obtain a reading. The sensor location and chassis location are provided.

Information A fan enclosure has been inserted into the

specified system. The sensor location and chassis location are provided.

Warning A fan enclosure has been removed from the

specified system. The sensor location and chassis location are provided.

Event Message Reference 33

Page 34

Table 2-10. Fan Enclosure Messages (continued)

Event ID Description Severity Cause

1454 Fan enclosure removed from

system for an extended amount of time

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

1455 Fan enclosure sensor

detected a non-recoverable value

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Error A fan enclosure has been removed from the

specified system for a user-definable length of time. The sensor location and chassis location are provided.

Error A fan enclosure sensor in the specified system

detected an error from which it cannot recover. The sensor location and chassis location are provided.

AC Power Cord Messages

AC power cord messages listed in Table 2-11 provide status and warning information for power cords that are part of an AC power switch, if your system supports AC switching.

Table 2-11. AC Power Cord Messages

Event ID Description Severity Cause

1500 AC power cord sensor has

failed Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

1501 AC power cord is not being

monitored

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Information An AC power cord sensor in the specified

Information The AC power cord status is not being

34 Event Message Reference

system failed. The AC power cord status cannot be monitored. The sensor location and chassis location information are provided.

monitored. This occurs when a system’s expected AC power configuration is set to nonredundant. The sensor location and chassis location information are provided.

Page 35

Table 2-11. AC Power Cord Messages (continued)

Event ID Description Severity Cause

1502 AC power has been restored

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

1503 AC power has been lost

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

1504 AC power has been lost

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

1505 AC power has been lost

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Information An AC power cord that did not have

AC power has had the power restored. The sensor location and chassis location information are provided.

Warning An AC power cord has lost its power, but

there is sufficient redundancy to classify this as a warning. The sensor location and chassis location information are provided.

Error An AC power cord has lost its power, and

lack of redundancy requires this to be classified as an error. The sensor location and chassis location information are provided.

Error An AC power cord sensor in the specified

system failed. The AC power cord status cannot be monitored. The sensor location and chassis location information are provided.

Hardware Log Sensor Messages

Hardware logs provide hardware status messages to systems management software. On certain systems, the hardware log is implemented as a circular queue. When the log becomes full, the oldest status messages are overwritten when new status messages are logged. On some systems, the log is not circular. On these systems, when the log becomes full, subsequent hardware status messages are lost. Hardware log sensor messages listed in logs that may fill up, resulting in lost status messages.

Ta b l e 2-12 provide status and warning information about the noncircular

Event Message Reference 35

Page 36

Table 2-12. Hardware Log Sensor Messages

Event ID Description Severity Cause

1550 Log monitoring has been

disabled

Log type:

1551 Log status is unknown

Log type:

1552 Log size is no longer near

or at capacity

Log type:

1553 Log size is near or at

capacity

Log type:

1554 Log size is full

Log type:

1555 Log sensor has failed

Log type:

Information A hardware log sensor in the specified

system is disabled. The log type information is provided.

Information A hardware log sensor in the specified

system could not obtain a reading. The log type information is provided.

Information The hardware log on the specified system is

no longer near or at its capacity, usually as the result of clearing the log. The log type information is provided.

Warning The size of a hardware log on the specified

system is near or at the capacity of the hardware log. The log type information is provided.

Error The size of a hardware log on the specified

system is full. The log type information is provided.

Error A hardware log sensor in the specified

system failed. The hardware log status cannot be monitored. The log type information is provided.

36 Event Message Reference

Page 37

Processor Sensor Messages

Processor sensors monitor how well a processor is functioning. Processor messages listed in Table 2-13 provide status and warning information for processors in a particular chassis.

Table 2-13. Processor Sensor Messages

Event ID Description Severity Cause

1600 Processor sensor has failed

Sensor Location:

chassis>

Chassis Location:

chassis>

Previous state was:

Processor sensor status:

1601 Processor sensor value

unknown Sensor Location:

Chassis Location:

chassis>

Previous state was:

Processor sensor status:

1602 Processor sensor returned to

a normal value

Sensor Location:

chassis>

Chassis Location:

chassis>

Previous state was:

Processor sensor status:

<Location in

<Name of

<State>

<Name of

<State>

<Location in

<Name of

<State>

Information A processor sensor in the specified system is

not functioning. The sensor location, chassis location, previous state and processor sensor status are provided.

Information A processor sensor in the specified system

could not obtain a reading. The sensor location, chassis location, previous state and processor sensor status are provided.

Information A processor sensor in the specified system

transitioned back to a normal state. The sensor location, chassis location, previous state and processor sensor status are provided.

Event Message Reference 37

Page 38

Table 2-13. Processor Sensor Messages (continued)

Event ID Description Severity Cause

1603 Processor sensor detected a

warning value

Sensor Location:

<Location in

chassis>

Chassis Location:

<Name of

Warning A processor sensor in the specified system is

in a throttled state. The sensor location, chassis location, previous state and processor sensor status are provided.

chassis>

Previous state was:

Processor sensor status:

<State>

1604 Processor sensor detected a

failure value

Sensor Location:

<Location in

chassis>

Chassis Location:

<Name of

Error A processor sensor in the specified system is

disabled, has a configuration error, or experienced a thermal trip. The sensor location, chassis location, previous state and processor sensor status are provided.

chassis>

Previous state was:

Processor sensor status:

<State>

1605 Processor sensor detected a

non-recoverable value

Sensor Location:

<Location in

chassis>

Chassis Location:

<Name of

Error A processor sensor in the specified system

has failed. The sensor location, chassis location, previous state and processor sensor status are provided.

chassis>

Previous state was:

Processor sensor status:

<State>

38 Event Message Reference

Page 39

Pluggable Device Messages

The pluggable device messages listed in Table 2-14 provide status and error information when some devices, such as memory cards, are added or removed.

Table 2-14. Pluggable Device Messages

Event ID Description Severity Cause

1650

1651 Device added to system

1652 Device removed from system

1653 Device configuration error

Device location:

if available>

Chassis location:

if available>

Additional details:

details for the events, if available>

Device location:

chassis>

Chassis location:

Additional details:

details for the events>

Device location:

chassis>

Chassis location:

chassis>

Additional details:

details for the events>

detected

Device location:

chassis>

Chassis location:

chassis>

Additional details:

details for the events>

<Location in chassis,

<Name of chassis,

<Additional

<Location in

<Additional

<Location in

<Name of

<Additional

<Location in

<Name of

<Additional

Information A pluggable device event

message of unknown type was received. The device location, chassis location, and additional event details, if available, are provided.

Information A device was added in the

specified system. The device location, chassis location, and additional event details, if available, are provided.

Information A device was removed from

the specified system. The device location, chassis location, and additional event details, if available, are provided.

Error A configuration error was

detected for a pluggable device in the specified system. The device may have been added to the system incorrectly.

Event Message Reference 39

Page 40

Battery Sensor Messages

Battery sensors monitor how well a battery is functioning. Battery messages listed in Table 2-15 provide status and warning information for batteries in a particular chassis.

Table 2-15. Battery Sensor Messages

Event ID Description Severity Cause

1700 Battery sensor has failed

Sensor location: <Location in chassis>

Chassis location: <Name of chassis>

Previous state was:

Battery sensor status:

1701 Battery sensor value unknown

Sensor Location:

Chassis Location:

Previous state was:

Battery sensor status:

1702 Battery sensor returned to a normal

value

Sensor Location:

Chassis Location:

Previous state was:

Battery sensor status:

1703 Battery sensor detected a warning

value

Sensor Location:

Chassis Location:

Previous state was:

Battery sensor status:

<State>

<State>

<State>

<State>

Information A battery sensor in the

specified system is not functioning. The sensor location, chassis location, previous state, and battery sensor status are provided.

Information A battery sensor in the

specified system could not retrieve a reading. The sensor location, chassis location, previous state, and battery sensor status are provided.

Information A battery sensor in the

specified system detected that a battery transitioned back to a normal state. The sensor location, chassis location, previous state, and battery sensor status are provided.

Warning A battery sensor in the

specified system detected that a battery is in a predictive failure state. The sensor location, chassis location, previous state, and battery sensor status are provided.

40 Event Message Reference

Page 41

Table 2-15. Battery Sensor Messages (continued)

Event ID Description Severity Cause

1704 Battery sensor detected a failure

value

Sensor Location:

Chassis Location:

Previous state was:

Battery sensor status:

1705 Battery sensor detected a non-

recoverable value

Sensor Location:

Chassis Location:

Previous state was:

Battery sensor status:

<State>

<State>

Error A battery sensor in the

specified system detected that a battery has failed. The sensor location, chassis location, previous state, and battery sensor status are provided.

Error A battery sensor in the

specified system detected that a battery has failed. The sensor location, chassis location, previous state, and battery sensor status are provided.

Event Message Reference 41

Page 42

42 Event Message Reference

Page 43

System Event Log Messages for IPMI Systems

The following tables list the system event log (SEL) messages, their severity, and cause.

NOTE: For corrective actions, see the appropriate documentation.

Temperature Sensor Events

The temperature sensor event messages help protect critical components by alerting the systems management console when the temperature rises inside the chassis. These event messages use additional variables, such as sensor location, chassis location, previous state, and temperature sensor

value or state.

Table 3-1. Temperature Sensor Events

Event Message Severity Cause

Sensor Name/Location

temperature sensor detected a failure <

Name/Location

that this sensor is monitoring. For example, "PROC Temp" or "Planar Temp."

Reading is specified in degree Celsius. For example 100 C.

<Sensor Name/Location

temperature sensor detected a warning <

Sensor Name/Location>

temperature sensor returned to warning state <

Sensor Name/Location

temperature sensor returned to normal state <

Reading

> is the entity

Reading

> where <

Reading

Sensor

Critical Temperature of the backplane board, system

board, or the carrier in the specified system <Sensor Name/Location> exceeded the critical threshold.

Warning Temperature of the backplane board, system

board, or the carrier in the specified system <Sensor Name/Location> exceeded the non-critical threshold.

Warning Temperature of the backplane board, system

board, or the carrier in the specified system <Sensor Name/Location> returned from critical state to non-critical state.

Information Temperature of the backplane board, system

board, or the carrier in the specified system <Sensor Name/Location> returned to normal operating range.

System Event Log Messages for IPMI Systems 43

Page 44

Voltage Sensor Events

The voltage sensor event messages monitor the number of volts across critical components. These

messages provide status and warning information for voltage sensors for a particular chassis.

Table 3-2. Voltage Sensor Events

Event Message Severity Cause

Sensor Name/Location

sensor detected a failure < where < entity that this sensor is monitoring.

Reading is specified in volts. For example, 3.860 V.

Sensor Name/Location

< sensor state asserted.

Sensor Name/Location

sensor state de-asserted.

Sensor Name/Location

< sensor detected a warning <

Reading

Sensor Name/Location

< sensor returned to normal <

Reading

Sensor Name/Location

> voltage

Reading

> is the

> voltage

Critical The voltage of the monitored device has

Critical The voltage specified by

Information The voltage of a previously reported

Warning Voltage of the monitored entity

Information The voltage of a previously reported

exceeded the critical threshold.

<Sensor Name/Location> is in critical state.

<Sensor Name/Location> is returned to normal state.

<Sensor Name/Location> exceeded the warning threshold.

<Sensor Name/Location> is returned to normal state.

44 System Event Log Messages for IPMI Systems

Page 45

Fan Sensor Events

The cooling device sensors monitor how well a fan is functioning. These messages provide status warning and failure messages for fans for a particular chassis.

Table 3-3. Fan Sensor Events

Event Message Severity Cause

Sensor Name/Location

sensor detected a failure <

Reading

Name/Location

that this sensor is monitoring. For example "BMC Back Fan" or "BMC Front Fan."

Reading is specified in RPM. For example, 100 RPM.

> where <

> is the entity

<Sensor Name/Location

sensor returned to normal state

Reading

Sensor Name/Location

< sensor detected a warning

Reading

Sensor Name/Location

Redundancy sensor redundancy degraded.

Sensor Name/Location

< Redundancy sensor redundancy lost.

<Sensor Name/Location> Fan Redundancy sensor redundancy regained

> Fan

Sensor

> Fan

Critical The speed of the specified <Sensor Name/Location>

fan is not sufficient to provide enough cooling to the system.

Information The fan specified by <Sensor Name/Location> has

returned to its normal operating speed.

Warning The speed of the specified <Sensor Name/Location>

fan may not be sufficient to provide enough cooling to the system.

Information The fan specified by <Sensor Name/Location> may

have failed and hence, the redundancy has been degraded.

Critical The fan specified by <Sensor Name/Location> may

have failed and hence, the redundancy that was degraded previously has been lost.

Information The fan specified by <Sensor Name/Location> may

have started functioning again and hence, the redundancy has been regained.

System Event Log Messages for IPMI Systems 45

Page 46

Processor Status Events

The processor status messages monitor the functionality of the processors in a system. These messages provide processor health and warning information of a system.

Table 3-4. Processor Status Events

Event Message Severity Cause

Processor Entity

sensor IERR, where <

Entity

generated the event. For example, PROC for a single processor system and PROC # for multiprocessor system.

< sensor Thermal Trip.

< sensor recovered from IERR.

< sensor disabled.

< sensor terminator not present.

> is the processor that

Processor Entity

< Processor Entity>

deasserted.

asserted.

was deasserted.

error was asserted.

error was deasserted.

asserted.

deasserted.

> status processor

Processor

> status processor

presence was

thermal tripped

configuration

throttled was

Critical IERR internal error generated by the

<Processor Entity>.

Critical The processor generates this event before it

shuts down because of excessive heat caused by lack of cooling or heat synchronization.

Information This event is generated when a processor

recovers from the internal error.

Warning This event is generated for all processors that

are disabled.

Information This event is generated if the terminator is

missing on an empty processor slot.

Critical This event is generated when the system

could not detect the processor.

Information This event is generated when the earlier

processor detection error was corrected.

Information This event is generated when the processor

has recovered from an earlier thermal condition.

Critical This event is generated when the processor

configuration is incorrect.

Information This event is generated when the earlier

processor configuration error was corrected.

Warning This event is generated when the processor

slows down to prevent over heating.

Information This event is generated when the earlier

processor throttled event was corrected.

46 System Event Log Messages for IPMI Systems

Page 47

Power Supply Events

The power supply sensors monitor the functionality of the power supplies. These messages provide status and warning information for power supplies for a particular system.

Table 3-5. Power Supply Events

Event Message Severity Cause

Power Supply Sensor Name

supply sensor removed.

Power Supply Sensor Name

supply sensor AC recovered.

Power Supply Sensor Name

supply sensor returned to normal state.

Entity Name

< sensor redundancy degraded.

Entity Name

sensor redundancy lost.

Entity Name

sensor redundancy regained.

> PS Redundancy

predictive failure was asserted

lost was asserted

predictive failure was deasserted

lost was deasserted

> power

input

Critical This event is generated when the power supply

sensor is removed.

Information This event is generated when the power supply

has been replaced.

Information This event is generated when the power supply

that failed or removed was replaced and the state has returned to normal.

Information Power supply redundancy is degraded if one of

the power supply sources is removed or failed.

Critical Power supply redundancy is lost if only one

power supply is functional.

Information This event is generated if the power supply has

been reconnected or replaced.

Warning This event is generated when the power supply

is about to fail.

Critical This event is generated when the power supply

is unplugged.

Information This event is generated when the power

supply has recovered from an earlier predictive failure event.

Information This event is generated when the power supply

is plugged in.

System Event Log Messages for IPMI Systems 47

Page 48

Memory ECC Events

The memory ECC event messages monitor the memory modules in a system. These messages monitor the ECC memory correction rate and the type of memory events that occurred.

Table 3-6. Memory ECC Events

Event Message Severity Cause

ECC error correction detected on Bank # DIMM [A/B].

ECC uncorrectable error detected on Bank # [DIMM].

Correctable memory error logging disabled.

Information This event is generated when there is a memory error

correction on a particular Dual Inline Memory Module (DIMM).

Critical This event is generated when the chipset is unable to

correct the memory errors. Usually, a bank number is provided and DIMM may or may not be identifiable, depending on the error.

Critical This event is generated when the chipset in the ECC

error correction rate exceeds a predefined limit.

BMC Watchdog Events

The BMC watchdog operations are performed when the system hangs or crashes. These messages monitor the status and occurrence of these events in a system.

Table 3-7. BMC Watchdog Events

Event Message Severity Cause

BMC OS Watchdog timer expired. Information This event is generated when the BMC watchdog

timer expires and no action is set.

BMC OS Watchdog performed system reboot.

BMC OS Watchdog performed system power off.

BMC OS Watchdog performed system power cycle.

Critical This event is generated when the BMC watchdog

detects that the system has crashed (timer expired because no response was received from Host) and the action is set to reboot.

Critical This event is generated when the BMC watchdog

detects that the system has crashed (timer expired because no response was received from Host) and the action is set to power off.

Critical This event is generated when the BMC watchdog

detects that the system has crashed (timer expired because no response was received from Host) and the action is set to power cycle.

48 System Event Log Messages for IPMI Systems

Page 49

Memory Events

The memory modules can be configured in different ways in particular systems. These messages monitor the status, warning, and configuration information about the memory modules in the system.

Table 3-8. Memory Events

Event Message Severity Cause

Memory RAID redundancy degraded.

Memory RAID redundancy lost.

Memory RAID redundancy regained

Memory Mirrored redundancy degraded.

Memory Mirrored redundancy lost.

Memory Mirrored redundancy regained.

Memory Spared redundancy degraded.

Memory Spared redundancy lost.

Memory Spared redundancy regained.

Information This event is generated when there is a memory failure in a

RAID-configured memory configuration.

Critical This event is generated when redundancy is lost in a

RAID-configured memory configuration.

Information This event is generated when the redundancy lost or degraded

earlier is regained in a RAID-configured memory configuration.

Information This event is generated when there is a memory failure in a

mirrored memory configuration.

Critical This event is generated when redundancy is lost in a mirrored

memory configuration.

Information This event is generated when the redundancy lost or degraded

earlier is regained in a mirrored memory configuration.

Information This event is generated when there is a memory failure in a

spared memory configuration.

Critical This event is generated when redundancy is lost in a spared

memory configuration.

Information This event is generated when the redundancy lost or degraded

earlier is regained in a spared memory configuration.

Hardware Log Sensor Events

The hardware logs provide hardware status messages to the system management software. On particular systems, the subsequent hardware messages are not displayed when the log is full. These messages provide status and warning messages when the logs are full.

Table 3-9. Hardware Log Sensor Events

Event Message Severity Cause

Log full detected. Critical This event is generated when the SEL device detects that

only one entry can be added to the SEL before it is full.

Log cleared. Information This event is generated when the SEL is cleared.

System Event Log Messages for IPMI Systems 49

Page 50

Drive Events

The drive event messages monitor the health of the drives in a system. These events are generated when there is a fault in the drives indicated.

Table 3-10. Drive Events

Event Message Severity Cause

Drive < state.

Drive < fault state.

Drive

drive presence was asserted

Drive

predictive failure was asserted

Drive

predictive failure was deasserted

Drive

hot spare was asserted

Drive

hot spare was deasserted

Drive

consistency check in progress was asserted

Drive

consistency check in progress was deasserted

Drive

in critical array was asserted

Drive

in critical array was deasserted

Drive

in failed array was asserted

Drive #

> asserted fault

Drive #

> de-asserted

Critical This event is generated when the specified drive in the

array is faulty.

Information This event is generated when the specified drive

recovers from a faulty condition.

Informational This event is generated when the drive is installed.

Warning This event is generated when the drive is about to fail.

Informational This event is generated when the drive from earlier

predictive failure is corrected.

Warning This event is generated when the drive is placed in a

hot spare.

Informational This event is generated when the drive is taken out of

hot spare.

Warning This event is generated when the drive is placed in

consistency check.

Informational This event is generated when the consistency check of

the drive is completed.

Critical This event is generated when the drive is placed in

critical array.

Informational This event is generated when the drive is removed

from critical array.

Critical This event is generated when the drive is placed in the

fail array.

50 System Event Log Messages for IPMI Systems

Page 51

Table 3-10. Drive Events (continued)

Event Message Severity Cause

Drive

in failed array was deasserted

Drive

rebuild in progress was asserted

Drive

rebuild aborted was asserted

Informational This event is generated when the drive is removed

from the fail array.

Informational This event is generated when the drive is rebuilding.

Warning This event is generated when the drive rebuilding

process is aborted.

Intrusion Events

The chassis intrusion messages are a security measure. Chassis intrusion alerts are generated when the system's chassis is opened. Alerts are sent to prevent unauthorized removal of parts from the chassis.

Table 3-11. Intrusion Events

Event Message Severity Cause

Intrusion sensor Name

sensor detected an intrusion.

Intrusion sensor Name

sensor returned to normal state.

sensor intrusion was asserted while system was ON

sensor intrusion was asserted while system was OFF

Critical This event is generated when the intrusion sensor

detects an intrusion.

Information This event is generated when the earlier intrusion

has been corrected.

Critical This event is generated when the intrusion sensor

detects an intrusion while the system is on.

Critical This event is generated when the intrusion sensor

detects an intrusion while the system is off.

System Event Log Messages for IPMI Systems 51

Page 52

BIOS Generated System Events

The BIOS generated messages monitor the health and functionality of the chipsets, I/O channels, and other BIOS-related functions. These system events are generated by the BIOS.

Table 3-12. BIOS Generated System Events

Event Message Severity Cause

System Event I/O channel chk. Critical This event is generated when a critical interrupt is

generated in the I/O Channel.

System Event PCI Parity Err. Critical This event is generated when a parity error is detected

on the PCI bus.

System Event Chipset Err. Critical This event is generated when a chip error is detected.

System Event PCI System Err. Information This event indicates historical data, and is generated

when the system has crashed and recovered.

System Event PCI Fatal Err. Critical This error is generated when a fatal error is detected on

the PCI bus.

System Event PCIE Fatal Err. Critical This error is generated when a fatal error is detected on

the PCIE bus.

POST Err

POST fatal error #<number>

Memory Spared

redundancy lost

Memory Mirrored

redundancy lost

Memory RAID

redundancy lost

Err Reg Pointer

OEM Diagnostic data event was asserted

System Board PFault Fail Safe state asserted

System Board PFault Fail Safe state deasserted

Memory Add

(BANK# DIMM#) presence was asserted

Critical This event is generated when an error accrues during

system boot. See the system documentation for more information on the error code.

Critical This event is generated when memory spare is no

longer redundant.

Critical This event is generated when memory mirroring is no

longer redundant.

Critical This event is generated when memory RAID is no

longer redundant.

Information This event is generated when an OEM event accrues.

Critical This event is generated when the system board

voltages are not at normal levels.

Information This event is generated when earlier PFault Fail Safe

system voltages returns to a normal level.

Information This event is generated when memory is added to the

system.

52 System Event Log Messages for IPMI Systems

Page 53

Table 3-12. BIOS Generated System Events (continued)

Event Message Severity Cause

Memory Removed

(BANK# DIMM#) presence was asserted

Memory Cfg Err

configuration error (BANK# DIMM#) was asserted

Mem Redun Gain

redundancy regained

Mem ECC Warning

transition to non-critical from OK

Mem ECC Warning

transition to critical from less severe

Mem CRC Err

transition to non-recoverable

Mem Fatal SB CRC

uncorrectable ECC was asserted

Mem Fatal NB CRC

uncorrectable ECC was asserted

Mem Overtemp

critical over temperature was asserted

USB Over-current

transition to non-recoverable

Hdwr version err

hardware incompatibility (BMC Firmware and CPU mismatch) was asserted

Information This event is generated when memory is removed from

the system.

Critical This event is generated when memory configuration is

incorrect for the system.

Information This event is generated when memory redundancy is

regained.

Warning This event is generated when correctable ECC errors

have increased from a normal rate.

Critical This event is generated when correctable ECC errors

reach a critical rate.

Critical This event is generated when CRC errors enter a

non-recoverable state.

Critical This event is generated when CRC errors occur while

storing to memory.

Critical This event is generated when CRC errors occur while

removing from memory.

Critical This event is generated when system memory reaches

critical temperature.

Critical This event is generated when the USB exceeds a

predefined current level.

Critical This event is generated when there is a mismatch

between the BMC firmware and the processor in use or vice versa.

System Event Log Messages for IPMI Systems 53

Page 54

Table 3-12. BIOS Generated System Events (continued)

Event Message Severity Cause

Hdwr version err

hardware incompatibility (BMC Firmware and CPU mismatch) was deasserted

Hdwr version err

hardware incompatibility (BMC Firmware and other mismatch) was asserted

Hdwr version err

hardware incompatibility (BMC Firmware and CPU mismatch) was deasserted

SBE Log Disabled

correctable memory error logging disabled was asserted

CPU Protocol Err

transition to non-recoverable

CPU Bus PERR

transition to non-recoverable

CPU Init Err

transition to non-recoverable

CPU Machine Chk

transition to non-recoverable

Logging Disabled

all event logging disabled was asserted

Unknown system event sensor

unknown system hardware failure was asserted

Information This event is generated when the earlier mismatch

between the BMC firmware and the processor is corrected.

Critical This event is generated when there is a mismatch

between the BMC firmware and the processor in use or vice versa.

Information This event is generated when an earlier hardware

mismatch is corrected.

Critical This event is generated when the ECC single bit error

rate is exceeded.

Critical This event is generated when the processor protocol

enters a non-recoverable state.

Critical This event is generated when the processor bus PERR

enters a non-recoverable state.

Critical This event is generated when the processor

initialization enters a non-recoverable state.

Critical This event is generated when the processor machine

check enters a non-recoverable state.

Critical This event is generated when all event logging is

disabled.

Critical This event is generated when an unknown hardware

failure is detected.

54 System Event Log Messages for IPMI Systems

Page 55

R2 Generated System Events

Table 3-13. R2 Generated Events

Description Severity Cause

System Event: OS stop event OS graceful shutdown detected

OEM Event data record (after OS graceful shutdown/restart event)

System Event: OS stop event runtime critical stop

OEM Event data record (after OS bugcheck event)

Information The OS was shutdown/restarted

normally.

Information Comment string accompanying an

OS shutdown/restart.

Critical The OS encountered a critical error and

was stopped abnormally.

Information OS bugcheck code and paremeters.

Cable Interconnect Events

The cable interconnect messages are used for detecting errors in the hardware cabling.

Table 3-14. Cable Interconnect Events

Description Severity Cause

Configuration error was asserted.

Connection was asserted.

Critical This event is generated when the cable is

not connected or is incorrectly connected.

Information This event is generated when the earlier

cable connection error was corrected.

Battery Events

Table 3-15. Battery Events

Description Severity Cause

Failed was asserted

Failed was deasserted

is low was asserted

is low was deasserted

Critical This event is generated when the sensor

detects a failed or missing battery.

Information This event is generated when the earlier

failed battery was corrected.

Warning This event is generated when the sensor

detects a low battery condition.

Information This event is generated when the earlier

low battery condition was corrected.

System Event Log Messages for IPMI Systems 55

Page 56

Entity Presence Events

The entity presence messages are used for detecting different hardware devices.

Table 3-16. Entity Presence Events

Description Severity Cause

presence was asserted

absent was asserted

Information This event is generated when the device was detected.

Critical This event is generated when the device was not detected.

56 System Event Log Messages for IPMI Systems

Page 57

Storage Management Message Reference

The Dell OpenManage™ Server Administrator Storage Management’s alert or event management features let you monitor the health of storage resources such as controllers, enclosures, physical disks, and virtual disks.

Alert Monitoring and Logging

The Storage Management Service performs alert monitoring and logging. By default, the Storage Management Service starts when the managed system starts up. If you stop the Storage Management Service, the alert monitoring and logging stops. Alert monitoring does the following:

• Updates the status of the storage object that generated the alert.

• Propagates the storage object’s status to all the related higher objects in the storage hierarchy. For example, the status of a lower-level object will be propagated up to the status displayed on the Health tab for the top-level storage object.

• Logs an alert in the Alert log and the operating system (OS) application log.

• Sends an SNMP trap if the operating system’s SNMP service is installed and enabled.

NOTE: Dell OpenManage Server Administrator Storage Management does not log alerts regarding the data

I/O path. These alerts are logged by the respective RAID drivers in the system alert log.

See the Storage Management Online Help and the Dell OpenManage Server Administrator Storage Management User’s Guide for updated information.

Alert Message Format with Substitution Variables

When you view an alert in the Server Administrator alert log, the alert identifies the specific components such as the controller name or the virtual disk name to which the alert applies. In an actual operating environment, a storage system can have many combinations of controllers and disks as well as user-defined names for virtual disks and other components. Because each environment is unique in its storage configuration and user-defined names, an accurate alert message requires that the Storage Management Service be able to insert the environment-specific names of storage components into an alert message.

This environment-specific information is inserted after the alert message text as shown for alert

2127 in Ta b l e 4-1.

Storage Management Message Reference 57

Page 58

For other alerts, the alert message text is constructed from information passed directly from the controller (or another storage component) to the Alert Log. In these cases, the variable information is represented with a % (percent sign) in the Storage Management documentation. An example of such an alert is shown for alert 2334 in

Table 4-1. Alert Message Format

Ta b l e 4-1.

Alert ID Message Text Displayed in the Storage

Management Service Documentation

2127 Background Initialization started Background Initialization started: Virtual Disk 3 (Virtual

2334 Controller event log % Controller event log: Current capacity of the battery is

Message Text Displayed in the Alert Log with Variable Information Supplied

Disk 3) Controller 1 (PERC 5/E Adapter)

above threshold.: Controller 1 (PERC 5/E Adapter)

The variables required to complete the message vary depending on the type of storage object and whether the storage object is in a SCSI or SAS configuration. The following table identifies the possible variables used to identify each storage object.

NOTE: Some alert messages relating to an enclosure or an enclosure component, such as a fan or EMM, are

generated by the controller when the enclosure or enclosure component ID cannot be determined.

Table 4-2. Message Format with Variables for Each Storage Object

Storage Object Message Variables

A, B, C and X, Y, Z in the following examples are variables representing the storage object name or number.

Controller Message Format: Controller A (Name)

Message Format: Controller A

Example: 2326 A foreign configuration has been detected.: Controller 1 (PERC 5/E Adapter)

NOTE: The controller name is not always displayed.

Battery Message Format: Battery X Controller A

Example: 2174 The controller battery has been removed: Battery 0 Controller 1

SCSI Physical Disk Message Format: Physical Disk X:Y Controller A, Connector B

Example: 2049 Physical disk removed: Physical Disk 0:14 Controller 1, Connector 0

SAS Physical Disk Message Format: Physical Disk X:Y:Z Controller A, Connector B

Example: 2049 Physical disk removed: Physical Disk 0:0:14 Controller 1, Connector 0

58 Storage Management Message Reference

Page 59

Table 4-2. Message Format with Variables for Each Storage Object (continued)

Storage Object Message Variables

A, B, C and X, Y, Z in the following examples are variables representing the storage object name or number.

Virtual Disk Message Format: Virtual Disk X (Name) Controller A (Name)

Message Format: Virtual Disk X Controller A

Example: 2057 Virtual disk degraded: Virtual Disk 11 (Virtual Disk 11) Controller 1 (PERC 5/E Adapter)

NOTE: The virtual disk and controller names are not always displayed.

Enclosure: Message Format: Enclosure X:Y Controller A, Connector B

Example: 2112 Enclosure shutdown: Enclosure 0:2 Controller 1, Connector 0

SCSI Power Supply Message Format: Power Supply X Controller A, Connector B, Target ID C

where "C" is the SCSI ID number of the enclosure management module (EMM) managing the power supply.

Example: 2122 Redundancy degraded: Power Supply 1, Controller 1, Connector 0, Target ID 6

SAS Power Supply Message Format: Power Supply X Controller A, Connector B, Enclosure C

Example: 2312 A power supply in the enclosure has an AC failure.: Power Supply 1, Controller 1, Connector 0, Enclosure 2

SCSI Temperature Probe

SAS Temperature Probe

SCSI Fan Message Format: Fan X Controller A, Connector B, Target ID C

SAS Fan Message Format: Fan X Controller A, Connector B, Enclosure C

SCSI EMM Message Format: EMM X Controller A, Connector B, Target ID C

Message Format: Temperature Probe X Controller A, Connector B, Target ID C

where "C" is the SCSI ID number of the EMM managing the temperature probe.

Example: 2101 Temperature dropped below the minimum warning threshold: Temperature Probe 1, Controller 1, Connector 0, Target ID 6

Message Format: Temperature Probe X Controller A, Connector B, Enclosure C

Example: 2101 Temperature dropped below the minimum warning threshold: Temperature Probe 1, Controller 1, Connector 0, Enclosure 2

where "C" is the SCSI ID number of the EMM managing the fan.

Example: 2121 Device returned to normal: Fan 1, Controller 1, Connector 0, Target ID 6

Example: 2121 Device returned to normal: Fan 1, Controller 1, Connector 0, Enclosure 2

where "C" is the SCSI ID number of the EMM.

Example: 2121 Device returned to normal: EMM 1, Controller 1, Connector 0, Target ID 6

Storage Management Message Reference 59

Page 60

Table 4-2. Message Format with Variables for Each Storage Object (continued)

Storage Object Message Variables

A, B, C and X, Y, Z in the following examples are variables representing the storage object name or number.

SAS EMM Message Format: EMM X Controller A, Connector B, Enclosure C

Example: 2121 Device returned to normal: EMM 1, Controller 1, Connector 0, Enclosure 2

Alert Message Change History

The following table describes changes made to the Storage Management alerts from the previous release of Storage Management to the current release.

Table 4-3. Alert Message Change History

Alert Message Change History

Storage Management 2.2 Comments

Product Versions to which Changes Apply

Reduction of unnecessary alert generation

Modified Alerts 2095 Severity changed to Informational. SNMP trap

Storage Management 2.2

Server Administrator 3.2

Dell OpenManage™ 5.2

Enhancements to Storage Management avoid numerous redundant or inappropriate alerts posted to the Alert Log after an unexpected system shutdown.

2153 Severity changed to Informational. SNMP trap

2188 Severity changed to Informational. SNMP trap

2192 Changed documentation for cause and

2202 Severity changed to Informational. SNMP trap

2204 Severity changed to Informational. SNMP trap

In previous versions of Storage Management, an unexpected system shutdown may have caused the controller to repost a large number of alerts to the Alert Log when restarting the system.

changed to 901.

changed to 851.

changed to 1151.

corrective action.

changed to 901.

60 Storage Management Message Reference

Page 61

Table 4-3. Alert Message Change History

Alert Message Change History

2205 Severity changed to Informational. SNMP trap

2266 SNMP traps changed to 751, 801, 851, 901,

2272 Severity changed to Critical. SNMP trap

2273 Changed alert message text and

2279 Changed alert message text.

2299 Changed corrective action information in the

2305 Changed severity to Warning. Changed SNMP

2331 Changed severity to Informational. Changed

2367 Changed severity to Warning. Changed SNMP

Obsolete Alerts 2333

2354 2354 replaced by 2368.

2355

2365

2370

Documentation Changes

Severity for alert 2163 changed from Ok/Normal to Critical/Failure/Error.

Severity for alert 2318 changed from Critical/Failure/Error to Warning/Noncritical.

changed to 901.

951, 1001, 1051, 1101, 1151, 1201.

changed to 904. Changed corrective action information in the documentation.

documentation for cause and corrective action.

documentation.

trap number to 903.

SNMP trap number to 901.

trap number to 903.

Documentation change only made in the Dell

OpenManage Server Administrator Messages Reference Guide to reflect the severity

displayed in the Server Administrator Alert Log and documented in the Storage Management online help.

Documentation change only made in the Dell

OpenManage Server Administrator Messages Reference Guide to reflect the severity

displayed in the Server Administrator Alert Log and documented in the Storage Management online help.

Storage Management Message Reference 61

Page 62

Table 4-3. Alert Message Change History

Alert Message Change History

Removed alert 2344. Replaced by alert 2070.

Removed alert 2345. Replaced by alert

2079.

Storage Management 2.1 Comments

Product Versions to which Changes Apply

New Alerts 2062 (see note)

Modified Alerts 2049, 2050, 2051, 2052, 2065, 2074, 2080,

Obsolete Alerts 2160

Storage Management 2.1

Server Administrator 2.4

Dell OpenManage™ 5.1

2173

2195

2196

2212

2213

2214

2215

2260 (see note)

2370

2371

2083, 2089, 2092, 2141, 2158, 2249, 2251, 2252, 2255, 2269, 2270, 2274, 2303, 2305, 2309, 2361, 2362, 2363

2161

Documentation change only made in the Dell

OpenManage Server Administrator Messages Reference Guide to reflect existing Storage

Management online help.

Documentation change only made in the Dell

OpenManage Server Administrator Messages Reference Guide to reflect existing Storage

Management online help.

The alert numbers for the new alerts 2062–2260 were previously unassigned.

Alert numbers 2370 and 2371 are new.

NOTE: Alerts 2062 and 2260 were previously

undocumented in the Storage Management online help, Dell OpenManage Server

Administrator Storage Management User’s Guide, and the Dell OpenManage Server Administrator Messages Reference Guide.

The term “array disk” has been changed to “physical disk” throughout Storage Management. This change affects the message text of the modified alerts.

2160 replaced by 2195.

2161 replaced by 2196.

62 Storage Management Message Reference

Page 63

Table 4-3. Alert Message Change History

Alert Message Change History

Documentation Changes

Documentation updated to indicate clear alert status.

Reference to SNMP trap variables removed.

Corresponding Array Manager event numbers removed (see comments).

Starting with Dell OpenManage 5.0, Array Manager is no longer an installable option. If you have an Array Manager installation and wish to see how the Array Manager events correspond to the Storage Management alerts, refer to the product documentation prior to Storage Management 2.1 or Dell OpenManage 5.1.

Alert Descriptions and Corrective Actions

The following sections describe alerts generated by the RAID or SCSI controllers supported by Storage Management. The alerts are displayed in the Server Administrator Alert subtab or through Windows Event Viewer. These alerts can also be forwarded as SNMP traps to other applications.

SNMP traps are generated for the alerts listed in the following sections. These traps are included in the Dell OpenManage Server Administrator Storage Management management information base (MIB). The SNMP traps for these alerts use all of the SNMP trap variables. For more information on SNMP support and the MIB, see the SNMP Reference Guide.

To locate an alert, scroll through the following table to find the alert number displayed on the Server Administrator Alert tab or search this file for the alert message text or number. See Event Messages" for more information on severity levels.

For more information regarding alert descriptions and the appropriate corrective actions, see the online help.

"Understanding

Table 4-4. Storage Management Messages

Event IDDescription Severity Cause and Action Clear

Event Number

2048 Device failed Critical /

Failure / Error

Cause: A storage component such as a physical disk or an enclosure has failed. The failed component may have been identified by the controller while performing a task such as a rescan or a check consistency.

Action: Replace the failed component. You can identify which disk has failed by locating the disk that has a red “X” for its status. Perform a rescan after replacing the disk.

Storage Management Message Reference 63

2121 754

SNMP Tra p Numbers

804 854 904 954 1004 1054 1104 1154 1204

Page 64

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2049 Physical disk

removed

2050 Physical disk offline Warning /

2051 Physical disk

degraded

2052 Physical disk

inserted

Warning / Non-critical

Non-critical

Warning / Non-critical

Ok / Normal Cause: This alert is for informational purposes.

Cause: A physical disk has been removed from the disk group. This alert can also be caused by loose or defective cables or by problems with the enclosure.

Action: If a physical disk was removed from the disk group, either replace the disk or restore the original disk. On some controllers, a removed disk has a red "X" for its status. On other controllers, a removed disk may have an Offline status or is not displayed on the user interface. Perform a rescan after replacing or restoring the disk. If a disk has not been removed from the disk group, then check for problems with the cables. online help the cables. Make sure that the enclosure is powered on. If the problem persists, check the enclosure documentation for further diagnostic information.

Cause: A physical disk in the disk group is offline. A user may have manually put the physical disk offline.

Action: Perform a rescan. You can also select the offline disk and perform a Make Online operation.

Cause: A physical disk has reported an error condition and may be degraded. The physical disk may have reported the error condition in response to a consistency check or other operation.

Action: Replace the degraded physical disk. You can identify which disk is degraded by locating the disk that has a red "X" for its status. Perform a rescan after replacing the disk.

Action: None

for more information on checking

See the

2052 903

2158 903

None 903

None 901

SNMP Tra p Numbers

64 Storage Management Message Reference

Page 65

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2053 Virtual disk created Ok / Normal Cause: This alert is for informational purposes.

Action: None

2054 Virtual disk deleted Warning /

Non-critical

2055 Virtual disk

configuration changed

2056 Virtual disk failed Critical /

Ok / Normal Cause: This alert is for informational purposes.

Failure / Error

Cause: A virtual disk has been deleted. "Performing a Reset Configuration" may detect that a virtual disk has been deleted and generate this alert.

Action: None

Cause: One or more physical disks included in the virtual disk have failed. If the virtual disk is non-redundant (does not use mirrored or parity data), then the failure of a single physical disk can cause the virtual disk to fail. If the virtual disk is redundant, then more physical disks have failed than can be rebuilt using mirrored or parity information.

Create a new virtual disk and restore

Action:

from a backup.

None 1201

None 1203

None 1201

None 1204

SNMP Tra p Numbers

Storage Management Message Reference 65

Page 66

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2057 Virtual disk degraded Warning /

Non-critical

2058 Virtual disk check

consistency started

2059 Virtual disk format

started

2061 Virtual disk

initialization started

2062 Physical disk

initialization started

2063 Virtual disk

reconfiguration started

Ok / Normal Cause: This alert is for informational purposes.

Cause 1: This alert message occurs when a physical disk included in a redundant virtual disk fails. Because the virtual disk is redundant (uses mirrored or parity information) and only one physical disk has failed, the virtual disk can be rebuilt.

Action 1: Configure a hot spare for the virtual disk if one is not already configured. Rebuild the virtual disk. When using an Expandable RAID Controller (PERC) PERC 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4e/DC, 4/Di, CERC ATA100/4ch, PERC 5/E, PERC 5/i or a Serial Attache SCSI (SAS) 5/iR controller, rebuild the virtual disk by first configuring a hot spare for the disk, and then initiating a write operation to the disk. The write operation will initiate a rebuild of the disk.

Cause 2: A physical disk in the disk group has been removed.

Action 2: If a physical disk was removed from the disk group, either replace the disk or restore the original disk. You can identify which disk has been removed by locating the disk that has a red “X” for its status. Perform a rescan after replacing the disk.

Action: None

Action: None.

Action: None

None 1203

2085 1201

2086 1201

2088 1201

2089 901

2090 1201

SNMP Tra p Numbers

66 Storage Management Message Reference

Page 67

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2064 Virtual disk rebuild

started

2065 Physical disk rebuild

started

2067 Virtual disk check

consistency cancelled

2070 Virtual disk

initialization cancelled

2074 Physical disk rebuild

cancelled

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: The check consistency operation

cancelled because a physical disk in the array has failed or because a user cancelled the check consistency operation.

Action: If the physical disk failed, then replace the physical disk. You can identify which disk failed by locating the disk that has a red “X” for its status. Perform a rescan after replacing the disk. When performing a consistency check, be aware that the consistency check can take a long time. The time it takes depends on the size of the physical disk or the virtual disk.

Ok / Normal Cause: The virtual disk initialization cancelled

because a physical disk included in the virtual disk has failed or because a user cancelled the virtual disk initialization.

Action: If a physical disk failed, then replace the physical disk. You can identify which disk has failed by locating the disk that has a red “X” for its status. Perform a rescan after replacing the disk. Restart the format physical disk operation. Restart the virtual disk initialization.

Ok / Normal Cause: A user has cancelled the rebuild

operation.

Action: Restart the rebuild operation.

2091 1201

2092 901

None 1201

None 901

SNMP Tra p Numbers

Storage Management Message Reference 67

Page 68

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2076 Virtual disk check

consistency failed

2077 Virtual disk format

failed.

2079 Virtual disk

initialization failed

2080 Physical disk

initialize failed

2081 Virtual disk

reconfiguration failed

Critical / Failure / Error

Cause: A physical disk included in the virtual disk failed or there is an error in the parity information. A failed physical disk can cause errors in parity information.

Action: Replace the failed physical disk. You can identify which disk has failed by locating the disk that has a red “X” for its status. Rebuild the physical disk. When finished, restart the check consistency operation.

Cause: A physical disk included in the virtual disk failed.

Action: Replace the failed physical disk. You can identify which physical disk has failed by locating the disk that has a red "X" for its status. Rebuild the physical disk. When finished, restart the virtual disk format operation.

Cause: A physical disk included in the virtual disk has failed or a user has cancelled the initialization.

Action: If a physical disk has failed, then replace the physical disk.

Cause: The physical disk has failed or is corrupt.

Action: Replace the failed or corrupt disk. You can identify a disk that has failed by locating the disk that has a red “X” for its status. Restart the initialization.

Cause: A physical disk included in the virtual disk has failed or is corrupt. A user may also have cancelled the reconfiguration.

Action: Replace the failed or corrupt disk. You can identify a disk that has failed by locating the disk that has a red “X” for its status.

If the physical disk is part of a redundant array, then rebuild the physical disk. When finished, restart the reconfiguration.

None 1204

None 904

None 1204

SNMP Tra p Numbers

68 Storage Management Message Reference

Page 69

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2082 Virtual disk rebuild

failed

Critical / Failure / Error

Cause: A physical disk included in the virtual disk has failed or is corrupt. A user may also have cancelled the rebuild.

None 1204

Action: Replace the failed or corrupt disk. You can identify a disk that has failed by locating the disk that has a red “X” for its status. Restart the virtual disk rebuild.

2083 Physical disk rebuild

failed

Critical / Failure / Error

Cause: A physical disk included in the virtual disk has failed or is corrupt. A user may also have cancelled the rebuild.

None 904

Action: Replace the failed or corrupt disk. You can identify a disk that has failed by locating the disk that has a red “X” for its status. Rebuild the virtual disk rebuild.

2085 Virtual disk check

consistency completed

2086 Virtual disk format

completed

2088 Virtual disk

initialization completed

2089 Physical disk

initialize completed

2090 Virtual disk

reconfiguration completed

2091 Virtual disk rebuild

completed

2092 Physical disk rebuild

completed

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Clear event

SNMP Tra p Numbers

1201

901

1201

901

Storage Management Message Reference 69

Page 70

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2094 Predictive Failure

reported.

2095 SCSI sense data. Ok / Normal Cause:

Warning / Non-critical

Cause: The physical disk is predicted to fail. Many physical disks contain Self Monitoring Analysis and Reporting Technology (SMART). When enabled, SMART monitors the health of the disk based on indications such as the number of write operations that have been performed on the disk.

Action: Replace the physical disk. Even though the disk may not have failed yet, it is strongly recommended that you replace the disk.

If this disk is part of a redundant virtual disk, perform the Offline task on the disk; replace the disk; and then assign a hot spare and the rebuild will start automatically.

If this disk is a hot spare, then unassign the hot spare; perform the Prepare to Remove task on the disk; replace the disk; and assign the new disk as a hot spare.

NOTICE: If this disk is part of a

nonredundant disk, back up your data immediately. If the disk fails, you will not be able to recover the data.

A physical disk has experienced a

None 903

None 901

temporary error.

Action: None.

2098 Global hot spare

assigned

2099 Global hot spare

unassigned

Ok / Normal Cause: A user has assigned a physical disk as a

global hot spare. This alert is for informational purposes.

Action: None

Ok / Normal Cause: A user has unassigned a physical disk

as a global hot spare. This alert is for informational purposes.

Action: None

None 901

SNMP Tra p Numbers

70 Storage Management Message Reference

Page 71

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2100 Temperature

exceeded the maximum warning threshold

2101 Temperature

dropped below the minimum warning threshold

2102 Temperature

exceeded the maximum failure threshold

2103 Temperature

dropped below the minimum failure threshold

Warning / Non-critical

Critical / Failure / Error

Cause: The physical disk enclosure is too hot. A variety of factors can cause the excessive temperature. For example, a fan may have failed, the thermostat may be set too high, or the room temperature may be too hot.

Action: Check for factors that may cause overheating. For example, verify that the enclosure fan is working. You should also check the thermostat settings and examine whether the enclosure is located near a heat source. Make sure the enclosure has enough ventilation and that the room temperature is not too hot. See the physical disk enclosure documentation for more diagnostic information.

Cause: The physical disk enclosure is too cool.

Action: Check if the thermostat setting is too low and if the room temperature is too cool.

Cause: The physical disk enclosure is too cool.

Action: Check if the thermostat setting is too low and if the room temperature is too cool.

2353 1053

None 1054

SNMP Tra p Numbers

Storage Management Message Reference 71

Page 72

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2104 Controller battery is

reconditioning

2105 Controller battery

recondition is completed

2106 Smart FPT exceeded Warning /

2107 Smart configuration

change

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Cause: A disk on the specified controller has

Non-critical

Critical / Failure / Error

received a SMART alert (predictive failure) indicating that the disk is likely to fail in the near future.

Action: Replace the disk that has received the SMART alert. If the physical disk is a member of a non-redundant virtual disk, then back up the data before replacing the disk.

NOTICE: Removing a physical disk that is

included in a non-redundant virtual disk will cause the virtual disk to fail and may cause data loss.

Cause: A disk has received a SMART alert (predictive failure) after a configuration change. The disk is likely to fail in the near future.

Action: Replace the disk that has received the SMART alert. If the physical disk is a member of a non-redundant virtual disk, then back up the data before replacing the disk.

2105 1151

Clear event

None 903

None 904

SNMP Tra p Numbers

1151

72 Storage Management Message Reference

NOTICE: Removing a physical disk that is

included in a non-redundant virtual disk will cause the virtual disk to fail and may cause data loss.

Page 73

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2108 Smart warning Warning /

Non-critical

Cause: A disk has received a SMART alert (predictive failure). The disk is likely to fail in the near future.

Action: Replace the disk that has received the SMART alert. If the physical disk is a member of a non-redundant virtual disk, then back up the data before replacing the disk.

NOTICE: Removing a physical disk that is

included in a non-redundant virtual disk will cause the virtual disk to fail and may cause data loss.

None 903

SNMP Tra p Numbers

Storage Management Message Reference 73

Page 74

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2109 SMART warning

temperature

Warning / Non-critical

Cause: A disk has reached an unacceptable temperature and received a SMART alert (predictive failure). The disk is likely to fail in the near future.

Action 1: Determine why the physical disk has reached an unacceptable temperature. A variety of factors can cause the excessive temperature. For example, a fan may have failed, the thermostat may be set too high, or the room temperature may be too hot or cold. Verify that the fans in the server or enclosure are working. If the physical disk is in an enclosure, you should check the thermostat settings and examine whether the enclosure is located near a heat source. Make sure the enclosure has enough ventilation and that the room temperature is not too hot. See the physical disk enclosure documentation for more diagnostic information.

Action 2: If you cannot identify why the disk has reached an unacceptable temperature, then replace the disk. If the physical disk is a member of a non-redundant virtual disk, then back up the data before replacing the disk.

None 903

SNMP Tra p Numbers

74 Storage Management Message Reference

NOTICE: Removing a physical disk that is

included in a non-redundant virtual disk will cause the virtual disk to fail and may cause data loss.

Page 75

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2110 SMART warning

degraded

2111 Failure prediction

threshold exceeded due to test - No action needed

2112 Enclosure was shut

down

2114 A consistency check

on a virtual disk has been paused (suspended)

Warning / Non-critical

Critical / Failure / Error

Ok / Normal Cause: The check consistency operation on a

Cause: A disk is degraded and has received a SMART alert (predictive failure). The disk is likely to fail in the near future.

Action: Replace the disk that has received the SMART alert. If the physical disk is a member of a non-redundant virtual disk, then back up the data before replacing the disk.

NOTICE: Removing a physical disk that is

included in a non-redundant virtual disk will cause the virtual disk to fail and may cause data loss.

Cause: A disk has received a SMART alert (predictive failure) due to test conditions.

Action: None

Cause: The physical disk enclosure is either hotter or cooler than the maximum or minimum allowable temperature range.

Action: Check for factors that may cause overheating or excessive cooling. For example, verify that the enclosure fan is working. You should also check the thermostat settings and examine whether the enclosure is located near a heat source. Make sure the enclosure has enough ventilation and that the room temperature is not too hot or too cold. See the enclosure documentation for more diagnostic information.

virtual disk was paused by a user.

Action: To resume the check consistency operation, right-click the virtual disk in the tree view and select Resume Check Consistency.

None 903

None 854

2115 1201

SNMP Tra p Numbers

Storage Management Message Reference 75

Page 76

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2115 A consistency check

on a virtual disk has been resumed

2116 A virtual disk and its

mirror have been split

2117 A mirrored virtual

disk has been unmirrored

2118 Change write policy Ok / Normal Cause: This alert is for informational purposes.

2120 Enclosure firmware

mismatch

Ok / Normal Cause: This alert is for informational purposes.

The check consistency operation on a virtual disk has resumed processing after being paused by a user.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

A user has caused a mirrored virtual disk to be split. When a virtual disk is mirrored, its data is copied to another virtual disk in order to maintain redundancy. After being split, both virtual disks retain a copy of the data, although because the mirror is no longer intact, updates to the data are no longer copied to the mirror.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

A user has caused a mirrored virtual disk to be unmirrored. When a virtual disk is mirrored, its data is copied to another virtual disk in order to maintain redundancy. After being unmirrored, the disk formerly used as the mirror returns to being a physical disk and becomes available for inclusion in another virtual disk.

Action: None

A user has changed the write policy for a virtual disk.

Action: None

Warning / Non-critical

Cause: The firmware on the EMM is not the same version. It is required that both modules have the same version of the firmware. This alert may be caused when a user attempts to insert an EMM module that has a different firmware version than an existing module.

Action: Download the same version of the firmware to both EMM modules.

Clear event

None 1201

None 853

SNMP Tra p Numbers

1201

76 Storage Management Message Reference

Page 77

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2121 Device returned to

normal

2122 Redundancy

degraded

Ok / Normal Cause: This alert is for informational purposes.

A device that was previously in an error state has returned to a normal state.

For example, if an enclosure became too hot and subsequently cooled down, then you may receive this alert.

Action: None

Warning / Non-critical

Cause: One or more of the enclosure components has failed.

For example, a fan or power supply may have failed. Although the enclosure is currently operational, the failure of additional components could cause the enclosure to fail.

Action: Identify and replace the failed component. To identify the failed component, select the enclosure in the tree view and click the Health subtab. Any failed component will be identified with a red "X" on the enclosure’s Health subtab. Alternatively, you can select the Storage object and click the Health subtab. The controller status displayed on the Health subtab indicates whether a controller has a failed or degraded component.

See the enclosure documentation for information on replacing enclosure components and for other diagnostic information.

Clear event

2124 1305

SNMP Tra p Numbers

752 802 852 902 952 1002 1052 1102 1152 1202

Storage Management Message Reference 77

Page 78

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2123 Redundancy lost Warning /

Non-critical

2124 Redundancy normal Ok / Normal Cause: This alert is for informational purposes.

Cause: A virtual disk or an enclosure has lost data redundancy. In the case of a virtual disk, one or more physical disks included in the virtual disk have failed. Due to the failed physical disk or disks, the virtual disk is no longer maintaining redundant (mirrored or parity) data. The failure of an additional physical disk will result in lost data. In the case of an enclosure, more than one enclosure component has failed. For example, the enclosure may have suffered the loss of all fans or all power supplies.

Action: Identify and replace the failed components. To identify the failed component, select the Storage object and click the Health subtab. The controller status displayed on the Health subtab indicates whether a controller has a failed or degraded component. Click the controller that displays a Warning or Failed status. This action displays the controller Health subtab which displays the status of the individual controller components. Continue clicking the components with a Warning or Health status until you identify the failed component.

See the online help for more information. See the enclosure documentation for information on replacing enclosure components and for other diagnostic information.

Data redundancy has been restored to a virtual disk or an enclosure that previously suffered a loss of redundancy.

Action: None

2124 1306

Clear event

SNMP Tra p Numbers

1304

78 Storage Management Message Reference

Page 79

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2126 SCSI sense sector

reassign

Warning / Non-critical

Cause: A sector of the physical disk is corrupted and data cannot be maintained on this portion of the disk. This alert is for informational purposes.

NOTICE: Any data residing on the

corrupt portion of the disk may be lost and you may need to restore your data from backup.

Action: If the physical disk is part of a nonredundant virtual disk, then back up the data and replace the physical disk.

None 903

NOTICE: Removing a physical disk that is

included in a nonredundant virtual disk will cause the virtual disk to fail and may cause data loss.

If the disk is part of a redundant virtual disk, then any data residing on the corrupt portion of the disk will be reallocated elsewhere in the virtual disk.

2127 Background

initialization (BGI) started

2128 BGI cancelled Ok / Normal Cause: BGI of a virtual disk has been

2129 BGI failed Critical /

2130 BGI completed Ok / Normal Cause: BGI of a virtual disk has completed.

Ok / Normal Cause: BGI of a virtual disk has started. This

alert is for informational purposes.

Action: None

cancelled. A user or the firmware may have stopped BGI.

Action: None

Cause: BGI of a virtual disk has failed. Failure / Error

Action: None

This alert is for informational purposes.

Action: None

2130 1201

None 1201

None 1204

Clear event

SNMP Tra p Numbers

1201

Storage Management Message Reference 79

Page 80

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2131 Firmware version

mismatch

2132 Driver version

mismatch

2135 Array Manager is

installed on the system

2136 Virtual disk

initialization

Warning / Non-critical

Ok / Normal Cause: This alert is for informational purposes.

Cause: The firmware on the controller is not

a supported version.

Action: Install a supported version of the

firmware. If you do not have a supported

version of the firmware available, it can be

downloaded from the Dell support site at

support.dell.com. If you do not have a

supported version of the firmware available,

check with your support provider for

information on how to obtain the most

current firmware.

Cause: The controller driver is not a

supported version.

Action: Install a supported version of the driver.

If you do not have a supported driver version

available, it can be downloaded from the

Dell support site at support.dell.com. If you

do not have a supported version of the driver

available, check with your support provider

for information on how to obtain the most

current driver.

Cause: Storage Management has been installed

on a system that has an Array Manager

installation.

Action: Installing Storage Management and

Array Manager on the same system is not a

supported configuration. Uninstall either

Storage Management or Array Manager.

Virtual disk initialization is in progress.

Action: None

None 753

None 103

2088 1201

SNMP Tra p Numbers

80 Storage Management Message Reference

Page 81

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2137 Communication

timeout

2138 Enclosure alarm

enabled

2139 Enclosure alarm

disabled

2140 Dead disk segments

restored

Warning / Non-critical

Ok / Normal Cause: This alert is for informational purposes.

Ok / Normal Cause: A user has disabled the enclosure alarm.

Ok / Normal Cause: This alert is for informational purposes.

Cause: The controller is unable to communicate

with an enclosure. There are several reasons

why communication may be lost. For example,

there may be a bad or loose cable. An

unusual amount of I/O may also interrupt

communication with the enclosure. In

addition, communication loss may be caused

by software, hardware, or firmware problems,

bad or failed power supplies, and enclosure

shutdown.

When viewed in the Alert Log, the description

for this event displays several variables. These

variables are: Controller and enclosure names,

type of communication problem, return code,

and SCSI status.

Action: Check for problems with the cables.

See the online help for more information on

checking the cables. You should also check to

see if the enclosure has degraded or failed

components. To do so, select the enclosure

object in the tree view and click the Health

subtab. The Health subtab displays the status

of the enclosure components. Verify that the

controller has supported driver and firmware

versions installed and that the EMMs are

each running the same version of supported

firmware.

A user has enabled the enclosure alarm.

Action: None

Disk space that was formerly “dead” or

inaccessible to a redundant virtual disk has

been restored.

Action: None

2162 853

None 851

None 1201

SNMP Tra p Numbers

Storage Management Message Reference 81

Page 82

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2141 Physical disk dead

segments recovered

2142 Controller rebuild

rate has changed

2143 Controller alarm

enabled

2144 Controller alarm

disabled

2145 Controller battery

low

2146 Bad block

replacement error

2147 Bad block sense

error

Ok / Normal Cause: This alert is for informational purposes.

Portions of the physical disk were formerly

inaccessible. The disk space from these dead

segments has been recovered and is now

usable. Any data residing on these dead

segments has been lost.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

A user has changed the controller rebuild

rate.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

A user has enabled the controller alarm.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

A user has disabled the controller alarm.

Action: None

Warning / Non-critical

Cause: The controller battery charge is low.

Action: Recondition the battery. See the

online help for more information

Cause: A portion of a physical disk is

damaged.

Action: See the Dell OpenManage Server

Administrator Storage Management

online help or the Dell OpenManage Server

Administrator Storage Management

User's Guide for more information.

Cause: A portion of a physical disk is

damaged.

Action: See the Dell OpenManage Server

Administrator Storage Management online

help for more information.

None 901

None 751

None 1153

None 753

SNMP Tra p Numbers

82 Storage Management Message Reference

Page 83

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2148 Bad block medium

error

2149 Bad block extended

sense error

2150 Bad block extended

medium error

2151 Asset tag changed Ok / Normal Cause: This alert is for informational purposes.

2152 Asset name changed Ok / Normal Cause: This alert is for informational purposes.

2153 Service tag changed Ok / Normal Cause: An enclosure service tag was changed.

2154 Maximum

temperature probe warning threshold value changed

Warning / Non-critical

Ok / Normal Cause: This alert is for informational purposes.

Cause: A portion of a physical disk is

damaged.

Action: See the Dell OpenManage Server

Administrator Storage Management online

help for more information.

Cause: A portion of a physical disk is

damaged.

Action: See the Dell OpenManage Server

Administrator Storage Management online

help for more information.

Cause: A portion of a physical disk is

damaged.

Action: See the Dell OpenManage Server

Administrator Storage Management online

help for more information.

A user has changed the enclosure asset tag.

Action: None

A user has changed the enclosure asset name.

Action: None

In most circumstances, this service tag

should only be changed by Dell™ support or

your service provider.

Action: Ensure that the tag was changed

under authorized circumstances.

A user has changed the value for the

maximum temperature probe warning

threshold.

Action: None

None 753

None 851

None 1051

SNMP Tra p Numbers

Storage Management Message Reference 83

Page 84

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2155 Minimum

temperature probe warning threshold value changed

2156 Controller alarm has

been tested

2157 Controller

configuration has been reset

2158 Physical disk online Ok / Normal Cause: This alert is for informational purposes.

2159 Virtual disk renamed Ok / Normal Cause: This alert is for informational purposes.

2162 Communication

regained

2163 Rebuild completed

with errors

Ok / Normal Cause: This alert is for informational purposes.

A user has changed the value for the

minimum temperature probe warning

threshold.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

The controller alarm test has run successfully.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

A user has reset the controller configuration.

See the online help for more information.

Action: None

An offline physical disk has been made

online.

Action: None

A user has renamed a virtual disk.

When renaming a virtual disk on a PERC

3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC,

4e/DC, 4/Di, CERC ATA100/4ch, PERC 5/E,

PERC 5/i or SAS 5/iR controller, this alert

displays the new virtual disk name.

On the PERC 3/SC, 3/DCL, 3/DC, 3/QC,

4/SC, 4/DC, 4e/DC, 4/Di, 4/IM, 4e/Si, 4e/Di,

and CERC ATA 100/4ch controllers, this alert

displays the original virtual disk name.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Communication with an enclosure has been

restored.

Action: None

Critical / Failure / Error

Cause: This alert is documented in the Storage

Management online help.

Action: See the online help for more

information.

None 1051

None 751

Clear event

None 1201

Clear event

None 904

SNMP Tra p Numbers

901

851

84 Storage Management Message Reference

Page 85

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2164 See the Readme file

for a list of validated controller driver versions

2165 The RAID controller

firmware and driver validation was not performed. The configuration file cannot be opened.

2166 The RAID controller

firmware and driver validation was not performed. The configuration file is out of date or corrupted.

Ok / Normal Cause: This alert is for informational purposes.

Storage Management is unable to determine

whether the system has the minimum

required versions of the RAID controller

drivers.

Action: See the Readme file for driver and

firmware requirements. In particular, if

Storage Management experiences

performance problems, you should verify that

you have the minimum supported versions of

the drivers and firmware installed.

Warning / Non-critical

Cause: Storage Management is unable to

determine whether the system has the

minimum required versions of the RAID

controller firmware and drivers. This

situation may occur for a variety of reasons.

For example, the installation directory path

to the configuration file may not be correct.

The configuration file may also have been

removed or renamed.

Action: Reinstall Storage Management

Cause: Storage Management is unable to

determine whether the system has the

minimum required versions of the

RAID controller firmware and drivers. This

situation has occurred because a configuration

file is unreadable or missing data. The

configuration file may be corrupted.

Action: Reinstall Storage Management.

None 101

None 753

SNMP Tra p Numbers

Storage Management Message Reference 85

Page 86

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2167 The current kernel

version and the nonRAID SCSI driver version are older than the minimum required levels. See readme.txt for a list of validated kernel and driver versions.

2168 The non-RAID SCSI

driver version is older than the minimum required level. See readme.txt for the validated driver version.

2169 The controller

battery needs to be replaced.

2170 The controller

battery charge level is normal.

Warning / Non-critical

Critical / Failure / Error

Ok / Normal Cause: This alert is for informational purposes.

Cause: The version of the kernel and the

driver do not meet the minimum requirements.

Storage Management may not be able to

display the storage or perform storage

management functions until you have

updated the system to meet the minimum

requirements.

Action: See the Readme file for a list of

validated kernel and driver versions. Update

the system to meet the minimum

requirements and then reinstall Storage

Management.

Cause: The version of the driver does not

meet the minimum requirements. Storage

Management may not be able to display the

storage or perform storage management

functions until you have updated the system

to meet the minimum requirements.

Action: See the Readme file for the validated

driver version. Update the system to meet the

minimum requirements and then reinstall

Storage Management.

Cause: The controller battery cannot recharge.

The battery may be old or it may have been

already recharged the maximum number of

times. In addition, the battery charger may

not be working.

Action: Replace the battery pack.

Action: None

None 103

None 1154

None 1151

SNMP Tra p Numbers

86 Storage Management Message Reference

Page 87

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2171 The controller

battery temperature is above normal.

Warning / Non-critical

Cause: The battery may be recharging, the

room temperature may be too hot, or the fan

in the system may be degraded or failed.

2172 1153

Action: If this alert was generated due to a

battery recharge, the situation will correct

when the recharge is complete. You should

also check if the room temperature is normal

and that the system components are

functioning properly.

2172 The controller

battery temperature is normal.

2173 Unsupported

configuration detected. The SCSI rate of the enclosure management modules (EMMs) is not the same. EMM0 %1 EMM1 %2

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Warning / Non-critical

Cause: The EMMs in the enclosure have a

different SCSI rate. This is an unsupported

configuration. All EMMs in the enclosure

should have the same SCSI rate. The %

(percent sign) indicates a substitution

variable. The text for this substitution

variable is displayed with the alert in the Alert

Log and can vary depending on the situation.

Action: The EMMs in the enclosure have a

Clear event

None 853

different SCSI rate. This is an unsupported

configuration. All EMMs in the enclosure

should have the same SCSI rate.

2174 The controller

battery has been removed.

Warning / Non-critical

Cause: The controller cannot communicate

with the battery, the battery may be removed,

or the contact point between the controller

None 1153

and the battery may be burnt or corroded.

Action: Replace the battery if it has been

removed. If the contact point between the

battery and the controller is burnt or corroded,

you will need to replace either the battery or

the controller, or both. See the hardware

documentation for information on how to

safely access, remove, and replace the battery.

2175 The controller

battery has been replaced.

Ok / Normal Cause: This alert is for informational purposes.

Action: None

None 1151

SNMP Tra p Numbers

1151

Storage Management Message Reference 87

Page 88

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2176 The controller

battery Learn cycle has started.

2177 The controller

battery Learn cycle has completed.

2178 The controller

battery Learn cycle has timed out.

2179 The controller

battery Learn cycle has been postponed.

2180 The controller

battery Learn cycle will start in %1 days.

2181 The controller

battery Learn cycle will start in %1 hours.

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Warning / Non-critical

Ok / Normal Cause: This alert is for informational purposes.

Ok / Normal Cause: This alert is for informational

Cause: The controller battery must be fully

charged before the Learn cycle can begin.

The battery may be unable to maintain a full

charge causing the Learn cycle to timeout.

Additionally, the battery must be able to

maintain cached data for a specified period of

time in the event of a power loss. For example,

some batteries maintain cached data for

24 hours. If the battery is unable to maintain

cached data for the required period of time,

then the Learn cycle will timeout.

Action: Replace the battery pack as the

battery is unable to maintain a full charge.

Action: None

purposes. The %1 indicates a substitution

variable. The text for this substitution

variable is displayed with the alert in the Alert

Log and can vary depending on the situation.

Action: None

purposes. The %1 indicates a substitution

variable. The text for this substitution

variable is displayed with the alert in the Alert

Log and can vary depending on the situation.

Action: None

2177 1151

Clear event

None 1153

None 1151

SNMP Tra p Numbers

1151

88 Storage Management Message Reference

Page 89

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2182 An invalid SAS

configuration has been detected.

2186 The controller cache

has been discarded.

2187 Single-bit ECC error

limit exceeded.

2188 The controller write

policy has been changed to Write Through.

2189 The controller write

policy has been changed to Write Back.

Critical / Failure / Error

Warning / Non-critical

Ok / Normal Cause: The controller battery is unable to

Ok / Normal Cause: This alert is for informational purposes.

Cause: The controller and attached

enclosures are not cabled correctly.

Action: See the hardware documentation for

information on correct cabling

configurations.

Cause: The controller has flushed the cache

and any data in the cache has been lost. This

may happen if the system has memory or

battery problems that cause the controller to

distrust the cache. Although user data may have

been lost, this alert does not always indicate

that relevant or user data has been lost.

Action: Verify that the battery and memory

are functioning properly.

Cause: The system memory is

malfunctioning.

Action: Replace the battery pack.

maintain cached data for the required period

of time. For example, if the required period of

time is 24 hours, the battery is unable to

maintain cached data for 24 hours. It is

normal to receive this alert during the battery

Learn cycle as the Learn cycle discharges the

battery before recharging it. When

discharged, the battery cannot maintain

cached data.

Action: Check the health of the battery. If the

battery is weak, replace the battery pack.

Action: None

None 754

None 753

None 1151

SNMP Tra p Numbers

Storage Management Message Reference 89

Page 90

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2191 Multiple enclosures

are attached to the controller. This is an unsupported configuration.

2192 The virtual disk

Check Consistency has made corrections and completed.

2193 The virtual disk

reconfiguration has resumed.

2194 The virtual disk

Read policy has changed.

2195 Dedicated hot spare

assigned. Physical disk %1

2196 Dedicated hot spare

unassigned. Physical disk %1

2199 The virtual disk

cache policy has changed.

Critical / Failure / Error

Ok / Normal Cause: This alert is for informational purposes.

Cause: Many enclosures are attached to the

controller port. When the enclosure limit is

exceeded, the controller loses contact with all

enclosures attached to the port.

Action: Remove the last enclosure. You must

remove the enclosure that has been added last

and is causing the enclosure limit to exceed.

The virtual disk Check Consistency has

identified errors and made corrections. For

example, the Check Consistency may have

encountered a bad disk block and remapped

the disk block to restore data consistency.

Action: This alert is for informational

purposes only and no additional action is

required. As a precaution, monitor the Alert

Log for other errors related to this virtual

disk. If problems persist, contact Dell

Technical Support.

Action: None

Action: None.

Action: None

None 854

None 1203

None 1201

2196 1201

Clear event

None 1201

SNMP Tra p Numbers

1201

90 Storage Management Message Reference

Page 91

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2201 A global hot spare

failed.

2202 A global hot spare

has been removed.

2203 A dedicated hot

spare failed.

2204 A dedicated hot

spare has been removed.

Warning / Non-critical

Ok / Normal Cause: The controller is unable to

Warning / Non-critical

Ok / Normal Cause: The controller is unable to communicate

Cause: The controller is not able to

communicate with a disk that is assigned as a

dedicated hot spare. The disk may have been

removed. There may also be a bad or loose

cable.

Action: Check if the disk is healthy and that

it has not been removed. Check the cables. If

necessary, replace the disk and reassign the

hot spare.

communicate with a disk that is assigned as a

global hot spare. The disk may have been

removed. There may also be a bad or loose

cable.

Action: Check if the disk is healthy and that

it has not been removed. Check the cables. If

necessary, replace the disk and reassign the

hot spare.

Cause: The controller is unable to

communicate with a disk that is assigned as a

dedicated hot spare. The disk may have failed

or been removed. There may also be a bad or

loose cable.

Action: Check if the disk is healthy and that

it has not been removed. Check the cables. If

necessary, replace the disk and reassign the

hot spare.

with a disk that is assigned as a dedicated hot

spare. The disk may have been removed.

There may also be a bad or loose cable.

Action: Check if the disk is healthy and that

it has not been removed. Check the cables. If

necessary, replace the disk and reassign the

hot spare.

None 903

None 901

None 903

None 901

SNMP Tra p Numbers

Storage Management Message Reference 91

Page 92

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2205 A dedicated hot

spare has been automatically unassigned.

2206 The only hot spare

available is a SATA dis k. SATA disks cannot replace SAS disks.

2207 The only hot spare

available is a SAS disk. SAS disks cannot replace SATA disks.

2211 The physical disk is

not supported.

2212 The controller

battery temperature is above normal.

2213 Recharge count

maximum exceeded

Ok / Normal Cause: The hot spare is no longer required

because the virtual disk it was assigned to has

been deleted.

Action: None.

Warning / Non-critical

OK/Normal Cause: This alert is for informational purposes.

Warning / Non-critical

Cause: The only physical disk available to

be assigned as a hot spare is using

SATA technology. The physical disks in the

virtual disk are using SAS technology.

Because of this difference in technology, the

hot spare cannot rebuild data if one of the

physical disks in the virtual disk fails.

Action: Add a SAS disk that is large enough

to be used as the hot spare and assign the new

disk as a hot spare.

Cause: The only physical disk available to be

assigned as a hot spare is using SAS technology.

The physical disks in the virtual disk are using

SATA technology. Because of this difference

in technology, the hot spare cannot rebuild

data if one of the physical disks in the virtual

disk fails.

Action: Add a SATA disk that is large enough

to be used as the hot spare and assign the new

disk as a hot spare.

Cause: The physical disk may not have a

supported version of the firmware or the disk

may not be supported by Dell.

Action: If the disk is supported by Dell,

update the firmware to a supported version.

If the disk is not supported by Dell, replace

the disk with one that is supported.

Action: None

Cause: The battery has been recharged more

times than the battery recharge limit allows.

Action: Replace the battery pack.

None 901

None 903

None 1151

None 1153

SNMP Tra p Numbers

92 Storage Management Message Reference

Page 93

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2214 Battery charge in

progress

2215 Battery charge

process interrupted

2232 The controller alarm

is silenced.

2233 The background

initialization (BGI) rate has changed.

2234 The Patrol Read rate

has changed.

2235 The Check

Consistency rate has changed.

2237 A controller rescan

has been initiated.

2238 The controller debug

log file has been exported.

2239 A foreign

configuration has been cleared.

2240 A foreign

configuration has been imported.

2241 The Patrol Read

mode has changed.

2242 The Patrol Read has

started.

2243 The Patrol Read has

stopped.

2244 A virtual disk blink

has been initiated.

OK/Normal Cause: This alert is for informational purposes.

Action: None.

OK/Normal Cause: This alert is for informational purposes.

Action: None.

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

None 1151

None 751

2243 751

Clear event

None 1201

SNMP Tra p Numbers

751

Storage Management Message Reference 93

Page 94

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2245 A virtual disk blink

has ceased.

2246 The controller

battery is degraded.

2247 The controller

battery is charging.

2248 The controller

battery is executing a Learn cycle.

2249 The physical disk

Clear operation has started.

2251 The physical disk

blink has initiated.

2252 The physical disk

blink has ceased.

2254 The Clear operation

has cancelled.

2255 The physical disk has

been started.

2259 An enclosure blink

operation has initiated.

2260 An enclosure blink

has ceased

2261 A global rescan has

initiated.

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Warning / Non-critical

Ok / Normal Cause: This alert is for informational purposes.

OK/Normal Cause: This alert is for informational purposes.

Ok / Normal Cause: This alert is for informational purposes.

Cause: The controller battery charge is weak.

Action: As the charge weakens, the charger

should automatically recharge the battery.

If the battery has reached its recharge limit,

replace the battery pack. Monitor the battery

to make sure that it recharges successfully.

If the battery does not recharge, replace the

battery pack.

Action: None

Action: None.

Action: None

None 1201

None 1153

2358 1151

None 1151

None 901

2260 851

None 851

None 101

SNMP Tra p Numbers

94 Storage Management Message Reference

Page 95

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2262 SMART thermal

shutdown is enabled.

2263 SMART thermal

shutdown is disabled.

2264 A device is missing. Warning /

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Ok / Normal Cause: This alert is for informational purposes.

Action: None

Cause: The controller cannot communicate Non-critical

with a device. The device may be removed.

None 101

None 753

There may also be a bad or loose cable.

Action: Check if the device is in and not

removed. If it is in, check the cables. You

should also check the connection to the

controller battery and the battery health.

A battery with a weak or depleted charge

may cause this alert.

2265 A device is in an

unknown state.

Warning / Non-critical

Cause: The controller cannot communicate

with a device. The state of the device cannot

None 753

be determined. There may be a bad or loose

cable. The system may also be experiencing

problems with the application programming

interface (API). There could also be a

problem with the driver or firmware.

Action: Check the cables. Check if the

controller has a supported version of the

driver and firmware. You can download the

most current version of the driver and

firmware from support.dell.com. Rebooting

the system may also resolve this problem.

2266 Controller log file

entry: %1

Ok / Normal Cause: This alert is for informational

purposes. The %1 indicates a substitution

None 751, 801,

variable. The text for this substitution

variable is generated by the controller and is

displayed with the alert in the Alert Log. This

text can vary depending on the situation.

Action: None

2267 The controller

reconstruct rate has changed.

Ok / Normal Cause: This alert is for informational purposes.

Action: None

None 751

SNMP Tra p Numbers

803 853 903 953 1003 1053 1103 1153 1203

851, 901, 951, 1001, 1051, 1101, 1151, 1201

Storage Management Message Reference 95

Page 96

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2268 %1, Storage

Management has lost communication with the controller. An immediate reboot is strongly recommended to avoid further problems. If the reboot does not restore communication, then contact technical support for more information.

2269 The physical disk

Clear operation has completed.

2270 The physical disk

Clear operation failed.

2271 The Patrol Read

corrected a media error.

Critical / Failure / Error

Ok / Normal Cause: This alert is for informational purposes.

Critical / Failure / Error

Ok / Normal Cause: This alert is for informational purposes.

Cause: Storage Management has lost

communication with a controller. This may

occur if the controller driver or firmware is

experiencing a problem. The %1 indicates a

substitution variable. The text for this

substitution variable is displayed with the

alert in the Alert Log and can vary depending

on the situation.

Action: Reboot the system. If the problem is

not resolved, contact technical support. See

your system documentation for information

about contacting technical support by using

telephone, fax, and Internet services.

Action: None

Cause: A Clear task was being performed on a

physical disk but the task was interrupted and

did not complete successfully. The controller

may have lost communication with the disk.

The disk may have been removed or the

cables may be loose or defective.

Action: Verify that the disk is present and not

in a Failed state. Make sure the cables are

attached securely. See the online help for

more information on checking the cables.

Restart the Clear task.

Action: None

None 104

None 901

None 904

None 901

SNMP Tra p Numbers

96 Storage Management Message Reference

Page 97

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2272 Patrol Read found an

uncorrectable media error.

2273 A block on the

physical disk has been punctured by the controller.

2274 The physical disk

rebuild has resumed.

2276 The dedicated hot

spare is too small.

2277 The global hot spare

is too small.

Critical / Failure / Error

Ok / Normal Cause: This alert is for informational purposes.

Warning / Non-critical

Cause: The Patrol Read task has encountered

an error that cannot be corrected. There may

be a bad disk block that cannot be remapped.

Action: Back up your data. If you are able to

back up the data successfully, then fully

initialize the disk and then restore from

back up.

Cause: The controller encountered an

unrecoverable medium error when

attempting to read a block on the physical

disk and marked that block as invalid. If the

controller encountered the unrecoverable

medium error on a source physical disk

during a rebuild or reconfigure operation, it

will also puncture the corresponding block on

the target physical disk. The invalid block will

be cleared on a write operation.

Action: Back up your data. If you are able to

back up the data successfully, then fully

initialize the disk and then restore from

back up.

Action: None

Cause: The dedicated hot spare is not large

enough to protect all virtual disks that reside

on the disk group.

Action: Assign a larger disk as the dedicated

hot spare.

Cause: The global hot spare is not large enough

to protect all virtual disks that reside on the

controller.

Action: Assign a larger disk as the global

hot spare.

None 904

None 901

None 903

SNMP Tra p Numbers

Storage Management Message Reference 97

Page 98

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2278 The controller

battery charge level is below a normal threshold.

2279 The controller

battery charge level is operating within normal limits.

2280 A disk media error

has been corrected.

2281 Virtual disk has

inconsistent data.

Ok / Normal Cause: The battery is discharging. A battery

discharge is a normal activity during the

battery Learn cycle. Before completing, the

battery Learn cycle recharges the battery. You

should receive alert 2179 when the recharge

occurs.

Action: Check if the battery Learn cycle is in

progress. Alert 2176 indicates that the battery

Learn cycle has initiated. The battery also

displays the Learn state while the Learn cycle

is in progress. If a Learn cycle is not in

progress, replace the battery pack.

Ok / Normal Cause: This alert is provided for

informational purposes. This alert indicates

that the battery is recharging during the

battery Learn cycle.

Action: None

Ok / Normal Cause: A disk media error was detected

while the controller was completing a

background task. A bad disk block was

identified. The disk block has been

remapped.

Action: Consider replacing the disk. If you

receive this alert frequently, be sure to replace

the disk. You should also routinely back up

your data.

Ok / Normal Cause: This alert is for informational purposes.

Action: None

None 1154

None 1151

None 1201

SNMP Tra p Numbers

98 Storage Management Message Reference

Page 99

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2282 Hot spare SMART

polling failed.

2283 A redundant path is

broken.

2284 A redundant path

has been restored.

2285 A disk media error

was corrected during recovery.

2286 A Learn cycle start is

pending while the battery charges.

2287 The Patrol Read is

paused.

2288 The patrol read has

resumed.

Critical / Failure / Error

Warning / Non-critical

Ok / Normal Cause: This alert is for informational purposes.

Cause: The controller firmware attempted a

SMART polling on the hot spare but was

unable to complete it. The controller has lost

communication with the hot spare.

Action: Check the health of the disk assigned

as a hot spare. You may need to replace the

disk and reassign the hot spare. Make sure the

cables are attached securely. See the Cables

Attached Correctly section in the

Dell OpenManage Server Administrator

Storage Management User’s Guide for more

information on checking the cables.

Cause: The controller has two connectors

that are connected to the same enclosure.

The communication path on one connector

has lost connection with the enclosure. The

communication path on the other connector

is reporting this loss.

Action: Make sure the cables are attached

securely. Make sure both EMMs are healthy.

Action: None

None 904

2284 903

Clear event

None 901

None 1151

2288 751

Clear event

SNMP Tra p Numbers

901

751

Storage Management Message Reference 99

Page 100

Table 4-4. Storage Management Messages (continued)

Event IDDescription Severity Cause and Action Clear

Event Number

2289 Multi-bit ECC error. Critical /

Failure / Error

2290 Single-bit ECC error. Warning /

Non-critical

2291 An EMM has been

discovered.

2292 Communication

with the enclosure has been lost.

Ok / Normal Cause: This alert is for informational purposes.

Critical / Failure / Error

Cause: An error involving multiple bits has

been encountered during a read or write

operation. The error correction algorithm

recalculates parity data during read and write

operations. If an error involves only a single

bit, it may be possible for the error correction

algorithm to correct the error and maintain

parity data. An error involving multiple bits,

however, usually indicates data loss. In some

cases, if the multi-bit error occurs during a

read operation, the data on the disk may be

correct/valid. If the multi-bit error occurs

during a write operation, data loss has

occurred.

Action: Replace the dual in-line memory

module (DIMM). The DIMM is a part of the

controller battery pack. See your hardware

documentation for information on replacing

the DIMM. You may need to restore data

from backup.

Cause: An error involving a single bit has

been encountered during a read or write

operation. The error correction algorithm has

corrected this error.

Action: None

Cause: The controller has lost communication

with an EMM. The cables may be loose or

defective.

Action: Make sure the cables are attached

securely. Reboot the system.

None 754

None 753

None 851

2162 854

SNMP Tra p Numbers

100 Storage Management Message Reference