Reproduction in any manner whatsoever without the written permission of Dell Inc. is strictly forbidden.
Trademarks used in this text: Dell, the DELL logo and Dell OpenManage are trademarks of Dell Inc.; Microsoft and Windows are registered
trademarks and Windows Server is a trademark of Microsoft Corporation; Red Hat is a registered trademark of Red
registered trademark of Novell, Inc. in the United States and other countries.
Other trademarks and trade names may be used in this document to refer to either the entities claiming the marks and names or their products.
Dell Inc. disclaims any proprietary interest in trademarks and trade names other than its own.
Dell OpenManage™ Server Administrator produces event messages stored primarily in the
operating
describes the event messages created by Server Administrator version 5.2 or later and displayed in
the Server Administrator Alert log.
Server Administrator creates events in response to sensor status changes and other monitored
parameters. The Server Administrator event monitor uses these status change events to add
descriptive messages to the operating system event log or the Server Administrator Alert log.
Each event message that Server Administrator adds to the Alert log consists of a unique identifier
called the event ID for a specific event source category and a descriptive message. The event
message includes the severity, cause of the event, and other relevant information, such as the event
location and the monitored item’s previous state.
Tables provided in this guide list all Server Administrator event IDs in numeric order. Each entry
includes the event ID’s corresponding description, severity level, and cause. Message text in angle
brackets (for example,
Server
What’s New in this Release
Modifications have been made to the Storage Management Service events. For more information,
see "
system or Server Administrator event logs and sometimes in SNMP traps. This document
<State>
Administrator.
Alert Message Change History
) describes the event-specific information provided by the
".
Messages Not Described in This Guide
This guide describes only event messages created by Server Administrator and displayed in the
Server Administrator Alert log. For information on other messages produced by your system, consult
one of the following sources:
•Your system’s
•Other system documentation
•Operating system documentation
•Application program documentation
Installation and Troubleshooting Guide
Introduction5
Understanding Event Messages
This section describes the various types of event messages generated by the Server Administrator.
When
an event occurs on your system, the Server Administrator sends information about one of the
following event types to the systems management console:
Table 1-1. Understanding Event Messages
IconAlert SeverityComponent Status
An event that describes the successful operation of a unit.
OK/Normal
Warning/Non-critical
Critical/Failure/Error
informational purposes and does not indicate an error condition. For example, the
alert may indicate the normal start or stop of an operation, such as power supply or
sensor reading returning to normal.
a
An event that is not necessarily significant, but may indicate a possible future
problem.
component (such as a temperature probe in an enclosure) has crossed a warning
threshold.
A significant event that indicates actual or imminent loss of data or loss of function.
For example,
For example, a Warning/Non-critical alert may indicate that a
crossing a failure threshold or a hardware failure such as
Server Administrator generates events based on status changes in the following sensors:
•
Temperature Sensor
— Helps protect critical components by alerting the systems management
console when temperatures become too high inside a chassis; also monitors a variety of locations in the
chassis and in any attached systems.
•
Fan Sensor
•
Voltage Sensor
— Monitors fans in various locations in the chassis and in any attached systems.
— Monitors voltages across critical components in various chassis locations and in any
attached systems.
•
Current Sensor
— Monitors the current (or amperage) output from the power supply (or supplies) in
the chassis and in any attached systems.
•
Chassis Intrusion Sensor
•
Redundancy Unit Sensor
— Monitors intrusion into the chassis and any attached systems.
— Monitors redundant units (critical units such as fans, AC power cords, or
power supplies) within the chassis; also monitors the chassis and any attached systems. For example,
redundancy allows a second or
n
th fan to keep the chassis components at a safe temperature when
another fan has failed. Redundancy is normal when the intended number of critical components are
operating. Redundancy is degraded when a component fails, but others are still operating. Redundancy
is lost when there is one less critical redundancy device than required.
•
Power Supply Sensor
Memory Prefailure Sensor
•
— Monitors power supplies in the chassis and in any attached systems.
— Monitors memory modules by counting the number of Error Correction
Code (ECC) memory corrections.
The alert is provided for
an array disk.
6Introduction
•
Fan Enclosure Sensor
insertion into the system, and by measuring how long a fan enclosure is absent from the chassis.
This sensor monitors the chassis and any attached systems.
•
AC Power Cord Sensor
Hardware Log Sensor
•
•
Processor Sensor
Pluggable Device Sensor
•
pluggable devices, such as memory cards.
•
Battery Sensor
— Monitors the status of one or more batteries in the system.
— Monitors protective fan enclosures by detecting their removal from and
— Monitors the presence of AC power for an AC power cord.
— Monitors the size of a hardware log.
— Monitors the processor status in the system.
— Monitors the addition, removal, or configuration errors for some
Sample Event Message Text
The following example shows the format of the event messages logged by Server Administrator.
EventID: 1000
Source: Server Administrator
Category: Instrumentation Service
Type: Information
Date and Time: Mon Oct 21 10:38:00 2002
Computer:
Description:
Server Administrator starting
Data: Bytes in Hex
<computer name>
Viewing Alerts and Event Messages
An event log is used to record information about important events.
Server Administrator generates alerts that are added to the operating system event log and to the
Server
Administrator Alert log. To view these alerts in Server Administrator:
1
Select the
2
Select the
3
Select the
You can also view the event log using your operating system’s event viewer. Each operating system’s event
viewer accesses the applicable operating system event log.
System
object in the tree view.
Logs
tab.
Alert
subtab.
Introduction7
The location of the event log file depends on the operating system you are using.
•In the Microsoft® Windows® 2000 Advanced Server and Windows Server™ 2003 operating systems,
messages are logged to the system event log and optionally to a unicode text file,
using Notepad), that is located in the
C:\Program Files\Dell\SysMgt
•In the Red Hat
®
Enterprise Linux and SUSE® Linux Enterprise Server operating system, messages are
.
install_path
\omsa\log
directory. The default
logged to the system log file. The default name of the system log file is
dcsys32.log
install_path
/var/log/messages
(viewable
is
. You can view
the messages file using a text editor such as vi or emacs.
NOTE: Logging messages to a unicode text file is optional. By default, the feature is disabled. To enable this
feature, modify the Event Manager section of the dcemdy32.ini file as follows:
•In Windows, locate the file at <install_path>\dataeng\ini and set
The default install_path is C:\Program Files\Dell\SysMgt. Restart the DSM SA Event Manager service.
•In Red Hat Enterprise Linux and SUSE Linux Enterprise Server, locate the file at <install_path>/dataeng/ini and
UnitextLog.enabled=True.
set
"/etc/init.d/dataeng restart" command to restart the Server Administrator event manager service. This will also
restart the Server Administrator data manager and SNMP services.
The default install_path is /opt/dell/srvadmin. Issue the
UnitextLog.enabled=True
.
The following subsections explain how to open the Windows 2000 Advanced Server, Windows Server 2003,
and the Red Hat Enterprise Linux and SUSE Linux Enterprise Server event viewers.
Viewing Events in Windows 2000 Advanced Server and Windows Server 2003
1
Click the
2
Double-click
3
In the
The
Start
Administrative Tools
Event Viewer
System Log
button, point to
window, click the
Settings
, and click
Control Panel
, and then double-click
Tree
tab and then click
Event Viewer
window displays a list of recently logged events.
.
.
System Log
.
4
To view the details of an event, double-click one of the event items.
NOTE: You can also look up the dcsys32.log file, in the install_path\omsa\log directory, to view the separate
event log file. The default install_path is C:\Program Files\Dell\SysMgt.
Viewing Events in Red Hat Enterprise Linux and SUSE Linux Enterprise Server
1
Log in as
2
Use a text editor such as vi or emacs to view the file named
The following example shows the Red Hat Enterprise Linux (and SUSE Linux Enterprise Server)
message log, /var/log/messages. The
NOTE: These messages are typically displayed as one long line. In the following example, the message is
displayed using line breaks to help you see the message text more clearly.
8Introduction
root
.
/var/log/messages
.
text in boldface type indicates the message text.
...
Feb 6 14:20:51 server01 Server Administrator: Instrumentation Service
EventID: 1000
Server Administrator starting
Feb 6 14:20:51 server01 Server Administrator: Instrumentation Service
EventID: 1001
Server Administrator startup complete
Feb 6 14:21:21 server01 Server Administrator: Instrumentation Service
EventID: 1254 Chassis intrusion detected Sensor location: Main chassis
intrusion Chassis location: Main System Chassis Previous state was: OK
(Normal) Chassis intrusion state: Open
Feb 6 14:21:51 server01 Server Administrator: Instrumentation Service
EventID: 1252 Chassis intrusion returned to normal Sensor location: Main
chassis intrusion Chassis location: Main System Chassis Previous state
was: Critical (Failed) Chassis intrusion state: Closed
Viewing the Event Information
The event log for each operating system contains some or all of the following information:
•
Date
— The date the event occurred.
•
Time
— The local time the event occurred.
•
Ty p e
— A classification of the event severity: Information, Warning, or Error.
User
•
•
•
•
•
•
— The name of the user on whose behalf the event occurred.
Computer
Source
Category
Event ID
Description
depending on the event type.
— The name of the system where the event occurred.
— The software that logged the event.
— The classification of the event by the event source.
— The number identifying the particular event type.
— A description of the event. The format and contents of the event description vary,
Introduction9
Understanding the Event Description
Ta b l e 1-2 lists in alphabetical order each line item that may appear in the event description.
Table 1-2. Event Description Reference
Description Line ItemExplanation
Action performed was:
Action requested was:
Additional Details:
details for the event>
<Additional power supply status
information>
Chassis intrusion state:
<Intrusion state>
Chassis location:
chassis>
Configuration error type:
<type of configuration error>
Current sensor value (in Amps):
<Reading>
Date and time of action:
<Date and time>
Device location: <
chassis
Discrete current state:
Discrete temperature state:
>
<State>
<Action>
<Action>
<Additional
<Name of
Location in
<State>
Specifies the action that was performed, for example:
Action performed was: Power cycle
Specifies the action that was requested, for example:
Action requested was: Reboot, shutdown OS first
Specifies additional details available for the hot plug event, for
example:
Memory device: DIMM1_A Serial number: FFFF30B1
Specifies information pertaining to the event, for example:
Power supply input AC is off, Power supply
POK (power OK) signal is not normal, Power
supply is turned off
Specifies the chassis intrusion state (open or closed), for example:
Chassis intrusion state: Open
Specifies name of the chassis that generated the message, for
example:
Chassis location: Main System Chassis
Specifies the type of configuration error that occurred, for example:
Configuration error type: Revision mismatch
Specifies the current sensor value in amps, for example:
Current sensor value (in Amps): 7.853
Specifies the date and time the action was performed, for example:
Date and time of action: Sat Jun 12 16:20:33
2004
Specifies the location of the device in the specified chassis, for
example:
Device location: Memory Card A
Specifies the state of the current sensor, for example:
Discrete current state: Good
Specifies the state of the temperature sensor, for example:
Specifies the location of the redundant power supply or cooling
unit in the chassis, for example:
Redundancy unit: Fan Enclosure
Specifies the location of the sensor in the specified chassis,
for example:
Sensor location: CPU1
Specifies the temperature in degrees Celsius, for example:
Temperature sensor value (in degrees Celsius):
30
Specifies the voltage sensor value in volts, for example:
Voltage sensor value (in Volts): 1.693
12Introduction
Event Message Reference
The following tables lists in numerical order each event ID and its corresponding description, along
with its severity and cause.
NOTE: For corrective actions, see the appropriate documentation.
Miscellaneous Messages
Miscellaneous messages in Table 2-1 indicate that certain alert systems are up and working.
Table 2-1. Miscellaneous Messages
Event ID DescriptionSeverityCause
0000Log was clearedInformationUser cleared the log from Server
Administrator.
0001Log backup createdInformationThe log was full, copied to backup, and
cleared.
1000Server Administrator startingInformationServer Administrator is beginning to
initialize.
1001Server Administrator startup
complete
1002A system BIOS update has been
scheduled for the next reboot
1003A previously scheduled system
BIOS update has been canceled
1004Thermal shutdown protection
has been initiated
InformationServer Administrator completed its
initialization.
InformationThe user has chosen to update the flash
basic input/output system (BIOS).
InformationThe user decides to cancel the flash
BIOS update, or an error occurs during
the flash.
ErrorThis message is generated when a
system is configured for thermal
shutdown due to an error event. If a
temperature sensor reading exceeds the
error threshold for which the system is
configured, the operating system shuts
down and the system powers off. This
event may also be initiated on certain
systems when a fan enclosure is removed
from the system for an extended period
of time.
Event Message Reference13
Table 2-1. Miscellaneous Messages (continued)
Event ID DescriptionSeverityCause
1005SMBIOS data is absentWarningThe system does not contain the
required systems management BIOS
version 2.2 or higher, or the BIOS is
corrupted.
1006Automatic System Recovery
(ASR) action was performed
Action performed was:
Date and time of action:
and time>
1007User initiated host system
control action
Action requested was:
1008Systems Management Data
Manager Started
1009Systems Management Data
Manager Stopped
1011RCI table is corruptWarningThis message is generated when the
1012IPMI Status
Interface: <
being used
the IPMI interface
>, <
additional
<Action>
<Date
<Action>
information if available and
applicable
>
ErrorThis message is generated when an
automatic system recovery action is
InformationUser requested a host system control
InformationSystems Management Data Manager
InformationSystems Management Data Manager
InformationThis message is generated to indicate
performed due to a hung operating
system. The action performed and the
time of action are provided.
action to reboot, power off, or power
cycle the system. Alternatively the user
had indicated protective measures to be
initiated in the event of a thermal
shutdown.
services were started.
services were stopped.
BIOS Remote Configuration Interface
(RCI) table is corrupted or cannot be
read by the systems management
software.
the Intelligent Platform Management
Interface (IPMI)) status of the system.
Additional information, when available,
includes Baseboard Management
Controller (BMC) not present, BMC
not responding, System Event Log (SEL)
not present, and SEL Data Record (SDR)
not present.
14Event Message Reference
Temperature Sensor Messages
Temperature sensors listed in Table 2-2 help protect critical components by alerting the systems
management console when temperatures become too high inside a chassis. The temperature sensor
messages use additional variables: sensor location, chassis location, previous state, and temperature
sensor value or state.
Table 2-2. Temperature Sensor Messages
Event ID DescriptionSeverityCause
1050Temperature sensor has failed
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
If sensor type is not discrete:
Temperature sensor value
(in degrees Celsius):
If sensor type is discrete:
Discrete temperature state:
<State>
1051Temperature sensor value
unknown
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
If sensor type is not discrete:
Temperature sensor value (in
degrees Celsius):
If sensor type is discrete:
Discrete temperature state:
<State>
<Reading>
<Reading>
InformationA temperature sensor on the backplane
board, system board, or the carrier in the
specified system failed. The sensor
location, chassis location, previous state,
and temperature sensor value are provided.
InformationA temperature sensor on the backplane
board, system board, or drive carrier in the
specified system could not obtain a reading.
The sensor location, chassis location,
previous state, and a nominal temperature
sensor value are provided.
Event Message Reference15
Table 2-2. Temperature Sensor Messages (continued)
Event ID DescriptionSeverityCause
1052Temperature sensor returned
to a normal value
Sensor location:
<Location in
chassis>
Chassis location:
<Name of
chassis>
Previous state was:
If sensor type is not discrete:
Temperature sensor value (in
degrees Celsius):
If sensor type is discrete:
Discrete temperature state:
<State>
<Reading>
InformationA temperature sensor on the backplane
board, system board, or drive carrier in the
specified system returned to a valid range
after crossing a failure threshold. The
sensor location, chassis location, previous
state, and temperature sensor value
are provided.
<State>
1053Temperature sensor detected
a warning value
Sensor location:
<Location in
chassis>
Chassis location:
<Name of
chassis>
Previous state was:
If sensor type is not discrete:
Temperature sensor value (in
degrees Celsius):
If sensor type is discrete:
Discrete temperature state:
<State>
<Reading>
WarningA temperature sensor on the backplane
board, system board, CPU, or drive carrier
in the specified system exceeded its
warning threshold. The sensor location,
chassis location, previous state, and
temperature sensor value are provided.
<State>
16Event Message Reference
Table 2-2. Temperature Sensor Messages (continued)
Event ID DescriptionSeverityCause
1054Temperature sensor detected
a failure value
Sensor location:
<Location in
chassis>
Chassis location:
<Name of
chassis>
Previous state was:
If sensor type is not discrete:
Temperature sensor value (in
degrees Celsius):
If sensor type is discrete:
Discrete temperature state:
<State>
<Reading>
ErrorA temperature sensor on the backplane
board, system board, or drive carrier in the
specified system exceeded its failure
threshold. The sensor location, chassis
location, previous state, and temperature
sensor value are provided.
<State>
1055Temperature sensor detected
a non-recoverable value
Sensor location:
<Location in
chassis>
Chassis location:
<Name of
chassis>
Previous state was:
If sensor type is not discrete:
Temperature sensor value (in
degrees Celsius):
If sensor type is discrete:
Discrete temperature state:
<State>
<Reading>
ErrorA temperature sensor on the backplane
board, system board, or drive carrier in the
specified system detected an error from
which it cannot recover. The sensor
location, chassis location, previous state,
and temperature sensor value are provided.
<State>
Event Message Reference17
Cooling Device Messages
Cooling device sensors listed in Table 2-3 monitor how well a fan is functioning. Cooling device messages
provide status and warning information for fans in a particular chassis.
Table 2-3. Cooling Device Messages
Event ID DescriptionSeverityCause
1100Fan sensor has failed
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Fan sensor value:
1101Fan sensor value unknown
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Fan sensor value:
1102Fan sensor returned to a
normal value
Sensor location:
in chassis>
Chassis location:
chassis>
Previous state was:
Fan sensor value:
1103Fan sensor detected a
warning value
Sensor location:
in chassis>
Chassis location:
chassis>
Previous state was:
Fan sensor value:
<Reading>
<Reading>
<Location
<Name of
<State>
<Reading>
<Location
<Name of
<State>
<Reading>
InformationA fan sensor in the specified system is not
functioning. The sensor location, chassis
location, previous state, and fan sensor value
are provided.
InformationA fan sensor in the specified system could not
obtain a reading. The sensor location, chassis
location, previous state, and a nominal fan
sensor value are provided.
InformationA fan sensor reading on the specified system
returned to a valid range after crossing a
warning threshold. The sensor location, chassis
location, previous state, and fan sensor value
are provided.
WarningA fan sensor reading in the specified system
exceeded a warning threshold. The sensor
location, chassis location, previous state, and
fan sensor value are provided.
18Event Message Reference
Table 2-3. Cooling Device Messages (continued)
Event ID DescriptionSeverityCause
1104Fan sensor detected a
failure value
Sensor location:
<Location
in chassis>
Chassis location:
<Name of
ErrorA fan sensor in the specified system detected
the failure of one or more fans. The sensor
location, chassis location, previous state, and
fan sensor value are provided.
chassis>
Previous state was:
Fan sensor value:
1105Fan sensor detected a
non-recoverable value
Sensor location:
in chassis>
Chassis location:
<State>
<Reading>
<Location
<Name of
ErrorA fan sensor detected an error from which it
cannot recover. The sensor location, chassis
location, previous state, and fan sensor value
are provided.
chassis>
Previous state was:
Fan sensor value:
<State>
<Reading>
Voltage Sensor Messages
Voltage sensors listed in Table 2-4 monitor the number of volts across critical components. Voltage sensor
messages provide status and warning information for voltage sensors in a particular chassis.
Table 2-4. Voltage Sensor Messages
Event ID DescriptionSeverityCause
1150Voltage sensor has failed
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
If sensor type is not discrete:
Voltage sensor value (in
Volts):
If sensor type is discrete:
Discrete voltage state:
<Reading>
<State>
InformationA voltage sensor in the specified system
failed. The sensor location, chassis
location, previous state, and voltage sensor
value are provided.
Event Message Reference19
Table 2-4. Voltage Sensor Messages (continued)
Event ID DescriptionSeverityCause
1151Voltage sensor value unknown
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
If sensor type is not discrete:
Voltage sensor value
(in Volts):
If sensor type is discrete:
Discrete voltage state:
1152Voltage sensor returned to a
normal value
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
If sensor type is not discrete:
Voltage sensor value
(in Volts):
If sensor type is discrete:
Discrete voltage state:
1153Voltage sensor detected a
warning value
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
If sensor type is not discrete:
Voltage sensor value
(in Volts):
If sensor type is discrete:
Discrete voltage state:
<Reading>
<State>
<Reading>
<State>
<Reading>
<State>
InformationA voltage sensor in the specified system
could not obtain a reading. The sensor
location, chassis location, previous state,
and a nominal voltage sensor value
are provided.
InformationA voltage sensor in the specified system
returned to a valid range after crossing a
failure threshold. The sensor location,
chassis location, previous state, and
voltage sensor value are provided.
WarningA voltage sensor in the specified system
exceeded its warning threshold. The
sensor location, chassis location, previous
state, and voltage sensor value are
provided.
20Event Message Reference
Table 2-4. Voltage Sensor Messages (continued)
Event ID DescriptionSeverityCause
1154Voltage sensor detected a
failure value
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
If sensor type is not discrete:
Voltage sensor value
(in Volts):
If sensor type is discrete:
Discrete voltage state:
1155Voltage sensor detected a
non-recoverable value
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
If sensor type is not discrete:
Voltage sensor value
(in Volts):
If sensor type is discrete:
Discrete voltage state:
<Reading>
<State>
<Reading>
<State>
ErrorA voltage sensor in the specified system
exceeded its failure threshold. The sensor
location, chassis location, previous state,
and voltage sensor value are provided.
ErrorA voltage sensor in the specified system
detected an error from which it cannot
recover. The sensor location, chassis
location, previous state, and voltage sensor
value are provided.
Event Message Reference21
Current Sensor Messages
Current sensors listed in Table 2-5 measure the amount of current (in amperes) that is traversing critical
components. Current sensor messages provide status and warning information for current sensors in a
particular chassis.
Table 2-5. Current Sensor Messages
Event ID DescriptionSeverityCause
1200Current sensor has failed
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
If sensor type is not
discrete:
Current sensor value (in
Amps):
If sensor type is discrete:
Discrete current state:
<State>
1201Current sensor value unknown
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
If sensor type is not
discrete:
Current sensor value (in
Amps):
If sensor type is discrete:
Discrete current state:
<State>
<Reading>
<Reading>
InformationA current sensor on the power supply for the
specified system failed. The sensor location,
chassis location, previous state, and current
sensor value are provided.
InformationA current sensor on the power supply for the
specified system could not obtain a reading.
The sensor location, chassis location,
previous state, and a nominal current sensor
value are provided.
22Event Message Reference
Table 2-5. Current Sensor Messages (continued)
Event ID DescriptionSeverityCause
1202Current sensor returned to
a normal value
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
If sensor type is not
discrete:
Current sensor value
(in Amps):
If sensor type is discrete:
Discrete current state:
<Reading>
InformationA current sensor on the power supply for the
specified system returned to a valid range
after crossing a failure threshold. The sensor
location, chassis location, previous state, and
current sensor value are provided.
<State>
1203Current sensor detected a
warning value
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
If sensor type is not
discrete:
Current sensor value
(in Amps):
If sensor type is discrete:
Discrete current state:
<Reading>
WarningA current sensor on the power supply for the
specified system exceeded its warning
threshold. The sensor location, chassis
location, previous state, and current sensor
value are provided.
<State>
Event Message Reference23
Table 2-5. Current Sensor Messages (continued)
Event ID DescriptionSeverityCause
1204Current sensor detected a
failure value
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
If sensor type is not
discrete:
Current sensor value
(in Amps):
If sensor type is discrete:
Discrete current state:
<Reading>
ErrorA current sensor on the power supply for the
specified system exceeded its failure threshold.
The sensor location, chassis location,
previous state, and current sensor value are
provided.
<State>
1205Current sensor detected a
non-recoverable value
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
If sensor type is not
discrete:
Current sensor value
(in Amps):
If sensor type is discrete:
Discrete current state:
<Reading>
ErrorA current sensor in the specified system
detected an error from which it cannot
recover. The sensor location, chassis location,
previous state, and current sensor value are
provided.
<State>
24Event Message Reference
Chassis Intrusion Messages
Chassis intrusion messages listed in Table 2-6 are a security measure. Chassis intrusion means that
someone is opening the cover to a system’s chassis. Alerts are sent to prevent unauthorized removal of
parts from a chassis.
Table 2-6. Chassis Intrusion Messages
Event ID DescriptionSeverityCause
1250Chassis intrusion sensor has
failed
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Chassis intrusion state:
<Intrusion state>
1251Chassis intrusion sensor
value unknown
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Chassis intrusion state:
<Intrusion state>
1252Chassis intrusion returned
to normal
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Chassis intrusion state:
<Intrusion state>
InformationA chassis intrusion sensor in the specified
system failed. The sensor location, chassis
location, previous state, and chassis intrusion
state are provided.
InformationA chassis intrusion sensor in the specified
system could not obtain a reading. The sensor
location, chassis location, previous state, and
chassis intrusion state are provided.
InformationA chassis intrusion sensor in the specified
system detected that a cover was opened while
the system was operating but has since been
replaced. The sensor location, chassis location,
previous state, and chassis intrusion state are
provided.
Event Message Reference25
Table 2-6. Chassis Intrusion Messages (continued)
Event ID DescriptionSeverityCause
1253Chassis intrusion in
progress
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Chassis intrusion state:
<Intrusion state>
1254Chassis intrusion detected
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Chassis intrusion state:
<Intrusion state>
1255Chassis intrusion sensor
detected a non-recoverable
value
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Chassis intrusion state:
<Intrusion state>
WarningA chassis intrusion sensor in the specified
system detected that a system cover is currently
being opened and the system is operating.
The sensor location, chassis location, previous
state, and chassis intrusion state are provided.
ErrorA chassis intrusion sensor in the specified
system detected that the system cover was
opened while the system was operating.
The sensor location, chassis location, previous
state, and chassis intrusion state are provided.
ErrorA chassis intrusion sensor in the specified
system detected an error from which it cannot
recover. The sensor location, chassis location,
previous state, and chassis intrusion state are
provided.
Redundancy Unit Messages
Redundancy means that a system chassis has more than one of certain critical components. Fans and
power supplies, for example, are so important for preventing damage or disruption of a computer system
that a chassis may have “extra” fans or power supplies installed. Redundancy allows a second or nth fan
to keep the chassis components at a safe temperature when the primary fan has failed. Redundancy is
normal when the intended number of critical components are operating. Redundancy is degraded when
a component fails but others are still operating. Redundancy is lost when the number of components
functioning falls below the redundancy threshold.
26Event Message Reference
Ta b l e 2-7 lists the redundancy unit messages.
The number of devices required for full redundancy is provided as part of the message, when applicable,
for the redundancy unit and the platform. For details on redundancy computation, see the respective
platform documentation.
Table 2-7. Redundancy Unit Messages
Event ID DescriptionSeverityCause
1300Redundancy sensor has failed
Redundancy unit:
<Redundancy
location in chassis>
Chassis location: <Name of
chassis>
Previous redundancy state was:
<State>
1301Redundancy sensor value
unknown
Redundancy unit:
<Redundancy
location in chassis>
Chassis location: <Name of
chassis>
Previous redundancy state was:
<State>
1302Redundancy not applicable
Redundancy unit:
<Redundancy
location in chassis>
Chassis location: <Name of
chassis>
Previous redundancy state was:
<State>
1303Redundancy is offline
Redundancy unit:
<Redundancy
location in chassis>
Chassis location: <Name of
chassis>
Previous redundancy state was:
<State>
InformationA redundancy sensor in the specified system
failed. The redundancy unit location, chassis
location, previous redundancy state, and the
number of devices required for full
redundancy are provided.
InformationA redundancy sensor in the specified system
could not obtain a reading. The redundancy
unit location, chassis location, previous
redundancy state, and the number of
devices required for full redundancy
are provided.
InformationA redundancy sensor in the specified system
detected that a unit was not redundant.
The redundancy location, chassis location,
previous redundancy state, and the number
of devices required for full redundancy are
provided.
InformationA redundancy sensor in the specified system
detected that a redundant unit is offline.
The redundancy unit location, chassis
location, previous redundancy state, and the
number of devices required for full
redundancy are provided.
Event Message Reference27
Table 2-7. Redundancy Unit Messages (continued)
Event ID DescriptionSeverityCause
1304Redundancy regained
Redundancy unit:
<Redundancy
location in chassis>
Chassis location: <Name of
chassis>
Previous redundancy state was:
InformationA redundancy sensor in the specified system
detected that a “lost” redundancy device has
been reconnected or replaced; full redundancy
is in effect. The redundancy unit location,
chassis location, previous redundancy state,
and the number of devices required for full
redundancy are provided.
<State>
1305Redundancy degraded
Redundancy unit:
<Redundancy
location in chassis>
Chassis location: <Name of
chassis>
Previous redundancy state was:
WarningA redundancy sensor in the specified system
detected that one of the components of the
redundancy unit has failed but the unit is
still redundant. The redundancy unit
location, chassis location, previous redundancy
state, and the number of devices required
for full redundancy are provided.
<State>
1306Redundancy lost
Redundancy unit:
location in chassis>
Chassis location: <Name of
chassis>
Previous redundancy state was:
<Redundancy
Warnin g o r
Error
(depending
on the
number of
units that are
functional)
A redundancy sensor in the specified system
detected that one of the components in the
redundant unit has been disconnected, has
failed, or is not present. The redundancy
unit location, chassis location, previous
redundancy state, and the number of devices
required for full redundancy are provided.
<State>
28Event Message Reference
Power Supply Messages
Power supply sensors monitor how well a power supply is functioning. Power supply messages listed in
Ta b l e 2-8 provide status and warning information for power supplies present in a particular chassis.
Table 2-8. Power Supply Messages
Event ID DescriptionSeverityCause
1350Power supply sensor has
failed Sensor location:
<Location in chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Power Supply type:
power supply>
<Additional power supply status
information>
If in configuration error
state:
Configuration error type:
<type of configuration error>
1351Power supply sensor value
unknown
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Power Supply type:
power supply>
<Additional power supply status
information>
If in configuration error
state:
Configuration error type:
<type of configuration error>
<type of
<type of
InformationA power supply sensor in the specified
system failed. The sensor location, chassis
location, previous state, and additional
power supply status information
are provided.
InformationA power supply sensor in the specified
system could not obtain a reading.
The sensor location, chassis location,
previous state, and additional power supply
status information are provided.
Event Message Reference29
Table 2-8. Power Supply Messages (continued)
Event ID DescriptionSeverityCause
1352Power supply returned to
normal Sensor location:
<Location in chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Power Supply type:
<type of
InformationA power supply has been reconnected or
replaced. The sensor location, chassis
location, previous state, and additional
power supply status information
are provided.
power supply>
<Additional power supply status
information>
If in configuration error
state:
Configuration error type:
<type of configuration error>
1353Power supply detected a
warning Sensor location:
<Location in chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Power Supply type:
<type of
WarningA power supply sensor reading in the
specified system exceeded a user-definable
warning threshold. The sensor location,
chassis location, previous state, and
additional power supply status information
are provided.
power supply>
<Additional power supply status
information>
If in configuration error
state:
Configuration error type:
<type of configuration error>
30Event Message Reference
Table 2-8. Power Supply Messages (continued)
Event ID DescriptionSeverityCause
1354Power supply detected a failure
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Power Supply type:
<type of
ErrorA power supply has been disconnected or
has failed. The sensor location, chassis
location, previous state, and additional
power supply status information
are provided.
power supply>
<Additional power supply status
information>
If in configuration error
state:
Configuration error type:
<type
of configuration error>
1355Power supply sensor detected
a non-recoverable value
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
Previous state was: <State>
Power Supply type:
<type of
ErrorA power supply sensor in the specified system
detected an error from which it cannot
recover. The sensor location, chassis location,
previous state, and additional power supply
status information are provided.
power supply>
<Additional power supply status
information>
If in configuration error
state:
Configuration error type:
<type of configuration error>
Event Message Reference31
Memory Device Messages
Memory device messages listed in Table 2-9 provide status and warning information for memory
modules present in a particular system. Memory devices determine health status by monitoring the ECC
memory correction rate and the type of memory events that have occurred.
NOTE: A critical status does not always indicate a system failure or loss of data. In some instances, the system has
exceeded the ECC correction rate. Although the system continues to function, you should perform system
maintenance as described in Table
NOTE: In Table 2-9, <status> can be either critical or non-critical.
Table 2-9. Memory Device Messages
Event ID DescriptionSeverityCause
1403Memory device status is
<status>
<location in chassis>
Possible memory module event
cause:
1404Memory device status is
<status>
<location in chassis>
Possible memory module event
cause: <list of causes>
Memory device location:
<list of causes>
Memory device location:
2-9.
WarningA memory device correction rate
exceeded an acceptable value.
The memory device status and location
are provided.
ErrorA memory device correction rate
exceeded an acceptable value, a memory
spare bank was activated, or a multibit
ECC error occurred. The system continues
to function normally (except for a
multibit error). Replace the memory
module identified in the message during
the system’s next scheduled maintenance.
Clear the memory error on multibit ECC
error. The memory device status and
location are provided.
32Event Message Reference
Fan Enclosure Messages
Some systems are equipped with a protective enclosure for fans. Fan enclosure messages listed in
Ta b l e 2-10 monitor whether foreign objects are present in an enclosure and how long a fan enclosure is
missing from a chassis.
Table 2-10. Fan Enclosure Messages
Event ID DescriptionSeverityCause
1450Fan enclosure sensor has
failed
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
1451Fan enclosure sensor value
unknown
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
1452Fan enclosure inserted into
system
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
1453Fan enclosure removed from
system
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
InformationThe fan enclosure sensor in the specified
system failed. The sensor location and chassis
location are provided.
InformationThe fan enclosure sensor in the specified
system could not obtain a reading. The sensor
location and chassis location are provided.
InformationA fan enclosure has been inserted into the
specified system. The sensor location and
chassis location are provided.
WarningA fan enclosure has been removed from the
specified system. The sensor location and
chassis location are provided.
Event Message Reference33
Table 2-10. Fan Enclosure Messages (continued)
Event ID DescriptionSeverityCause
1454Fan enclosure removed from
system for an extended
amount of time
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
1455Fan enclosure sensor
detected a non-recoverable
value
Sensor location: <Location
in chassis>
Chassis location: <Name of
chassis>
ErrorA fan enclosure has been removed from the
specified system for a user-definable length of
time. The sensor location and chassis location
are provided.
ErrorA fan enclosure sensor in the specified system
detected an error from which it cannot recover.
The sensor location and chassis location
are provided.
AC Power Cord Messages
AC power cord messages listed in Table 2-11 provide status and warning information for power cords that
are part of an AC power switch, if your system supports AC switching.
Table 2-11. AC Power Cord Messages
Event ID DescriptionSeverityCause
1500AC power cord sensor has
failed Sensor location:
<Location in chassis>
Chassis location: <Name of
chassis>
1501AC power cord is not being
monitored
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
InformationAn AC power cord sensor in the specified
InformationThe AC power cord status is not being
34Event Message Reference
system failed. The AC power cord status
cannot be monitored. The sensor location
and chassis location information are
provided.
monitored. This occurs when a system’s
expected AC power configuration is set to
nonredundant. The sensor location and
chassis location information are provided.
Table 2-11. AC Power Cord Messages (continued)
Event ID DescriptionSeverityCause
1502AC power has been restored
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
1503AC power has been lost
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
1504AC power has been lost
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
1505AC power has been lost
Sensor location: <Location in
chassis>
Chassis location: <Name of
chassis>
InformationAn AC power cord that did not have
AC power has had the power restored.
The sensor location and chassis location
information are provided.
WarningAn AC power cord has lost its power, but
there is sufficient redundancy to classify
this as a warning. The sensor location and
chassis location information are provided.
ErrorAn AC power cord has lost its power, and
lack of redundancy requires this to be
classified as an error. The sensor location and
chassis location information are provided.
ErrorAn AC power cord sensor in the specified
system failed. The AC power cord status
cannot be monitored. The sensor location
and chassis location information are
provided.
Hardware Log Sensor Messages
Hardware logs provide hardware status messages to systems management software. On certain systems,
the hardware log is implemented as a circular queue. When the log becomes full, the oldest status
messages are overwritten when new status messages are logged. On some systems, the log is not circular.
On these systems, when the log becomes full, subsequent hardware status messages are lost. Hardware
log sensor messages listed in
logs that may fill up, resulting in lost status messages.
Ta b l e 2-12 provide status and warning information about the noncircular
Event Message Reference35
Table 2-12. Hardware Log Sensor Messages
Event ID DescriptionSeverityCause
1550Log monitoring has been
disabled
Log type:
1551Log status is unknown
Log type:
1552Log size is no longer near
or at capacity
Log type:
1553Log size is near or at
capacity
Log type:
1554Log size is full
Log type:
1555Log sensor has failed
Log type:
<Log type>
<Log type>
<Log type>
<Log type>
<Log type>
<Log type>
InformationA hardware log sensor in the specified
system is disabled. The log type information
is provided.
InformationA hardware log sensor in the specified
system could not obtain a reading. The log
type information is provided.
InformationThe hardware log on the specified system is
no longer near or at its capacity, usually as
the result of clearing the log. The log type
information is provided.
WarningThe size of a hardware log on the specified
system is near or at the capacity of the
hardware log. The log type information is
provided.
ErrorThe size of a hardware log on the specified
system is full. The log type information is
provided.
ErrorA hardware log sensor in the specified
system failed. The hardware log status
cannot be monitored. The log type
information is provided.
36Event Message Reference
Processor Sensor Messages
Processor sensors monitor how well a processor is functioning. Processor messages listed in Table 2-13
provide status and warning information for processors in a particular chassis.
Table 2-13. Processor Sensor Messages
Event ID DescriptionSeverityCause
1600Processor sensor has failed
Sensor Location:
chassis>
Chassis Location:
chassis>
Previous state was:
Processor sensor status:
<status>
1601Processor sensor value
unknown Sensor Location:
<Location in chassis>
Chassis Location:
chassis>
Previous state was:
Processor sensor status:
<status>
1602Processor sensor returned to
a normal value
Sensor Location:
chassis>
Chassis Location:
chassis>
Previous state was:
Processor sensor status:
<status>
<Location in
<Name of
<State>
<Name of
<State>
<Location in
<Name of
<State>
InformationA processor sensor in the specified system is
not functioning. The sensor location, chassis
location, previous state and processor sensor
status are provided.
InformationA processor sensor in the specified system
could not obtain a reading. The sensor
location, chassis location, previous state and
processor sensor status are provided.
InformationA processor sensor in the specified system
transitioned back to a normal state.
The sensor location, chassis location, previous
state and processor sensor status
are provided.
Event Message Reference37
Table 2-13. Processor Sensor Messages (continued)
Event ID DescriptionSeverityCause
1603Processor sensor detected a
warning value
Sensor Location:
<Location in
chassis>
Chassis Location:
<Name of
WarningA processor sensor in the specified system is
in a throttled state. The sensor location,
chassis location, previous state and
processor sensor status are provided.
chassis>
Previous state was:
Processor sensor status:
<State>
<status>
1604Processor sensor detected a
failure value
Sensor Location:
<Location in
chassis>
Chassis Location:
<Name of
ErrorA processor sensor in the specified system is
disabled, has a configuration error, or
experienced a thermal trip. The sensor
location, chassis location, previous state and
processor sensor status are provided.
chassis>
Previous state was:
Processor sensor status:
<State>
<status>
1605Processor sensor detected a
non-recoverable value
Sensor Location:
<Location in
chassis>
Chassis Location:
<Name of
ErrorA processor sensor in the specified system
has failed. The sensor location, chassis
location, previous state and processor sensor
status are provided.
chassis>
Previous state was:
Processor sensor status:
<State>
<status>
38Event Message Reference
Pluggable Device Messages
The pluggable device messages listed in Table 2-14 provide status and error information when some
devices, such as memory cards, are added or removed.
Table 2-14. Pluggable Device Messages
Event ID DescriptionSeverityCause
1650
1651Device added to system
1652Device removed from system
1653Device configuration error
<Device plug event type unknown>
Device location:
if available>
Chassis location:
if available>
Additional details:
details for the events,
if available>
Device location:
chassis>
Chassis location:
Additional details:
details for the events>
Device location:
chassis>
Chassis location:
chassis>
Additional details:
details for the events>
detected
Device location:
chassis>
Chassis location:
chassis>
Additional details:
details for the events>
<Location in chassis,
<Name of chassis,
<Additional
<Location in
<Name of chassis>
<Additional
<Location in
<Name of
<Additional
<Location in
<Name of
<Additional
InformationA pluggable device event
message of unknown type was
received. The device location,
chassis location, and
additional event details, if
available, are provided.
InformationA device was added in the
specified system. The device
location, chassis location, and
additional event details, if
available, are provided.
InformationA device was removed from
the specified system.
The device location, chassis
location, and additional event
details, if available, are
provided.
ErrorA configuration error was
detected for a pluggable
device in the specified
system. The device may have
been added to the system
incorrectly.
Event Message Reference39
Battery Sensor Messages
Battery sensors monitor how well a battery is functioning. Battery messages listed in Table 2-15 provide
status and warning information for batteries in a particular chassis.
Table 2-15. Battery Sensor Messages
Event ID DescriptionSeverityCause
1700Battery sensor has failed
Sensor location: <Location in chassis>
Chassis location: <Name of chassis>
Previous state was:
Battery sensor status:
1701Battery sensor value unknown
Sensor Location:
Chassis Location:
Previous state was:
Battery sensor status:
1702Battery sensor returned to a normal
value
Sensor Location:
Chassis Location:
Previous state was:
Battery sensor status:
1703Battery sensor detected a warning
value
Sensor Location:
Chassis Location:
Previous state was:
Battery sensor status:
<State>
<status>
<Location in chassis>
<Name of chassis>
<State>
<status>
<Location in chassis>
<Name of chassis>
<State>
<status>
<Location in chassis>
<Name of chassis>
<State>
<status>
InformationA battery sensor in the
specified system is not
functioning. The sensor
location, chassis location,
previous state, and battery
sensor status are provided.
InformationA battery sensor in the
specified system could not
retrieve a reading. The sensor
location, chassis location,
previous state, and battery
sensor status are provided.
InformationA battery sensor in the
specified system detected
that a battery transitioned
back to a normal state.
The sensor location, chassis
location, previous state, and
battery sensor status are
provided.
WarningA battery sensor in the
specified system detected
that a battery is in a predictive
failure state. The sensor
location, chassis location,
previous state, and battery
sensor status are provided.
40Event Message Reference
Table 2-15. Battery Sensor Messages (continued)
Event ID DescriptionSeverityCause
1704Battery sensor detected a failure
value
Sensor Location:
Chassis Location:
Previous state was:
Battery sensor status:
1705Battery sensor detected a non-
recoverable value
Sensor Location:
Chassis Location:
Previous state was:
Battery sensor status:
<Location in chassis>
<Name of chassis>
<State>
<status>
<Location in chassis>
<Name of chassis>
<State>
<status>
ErrorA battery sensor in the
specified system detected
that a battery has failed.
The sensor location, chassis
location, previous state, and
battery sensor status are
provided.
ErrorA battery sensor in the
specified system detected
that a battery has failed.
The sensor location, chassis
location, previous state, and
battery sensor status are
provided.
Event Message Reference41
42Event Message Reference
System Event Log Messages for IPMI Systems
The following tables list the system event log (SEL) messages, their severity, and cause.
NOTE: For corrective actions, see the appropriate documentation.
Temperature Sensor Events
The temperature sensor event messages help protect critical components by alerting the systems
management console when the temperature rises inside the chassis. These event messages use
additional variables, such as sensor location, chassis location, previous state, and temperature
sensor
value or state.
Table 3-1. Temperature Sensor Events
Event MessageSeverityCause
<
Sensor Name/Location
temperature sensor detected a
failure <
Name/Location
that this sensor is monitoring.
For example, "PROC Temp" or
"Planar Temp."
Reading is specified in degree
Celsius. For example 100 C.
<Sensor Name/Location
temperature sensor detected
a warning <
<
Sensor Name/Location>
temperature sensor returned
to warning state <
<
Sensor Name/Location
temperature sensor returned
to normal state <
Reading
> is the entity
Reading
>
> where <
>
>.
Reading
>
Reading
Sensor
>.
>.
CriticalTemperature of the backplane board, system
board, or the carrier in the specified system
<Sensor Name/Location> exceeded the critical
threshold.
WarningTemperature of the backplane board, system
board, or the carrier in the specified system
<Sensor Name/Location> exceeded the
non-critical threshold.
WarningTemperature of the backplane board, system
board, or the carrier in the specified system
<Sensor Name/Location> returned from critical
state to non-critical state.
InformationTemperature of the backplane board, system
board, or the carrier in the specified system
<Sensor Name/Location> returned to normal
operating range.
System Event Log Messages for IPMI Systems43
Voltage Sensor Events
The voltage sensor event messages monitor the number of volts across critical components.
These
messages provide status and warning information for voltage sensors for a particular chassis.
Table 3-2. Voltage Sensor Events
Event MessageSeverityCause
<
Sensor Name/Location
sensor detected a failure <
where <
entity that this sensor is
monitoring.
Reading is specified in volts.
For example, 3.860 V.
Sensor Name/Location
<
sensor state asserted.
<
Sensor Name/Location
sensor state de-asserted.
Sensor Name/Location
<
sensor detected a warning
<
Reading
Sensor Name/Location
<
sensor returned to normal
<
Reading
Sensor Name/Location
>.
>.
> voltage
Reading
> is the
> voltage
> voltage
> voltage
> voltage
CriticalThe voltage of the monitored device has
>
CriticalThe voltage specified by
InformationThe voltage of a previously reported
WarningVoltage of the monitored entity
InformationThe voltage of a previously reported
exceeded the critical threshold.
<Sensor Name/Location> is in critical state.
<Sensor Name/Location> is returned to
normal state.
<Sensor Name/Location> exceeded the
warning threshold.
<Sensor Name/Location> is returned to
normal state.
44System Event Log Messages for IPMI Systems
Fan Sensor Events
The cooling device sensors monitor how well a fan is functioning. These messages provide status warning
and failure messages for fans for a particular chassis.
Table 3-3. Fan Sensor Events
Event MessageSeverityCause
<
Sensor Name/Location
sensor detected a failure
<
Reading
Name/Location
that this sensor is monitoring.
For example "BMC Back Fan" or
"BMC Front Fan."
Reading is specified in RPM.
For example, 100 RPM.
> where <
> is the entity
<Sensor Name/Location
sensor returned to normal state
Reading
<
Sensor Name/Location
<
sensor detected a warning
Reading
<
<
Sensor Name/Location
Redundancy sensor redundancy
degraded.
Sensor Name/Location
<
Redundancy sensor redundancy
lost.
>.
>.
<Sensor Name/Location> Fan
Redundancy sensor redundancy
regained
> Fan
Sensor
> Fan
> Fan
> Fan
> Fan
CriticalThe speed of the specified <Sensor Name/Location>
fan is not sufficient to provide enough cooling to the
system.
InformationThe fan specified by <Sensor Name/Location> has
returned to its normal operating speed.
WarningThe speed of the specified <Sensor Name/Location>
fan may not be sufficient to provide enough cooling
to the system.
InformationThe fan specified by <Sensor Name/Location> may
have failed and hence, the redundancy has been
degraded.
CriticalThe fan specified by <Sensor Name/Location> may
have failed and hence, the redundancy that was
degraded previously has been lost.
InformationThe fan specified by <Sensor Name/Location> may
have started functioning again and hence, the
redundancy has been regained.
System Event Log Messages for IPMI Systems45
Processor Status Events
The processor status messages monitor the functionality of the processors in a system. These messages
provide processor health and warning information of a system.
Table 3-4. Processor Status Events
Event MessageSeverityCause
<
Processor Entity
sensor IERR, where <
Entity
generated the event. For example,
PROC for a single processor system
and PROC # for multiprocessor
system.
<
sensor Thermal Trip.
<
sensor recovered from IERR.
<
sensor disabled.
<
sensor terminator not present.
> is the processor that
Processor Entity
Processor Entity
Processor Entity
Processor Entity
< Processor Entity>
deasserted.
<Processor Entity>
asserted.
<Processor Entity>
was deasserted.
<Processor Entity>
error was asserted.
<Processor Entity>
error was deasserted.
<Processor Entity>
asserted.
<Processor Entity>
deasserted.
> status processor
Processor
> status processor
> status processor
> status processor
> status processor
presence was
presence was
thermal tripped
configuration
configuration
throttled was
throttled was
CriticalIERR internal error generated by the
<Processor Entity>.
CriticalThe processor generates this event before it
shuts down because of excessive heat caused
by lack of cooling or heat synchronization.
InformationThis event is generated when a processor
recovers from the internal error.
WarningThis event is generated for all processors that
are disabled.
InformationThis event is generated if the terminator is
missing on an empty processor slot.
CriticalThis event is generated when the system
could not detect the processor.
InformationThis event is generated when the earlier
processor detection error was corrected.
InformationThis event is generated when the processor
has recovered from an earlier thermal condition.
CriticalThis event is generated when the processor
configuration is incorrect.
InformationThis event is generated when the earlier
processor configuration error was corrected.
WarningThis event is generated when the processor
slows down to prevent over heating.
InformationThis event is generated when the earlier
processor throttled event was corrected.
46System Event Log Messages for IPMI Systems
Power Supply Events
The power supply sensors monitor the functionality of the power supplies. These messages provide status
and warning information for power supplies for a particular system.
Table 3-5. Power Supply Events
Event MessageSeverityCause
<
Power Supply Sensor Name
supply sensor removed.
<
Power Supply Sensor Name
supply sensor AC recovered.
<
Power Supply Sensor Name
supply sensor returned to normal
state.
Entity Name
<
sensor redundancy degraded.
<
Entity Name
sensor redundancy lost.
<
Entity Name
sensor redundancy regained.
> PS Redundancy
> PS Redundancy
> PS Redundancy
<Power Supply Sensor Name>
predictive failure was asserted
<Power Supply Sensor Name>
lost was asserted
<Power Supply Sensor Name>
predictive failure was deasserted
<Power Supply Sensor Name>
lost was deasserted
> power
> power
> power
input
input
CriticalThis event is generated when the power supply
sensor is removed.
InformationThis event is generated when the power supply
has been replaced.
InformationThis event is generated when the power supply
that failed or removed was replaced and the
state has returned to normal.
InformationPower supply redundancy is degraded if one of
the power supply sources is removed or failed.
CriticalPower supply redundancy is lost if only one
power supply is functional.
InformationThis event is generated if the power supply has
been reconnected or replaced.
WarningThis event is generated when the power supply
is about to fail.
CriticalThis event is generated when the power supply
is unplugged.
InformationThis event is generated when the power
supply has recovered from an earlier predictive
failure event.
InformationThis event is generated when the power supply
is plugged in.
System Event Log Messages for IPMI Systems47
Memory ECC Events
The memory ECC event messages monitor the memory modules in a system. These messages monitor
the ECC memory correction rate and the type of memory events that occurred.
Table 3-6. Memory ECC Events
Event MessageSeverityCause
ECC error correction detected
on Bank # DIMM [A/B].
ECC uncorrectable error
detected on Bank # [DIMM].
Correctable memory error
logging disabled.
InformationThis event is generated when there is a memory error
correction on a particular Dual Inline Memory Module
(DIMM).
CriticalThis event is generated when the chipset is unable to
correct the memory errors. Usually, a bank number is
provided and DIMM may or may not be identifiable,
depending on the error.
CriticalThis event is generated when the chipset in the ECC
error correction rate exceeds a predefined limit.
BMC Watchdog Events
The BMC watchdog operations are performed when the system hangs or crashes. These messages
monitor the status and occurrence of these events in a system.
Table 3-7. BMC Watchdog Events
Event MessageSeverityCause
BMC OS Watchdog timer expired. InformationThis event is generated when the BMC watchdog
timer expires and no action is set.
BMC OS Watchdog performed
system reboot.
BMC OS Watchdog performed
system power off.
BMC OS Watchdog performed
system power cycle.
CriticalThis event is generated when the BMC watchdog
detects that the system has crashed (timer expired
because no response was received from Host) and the
action is set to reboot.
CriticalThis event is generated when the BMC watchdog
detects that the system has crashed (timer expired
because no response was received from Host) and the
action is set to power off.
CriticalThis event is generated when the BMC watchdog
detects that the system has crashed (timer expired
because no response was received from Host) and the
action is set to power cycle.
48System Event Log Messages for IPMI Systems
Memory Events
The memory modules can be configured in different ways in particular systems. These messages monitor
the status, warning, and configuration information about the memory modules in the system.
Table 3-8. Memory Events
Event MessageSeverityCause
Memory RAID redundancy
degraded.
Memory RAID redundancy
lost.
Memory RAID redundancy
regained
Memory Mirrored
redundancy degraded.
Memory Mirrored
redundancy lost.
Memory Mirrored
redundancy regained.
Memory Spared redundancy
degraded.
Memory Spared redundancy
lost.
Memory Spared redundancy
regained.
Information This event is generated when there is a memory failure in a
RAID-configured memory configuration.
CriticalThis event is generated when redundancy is lost in a
RAID-configured memory configuration.
Information This event is generated when the redundancy lost or degraded
earlier is regained in a RAID-configured
memory configuration.
Information This event is generated when there is a memory failure in a
mirrored memory configuration.
CriticalThis event is generated when redundancy is lost in a mirrored
memory configuration.
Information This event is generated when the redundancy lost or degraded
earlier is regained in a mirrored memory configuration.
Information This event is generated when there is a memory failure in a
spared memory configuration.
CriticalThis event is generated when redundancy is lost in a spared
memory configuration.
Information This event is generated when the redundancy lost or degraded
earlier is regained in a spared memory configuration.
Hardware Log Sensor Events
The hardware logs provide hardware status messages to the system management software. On particular
systems, the subsequent hardware messages are not displayed when the log is full. These messages
provide status and warning messages when the logs are full.
Table 3-9. Hardware Log Sensor Events
Event MessageSeverityCause
Log full detected.CriticalThis event is generated when the SEL device detects that
only one entry can be added to the SEL before it is full.
Log cleared.InformationThis event is generated when the SEL is cleared.
System Event Log Messages for IPMI Systems49
Drive Events
The drive event messages monitor the health of the drives in a system. These events are generated when
there is a fault in the drives indicated.
Table 3-10. Drive Events
Event MessageSeverityCause
Drive <
state.
Drive <
fault state.
Drive
drive presence was asserted
Drive
predictive failure was
asserted
Drive
predictive failure was
deasserted
Drive
hot spare was asserted
Drive
hot spare was deasserted
Drive
consistency check in progress
was asserted
Drive
consistency check in progress
was deasserted
Drive
in critical array was
asserted
Drive
in critical array was
deasserted
Drive
in failed array was asserted
Drive #
> asserted fault
Drive #
<Drive #>
<Drive #>
<Drive #>
<Drive #>
<Drive #>
<Drive #>
<Drive #>
<Drive #>
<Drive #>
<Drive #>
> de-asserted
CriticalThis event is generated when the specified drive in the
array is faulty.
InformationThis event is generated when the specified drive
recovers from a faulty condition.
Informational This event is generated when the drive is installed.
WarningThis event is generated when the drive is about to fail.
Informational This event is generated when the drive from earlier
predictive failure is corrected.
WarningThis event is generated when the drive is placed in a
hot spare.
Informational This event is generated when the drive is taken out of
hot spare.
WarningThis event is generated when the drive is placed in
consistency check.
Informational This event is generated when the consistency check of
the drive is completed.
CriticalThis event is generated when the drive is placed in
critical array.
Informational This event is generated when the drive is removed
from critical array.
CriticalThis event is generated when the drive is placed in the
fail array.
50System Event Log Messages for IPMI Systems
Table 3-10. Drive Events (continued)
Event MessageSeverityCause
Drive
in failed array was deasserted
Drive
rebuild in progress was
asserted
Drive
rebuild aborted was asserted
<Drive #>
<Drive #>
<Drive #>
Informational This event is generated when the drive is removed
from the fail array.
Informational This event is generated when the drive is rebuilding.
WarningThis event is generated when the drive rebuilding
process is aborted.
Intrusion Events
The chassis intrusion messages are a security measure. Chassis intrusion alerts are generated when the
system's chassis is opened. Alerts are sent to prevent unauthorized removal of parts from the chassis.
Table 3-11. Intrusion Events
Event MessageSeverityCause
<
Intrusion sensor Name
sensor detected an intrusion.
<
Intrusion sensor Name
sensor returned to normal state.
<Intrusion sensor Name>
sensor intrusion was asserted
while system was ON
<Intrusion sensor Name>
sensor intrusion was asserted
while system was OFF
>
>
CriticalThis event is generated when the intrusion sensor
detects an intrusion.
InformationThis event is generated when the earlier intrusion
has been corrected.
CriticalThis event is generated when the intrusion sensor
detects an intrusion while the system is on.
CriticalThis event is generated when the intrusion sensor
detects an intrusion while the system is off.
System Event Log Messages for IPMI Systems51
BIOS Generated System Events
The BIOS generated messages monitor the health and functionality of the chipsets, I/O channels, and
other BIOS-related functions. These system events are generated by the BIOS.
Table 3-12. BIOS Generated System Events
Event MessageSeverityCause
System Event I/O channel chk. CriticalThis event is generated when a critical interrupt is
generated in the I/O Channel.
System Event PCI Parity Err.CriticalThis event is generated when a parity error is detected
on the PCI bus.
System Event Chipset Err.CriticalThis event is generated when a chip error is detected.
System Event PCI System Err.InformationThis event indicates historical data, and is generated
when the system has crashed and recovered.
System Event PCI Fatal Err.CriticalThis error is generated when a fatal error is detected on
the PCI bus.
System Event PCIE Fatal Err.CriticalThis error is generated when a fatal error is detected on
the PCIE bus.
POST Err
POST fatal error #<number>
Memory Spared
redundancy lost
Memory Mirrored
redundancy lost
Memory RAID
redundancy lost
Err Reg Pointer
OEM Diagnostic data event was
asserted
System Board PFault Fail
Safe state asserted
System Board PFault Fail
Safe state deasserted
Memory Add
(BANK# DIMM#) presence was
asserted
CriticalThis event is generated when an error accrues during
system boot. See the system documentation for more
information on the error code.
CriticalThis event is generated when memory spare is no
longer redundant.
CriticalThis event is generated when memory mirroring is no
longer redundant.
CriticalThis event is generated when memory RAID is no
longer redundant.
InformationThis event is generated when an OEM event accrues.
CriticalThis event is generated when the system board
voltages are not at normal levels.
InformationThis event is generated when earlier PFault Fail Safe
system voltages returns to a normal level.
InformationThis event is generated when memory is added to the
system.
52System Event Log Messages for IPMI Systems
Table 3-12. BIOS Generated System Events (continued)
Event MessageSeverityCause
Memory Removed
(BANK# DIMM#) presence was
asserted
Memory Cfg Err
configuration error (BANK#
DIMM#) was asserted
Mem Redun Gain
redundancy regained
Mem ECC Warning
transition to non-critical
from OK
Mem ECC Warning
transition to critical from
less severe
Mem CRC Err
transition to non-recoverable
Mem Fatal SB CRC
uncorrectable ECC was
asserted
Mem Fatal NB CRC
uncorrectable ECC was
asserted
Mem Overtemp
critical over temperature
was asserted
USB Over-current
transition to non-recoverable
Hdwr version err
hardware incompatibility
(BMC Firmware and CPU
mismatch) was asserted
InformationThis event is generated when memory is removed from
the system.
CriticalThis event is generated when memory configuration is
incorrect for the system.
InformationThis event is generated when memory redundancy is
regained.
WarningThis event is generated when correctable ECC errors
have increased from a normal rate.
CriticalThis event is generated when correctable ECC errors
reach a critical rate.
CriticalThis event is generated when CRC errors enter a
non-recoverable state.
CriticalThis event is generated when CRC errors occur while
storing to memory.
CriticalThis event is generated when CRC errors occur while
removing from memory.
CriticalThis event is generated when system memory reaches
critical temperature.
CriticalThis event is generated when the USB exceeds a
predefined current level.
CriticalThis event is generated when there is a mismatch
between the BMC firmware and the processor in use
or vice versa.
System Event Log Messages for IPMI Systems53
Table 3-12. BIOS Generated System Events (continued)
Event MessageSeverityCause
Hdwr version err
hardware incompatibility
(BMC Firmware and CPU
mismatch) was deasserted
Hdwr version err
hardware incompatibility
(BMC Firmware and other
mismatch) was asserted
Hdwr version err
hardware incompatibility
(BMC Firmware and CPU
mismatch) was deasserted
SBE Log Disabled
correctable memory error
logging disabled was asserted
CPU Protocol Err
transition to non-recoverable
CPU Bus PERR
transition to non-recoverable
CPU Init Err
transition to non-recoverable
CPU Machine Chk
transition to non-recoverable
Logging Disabled
all event logging disabled was
asserted
Unknown system event sensor
unknown system hardware
failure was asserted
InformationThis event is generated when the earlier mismatch
between the BMC firmware and the processor is
corrected.
CriticalThis event is generated when there is a mismatch
between the BMC firmware and the processor in use or
vice versa.
InformationThis event is generated when an earlier hardware
mismatch is corrected.
CriticalThis event is generated when the ECC single bit error
rate is exceeded.
CriticalThis event is generated when the processor protocol
enters a non-recoverable state.
CriticalThis event is generated when the processor bus PERR
enters a non-recoverable state.
CriticalThis event is generated when the processor
initialization enters a non-recoverable state.
CriticalThis event is generated when the processor machine
check enters a non-recoverable state.
CriticalThis event is generated when all event logging is
disabled.
CriticalThis event is generated when an unknown hardware
failure is detected.
54System Event Log Messages for IPMI Systems
R2 Generated System Events
Table 3-13. R2 Generated Events
DescriptionSeverityCause
System Event: OS stop event OS
graceful shutdown detected
OEM Event data record (after
OS graceful shutdown/restart event)
System Event: OS stop event runtime
critical stop
OEM Event data record (after OS
bugcheck event)
InformationThe OS was shutdown/restarted
normally.
InformationComment string accompanying an
OS shutdown/restart.
CriticalThe OS encountered a critical error and
was stopped abnormally.
InformationOS bugcheck code and paremeters.
Cable Interconnect Events
The cable interconnect messages are used for detecting errors in the hardware cabling.
Table 3-14. Cable Interconnect Events
DescriptionSeverityCause
<Cable sensor Name/Location>
Configuration error was asserted.
<Cable sensor Name/Location>
Connection was asserted.
CriticalThis event is generated when the cable is
not connected or is incorrectly
connected.
InformationThis event is generated when the earlier
cable connection error was corrected.
Battery Events
Table 3-15. Battery Events
DescriptionSeverityCause
<Battery sensor Name/Location>
Failed was asserted
<Battery sensor Name/Location>
Failed was deasserted
<Battery sensor Name/Location>
is low was asserted
<Battery sensor Name/Location>
is low was deasserted
CriticalThis event is generated when the sensor
detects a failed or missing battery.
InformationThis event is generated when the earlier
failed battery was corrected.
WarningThis event is generated when the sensor
detects a low battery condition.
InformationThis event is generated when the earlier
low battery condition was corrected.
System Event Log Messages for IPMI Systems55
Entity Presence Events
The entity presence messages are used for detecting different hardware devices.
Table 3-16. Entity Presence Events
DescriptionSeverityCause
<Device Name>
presence was asserted
<Device Name>
absent was asserted
InformationThis event is generated when the device was detected.
CriticalThis event is generated when the device was not detected.
56System Event Log Messages for IPMI Systems
Storage Management Message Reference
The Dell OpenManage™ Server Administrator Storage Management’s alert or event management
features let you monitor the health of storage resources such as controllers, enclosures, physical
disks, and virtual disks.
Alert Monitoring and Logging
The Storage Management Service performs alert monitoring and logging. By default, the Storage
Management Service starts when the managed system starts up. If you stop the Storage
Management Service, the alert monitoring and logging stops. Alert monitoring does the following:
•Updates the status of the storage object that generated the alert.
•Propagates the storage object’s status to all the related higher objects in the storage hierarchy. For
example, the status of a lower-level object will be propagated up to the status displayed on the
Health tab for the top-level storage object.
•Logs an alert in the Alert log and the operating system (OS) application log.
•Sends an SNMP trap if the operating system’s SNMP service is installed and enabled.
NOTE: Dell OpenManage Server Administrator Storage Management does not log alerts regarding the data
I/O path. These alerts are logged by the respective RAID drivers in the system alert log.
See the Storage Management Online Help and the Dell OpenManage Server Administrator Storage
Management User’s Guide for updated information.
Alert Message Format with Substitution Variables
When you view an alert in the Server Administrator alert log, the alert identifies the specific
components such as the controller name or the virtual disk name to which the alert applies. In an
actual operating environment, a storage system can have many combinations of controllers and disks
as well as user-defined names for virtual disks and other components. Because each environment is
unique in its storage configuration and user-defined names, an accurate alert message requires that
the Storage Management Service be able to insert the environment-specific names of storage
components into an alert message.
This environment-specific information is inserted after the alert message text as shown for
alert
2127 in Ta b l e 4-1.
Storage Management Message Reference57
For other alerts, the alert message text is constructed from information passed directly from the
controller (or another storage component) to the Alert Log. In these cases, the variable information is
represented with a % (percent sign) in the Storage Management documentation. An example of such an
alert is shown for alert 2334 in
Table 4-1. Alert Message Format
Ta b l e 4-1.
Alert IDMessage Text Displayed in the Storage
Management Service Documentation
2127Background Initialization startedBackground Initialization started: Virtual Disk 3 (Virtual
2334Controller event log %Controller event log: Current capacity of the battery is
Message Text Displayed in the Alert Log with Variable
Information Supplied
Disk 3) Controller 1 (PERC 5/E Adapter)
above threshold.: Controller 1 (PERC 5/E Adapter)
The variables required to complete the message vary depending on the type of storage object and
whether the storage object is in a SCSI or SAS configuration. The following table identifies the possible
variables used to identify each storage object.
NOTE: Some alert messages relating to an enclosure or an enclosure component, such as a fan or EMM, are
generated by the controller when the enclosure or enclosure component ID cannot be determined.
Table 4-2. Message Format with Variables for Each Storage Object
Storage Object Message Variables
A, B, C and X, Y, Z in the following examples are variables representing the storage object
name or number.
ControllerMessage Format: Controller A (Name)
Message Format: Controller A
Example: 2326 A foreign configuration has been detected.: Controller 1 (PERC 5/E
Adapter)
NOTE: The controller name is not always displayed.
BatteryMessage Format: Battery X Controller A
Example: 2174 The controller battery has been removed: Battery 0 Controller 1
SCSI Physical DiskMessage Format: Physical Disk X:Y Controller A, Connector B
Example: 2049 Physical disk removed: Physical Disk 0:14 Controller 1, Connector 0
SAS Physical DiskMessage Format: Physical Disk X:Y:Z Controller A, Connector B
Example: 2049 Physical disk removed: Physical Disk 0:0:14 Controller 1, Connector 0
58Storage Management Message Reference
Table 4-2. Message Format with Variables for Each Storage Object (continued)
Storage Object Message Variables
A, B, C and X, Y, Z in the following examples are variables representing the storage object
name or number.
Virtual DiskMessage Format: Virtual Disk X (Name) Controller A (Name)
Message Format: Virtual Disk X Controller A
Example: 2057 Virtual disk degraded: Virtual Disk 11 (Virtual Disk 11) Controller 1
(PERC 5/E Adapter)
NOTE: The virtual disk and controller names are not always displayed.
Enclosure:Message Format: Enclosure X:Y Controller A, Connector B
SCSI Power SupplyMessage Format: Power Supply X Controller A, Connector B, Target ID C
where "C" is the SCSI ID number of the enclosure management module (EMM)
managing the power supply.
Example: 2122 Redundancy degraded: Power Supply 1, Controller 1, Connector 0, Target
ID 6
SAS Power SupplyMessage Format: Power Supply X Controller A, Connector B, Enclosure C
Example: 2312 A power supply in the enclosure has an AC failure.: Power Supply 1,
Controller 1, Connector 0, Enclosure 2
SCSI Temperature
Probe
SAS Temperature
Probe
SCSI FanMessage Format: Fan X Controller A, Connector B, Target ID C
SAS FanMessage Format: Fan X Controller A, Connector B, Enclosure C
SCSI EMMMessage Format: EMM X Controller A, Connector B, Target ID C
Message Format: Temperature Probe X Controller A, Connector B, Target ID C
where "C" is the SCSI ID number of the EMM managing the temperature probe.
Example: 2101 Temperature dropped below the minimum warning threshold:
Temperature Probe 1, Controller 1, Connector 0, Target ID 6
Message Format: Temperature Probe X Controller A, Connector B, Enclosure C
Example: 2101 Temperature dropped below the minimum warning threshold:
Temperature Probe 1, Controller 1, Connector 0, Enclosure 2
where "C" is the SCSI ID number of the EMM managing the fan.
Example: 2121 Device returned to normal: Fan 1, Controller 1, Connector 0, Target ID 6
Example: 2121 Device returned to normal: Fan 1, Controller 1, Connector 0, Enclosure 2
where "C" is the SCSI ID number of the EMM.
Example: 2121 Device returned to normal: EMM 1, Controller 1, Connector 0, Target
ID 6
Storage Management Message Reference59
Table 4-2. Message Format with Variables for Each Storage Object (continued)
Storage Object Message Variables
A, B, C and X, Y, Z in the following examples are variables representing the storage object
name or number.
SAS EMMMessage Format: EMM X Controller A, Connector B, Enclosure C
Example: 2121 Device returned to normal: EMM 1, Controller 1, Connector 0,
Enclosure 2
Alert Message Change History
The following table describes changes made to the Storage Management alerts from the previous release
of Storage Management to the current release.
Table 4-3. Alert Message Change History
Alert Message Change History
Storage Management 2.2Comments
Product Versions to
which Changes
Apply
Reduction of
unnecessary alert
generation
Modified Alerts2095Severity changed to Informational. SNMP trap
Storage Management 2.2
Server Administrator 3.2
Dell OpenManage™ 5.2
Enhancements to Storage Management
avoid numerous redundant or
inappropriate alerts posted to the Alert
Log after an unexpected system
shutdown.
2153Severity changed to Informational. SNMP trap
2188Severity changed to Informational. SNMP trap
2192Changed documentation for cause and
2202Severity changed to Informational. SNMP trap
2204Severity changed to Informational. SNMP trap
In previous versions of Storage Management,
an unexpected system shutdown may have
caused the controller to repost a large number
of alerts to the Alert Log when restarting the
system.
changed to 901.
changed to 851.
changed to 1151.
corrective action.
changed to 901.
changed to 901.
60Storage Management Message Reference
Table 4-3. Alert Message Change History
Alert Message Change History
2205Severity changed to Informational. SNMP trap
2266SNMP traps changed to 751, 801, 851, 901,
2272Severity changed to Critical. SNMP trap
2273Changed alert message text and
2279 Changed alert message text.
2299Changed corrective action information in the
2305Changed severity to Warning. Changed SNMP
2331Changed severity to Informational. Changed
2367Changed severity to Warning. Changed SNMP
Obsolete Alerts2333
23542354 replaced by 2368.
2355
2365
2370
Documentation
Changes
Severity for alert 2163 changed from
Ok/Normal to Critical/Failure/Error.
Severity for alert 2318 changed from
Critical/Failure/Error to Warning/Noncritical.
changed to 901.
951, 1001, 1051, 1101, 1151, 1201.
changed to 904. Changed corrective action
information in the documentation.
documentation for cause and corrective action.
documentation.
trap number to 903.
SNMP trap number to 901.
trap number to 903.
Documentation change only made in the Dell
OpenManage Server Administrator Messages
Reference Guide to reflect the severity
displayed in the Server Administrator Alert Log
and documented in the Storage Management
online help.
Documentation change only made in the Dell
OpenManage Server Administrator Messages
Reference Guide to reflect the severity
displayed in the Server Administrator Alert Log
and documented in the Storage Management
online help.
OpenManage Server Administrator Messages
Reference Guide to reflect existing Storage
Management online help.
Documentation change only made in the Dell
OpenManage Server Administrator Messages
Reference Guide to reflect existing Storage
Management online help.
The alert numbers for the new alerts
2062–2260 were previously unassigned.
Alert numbers 2370 and 2371 are new.
NOTE: Alerts 2062 and 2260 were previously
undocumented in the Storage Management
online help, Dell OpenManage Server
Administrator Storage Management User’s
Guide, and the Dell OpenManage Server
Administrator Messages Reference Guide.
The term “array disk” has been changed to
“physical disk” throughout Storage
Management. This change affects the message
text of the modified alerts.
2160 replaced by 2195.
2161 replaced by 2196.
62Storage Management Message Reference
Table 4-3. Alert Message Change History
Alert Message Change History
Documentation
Changes
Documentation updated to indicate clear
alert status.
Reference to SNMP trap variables
removed.
Corresponding Array Manager event
numbers removed (see comments).
Starting with Dell OpenManage 5.0, Array
Manager is no longer an installable option. If
you have an Array Manager installation and
wish to see how the Array Manager events
correspond to the Storage Management alerts,
refer to the product documentation prior
to Storage Management 2.1 or Dell
OpenManage 5.1.
Alert Descriptions and Corrective Actions
The following sections describe alerts generated by the RAID or SCSI controllers supported by Storage
Management. The alerts are displayed in the Server Administrator Alert subtab or through Windows
Event Viewer. These alerts can also be forwarded as SNMP traps to other applications.
SNMP traps are generated for the alerts listed in the following sections. These traps are included in the
Dell OpenManage Server Administrator Storage Management management information base (MIB).
The SNMP traps for these alerts use all of the SNMP trap variables. For more information on SNMP
support and the MIB, see the SNMP Reference Guide.
To locate an alert, scroll through the following table to find the alert number displayed on the Server
Administrator Alert tab or search this file for the alert message text or number. See
Event Messages" for more information on severity levels.
For more information regarding alert descriptions and the appropriate corrective actions, see the online
help.
"Understanding
Table 4-4. Storage Management Messages
Event IDDescriptionSeverityCause and ActionClear
Event
Number
2048Device failedCritical /
Failure /
Error
Cause: A storage component such as a
physical disk or an enclosure has failed. The
failed component may have been identified
by the controller while performing a task such
as a rescan or a check consistency.
Action: Replace the failed component. You
can identify which disk has failed by locating
the disk that has a red “X” for its status.
Perform a rescan after replacing the disk.
Ok / Normal Cause: This alert is for informational purposes.
Cause: A physical disk has been removed
from the disk group. This alert can also be
caused by loose or defective cables or by
problems with the enclosure.
Action: If a physical disk was removed from
the disk group, either replace the disk or
restore the original disk. On some controllers,
a removed disk has a red "X" for its status. On
other controllers, a removed disk may have an
Offline status or is not displayed on the
user interface. Perform a rescan after
replacing or restoring the disk. If a disk has
not been removed from the disk group, then
check for problems with the cables.
online help
the cables. Make sure that the enclosure is
powered on. If the problem persists, check
the enclosure documentation for further
diagnostic information.
Cause: A physical disk in the disk group is
offline. A user may have manually put the
physical disk offline.
Action: Perform a rescan. You can also select
the offline disk and perform a Make Online
operation.
Cause: A physical disk has reported an error
condition and may be degraded. The physical
disk may have reported the error condition in
response to a consistency check or other
operation.
Action: Replace the degraded physical disk.
You can identify which disk is degraded by
locating the disk that has a red "X" for its
status. Perform a rescan after replacing the
disk.
2053Virtual disk createdOk / Normal Cause: This alert is for informational purposes.
Action: None
2054Virtual disk deletedWarning /
Non-critical
2055Virtual disk
configuration
changed
2056Virtual disk failedCritical /
Ok / Normal Cause: This alert is for informational purposes.
Failure /
Error
Cause: A virtual disk has been deleted.
"Performing a Reset Configuration" may
detect that a virtual disk has been deleted
and generate this alert.
Action: None
Action: None
Cause: One or more physical disks included
in the virtual disk have failed. If the virtual
disk is non-redundant (does not use mirrored
or parity data), then the failure of a single
physical disk can cause the virtual disk to fail.
If the virtual disk is redundant, then more
physical disks have failed than can be rebuilt
using mirrored or parity information.
Ok / Normal Cause: This alert is for informational purposes.
Ok / Normal Cause: This alert is for informational purposes.
Ok / Normal Cause: This alert is for informational purposes.
Ok / Normal Cause: This alert is for informational purposes.
Ok / Normal Cause: This alert is for informational purposes.
Cause 1: This alert message occurs when a
physical disk included in a redundant virtual
disk fails. Because the virtual disk is redundant
(uses mirrored or parity information) and
only one physical disk has failed, the virtual
disk can be rebuilt.
Action 1: Configure a hot spare for the virtual
disk if one is not already configured. Rebuild
the virtual disk. When using an Expandable
RAID Controller (PERC) PERC 3/SC,
3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4e/DC,
4/Di, CERC ATA100/4ch, PERC 5/E, PERC
5/i or a Serial Attache SCSI (SAS) 5/iR
controller, rebuild the virtual disk by first
configuring a hot spare for the disk, and then
initiating a write operation to the disk. The
write operation will initiate a rebuild of the
disk.
Cause 2: A physical disk in the disk group has
been removed.
Action 2: If a physical disk was removed from
the disk group, either replace the disk or
restore the original disk. You can identify
which disk has been removed by locating the
disk that has a red “X” for its status. Perform
a rescan after replacing the disk.
Ok / Normal Cause: This alert is for informational purposes.
Action: None
Ok / Normal Cause: This alert is for informational purposes.
Action: None
Ok / Normal Cause: The check consistency operation
cancelled because a physical disk in the array
has failed or because a user cancelled the
check consistency operation.
Action: If the physical disk failed, then replace
the physical disk. You can identify which disk
failed by locating the disk that has a red “X”
for its status. Perform a rescan after replacing
the disk. When performing a consistency
check, be aware that the consistency check
can take a long time. The time it takes
depends on the size of the physical disk or
the virtual disk.
Ok / Normal Cause: The virtual disk initialization cancelled
because a physical disk included in the virtual
disk has failed or because a user cancelled the
virtual disk initialization.
Action: If a physical disk failed, then replace
the physical disk. You can identify which disk
has failed by locating the disk that has a
red “X” for its status. Perform a rescan after
replacing the disk. Restart the format
physical disk operation. Restart the virtual
disk initialization.
Ok / Normal Cause: A user has cancelled the rebuild
Cause: A physical disk included in the virtual
disk failed or there is an error in the parity
information. A failed physical disk can cause
errors in parity information.
Action: Replace the failed physical disk. You
can identify which disk has failed by locating
the disk that has a red “X” for its status.
Rebuild the physical disk. When finished,
restart the check consistency operation.
Cause: A physical disk included in the virtual
disk failed.
Action: Replace the failed physical disk. You
can identify which physical disk has failed by
locating the disk that has a red "X" for its
status. Rebuild the physical disk. When
finished, restart the virtual disk format
operation.
Cause: A physical disk included in the virtual
disk has failed or a user has cancelled the
initialization.
Action: If a physical disk has failed, then
replace the physical disk.
Cause: The physical disk has failed or is
corrupt.
Action: Replace the failed or corrupt disk.
You can identify a disk that has failed by
locating the disk that has a red “X” for its
status. Restart the initialization.
Cause: A physical disk included in the virtual
disk has failed or is corrupt. A user may also
have cancelled the reconfiguration.
Action: Replace the failed or corrupt disk.
You can identify a disk that has failed by
locating the disk that has a red “X” for its
status.
If the physical disk is part of a redundant
array, then rebuild the physical disk. When
finished, restart the reconfiguration.
Cause: A physical disk included in the virtual
disk has failed or is corrupt. A user may also
have cancelled the rebuild.
None1204
Action: Replace the failed or corrupt disk.
You can identify a disk that has failed by
locating the disk that has a red “X” for its
status. Restart the virtual disk rebuild.
2083Physical disk rebuild
failed
Critical /
Failure /
Error
Cause: A physical disk included in the virtual
disk has failed or is corrupt. A user may also
have cancelled the rebuild.
None904
Action: Replace the failed or corrupt disk.
You can identify a disk that has failed by
locating the disk that has a red “X” for its
status. Rebuild the virtual disk rebuild.
2085Virtual disk check
consistency
completed
2086Virtual disk format
completed
2088Virtual disk
initialization
completed
2089Physical disk
initialize completed
2090Virtual disk
reconfiguration
completed
2091Virtual disk rebuild
completed
2092Physical disk rebuild
completed
Ok / Normal Cause: This alert is for informational purposes.
Action: None
Ok / Normal Cause: This alert is for informational purposes.
Action: None
Ok / Normal Cause: This alert is for informational purposes.
Action: None
Ok / Normal Cause: This alert is for informational purposes.
Action: None
Ok / Normal Cause: This alert is for informational purposes.
Action: None
Ok / Normal Cause: This alert is for informational purposes.
Action: None
Ok / Normal Cause: This alert is for informational purposes.
Cause: The physical disk is predicted to fail.
Many physical disks contain Self Monitoring
Analysis and Reporting Technology (SMART).
When enabled, SMART monitors the health
of the disk based on indications such as the
number of write operations that have been
performed on the disk.
Action: Replace the physical disk. Even
though the disk may not have failed yet, it is
strongly recommended that you replace the
disk.
If this disk is part of a redundant virtual disk,
perform the Offline task on the disk; replace
the disk; and then assign a hot spare and the
rebuild will start automatically.
If this disk is a hot spare, then unassign the
hot spare; perform the Prepare to Remove
task on the disk; replace the disk; and assign
the new disk as a hot spare.
NOTICE: If this disk is part of a
nonredundant disk, back up your data
immediately. If the disk fails, you will not
be able to recover the data.
A physical disk has experienced a
None903
None901
temporary error.
Action: None.
2098Global hot spare
assigned
2099Global hot spare
unassigned
Ok / Normal Cause: A user has assigned a physical disk as a
global hot spare. This alert is for
informational purposes.
Action: None
Ok / Normal Cause: A user has unassigned a physical disk
as a global hot spare. This alert is for
informational purposes.
Cause: The physical disk enclosure is too hot.
A variety of factors can cause the excessive
temperature. For example, a fan may have
failed, the thermostat may be set too high,
or the room temperature may be too hot.
Action: Check for factors that may cause
overheating. For example, verify that the
enclosure fan is working. You should also
check the thermostat settings and examine
whether the enclosure is located near a heat
source. Make sure the enclosure has enough
ventilation and that the room temperature is
not too hot. See the physical disk enclosure
documentation for more diagnostic
information.
Cause: The physical disk enclosure is too
cool.
Action: Check if the thermostat setting is too
low and if the room temperature is too cool.
Cause: The physical disk enclosure is too hot.
A variety of factors can cause the excessive
temperature. For example, a fan may have
failed, the thermostat may be set too high, or
the room temperature may be too hot.
Action: Check for factors that may cause
overheating. For example, verify that the
enclosure fan is working. You should also
check the thermostat settings and examine
whether the enclosure is located near a heat
source. Make sure the enclosure has enough
ventilation and that the room temperature is
not too hot. See the physical disk enclosure
documentation for more diagnostic
information.
Cause: The physical disk enclosure is too
cool.
Action: Check if the thermostat setting is too
low and if the room temperature is too cool.
Ok / Normal Cause: This alert is for informational purposes.
Action: None
Ok / Normal Cause: This alert is for informational purposes.
Action: None
Cause: A disk on the specified controller has
Non-critical
Critical /
Failure /
Error
received a SMART alert (predictive failure)
indicating that the disk is likely to fail in the
near future.
Action: Replace the disk that has received the
SMART alert. If the physical disk is a
member of a non-redundant virtual disk,
then back up the data before replacing the
disk.
NOTICE: Removing a physical disk that is
included in a non-redundant virtual disk
will cause the virtual disk to fail and may
cause data loss.
Cause: A disk has received a SMART alert
(predictive failure) after a configuration
change. The disk is likely to fail in the near
future.
Action: Replace the disk that has received the
SMART alert. If the physical disk is a
member of a non-redundant virtual disk,
then back up the data before replacing the
disk.
21051151
Clear
event
None903
None904
SNMP
Tra p
Numbers
1151
72Storage Management Message Reference
NOTICE: Removing a physical disk that is
included in a non-redundant virtual disk
will cause the virtual disk to fail and may
cause data loss.
Cause: A disk has received a SMART alert
(predictive failure). The disk is likely to fail in
the near future.
Action: Replace the disk that has received the
SMART alert. If the physical disk is a
member of a non-redundant virtual disk, then
back up the data before replacing the disk.
NOTICE: Removing a physical disk that is
included in a non-redundant virtual disk
will cause the virtual disk to fail and may
cause data loss.
Cause: A disk has reached an unacceptable
temperature and received a SMART alert
(predictive failure). The disk is likely to fail in
the near future.
Action 1: Determine why the physical disk
has reached an unacceptable temperature.
A variety of factors can cause the excessive
temperature. For example, a fan may have
failed, the thermostat may be set too high, or
the room temperature may be too hot or cold.
Verify that the fans in the server or enclosure
are working. If the physical disk is in an
enclosure, you should check the thermostat
settings and examine whether the enclosure
is located near a heat source. Make sure the
enclosure has enough ventilation and that
the room temperature is not too hot. See the
physical disk enclosure documentation for
more diagnostic information.
Action 2: If you cannot identify why the disk
has reached an unacceptable temperature,
then replace the disk. If the physical disk is a
member of a non-redundant virtual disk,
then back up the data before replacing the
disk.
None903
SNMP
Tra p
Numbers
74Storage Management Message Reference
NOTICE: Removing a physical disk that is
included in a non-redundant virtual disk
will cause the virtual disk to fail and may
cause data loss.
Ok / Normal Cause: The check consistency operation on a
Cause: A disk is degraded and has received a
SMART alert (predictive failure). The disk is
likely to fail in the near future.
Action: Replace the disk that has received the
SMART alert. If the physical disk is a
member of a non-redundant virtual disk,
then back up the data before replacing the
disk.
NOTICE: Removing a physical disk that is
included in a non-redundant virtual disk
will cause the virtual disk to fail and may
cause data loss.
Cause: A disk has received a SMART alert
(predictive failure) due to test conditions.
Action: None
Cause: The physical disk enclosure is either
hotter or cooler than the maximum or
minimum allowable temperature range.
Action: Check for factors that may cause
overheating or excessive cooling. For example,
verify that the enclosure fan is working. You
should also check the thermostat settings and
examine whether the enclosure is located
near a heat source. Make sure the enclosure
has enough ventilation and that the room
temperature is not too hot or too cold. See
the enclosure documentation for more
diagnostic information.
virtual disk was paused by a user.
Action: To resume the check consistency
operation, right-click the virtual disk in the
tree view and select Resume Check Consistency.
2118Change write policy Ok / Normal Cause: This alert is for informational purposes.
2120Enclosure firmware
mismatch
Ok / Normal Cause: This alert is for informational purposes.
The check consistency operation on a virtual
disk has resumed processing after being
paused by a user.
Action: None
Ok / Normal Cause: This alert is for informational purposes.
A user has caused a mirrored virtual disk to be
split. When a virtual disk is mirrored, its data
is copied to another virtual disk in order to
maintain redundancy. After being split, both
virtual disks retain a copy of the data,
although because the mirror is no longer
intact, updates to the data are no longer copied
to the mirror.
Action: None
Ok / Normal Cause: This alert is for informational purposes.
A user has caused a mirrored virtual disk to be
unmirrored. When a virtual disk is mirrored,
its data is copied to another virtual disk in
order to maintain redundancy. After being
unmirrored, the disk formerly used as the
mirror returns to being a physical disk and
becomes available for inclusion in another
virtual disk.
Action: None
A user has changed the write policy for a
virtual disk.
Action: None
Warning /
Non-critical
Cause: The firmware on the EMM is not the
same version. It is required that both modules
have the same version of the firmware. This
alert may be caused when a user attempts to
insert an EMM module that has a different
firmware version than an existing module.
Action: Download the same version of the
firmware to both EMM modules.
Ok / Normal Cause: This alert is for informational purposes.
A device that was previously in an error state
has returned to a normal state.
For example, if an enclosure became too hot
and subsequently cooled down, then you may
receive this alert.
Action: None
Warning /
Non-critical
Cause: One or more of the enclosure
components has failed.
For example, a fan or power supply may have
failed. Although the enclosure is currently
operational, the failure of additional
components could cause the enclosure to fail.
Action: Identify and replace the failed
component. To identify the failed component,
select the enclosure in the tree view and click
the Health subtab. Any failed component will
be identified with a red "X" on the enclosure’s
Health subtab. Alternatively, you can select
the Storage object and click the Health
subtab. The controller status displayed on the
Health subtab indicates whether a controller
has a failed or degraded component.
See the enclosure documentation for
information on replacing enclosure
components and for other diagnostic
information.
2124Redundancy normal Ok / Normal Cause: This alert is for informational purposes.
Cause: A virtual disk or an enclosure has lost
data redundancy. In the case of a virtual disk,
one or more physical disks included in the
virtual disk have failed. Due to the failed
physical disk or disks, the virtual disk is no
longer maintaining redundant (mirrored or
parity) data. The failure of an additional
physical disk will result in lost data. In the
case of an enclosure, more than one enclosure
component has failed. For example, the
enclosure may have suffered the loss of all
fans or all power supplies.
Action: Identify and replace the failed
components. To identify the failed component,
select the Storage object and click the Health
subtab. The controller status displayed on the
Health subtab indicates whether a controller
has a failed or degraded component. Click
the controller that displays a Warning or
Failed status. This action displays the controller
Health subtab which displays the status of
the individual controller components.
Continue clicking the components with a
Warning or Health status until you identify
the failed component.
See the online help for more information. See
the enclosure documentation for information
on replacing enclosure components and for
other diagnostic information.
Data redundancy has been restored to a
virtual disk or an enclosure that previously
suffered a loss of redundancy.
Cause: A sector of the physical disk is
corrupted and data cannot be maintained on
this portion of the disk. This alert is for
informational purposes.
NOTICE: Any data residing on the
corrupt portion of the disk may be lost
and you may need to restore your data
from backup.
Action: If the physical disk is part of a
nonredundant virtual disk, then back up the
data and replace the physical disk.
None903
NOTICE: Removing a physical disk that is
included in a nonredundant virtual disk
will cause the virtual disk to fail and may
cause data loss.
If the disk is part of a redundant virtual disk,
then any data residing on the corrupt portion
of the disk will be reallocated elsewhere in the
virtual disk.
2127Background
initialization (BGI)
started
2128BGI cancelledOk / Normal Cause: BGI of a virtual disk has been
2129BGI failedCritical /
2130BGI completedOk / Normal Cause: BGI of a virtual disk has completed.
Ok / Normal Cause: BGI of a virtual disk has started. This
alert is for informational purposes.
Action: None
cancelled. A user or the firmware may have
stopped BGI.
Action: None
Cause: BGI of a virtual disk has failed.
Failure /
Error
version and the nonRAID SCSI driver
version are older
than the minimum
required levels. See
readme.txt for a list
of validated kernel
and driver versions.
2168The non-RAID SCSI
driver version is older
than the minimum
required level. See
readme.txt for the
validated driver
version.
2169The controller
battery needs to be
replaced.
2170The controller
battery charge level
is normal.
Warning /
Non-critical
Warning /
Non-critical
Critical /
Failure /
Error
Ok / Normal Cause: This alert is for informational purposes.
Management has
lost communication
with the controller.
An immediate
reboot is strongly
recommended to
avoid further
problems. If the
reboot does not
restore
communication,
then contact
technical support for
more information.
2269The physical disk
Clear operation has
completed.
2270The physical disk
Clear operation
failed.
2271The Patrol Read
corrected a media
error.
Critical /
Failure /
Error
Ok / Normal Cause: This alert is for informational purposes.
Critical /
Failure /
Error
Ok / Normal Cause: This alert is for informational purposes.