INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY
ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PRO PERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN
INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEV ER, AND INTEL DISCLAIMS
ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES
RELATING T O FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PA TENT, COPYRIGHT OR OTHER
INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “un defined.” Intel reserves these for
future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
®
The Intel
product to deviate from published specifications. Current characterized errata are available on request.
This Software Technical Product Specification as well as the software described in it is furnished under license and may only be used or copied in
accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to change without notice,
and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or
inaccuracies that may appear in this document or any software that may be provided in associat ion with this document.
Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any
means without the express written consent of Intel Corporation.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an ordering number and are referenced in this document, or other Intel literature may be obtained by calling
1-800-548-4725 or by visiting Intel's website at http://www.intel.com.
AnyPoint, AppChoice, BoardWatch, BunnyPeople, CablePort, Celeron, Chips, CT Media, Dialogic, DM3, EtherExpress, ETOX, FlashFile, i386, i486,
i960, iCOMP, InstantIP , I ntel, Inte l Centrino, I ntel logo, Intel386, I ntel486, I ntel740, Int elDX2, Inte lDX4, IntelSX2, Intel Creat e & Share, Intel GigaBla de,
Intel InBusiness, Intel Inside, Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel Play, Intel Play logo, Intel SingleDriver, Intel
SpeedStep, Intel StrataFlash, Intel TeamStation, Intel Xeon, Intel XScale, IPLink, Itanium, MCS, MMX, MMX logo, Optimizer logo, OverDrive,
Paragon, PC Dads, PC Parents, PDCharm, Pentium, Pentium II Xeon, Pe ntium III Xeon, Pe rformance at Your Command, RemoteExpress, SmartDie,
Solutions960, Sound Mark, StorageExpress, The Computer Inside., The Journey Inside, TokenExpress, VoiceBrick, VTune, and Xircom are
trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
The Intel® NetStructureTM MPCMM0001 Chassis Management Module is a 4U, single-slot CMM
intended for use with AdvancedTCA* PICMG* 3.0 platforms. This document details the software
features and specifications of the CMM. For information on hardware features for the CMM refer
to the Intel
specifications and other material can be found in Appendix B , “Data Sheet Reference.”
The CMM plugs into a dedicated slot in compatible systems. It provides centralized management
and alarming for up to 16 node and/or fabric slots as well as for system power supplies, fans and
power entry modules. The CMM may be paired with a backup for redundant use in highavailability applications.
The CMM is a special purpose single board computer (SBC) with its own CPU, memory, PCI bus,
operating system, and peripherals. The CMM monitors and configures IPMI-based components in
the chassis. When thresholds (such as temperature and voltage) are crossed or a failure occurs, the
CMM captures these events, stores them in an event log, sends SNMP traps, and drives the Telco
alarm relays and alarm LEDs. The CMM can query FRU information (such as serial number,
model number, manufacture date, etc.), detect presence of components (such as fan tray, CPU
board, etc.), perform health monitoring of each component, control the power-up sequencing of
each device, and control power to each slot via IPMI.
®
NetStructure™ MPCMM0001 Hardware Technical Product Specification. Links to
Assumptions: This document assumes some basic Linux* knowledge and the ability to use Linux
text editors such as vi.
1.2Terms Used in this Document
Table 1. Glossary (Sheet 1 of 2)
AcronymDescription
BISTBuilt-In Self Test
CDMChassis Data Module
CLICommand Line Interface
CMMChassis Management Module
DHCPDynamic Host Configuration Protocol
FFSFlash File System
FISFlash Image System
FPGAField-Programmable Gate Arrays
FRUField Replaceable Unit
HSHot Swap
IPMIIntelligent Platform Management
IPMBIntelligent Platform Management Bus
2.1Red Hat* Embedded Debug and Bootstrap (Redboot)
Upon initial power on, the CMM enters into the Redboot firmware to bootstrap the embedded
environment. Upon execution, Redboot acts as a TFTP server and checks for a TFTP connection to
a client. If a TFTP connection exists, Redboot will accept a firmware update that is pushed down
from the client, check the firmware update for data integrity, and then write the update to the flash.
Note: Firmware updates using the Redboot TFTP method are supported for backwards compatibility.
However, updating from within the OS using the CLI is the preferred method of updating CMM
firmware. For information on the firmware update process refe r to Section 23, “Updating CMM
Software” on page 204.
Under normal circumstances, Redboot runs through the standard diagnostics, memory setup,
decompresses the OS kernel, and boots into that kernel.
2.2Operating System
The CMM runs a customized version of embedded BlueCat* Linux* 4.0 on an Intel® 80321
processor with Intel
the web at http://www.lynuxworks.com.
®
XScale® technology. Development support for BlueCat Linux is available on
2.3Command Line Interface (CLI)
The Command Line Interface (CLI) connects to and communicates with the intelligent
management devices of the chassis, boards, and the CMM itself. The CLI is an IPMI-based library
of commands that can be accessed directly or through a higher-level management application.
Administrators can access the CLI through Telnet, SSH, or the CMM’s serial port. Using the CLI,
users can access information about the current state of the system including current sensor values,
threshold settings, recent events, and overall chassis health, access and modify shelf and CMM
configurations, set fan speeds, perform actions on a FRU, etc. The CLI is covered in Section 8,
“The Command Line Interface (CLI)” on page 71.
2.4SNMP/UDP
The chassis management module supports both queries and traps on SNMP (Simple Network
Management Protocol) v1 or v3. The SNMP version can be configured through the CLI interface.
The default is for SNMP v1. A MIB for the entire platform is included with the CMM. The CMM
can send out SNMP traps to up to five trap receivers.
Along with SNMP traps, the CMM sends UDP (User Datagram Protocol) alerts to port 10000. The
content of these UDP alerts is the same as the SNMP traps. SNMP is covered in Section 17,
In addition to the console command-line interface, the CMM can be administered by custom
remote applications via remote procedure calls (RPC). RPC is covered in Section 19, “Remote
Procedure Calls (RPC)” on page 174.
2.6RMCP
RMCP (Remote Management Control Protocol) is a protocol that defines a method to send IPMI
packets over LAN. The RMCP server on the CMM can decode RMCP packages and forward the
IPMI messages to the appropriate channels including: SBC blades, PEMs, and FanTrays or local
destination within the CMM. When there is a responding IPMI message coming from SBC blades,
PEMs, or FanTrays destined to RMCP client, the RMCP server will format this IPMI message into
a RMCP message and send it to through the designated LAN interface back to originator. RMCP is
covered in Section 20, “RMCP” on page 190.
2.7Ethernet Interfaces
The CMM contains two Ethernet ports. The software can configure each of these ports to either the
front panel, to the backplane, or to the rear transition module (RTM). Information on configuring
the Ethernet interfaces is covered in Section 8.3.1, “Setting IP Address Properties” on page 72.
Software Specifications
2.8Sensor Event Logs (SEL)
The AdvancedTCA CMM implements system event logs according to Section 3.5 of the PICMG
3.0 Specification. The SEL contained on the CMM is fully IPMI compliant.
2.8.1CMM SEL Architecture
The MPCMM0001 uses a single flat SEL file stored locally in the /etc/cmm directory. The SEL
maintains a list of all the sensor events in the shelf. Each of the managed devices may keep its own
SEL records in local SELs, but the master copy for the shelf is maintained by the CMM.
The SEL is limited to 65536 bytes. In order to keep the SEL from getting full, which can cause loss
of error logging, the SEL is checked every 15 minutes by the CMM, and if the size of the cmm_sel
is greater than 40000 bytes, the SEL is archived in gzip format and saved in /home/log/SEL. The
names of the saved logs will be cmm_sel.0.gz, cmm_sel.1.gz, and so on, to a maximum of 16 logs
where they are then rolled over.
Note: Archived files should NEVER be decompressed on the CMM as the resulting prolonged flash file
writing could disrupt normal CMM operation and behavior. Using FTP, transfer the files to a
different system before decompressing the archive using utilities such as gzip.
2.8.2Retrieving a SEL
To retrieve a SEL from the CMM, issues the following command:
Where location is one of {cmm, blade[1-14], fantray1, PEM[1-2]}. Even though the CMM uses a
single flat SEL for system events, the ‘cmmget’ command will filter the SEL and only return
events associated with the provided location. Also, some individual FRUs may keep their own
local SELs (i.e., blades).
2.8.3Clearing the SEL
The following command will clear the SEL on both the active and the standby:
cmmset -d clearsel -v clear
Note: Since the CMM uses a single flat SEL for system events, this command clears the entire shelf SEL,
not just a filtered subset.
2.8.4Retrieving the Raw SEL
To retrieve the SEL in its raw format from a location, issue the following command:
cmmget -l [location] -d rawsel
2.9Blade OverTemp Shutdown Script
The CMM software includes predefined script settings specifically for the MPCBL0001 board,
which will automatically shut down a board when the “baseboard temp” sensor on that board
crosses the upper critical threshold. This is done to prevent a runaway thermal event on the board
from occurring. If this functionality is needed when using boards other than the MPCBL0001, the
user will need to associate the name of the thermal sensor and the threshold with the board
shutdown script:
Please refer to Section 18, “CMM Scripting” on page 164 for more information on assocating a
script to an event.
When using the CMM with boards other than the MPCBL0001, as long as there is no sensor name
titled "baseboard temp" associated with the particular board being used, then there is no issue
leaving these settings intact. If needed, to deactivate these settings for each physical slot, use the
command:
where bladeN is the blade, corresponding to the physical slot number, on which to remove the
automatic shutdown setting (blade[1-16]). Please refer to Section 18, “CMM Scripting” on
page 164 for more information on removing script actions.
The CMM supports redundant operation with automatic failover in a chassis using redundant
CMM slots. In systems where two CMMs are present, one acts as the active shelf manager and the
other as standby. Both CMMs monitor each other, and either can trigger a failover if necessary.
Data from the active CMM is synchronized to the standby CMM whenever any changes occur.
Data on the standby CMM is overwritten. A full synchronization between active and standby
CMMs occurs on initial power up, or any insertion of a new CMM.
The active CMM is responsible for shelf FRU information management when CMMs are in
redundant mode.
3.2Synchronization
To ensure critical files on the standby CMM match the data on the active CMM, the active CMM
synchronizes its data with the standby CMM, overwriting any existing data on the standby CMM.
An exception to this is the password reset procedure, detailed in Section 9, “Resetting the
Password” on page 99. When the password reset switch is activated on the standby CMM, the
password will be synchronized to the active CMM.
The CMMs will initially fully synchronize data from the active to the standby CMM just after
booting. An insertion of a new CMM will also cause a full synchronization from the active to the
newly inserted standby. Date and time are synched every hour. Partial synchronization will also
occur any time files are modified or touched via the Linux* “touch” command with the exception
of all *.sif and *.bin files in the /etc/cmm directory.
The *.sif (ALL SIF files), and *.bin (SDR Files) files under /etc/cmm are synchronized only once
(when the CMMs establish communication). A 'touch' on those files at any later time will not
perform a sync operation. Also, any updates to these files always happen as part of the software
updates and not in isolation.
Note: During synchronization, the health event LEDs on the standby CMM may blink on and off as the
health events that were logged in the SEL are synchronized.
Below is a list of items that are synchronized between CMMs. During a full synchronization, all of
these files and data are synchronized. A change to any of these files results in that file being
synched. The active CMM overwrites these files on the standby CMM.
There are two "levels" of files that get synchronized. In order to normally manage the chassis, the
priority 1 files must be synchronized after power up or installation of a brand new CMM into the
chassis. It is absolutely necessary that a standby CMM has the priority one files synched before a
successful failover can occur. When a brand new CMM boots the first time as a standby, if a CMM
failover is forced before all priority 1 data items are synchronized to the standby CMM, the standby
CMM can still become the active CMM but may not be able to properly manage the FRUs in the
chassis.
Table 2. CMM Synchronization (Sheet 1 of 2)
File(s) or DataDescriptionPathPriority
date and timeDate and timeIPMB1
IP Address Settings
/etc/cmm.cfgCMM’s main configuration fileEthernet 1
/etc/cmm/cmm_selSystem SELEthernet 1
/etc/cmm/sensors.iniSensor Set ValuesEthernet 1
Ekey Controller StructuresEkey Controller StructuresEthernet 1
Bused EKey Token infoBused EKey Token infoEthernet 1
IPMB User StatesIPMB User StatesEthernet 1
Fan StatesFan StatesEthernet 1
Cooling StateCooling State InformationEthernet 1
User LED StatesUser LED StatesEthernet 1
SDR structures and SIPI Controller InfoSDR structures and SIPI Controller InfoEthernet 1
PHM FRU state, Power Usage and
Power Info
FIM FRU Cache (Local and Temp)FIM FRU Cache (Local and Temp)Ethernet 1
SEL TimeSEL TimeIPMB1
SEL EventsIndividual SEL EventsIPMB1
/etc/cmm/fantray.cfgFantray settings needed by cooling manager Ethernet 1
Recovery Action and escalation action for all
the monitored processes except monitor
process
Recovery action and escalation action for
monitor process
Note: The /.rhosts file is used for synchronization and should NEVER be modified.
3.3Heterogeneous Synchronization
Beginning in version 5.2 firmware, the CMM can synchronize data between differing CMM
versions. The firmware delineates synchronization from firmware versioning, thus allowing
seamless synchronization between all CMM versions. A form of internal data versioning
maintained by the CMM helps achieve this.
Note: SDR/SIF and user scripts differ slightly in synchronization architecture as described below .
Ethernet 2
Ethernet 2
3.3.1SDR/SIF Synchronization
Sensor Data Records (SDRs) and Sensor Information Files (SIFs) will be synchronized only
between CMMs having the same version for this data item (even if the CMM firmware versions
differ).
3.3.2User Scripts Synchronization and Configuration
By default, user scripts are synchronized only between CMM’s with same firmware versions. User
can control the user scripts synchronization irrespective of CMM version differences by modifying
the value of a configuration flag - "SyncUserScripts" (in the CMM configuration file, cmm.cfg
under /etc). The configuration flag can be modified using the cmmget/cmmset commands. This
flag can be read/set through any of the CMM interfaces (i.e., CLI, SNMP and RPC).
Only when CMM firmware versions differ will the value of this flag determines if user scripts
should be synchronized or not. Between same firmware versions, the user scripts directory will
continue to be synchronized and this flag ignored.
3.3.2.1Setting User Scripts Sync Configuration Flag
T o set the value of the Scripts Synchronization configuration flag, the following CMM command is
used:
upgrade: Synchronizes user scripts only when the other CMM has a newer firmware version.
downgrade: Synchronizes user scripts only when the other CMM has an older firmware version.
always: Synchronizes user scripts irrespective of version differences.
3.3.2.2Retrieving User Scripts Sync Configuration Flag
To retrieve the value of the Scripts Synchronization configuration flag, the following CMM
command is used:
cmmget -l cmm -d syncuserscripts
The value returned will be one of: Equal, Upgrade, Downgrade, Always, or Error on failure.
3.3.3Synchronization Requirements
For synchronization to occur:
• The CMMs must be able to communicate with each other over their dedicated IPMB. The
CMMs use a heartbeat via their dedicated IPMB to determine if they can communicate with
each other over IPMB.
• An Ethernet connection must exist between the two CMMs. The CMMs must be able to ping
each other via Ethernet for synchronization to be successful. This can be a connection through
the Ethernet switches in the chassis, which requires both switches to be present in the chassis;
a connection can occur through an external Ethernet switch connected to the front ports of the
CMM pair, or alternatively, the connection can be a crossover cable connecting the two front
ports of the CMM pair. If synchronization fails on eth1, then it will be attempted on eth0. If the
CMMs cannot successfully ping each other via eth0 or eth1, then synchronization between the
CMMs cannot occur.
A failure of any priority 1 synchronization will result in a health event being logged in the CMM
SEL and will inhibit a failover from occurring.
3.4Initial Data Synchronization
It is absolutely necessary that a standby CMM has the priority one files synched before a successful
failover can occur. A standby CMM can still become active if all priority one synchronization has
not been completed, but it may not be able to properly manage all the FRU’s in the chassis.
The CMM implements the “Datasync Status” sensor to determine the state of synchronization and
if synchronization has completed. successfully.
3.4.1Initial Data Sync Failure
If CMM encounters any failure during data synchronization it marks the data synchronization
failure and logs a SEL event and sends an SNMP trap. Duplicate failures are not reported multiple
times. As soon as CMM is out of failure condition it will reset data synchronization failure state.
The CMM will continue trying to synchronize as long as there are two CMMs present in the
chassis and they are able to communicate via their cross-connected IPMB.
A sensor named “Datasync Status” exists in order to make the Datasync state information available
to the user. This sensor tracks the status of the Datasync module and will make its status available
through the various CMM interfaces. This sensor is used to query the data synchronization states,
and log SEL events for initial synchronization complete event. It is a discrete OEM sensor with
status bits representing the state of different parts of the Datasync module.
Note: The Datasync Status sensor can only be queried through the active CMM.
3.5.1Sensor bitmap
When the Datasync starts the first time through in a dual CMM system and whenever the CMM
changes between Active and Standby, the status bits are all cleared to 0x0000.
• Bit 0 (Running) is set when the datasync module is active.
• Bit 1 (P1Done) is set when the priority 1 data syncs are done, and cleared when priority 1 data
needs to be synced.
• Bit 2 (P2Done) is set when the priority 2 data syncs are done, and cleared when a priority 2
data needs to be synced.
• Bit 3 (InitSyncDone) is set when both priority 1 and priority 2 data syncs are done, and stays
set (latches) until the CMM changes between Active and Standby, or looses contact with the
partner CMM.
• Bit 4 (SyncError) is set if an error was detected, and cleared when no data items have errors.
Redundancy, Synchronization, and Failover
3.5.2Event IDs
The “Datasync Status” sensor will use event ids 0x420 to 0x42f. The following new event ids are
used to log various events for these requirements. These event ID’s can be used to associated
scripts with the respective events.
EventEvent ID
Initial Data Synchronization complete0x420 (1056)
3.5.3Querying the Datasync Status
The status of the data synch sensor can be queried using the following CLI command:
• When initial data synchronization is complete, the following SNMP trap is generated:
[Month] [Date] [Time] [hostname] snmptrapd[xxxxx]: [IP Address]:
Enterprise Specific Trap (25) Uptime: [Time], SNMPv2SMI::enterprises.343.2.14.1.5 = STRING: "Time : [Day] [Month] [Date] [Time] [Year], Location : [location] , Chassis Serial # : [xxxxxxxx],
Board : CMM[x] , Sensor : CMM[x]:Datasync Status , Event : Initial Data
Synchronization is complete. Asserted "
3.5.6System Health
The “Datasync Status” sensor will not contribute to the system health. However sync failures are
captured by the “File Sync Failure” sensor and it contributes to the system health
3.6CMM Failover
Once information is synchronized between the redundant CMMs, the active CMM will constantly
monitor its own health as well as the health of the standby CMM. In the event of one of the
scenarios listed in the sections that follow, the active CMM will automatically failover to the
standby CMM so that no management functionality is lost at any time.
3.6.1Scenarios That Prevent Failover
The following are reasons a failover can NOT occur:
• The active CMM can NOT communicate with the standby CMM via their IPMB bus.
• Not all priority 1 data has been completely synchronized between the CMMs.
To determine the active CMM at anytime, use the CLI command:
cmmget -l cmm –d redundancy
This command will output a list stating if both CMMs are present, which one is the active CMM,
and which CMM you are logged in to. CMM1 is the CMM on the left when looking from the front
of the chassis, and CMM2 is on the right.
3.6.2Scenarios That Failover to a Healthier Standby CMM
The scenarios listed below can only cause a failover if the standby CMM is in a healthier state than
the active CMM. The health of the CMM is determined by computing a CMM health score, which
is equal to the sum of the weights of the following active conditions. A CMM health score is
determined for each CMM whenever any of these conditions occur on the active CMM. The CMM
health score is composed of the sum of the weights of any of the three conditions listed below. Each
condition has a default weight of 1 assigned to it, causing all conditions to have equal importance
in causing failover.
To determine if a failover is necessary when one of these conditions occurs, the active CMM
computes its CMM health score, and requests the health score of the standby CMM. If the score of
the standby CMM is LESS than the score of the active CMM, a failover will occur. If a failover
does not occur, the CMM SEL will contain an entry indicating the reason failover did not occur.
The active CMM will failover to the standby CMM if the active CMM cannot ping its first
SNMP trap address (SNMPTrapAddress1) over any of the available Ethernet ports, but the
standby CMM can. The trap address is set using the command:
cmmset –l cmm –d snmptrapaddress1 –v [ip address]
Only a ping failure of the first SNMP trap address (SNMPTrapAddress1) can cause a failover.
SNMPtrapaddress2 through SNMPtrapaddress5 do not perform this ping test.
Note: The frequency of the ping to the first trap address can vary from one second to approximately 20
seconds.
2. Critical events on the active CMM:
The active CMM has critical events for any of the CMM sensors (not critical chassis or blade
events) and the standby CMM does not. If both CMMs have critical CMM events, then the
number of major and minor CMM events is examined to decide if a failover should occur. The
number of major events is compared, and if they are equal, the number of minor events is used.
3.6.3Manual Failover
The following command can be issued to the active CMM to manually cause a failover to the
standby CMM:
Redundancy, Synchronization, and Failover
cmmset -l cmm -d failover -v [1/any]
Where:
1: Will failover only to a CMM with the same or newer version of firmware.
any: Will failover to any version of firmware.
A manual failover can only be initiated on the active CMM. A failover will only occur if the
standby CMM is at least as healthy as the active CMM. Once the command executes, the former
standby CMM immediately becomes the active CMM.
If the failover could not occur, the CLI will indicate the reason why the failover could not occur,
and a SEL event will be recorded.
In addition, opening the ejector latch on the active CMM will initiate a failover, but only if the
standby is at least as healthy as the active.
3.6.4Scenarios That Force a Failover
The following scenarios cause a failover as long as the standby CMM is operational, even when it
is less healthy than the active:
• The active CMM is pulled out of the chassis.
• The active CMM’s healthy signal is de-asserted.
• A “reboot” command issued to the active CMM.
• The front panel alarm quiet switch button on the active CMM is pushed for more than five
seconds. If the button continues to be pressed for more than 10 seconds, the CMM does not
reset.
The CMM Ready Event is a notification mechanism that informs the user when all CMM modules
are fully up and running. The CMM is ready to process any request after receiving this event.
The CMM uses the "CMM Status" sensor when generating the CMM Not Ready event. Please
refer to Table 46, “CMM Status Event Strings (CMM Status)” on page 118 for CMM status event
strings.
Table 3. CMM Status Event Strings (CMM Status)
Event StringEvent CodeEvent Severity
“CMM is not ready.”1024Minor
“CMM is ready.”1025OK
“CMM is Active”1026OK
“CMM is Standby”1027OK
“CMM ready timed out”1028Minor
A CMM Not Ready Assertion SEL event is generated on a CMM when it transitions from standby
mode to active mode during a failover or on the active CMM on power up. The event is only
generated on the newly active CMM. The “CMM is Ready” event is generated after all CMM
modules (board wrapper processes) are up and running and the SNMP daemon is active.
The CMM provides for a Built-In Self Test (BIST). The test is run automatically after power up.
This test detects flash corruption as well as other critical hardware failures.
Results of the BIST are displayed on the console through the serial port during boot time. Results
of BIST are also available through the CLI if the OS successfully boots. If the BIST detects a fatal
error, the CMM is not allowed to function as an active CMM.
4.1BIST Test Flow
The following state diagram shows the order of the tests RedBoot runs following a pow er-up or
front-panel reset. On every state before reaching active CMM, if there is an error , RedBoot will log
the error event into the EEPROM, route the error message to the serial port, and continue booting.
If the execution hangs before the OS loads due to the nature of the error, the CMM hangs. If the OS
successfully boots, it alerts users to any errors that occurred during boot.
The BIST has been broken down into stages consisting of groups of tests that run at certain times
throughout the boot process. The following table shows the different BIST stages and the tests
associated with each stage:
Table 4. BIST Implementation
Boot-BISTEarly-BISTMid-BISTLate-BIST
RedBoot image
checksum
FPGA image checksumFPGA version checkIPMB bus test
Base memory testDS1307 RTC test
The codes in Boot-BIST are executed at the very early stage of the RedBoot bootstrap, which is
just before the FPGA programming and memory module initialization. Boot-BIST performs
checksum checking over the RedBoot image and the FPGA image. A checksum error will be
detected if there is a mismatch between the calculated checksum and the stored checksum in FIS
directory.
Boot-BIST also performs a Base Memory T est for the fi rst 1 MByte of memo ry. Whenever there is
an error, BIST will inform the user by prompting a warning message throug h the console terminal
and log the event to event-log area.
4.3Early-BIST
The early BIST stage extends the reset timeout period on the watchdog timer (MAX6374) by
strobing GPIO7 on FPGA1. This prevents any possible hardware reset during the BIST process.
The watchdog timer is enabled after the ADM1026 GPIO initialization and disabled once it reaches
the RedBoot console. The OS enables the watchdog timer again and starts the strobing thread at the
kernel level.
Built-In Self Test (BIST)
4.4Mid-BIST
This stage of BIST performs the Extended Memory Test to scan and diagnose the possible bit
errors in the memory. It starts scanning from 1 MByte to the 128 MByte. It does not test the
memory below 1 MByte because a portion of RedBoot has already loaded and resided on it.
The memory test includes the walking ones test 32-bit address test, and 32-bit inverse address test.
Furthermore, voltage and temperature ratings will be verified to lie within the hardware tolerable
ranges. The FPGA firmware version is checked and will alert if an older version of an FPGA image
has been detected. Also, system date and time is read from the real-time clock and displayed
through the console terminal. NIC presence is also checked here, though the NIC self-test happens
later when the driver is loaded.
4.5Late-BIST
Late-BIST disables the watchdog timer once RedBoot is fully loaded. It then verifies the checksum
of the OS image with a stored checksum at the top of flash memory, before proceeding with the
boot script execution.
The following diagram shows the times during the boot cycle the when various stages of BIST are
performed.
This feature will skip all the diagnostics tests in the mid-BIST and late-BIST, once it has been
enabled. However, Flash Test and Base Memory Test in the boot-BIST will still execute, even with
this feature enabled. The default setting is QuickBoot enabled.
When QuickBoot feature has been disabled, user has the choice to optionally enable or disable the
Extended Memory Test (in mid-BIST) and the OS Image Checksum Test (in late-BIST)
individually.
4.6.1Configuring QuickBoot
Module
initialization
(ethernet interface)
Late-BIST
Display c o pyright
banner, and
execute boot script
Done
RedBoot> fconfig
...
Enable QuickBoot during BIST: false
Execute extended memory test: true
OS image checksum at boot: true
...
Update RedBoot non-volatile configuration - are you sure (y/n)? y
The default 'Enable QuickBoot during BIST' is true. When 'Enable QuickBoot during BIST' set to
false, there will be two additional options displayed in the configuration menu. They are 'Execute
extended memory test' and 'OS image checksum at boot' options. User can selectively enable one
or both tests during the QuickBoot disabled mode. Both options will not be shown in the
configuration menu if the QuickBoot is enabled. These options will go into effect during the next
boot.
4.7Event Log Area and Event Management
Errors detected by the BIST are stored in an event log. The event-log area is designed to have up to
269 entries. Each entry is 14 bytes. The event-log area is located in EEPROM on the CMM. The
BIST can place entries into the event log until it becomes full. Once full, any new entries will be
lost. The BIST event log is cleared by the OS once the OS logs any BIST errors into the SEL.
At OS start-up, the CMM reads the contents of BIST results in the reserved event log area and
stores the errors as entries in the CMM SEL. This allows the CMM application to take the
appropriate action based upon the SEL events as a result of RedBoot BIST tests. If there is not
enough space to log the events in the CMM SEL, no results are logged to the CMM SEL.
Built-In Self Test (BIST)
The BIST event log is erased only after the event log is stored into the CMM SEL. Event strings for
BIST events are listed in Section 11, “Health Events” on page 104.
4.8OS Flash Corruption Detection and Recovery Design
The OS is responsible for the flash content integrity at runtim e. Flash monit oring under the OS
environment can be divided into two parts: Monitoring static images and monitoring dynamic
images.
Static images refer to the RedBoot image, FPGA image and BlueCat image in flash. These images
should not change throughout the lifetime of the CMM unless they are purposely updated or
corrupted. The checksum for these files is written into flash when the images are uploaded.
Dynamic image refers to the OS Flash File System (JFFS2). This image dynamically changes
throughout the runtime of the OS.
4.8.1Monitoring the Static Images
A static test is run every 24 hours during CMM operation. The static test reads each static image
(RedBoot, FPGA, BlueCat), calculates the image checksum, and compares with the checksum in
the RedBoot configuration area (FIS). If the checksum test fails, the error is logged to the CMM
SEL.
For monitoring the dynamic images, the CMM leverages the corruption detection ability from the
JFFS(2) flash file system. At OS start-up, the CMM executes an initialization script to mount the
JFFS(2) flash partitions (/etc and /home). If a flash corruption is detected, an event is logged to the
CMM SEL.
During normal OS operation, flash corruption during file access can also be detected by the
JFFS(2) and/or the flash driver. If a flash corruption is detected, an event is logged to the CMM
SEL.
4.8.3CMM Failover
If during normal OS operation a critical error occurs on the active CMM, such as a flash
corruption, the standby CMM is checked to see if it is in a healthier state. If the standby CMM is in
a healthier state, then a failover will occur. See Section 3, “Redundancy, Synchronization, and
Failover” on page 21.
4.9BIST Test Descriptions
4.9.1Flash Checksum Test
This test is targeted to verify the RedBoot image and FPGA image are not corrupted. This test
calculates the CRC32 checksum from the RedBoot image, then compares with the image checksum
stored in the FIS directory. If one mismatches another, BIST switches to the backup image. If
checksum mismatch was found from the FPGA image, BIST loads the backup image to program
the FPGA device.
4.9.2Base Memory Test
This test writes the data pattern of 55AA55AA into every 4 bytes of the memory below 1 MByte.
Its objective is to verify the wire connectivity of address and data pins between the memory
module and the processor. The test first writes the data pattern into the complete first 1 MByte, then
verifies the written data pattern by reading them from the memory module. If the data pattern
mismatches, the test logs the error event into the event-log area and routes the error message to the
serial port.
4.9.3Extended Memory T ests
Walking Ones Test
This test is targeted to verify the data bus wiring by testing the bus one bit at a time. The data bus
passes the test if each data bit can be set to 0 and 1 independently of the other data bits.
32-Bit Address Test
This test is targeted to verify the address bus wiring. The smallest set of addresses that will cover
all possible combinations is the set o f “power - of-two” addresses. These addresses are analogous to
the set of data values used in the walking ones test. The corresponding memory locations are
0001h, 0002h, 0004h, 0008h, 0010h, 0020h, and so on. In addition, address 0000h must also be
tested. To confirm that no two memory locations overlap, initial data value is first written at each
power-of-two offset within the device. Then a new value is written–an inverted copy of the initial
value to the first test offset. It is then verified that the initial data value is still stored at every other
power-of-two offset. If a location is found, other than the one just written, that contains the new
data value, there is a problem with the current address bit. If no overlapping is found, the procedure
is repeated for each of the remaining offsets.
32-Bit Inverse Address Test
This test behaves similarly to the memory test described above, except the addresses are tested in
the inverse direction. This test helps to identify a broader scope of possible addressing errors
inherent in the memory modules.
4.9.4FPGA Version Check
This test is targeted to verify the correct FPGA image programmed into both FPGA chips. It
displays the FPGA version on both FPGAs. Both versions should be the same. If the programmed
version is older than expected, an event is logged to the SEL.
4.9.5DS1307 RTC (Real-Time Clock) Test
This test is targeted to verify the functionality of DS1307 RTC chip. This test displays the date/time
settings from the RTC and validates the readings. If any readings are found to be non-BCD format,
an event is logged to the SEL. This test also captures current time, sleeps a while, and compares the
previously captured time and new time. If they differ, it means the R TC is working. If not, an event
is logged to the SEL.
Built-In Self Test (BIST)
4.9.6NIC Presence/Local PCI Bus Test
This test generates the PCI bus transaction by scanning the PCI buses available on the board. This
test detects the two Ethernet devices and verifies each device has the valid Vendor ID and Device
ID in the PCI configuration space. NIC internal self-test is not performed here, as the self-test is
executed when loading the Ethernet driver.
4.9.7OS Image Checksum Test
This test is targeted to verify the OS image st ored in the flash is not corrupted. This test calculates
the CRC32 checksum from the OS image, and then compares it with the image checksum stored in
the FIS directory. If one mismatches another, BIST will log an error event to the SEL.
4.9.8CRC32 Checksum
CRC32 is the 32-bit version of Cyclic Redundant Check technique, which is designed to ensure the
bits validity and integrity within the data. It first generates the diffusion table, which consists of
256 entries of double-word; each entry is known as a unique diffusion code. The checksum
calculation is started by fetching the first byte in data buffer, exclusive-OR with the temporary
checksum value. The resulting value is AND-ed with 0xFF to restrict an index from 0 to 255
(decimal). That index is used to fetch a new diffusion code from the table. Next, the newly fetched
diffusion code is exclusive-OR with the most significant 24 bits of the temporary checksum value
(effectively 8 bits left-shifting the checksum value). The resulting value is the new temporary
checksum value. The calculation process is repeated until the last byte in the data buffer. The final
temporary checksum value becomes the final checksum value.
The Chassis Management Module has the ability to re-enumerate devices in the chassis in the event
that the chassis loses and then regains CMM management. This allows the CMM to query
information on all devices in the chassis on startup if there are no active CMMs in that chassis
already containing that information from which it can receive via a regular synchronization. This is
achieved without having to restart the individual blades already present in the chassis.
Re-enumeration provides a way to recover from situations such as double failures where both the
CMMs have failed or been accidentally removed from the chassis. For the CMM to identify the
contents of the chassis, it first determines if it should do this function. The Standby CMM does not
re-enumerate its information and relies on the information synchronized from the Active CMM in
case a failover occurs. After the startup, the Active CMM determines what Entities are present.
Then for each of these Entities, the CMM queries it to get state and other information to be able to
properly manage the Entity as well as the entire chassis. The CMM stays in M2 state unt il reenumeration is complete.
The CMM re-enumeration process obtains the following information for each FRU in the chassis:
—Presence
— M-State
— Power Usage
— Sensor Data Records
— Health Events
— Board EKey Usage
— Bused EKey Usage
5.2Re-enumeration on Failover
In case of forced failover, the newly Active CMM will do re-enumeration if following conditions
are satisfied:
• Re-enumeration has not completed on the Active CMM.
• Active CMM has not yet synchronized the re-enumerated data over to the Standby CMM.
In case the newly Active CMM has to do re-enumeration, it will switch to M2 state before starting
re-enumeration. The Blue LED uses long blinks to provide visual indication of the state of the
CMM. It is recommended that the Entities in the chassis be not activated or deactivated while reenumeration is in progress.
If, during re-enumeration, the CMM discovers that a FRU is requesting for deactivation (State
M5), it denies the request and informs the FRU to go back to Active (M4) state if there is no
frucontrol script present (refer to Section 18.5, “FRU Control Script” on page 169). Otherwise, the
CMM executes the frucontrol script and lets it handle the deactivation of the FRU.
5.4Resolution of EKeys
During re-enumeration, the CMM determines the status of EKeys of the Boards present in the
chassis. If there are interfaces which can be enabled with respect to other end-point, the CMM
completes the EKeying process as per Section 14.1. If there are EKeys enabled to a slot but CMM
was unable to discover a Board in that slot, it assumes that the Board in that slot is in M7
(Communication Lost) state.
5.5Events Regeneration
The Re-enumeration agent sends out the "Set Event Receiver" command to all the Entities in the
chassis. On receiving the command, the Entities re-arm event generation for all their internal
sensors. This will cause them to transmit the event messages that they have based on the current
event conditions. These events will be logged in the SEL.
Note: The regeneration of events may cause events to be logged into the SEL twice. This could result in
configured eventaction scripts running twice.
During the process of identifying the chassis content, once the CMM determines that the Entity is a
fantray, it automatically sets the fan speeds to the critical level. The speeds are not brought back to
normal level until it has determined that there are no thermal events in the chassis.
The Chassis Management Module monitors the general healt h of pr ocesses running on the CMM
and can take recovery actions upon detection of failed processes. This is handled by the Process
Monitoring Service (PMS).
Upon detecting unhealthy processes, the PMS will take a configurable recovery action. Examples
of recovery actions include restarting the process, failing over to the standby CMM, etc.
The PMS itself is also monitored to ensure that it is operating correctly. The PMS is monitored in
both a single CMM configuration and a redundant CMM configuration. When faults are detected in
the PMS, corrective actions are taken.
The PMS also provides dynamic configuration and status information through the CLI, RPC, and
SNMP interfaces. For example, users can administratively lock/disable monitoring of a process
while the PMS is running to suit their particular needs. The PMS also provides static configuration
to allow customers the ability to tune the static system parameters for the given platform. Examples
of these parameters may include monitoring interval, retries, and ramp-up times.
6.1.1Process Existence Monitoring
Process existence monitoring utilizes the operating system's process table to determine the
existence of the process. When the CMM software is started, the PMS initializes and determines
the set of processes to monitor for process existence. The PMS periodically queries the operating
system for the existence of that set of processes. When a monitored process is found not to exist,
the PMS will generate a SEL entry and take a recovery action.
Process existence monitoring can be utilized on all permanent processes (processes which exist for
the life of the CMM software as a whole). It is particularly useful when monitoring processes that
were not specifically developed for running on the CMM. Applications that are provided by the
operating system vendor are examples of these types of processes. For the Linux* operating
system, processes like syslogd and crond would be good examples.
6.1.2Thread Watchdog Monitoring
Thread watchdog monitoring requires that the process being monitored notifies the PMS of its
continued operation. Notifying the PMS will allow the PMS to monitor the process for existence
and conditions where a process locks-up. Each thread requiring monit oring within a process using
the thread watchdog will register with the PMS. The PMS will loop through its list of registered
threads and determine if the set of registered threads are operating. When any thread is determined
to be unresponsive (i.e., not notifying the PMS of its continued operation), the PMS will generate a
SEL entry and take a recovery action.
Thread watchdog monitoring can be used on all processes that are instrumented with the PMS
thread watchdog API. It provides more functionality then process existence monitoring and can be
used in conjunction with process integrity monitoring to provide a comprehensive solution. Thread
watchdog monitoring is relatively lightweight and can be done every second, although, the process
being monitored may dictate a (much) lower frequency depending on how often it is capable of
feeding the watchdog.
6.1.3Process Integrity Monitoring
The Process Integrity Executable (PIE) will be responsible for determining the health of process or
processes. When a PIE finds an unhealthy process, it will notify the PMS of the errant process so
that the PMS can take the appropriate action. An example of a PIE would be one that monitored the
Simple Network Management Protocol (SNMP) process. The PIE could utilize SNMP get
operations to query the SNMP process. If the SNMP process cannot respond to the queries with the
appropriate information, the process would be considered unhealthy and the PIE would notify the
PMS.
Process integrity monitoring may be used in conjunction with existence monitoring to provide a
comprehensive solution.
6.2Processes Monitored
Below is a list of processes that are monitored for Process Existence on the CMM by the Process
Monitoring Service.
Table 5. Processes Monitored
Process Monitored
CMM Wrapper Process ./WrapperProcess 23PmsProc 23
CMM Wrapper Process ./WrapperProcess 255PmsProc50
SNMP Daemon
CLI Server./cli_svrPmsProc52Existence
Cron Daemon/bin/crondPmsProc100 Existence
Inet Daemonxinetd -stayalive -reusePmsProc101 Existence
Syslog Daemon/sbin/syslogdPmsProc102 Existence
CMM Command
Handler
CMM Blade Process
Manager
CMM Wrapper Process
[0-39]
Pms Monitor./PmsMonitorPmsProc3Existence and TWL
Pms Shadow ./PmsMonitor shadowPmsProc2Existence and TWL
Process Command Line /
Process Name
/usr/sbin/snmpd -c /etc/
snmpd.conf
./cmd_handPmsProc53Existence
./BPMPmsProc54
./WrapperProcess[#] (0-39)
Target
Name
PmsProc51
PmsProc[#]
(60-99)
Monitoring Level
Existence and
Integrity
Existence and
Integrity
Existence and
Integrity
Existence and
Integrity
Integrity
6.3Process Monitoring Targets
The following targets are provided for the Process Monitoring Service under the cmm location:
Use the following CLI command to view the targets for the processes being monitored.
cmmget -l cmm -d listtargets
The particular processes being monitored will be listed (e.g., PmsProc23, PmsProc100). To view
the name of the process being monitored use the following example command:
cmmget -l cmm -t PmsProc34 -d ProcessName
T able 5, “Processes Monitored” contai ns the list of processes monito red and the command lines
and the target names. The ProcessName dataitem will return the Process Command Line.
6.4Process Monitoring Dataitems
Process Monitoring and Integrity
The following dataitems are used to retrieve information on and configure the Process Monitoring
Service (used with PmsGlobal or PmsProc[#] targets on the cmm location).
• AdminState
• RecoveryAction
• EscalationAction
• ProcessName
• OpState
More information on the usage and descriptions of these dataitems can be found in Section 8, “The
Command Line Interface (CLI)” on page 71.
6.4.1Examples
The following example will set the global PMS AdminState to locked:
cmmset -l cmm -t PmsGlobal -d AdminState -v 2
The following example will get the recovery action assigned to a monitored process:
cmmget -l cmm -t PmsProc34 -d RecoveryAction
The following example will get the admin state to a PIE:
cmmget -l cmm -t PmsPie176 -d AdminState
SNMP commands are implemented in the CMM mib for Process Monitoring. The list of new
commands can be found in the CMMs MIB file or in Section 17, “SNMP” on page 140.
6.6Process Monitoring CMM Events
The “Process Monitoring Service” sensor types are used to assert and de-assert process status
information such as process presence not detected, process recovery failure, or recovery action
taken. See Section 11.4, “List of Possible Health Event Strings” on page 108 for event strings,
codes, and severities for Process Monitoring.
Event severities are configurable by the user and are unique to the process being monitored.
The processes that are monitored and their default severities are listed below. Severities are
configured (while PMS is not running) by changing the ProcessSeverity field in the configuration
file (pms.ini). Values for severity: 1 = minor, 2 = major, 3 = critical.
Note: The recovery action and escalation action should not be set to "no action" for the xinetd process.
This process is involved in data synchronization between the CMMs.
Note: When a user tries to change the recovery action for cmd_hand or BPM to values other than allowed
via the CLI API, the error string displayed is:
"Recovery action not allowed for this target."
6.7Failure Scenarios and Eventing
This section describes the process fault scenarios that are detected and handled by the PMS. It also
describes the eventing that is associated with the detection and recovery mechanisms. Each
scenario contains a brief textual description and a table that further describes the scenario.
In the table, the Description column outlines the current action. The Event Type String defines the
text for the event that is written to the SEL. The text in this field describes the portion of the event
containing event-specific string (the remainder of the event text is standard for all events).
However, for PMS the target name (sensor name) will be PmsProc<#> instead of the name of the
sensor (where # is the unique identifier of the given process).
Process Monitoring and Integrity
The UID indicates the unique identifier for the process causing the event. An ID of 1 indicates the
monitoring service itself (global) and an ID of # indicates an application process.
The Assert column indicates if the event is asserted or de-asserted. For items that are just written to
the SEL for informational purposes, the assertion state is not applicable. However, it is required by
the interface and therefore it will be set to de-assert.
The Severity column will define the severity of the event. A severity of Configure indicates that the
severity is configurable. The configurable severities are available in the Configuration Database.
The remaining columns (SNMP traps, health events, LEDs, and telecommunication alarms) define
what indicator will be triggered by the event.
6.7.1No Action Recovery
In this scenario PMS detects a process fault. The PMS is configured to take no action and therefore
disables monitoring of the process.
PMS detects a faulty process. The
mechanism (existence, thread
watchdog, or integrity) used to detect
the fault will determine which of the
event type strings will be used.
The recovery action specified is "no
action".
No attempt will be made to recover
the process. The PMS will stop
monitoring the process.
See Section 6.7.11, “Process
Administrative Action” on page 53, for
information about how to re-enable
monitoring and de-assert the event.
Process existence fault;
attempting recovery or
Thread watchdog fault; attempting
recovery or
Process integrity fault; attempting
recovery
Take no action specified for
recovery
Process existence fault;
monitoring disabled or
Thread watchdog fault; monitoring
disabled or
Process integrity fault; monitoring
disabled
6.7.2Successful Restart Recovery
In this scenario PMS detects a process fault. The configured recovery action is: restart the process.
The PMS is able to successfully recover the process by restarting it.
Table 7. Successful Restart Recovery
DescriptionEvent StringUIDAsse rtSeverity
PMS detects a faulty process. The
mechanism (existence, thread
watchdog, or integrity) used to detect
the fault will determine which of the
event type strings will be used.
The recovery action specified is
"process restart".
In this scenario PMS detects a process fault. The configured recovery action is: failover to the
standby CMM and then restart the failed process. The PMS is able to successfully recover the
process by restarting it.
Table 8. Successful Failover/Restart Recovery
DescriptionEvent StringUIDAssertSeverity
PMS detects a faulty process. The
mechanism (existence, thread
watchdog, or integrity) used to detect
the fault will determine which of the
event type strings will be used.
The recovery action specified is
"failover and restart".
PMS executes a failover.
Note this step is skipped when
running on the standby CMM.
PMS was successfully able to restart
the process
Note PMS will execute this step even
if the failover is unsuccessful (standby
not available, unhealthy, etc.).
Process existence fault;
attempting recovery or
Thread watchdog fault; attempting
recovery or
Process integrity fault; attempting
recovery
Attempting process failover &
restart recovery action
The existing code generates the
events for failover. They are
separate from process monitoring
events and are not described
here.
In this scenario, PMS detects a process fault. The configured recovery action is: failover to the
standby CMM and upon successfully executing the failover, reboot the now standby CMM. The
recovery actions are successful.
Table 9. Successful Failover/Reboot Recovery
DescriptionEvent StringUIDAsse rtSeverity
PMS detects a faulty process. The
mechanism (existence, thread
watchdog, or integrity) used to detect
the fault will determine which of the
event type strings will be used.
The recovery action specified is
"failover & reboot"
PMS executes a failover.
Note this step is skipped when
running on the standby CMM.
PMS is running on the standby CMM
(failover was successful or already
running on the standby), PMS
recovers the CMM by rebooting.
Upon initialization of PMS after the
reboot. The monitor will de-assert the
event.
Process existence fault;
attempting recovery or
Thread watchdog fault; attempting
recovery or
Process integrity fault; attempting
recovery
Attempting failover & reboot
recovery action
The existing code generates the
events for failover. They are
separate from process monitoring
events and are not described
here.
In this scenario, PMS is running on the active CMM and detects a monitored process fault. The
severity of the process is configured to a value that is not critical. The configured recovery action
is: failover to the standby CMM and upon successfully executing the failover, reboot the now
standby CMM. The failover recovery action is unsuccessful (standby is not available, etc.). The
process being monitored is not of a critical severity and therefore the reboot of the CMM will not
be performed.
PMS detects a faulty process. The
mechanism (existence, thread
watchdog, or integrity) used to detect
the fault will determine which of the
event type strings will be used.
The recovery action specified is
"failover & reboot"
PMS executes a failover
PMS detects that it is still running on
the active CMM. The process is not
critical and therefore the reboot
operation will not be performed.
No attempt will be made to recover
the process. The PMS will stop
monitoring the process.
See Section 6.7.11, “Process
Administrative Action” on page 53, for
information about how to re-enable
monitoring and de-assert the event.
Process existence fault;
attempting recovery or
Thread watchdog fault; attempting
recovery or
Process integrity fault; attempting
recovery
Attempting failover & reboot
recovery action
The existing code generates the
events for failover. They are
separate from process monitoring
events and are not described
here.
Failover & reboot recovery failure#N/AConfigure
Process existence fault;
monitoring disabled or
Thread watchdog fault; monitoring
disabled or
Process integrity fault; monitoring
disabled
6.7.6Failed Failover/Reboot Recovery, Critical
#AssertConfigure
#N/AConfigure
-N/A N/A
#AssertConfigure
In this scenario, PMS is running on the active CMM and detects a monitored process fault. The
severity of the process is configured to be critical. The configured recovery action is: failover to the
standby CMM and upon successfully executing the failover, reboot the now standby CMM. The
failover recovery action is unsuccessful (standby is not available, etc.). The process being
monitored is of a critical severity and therefore the reboot of the CMM will be performed.
PMS detects a faulty process. The
mechanism (existence, thread
watchdog, or integrity) used to detect
the fault will determine which of the
event type strings will be used.
The recovery action specified is
"failover & reboot"
PMS executes a failover.
PMS detects that it is still running on
the active CMM. The process is
critical and therefore the reboot
operation is performed.
Upon initialization of PMS after the
reboot. The monitor will de-assert the
event.
Process existence fault;
attempting recovery or
Thread watchdog fault; attempting
recovery or
Process integrity fault; attempting
recovery
Attempting failover & reboot
recovery action
The existing code generates the
events for failover. They are
separate from process monitoring
events and are not described
here.
Monitoring initialized#De-assertOK
#AssertConfigure
#N/AConfigure
-N/A N/A
6.7.7Excessive Restarts, Escalate No Action
In this scenario PMS detects a process fault. The configured recovery action is: restart the process.
However, the PMS also detects that the process has exceeded the threshold for excessive process
restarts. Therefore, the PMS will execute the escalation action. The escalation action is configured
for no action.
Table 12. Existence Fault, Excessive Restarts, Escalate No Action (Sheet 1 of 2)
DescriptionEvent StringUIDAsse rtSeverity
PMS detects a faulty process. The
mechanism (existence, thread
watchdog, or integrity) used to detect
the fault will determine which of the
event type strings will be used.
The recovery action specified is
"process restart"
In this scenario PMS detects a process fault. The configured recovery action is: restart the process.
However, the PMS also detects that the process has exceeded the threshold for excessive process
restarts. Therefore, the PMS will execute the escalation action. The configured escalation recovery
action is: failover to the standby CMM and upon successfully executing the failover, reboot the
now standby CMM. The escalated recovery action is successful.
PMS detects a faulty process. The
mechanism (existence, thread
watchdog, or integrity) used to detect
the fault will determine which of the
event type strings will be used.
The recovery action specified is
"restart process"
PMS detects that the process has
been restarted excessively.
The escalated recovery action
specified is "failover and reboot"
PMS executes a failover.
Note this step is skipped when
running on the standby CMM.
PMS is running on the standby CMM
(failover was successful or already
running on the standby), PMS
recovers the CMM by rebooting.
Upon initialization of PMS after the
reboot. The monitor will de-assert the
event.
In this scenario PMS detects a process fault. The severity of the process is configured to a value
that is not critical. The configured recovery action is: restart the process. However, the PMS also
detects that the process has exceeded the threshold for excessive process restarts. Therefore, the
PMS will execute the escalation action. The configured escalation recovery action is: failover to
the standby CMM and upon successfully executing the failover, reboot the now standby CMM.
The failover recovery action is unsuccessful (standby is not available, etc.). The process being
monitored is not of a critical severity and therefore the reboot of th e CMM will not be performed.
PMS detects a faulty process. The
mechanism (existence, thread
watchdog, or integrity) used to detect
the fault will determine which of the
event type strings will be used.
The recovery action specified is
"restart process"
PMS detects that the process has
been restarted excessively.
The escalated recovery action
specified is "failover and reboot"
PMS executes a failover.
PMS detects that it is still running on
the active CMM. The process is not
critical and therefore the reboot
operation will not be performed.
No attempt will be made to recover
the process. The PMS will stop
monitoring the process.
See Section 6.7.11, “Process
Administrative Action” on page 53, for
information about how to re-enable
monitoring and de-assert the event.
In this scenario, PMS detects a process fault. The severity of the process is configured as critical.
The configured recovery action is: restart the process. However, the PMS also detects that the
process has exceeded the threshold for excessive process restarts. Therefore, the PMS will execute
the escalation recovery action. The configured escalation recovery action is: failover to the standby
CMM and upon successfully executing the failover, reboot the now standby CMM. The failover
recovery action is unsuccessful (standby is not available, etc.). The process being monitored is of
critical severity and therefore the reboot of the CMM will still be executed even though the CMM
is still active.
PMS detects a faulty process. The
mechanism (existence, thread
watchdog, or integrity) used to detect
the fault will determine which of the
event type strings will be used.
The recovery action specified is
"restart process"
PMS detects that the process has
been restarted excessively.
The escalated recovery action
specified is "failover and reboot"
PMS executes a failover.
PMS detects that it is still running on
the active CMM. The process is
critical and therefore the reboot
operation is performed.
Upon initialization of PMS after the
reboot. The monitor will de-assert the
event.
The existing code generates the
events for failover. They are
separate from process monitoring
events and are not described
here.
Monitoring initialized#De-assertOK
6.7.11Process Administrative Action
#AssertConfigure
#N/AConfigure
#N/AConfigure
#N/AConfigure
-N/A N/A
In this scenario, PMS has detected a fault in a process, but has not been able to recover the process
(recovery is configured for no action, etc.). This causes PMS to operationally disable monitoring of
the process. To re-enable monitoring of the process, an operator must administratively lock the
process, take the necessary actions to fix the process, and administratively unlock the process.
Table 16. Administrative Action
DescriptionEvent StringUIDAssertSeverity
Operator administratively locks
monitoring of the process
Operator takes actions to fix the
problem
Operator administratively unlocks
monitoring of the process causing
monitoring to restart
Prior to executing any failover/reboot the PMS will determine if the failover/reboot threshold has
been exceeded. If it has, the PMS will be operationally disabled. When PMS is disabled, all process
monitoring is halted. To re-enable the PMS, the operator must lock the global administrative state.
The operator can then fix the problem and administratively unlock the global administrative st ate.
The following events are generated against the PMS Monitor (unique ID 1). The events for the
process or processes that caused this condition to occur will also be present, but are not described
in this table. They are defined in the scenarios provided above.
Operator unlocks the global
administrative state causing
monitoring to be resumed
a.The "Monitoring initialized" will be generated for the monitor (unique 1) as well as the individual processes that are admin-
istratively unlocked.
Excessive reboots/failovers; all
process monitoring disabled
None-N/AN/A
N/A-N/AN/A
Monitoring initialized1#
1AssertMajor
a
De-assertOK
6.8Process Integrity Executable (PIE)
The Process Integrity Executable (PIE) for the Chassis Management Module’s (CMM) Blade
Proxy Manager (BPM) and Wrapper Processes is responsible for determining the health of the
Wrapper Processes. Monitoring the integrity means not only monitoring the fact that the process is
running but that it is functioning properly.
The PIE will monitor the BPM, CMM Wrapper Process (Wrapper Process num ber 255) and
Chassis Wrapper Processes (23). It will also monitor the Wrapper Processes for intelligent (have a
management controller) blades, power supplies, and fans. Wrapper Processes for non-intelligent
devices will not be monitored.
PIE will monitor the BPM and Wrapper Processes. The Wrapper Processes have two categories for
integrity monitoring. The first category contains the static processes. Static processes are processes
that are always present while the CMM software is running. The CMM (255) and chassis (23)
Wrapper Processes are the static processes. The second category contains all the dynamic Wr apper
Processes. Dynamic processes are ones that come and go as the configuration of the chassis
changes (such as a blade insertion or removal). The fan, power supply, and blade Wrapper
Processes belong to the dynamic category.
The pms.ini file is the Process Monitoring Service (PMS) and Process Integrity Exectuable (PIE)
configuration file. It contains all of the non-volatile configuration data for the service. This file can
be found in the /etc/cmm directory on the CMM. It is an ASCII based text file that can be edited
with vi or any other text editor.
Note: Any changes made to the pms.ini file will be overwritten during a firmware update. Care should be
made to preserve the file or any changes before a firmware update is done so that the file and
changes can be restored following the update.
The dynamic data fields (except the AdminStates) in this file will be replicated to the standby
CMM via the CMM Data Synchronization Service. If invalid data is provided for a particular field
(i.e. out of range), the default value, if one exists, will be used.
If invalid data is provided for a particular field (i.e. out of range), the default value, if one exists,
will be used. If a default value is not possible, that entire section not be used. For example,
PmsProcess012 will be ignored if no value is given for its CommandLine.
Database changes are classified in two categories: dynamic and static. Dynamic changes are
initiated by an interface (RPC, CLI, or SNMP). The change will take effect in the PMS and the data
in this file will be updated. Dynamic changes can be made while the PMS is running.
Process Monitoring and Integrity
Static changes are made directly to this file and must be done while the PmsMonitor is not running.
6.9.1Global Data
This data applies to the PMS as a whole (not specific to a process). There must be one and only one
set of this data.
6.9.1.1PMS Administrative State
The PMS administrative state determines if monitoring of all processes will be allowed.
Values: 1 - unlocked (enabled), 2 - locked. Default: 1. (dynamic)
AdminState = 1
6.9.1.2PMS Excessive Reboot/Failover Cou nt
The maximum number of reboots or failover attempts allowed (over the interval specified in the
field below).
Values: 2 - 255. Default: 3.
ExcessiveRebootOrFailoverCount = 3
6.9.1.3PMS Excessive Reboot/Failover Interval
The interval, in seconds, over which the maximum number of reboots/failovers will be measured.
Values: 1 - 65535. Default: 900.
This data applies to a specific process running on the CMM. There will be one set of this data for
each process.
The following information describes each of the fields in the process specific section.
6.9.2.1Process Section Name
The section name MUST follow the pattern "PmsProcessXXX" where XXX is a number from 010
to 175 inclusive. PmsProcess section names must be unique but are NOT significant in any other
way. Specifically, they are NOT required to match the UniqueID field for the section.
[PmsProcess151]
6.9.2.2Unique ID
This is a unique identifier for the program and its arguments. It is essentially the short version of
the "Process Name and Arguments" field above.
This is a list of chassis types for which this particular Uid is valid. The list is comma delimited.
Spaces are ignored. If this key is not present, then the Uid is valid on all chassis.
This string contains the program name including its path and its associated command line
arguments. This field will be used to monitor a program and therefore must be an exact match to
how the program is represented in the OS. The program name and command line arguments are
space separated with the program name being the first entry in the string. If an individual argument
contains spaces, the argument must be encapsulated in quotation marks. The program name and
arguments will uniquely identify the entry. This means if the same program is started multiple
times with different arguments, each of them will require a separate entry.
Values: N/A. Default: None.
CommandLine = MyProcess -x -y
6.9.2.5Start Program Name and Arguments
This is the program name and arguments used to start the program. This differs from the
monitoring program name and arguments because some programs are started via scripts. For
example many Linux system programs are started via startup scripts located in the "init.d"
directory.
Values: N/A. Default: None.
StartCommandLine = MyProcess -x -y
6.9.2.6Administrative State
Process Monitoring and Integrity
The process administrative state determines if the process will be monitored.
Values: 1 - unlocked (enabled), 2 - locked. Default: 1. (dynamic)
AdminState = 1
6.9.2.7Process Existence Interval
This is the interval in seconds in which to verify that a process exists. A value of 0 disables
Existence Monitoring.
Values: 0 - 65535. Default: 2.
ProcessExistenceInterval = 2
6.9.2.8Thread Watchdog Retries
This is the number of retries (number of thread watchdog intervals) to wait for notification from a
thread. Recovery takes place on retries+1 missed thread watchdog intervals.
Values: 0-10, default: 3.
ThreadWatchdogRetries = 3
6.9.2.9Process Ramp-up Time
The amount of time in seconds necessary for the process to initialize and be functional.
Values: 0-255. Default: 60.
An indicator for the importance of a given process. This severity will determine at what level SEL
entries are generated and when reboots should occur on an active CMM.
This is the recovery action to take upon detection of a failed process.
Values: 1 = no Action, 2 = process restart, 3 = failover and process restart, 4 = failover and reboot.
Default: 1. (dynamic)
RecoveryAction = 1
6.9.2.12Process Restart Escalation Action
This determines the action to take if the RecoveryAction includes "process restart" and it fails.
Values: 1= no action, 2 = failover and reboot. Default: 1. (dynamic)
ProcessRestartEscalationAction = 1
6.9.2.13Process Restart Escalation Number
This is the number of process restarts that are allowed (within the interval specified below) before
escalation starts.
Values: 1 - 255. Default: 5.
ProcessRestartEscalationNumber = 5
6.9.2.14Process Restart Escalation Interval
This is the interval in seconds at which the number of restarts will be limited (see above).
Values: 1 - 65535. Default: 900.
ProcessRestartEscalationInterval = 900
6.9.3Process Definition Section of pms.ini
The following sections describe and give examples of each of the process types that are defined in
the pms.ini file.
6.9.3.1Shadow Process
Shadow process must exist to monitor "Monitor Process". Therefore this process should never have
a recovery action of "no action".
This process must exist to execute interface commands (CLI, SNMP, etc.) for the CMM. Therefore
this process should never have a recovery action of "no action".
Note: PmsProc053 represents a crucial process cmd_hand (command handler) of the CMM software
stack. This process cannot be restarted properly if it terminates unexpectedly. Hence, none of the
recovery actions that attempt to restart a process i.e., 2 (Restart), 3 (Failover & Restart) are allowed
as valid recovery actions for cmd_hand. The default recovery action for cmd_hand process is 4
(failover and reboot) and that cannot be changed to anything else. A recovery action of 1 (No
Action) is also not allowed because of the severity of the process.
In the event that cmd_hand process terminates unexpectedly, and the default recovery action kicks
in, there is 2-3 minute delay before the CMM actually reboots. This is normal and expected
because PMS makes multiple tries to failover, and times out because cmd_hand does not respond.
[PmsProcess053]
UniqueID = 53
CommandLine = ./cmd_hand
StartCommandLine = ./cmd_hand
AdminState = 1
ProcessExistenceInterval = 2
ProcessRampUpTime = 10
ProcessSeverity = 3
RecoveryAction = 4
ProcessRestartEscalationAction = 2
ProcessRestartEscalationNumber = 5
ProcessRestartEscalationInterval = 300
6.9.3.8BPM
Note: PmsProc054 represents a crucial process of the CMM software stack. This process cannot be
restarted properly if it terminates unexpectedly. Hence, none of the recovery actions that attempt to
restart a process i.e., 2 (Restart) or 3 (Failover & Restart) are allowed as valid recovery actions for
BPM. The default recovery action for BPM process is 4 (failover and reboot) which can only be
changed to 1 (No Action).
6.10Process Integrity Executable (PIE) Specific Data Config
This data applies to each Process Integrity Executable (PIE). One PIE may monitor multiple CMM
processes or only one CMM process. There will be one set of this data for each PIE.
The following information describes each of the fields in the PIE specific section. Lines with a '*'
prefix, indicate the actual fields (the prefix is not part of the field name).
6.10.1PIE Section Name
The section name MUST follow the pattern "PmsPieXXX" where XXX is a number from 176 to
200 inclusive. PmsPie section names must be unique but are NOT significant in any other way.
Specifically, they are NOT required to match the UniqueID field for the section.
The name, including its path and command line arguments, of the PIE to be executed periodically.
This is used to start the program and may, in the future, be used to monitor the program and
therefore must be an exact match to how the program is represented in the OS. The program name
and command line arguments will all be space separated with the program name being the first
entry in the string. If an individual argument contains spaces, the argument must be encapsulated in
quotation marks. The program name and arguments will uniquely identify the entry. This means if
the same program is started multiple times with different arguments, each of them will require a
separate entry. Each PIE will likely have PIE specific options that can be specified through the
command line. This options must be included in the arguments to the "ProcessIntegrityExecutable"
command.
ProcessIntegrityExecutable = ./PmsPieSnmp
6.10.3Unique ID
This is a unique identifier for the executable and its arguments. It is essentially the short version of
the "Process Integrity Executable" field above. It is used for logging and CSL access.
The PIE administrative state determines if the PIE will be restarted at the next interval.
Values: 1 - unlocked (enabled), 2 - locked. Default: 1. (dynamic)
This is the interval in seconds between executions of the PIE.
Values: 0 - 65535, where 0 indicates that the PIE only gets executed once.
Default: 3600.
ProcessIntegrityInterval = 3600
6.10.6Chassis Applicability
This is a list of chassis types on which this particular Pie should be run. The list is comma
delimited. Spaces are ignored. If this key is not present, then the Pie will run on all chassis.
The command line usage of PmsPieSnmp is:
PmsPieSnmp [-f SuccessiveFailureNumber]
where:
-f : This is the number of allowed successive integrity failures before the PMS performs recovery
on the faulting process. PMS performs recovery on "this number + 1".
The command line usage of PmsPieWp is:
PmsPieWp [-s] [-d[NumberOfDynamicWrappersPerRun]] [-f SuccessiveFailureNumber]
where:
-s: check static wrappers (optional)
-d: check dynamic wrappers and bpm threads (optional)
Number of dynamic wrappers and bpm threads to check on each run (optional)
Values: 0 - 100. Default : 0 = process all dynamic wrappers and bpm threads on each execution.
-f: Successive Failure Number - This is the number of allowed successive integrity failures before
the PMS performs recovery on the faulting process. PMS performs recovery on "this number + 1".
Values: 1 - 100. Default = 3
Example:
Process Monitoring and Integrity
PmsPieWp -s -f2 - check static wrappers
PmsPieWp -d0 -f2 - check all dynamic wrappers and all BPM threads
PmsPieWp -s -d0 -f2 - Check static and all dynamic wrappers all BPM
threads
PmsPieWp -s -d10 -f2 - Check static and 10 dynamic wrappers and BPM
threads
PmsPieWp -s -d10 -f 2 - Check static and 10 dynamic wrappers and BPM
threads
The CMM is responsible for the management of FRU hot-swap activities. The CMM listens to
FRU hot-swap SEL messages from IPMI devices and distributes power to each FRU after
negotiating with the respective IPMI device fronting the FRU. The CMM also manages the shelfwide power budget. The CMM also polls IPMI devices to get the status of each FRU fronted by the
IPMI device. The CMM uses shelf FRU information to guarantee power-up sequence delays
between boards.
Once the CMM receives the shelf FRU information on power budget and power sequence delays, it
is ready to service FRU hot-swap requests from respective IPMI devices.
7.1Hot Swap States
The CMM defines the hot swap status of a FRU as being in one of eight states. CMM
documentation often refers to only the letter/number designation of that state (M0 - M7). Here is a
list of what each of those states means:
• "State M0 - Not Installed"
• "State M1 - Inactive"
• "State M2 - Activation Request"
• "State M3 - Activation In Progress"
• "State M4 - Active"
• "State M5 - Deactivation Request"
• "State M6 - Deactivation In Progress"
• "State M7 - Communication Lost"
7.2FRU Insertion
When the CMM receives a request that a FRU is ready to activate, it will compute the FRU’s
power, get the power levels, and check the available power budget.
The Set_Power_Level command will be sent only when the necessary power budget, from each of
the redundant power feeds, is available to satisfy FRU's desired power level. If a FRU can't be
activated at the time of the request, it should remain in the M3 state and shall be powered up when
the necessary power budget becomes available. If the FRU decides to operate at a lower power
level and notifies the Shelf Manager and the new power level is within the current Shelf Power
envelope, the CMM shall send the Set_Power_Level (new desired level) command to the FRU.
7.3Graceful FRU Extraction
When the CMM receives a FRU Hot swap request for extraction, the CMM will send the deactivate
state command, and the FRU will transition to M6 state and begin its shut-down procedures. Once
the FRU has shut down, it transitions to M1 state, and the CMM then reclaims the FRU’s power
and adjusts the power budget for the newly available power.
The CMM detects a surprise FRU extraction or a failure of the IPMI device fronting the FRU if a
device previously in one of the M2-7 states reports a transition to the M2 state. If this scenario is
detected, the CMM assumes one of three things has happened:
• Surprise extraction and reinsertion of the same (or another) FRU.
• IPMI Device fronting the FRU failed, FRU was extracted, then the same (or another) FRU is
reinserted.
• Watchdog Timer (WDT) on the IPMI device restarted the IPMI Device firmware.
Once this occurs, the CMM shall reclaim all the resources allocated to that FRU. The CMM will
log a SEL message describing the situation, i.e. IPMI device failure or surprise extraction. From
this point the CMM shall follow the sequence of actions described in Section 7.2, “FRU Insertion”.
7.5Forced Power State Changes
An external authorized entity (e.g., a management interface like RMCP) can request FRU power
state changes like Power OFF, RESET etc. The CMM is responsible for handling these requests.
7.6Power Management on the Standby CMM
The standby CMM does not participate in any power management activities in the standby mode.
The CMM is in a hot standby state on a standby CMM. The standby CMM starts performing power
management activities as soon as it becomes the active CMM.
7.7Power Feed Targets
The CLI allows certain get and set actions to be taken on power feeds for a location. They include
the following dataitems; maxexternalavailablecurrent, maxinternalcurrent, and
minexpectedoperatingvoltage. These dataitems are described in Section 8, “The Command Line
Interface (CLI)” on page 71.
To find the number of feed targets, use the command:
cmmget -l cmm -d feedcount
This returns an integer, indicating the number of power feeds.
As an example, the MPCHC0001 chassis with four power feeds coming from the PEMs will return
the number 4, meaning there are four feed targets (feed1, feed2, feed3, and feed4). They correlate
to the physical feeds on the MPCHC0001as follows:
The following lists the values of time to delay and number of pings that the CMM uses to
determine the state of a FRU.
Table 18. Time to Delay and Number of Attempts
VariableDescriptionValue
The number of microseconds to delay between
DelayBetweenPingLoops
DelayBetweenIPMControllerPings
NumberFailedAttemptsBeforeAlert
each ping loop. This is essentially the amount of
time from the ping of the last IPMI Controller in
the list to the ping of the first controller in the list.
The number of microseconds of delay between
the ping on one controller that is in the list and
the ping of the next one on the list. This delay
does not apply after the last controller in the list.
How many failed attempts to contact the IPMI
Controller must occur prior to raising an event
that communication has been lost.
The Command Line Interface (CLI) connects to and communicates with the intelligent
management devices of the chassis, boards, and the CMM itself. The CLI is an IPMI-based library
of commands that can be accessed directly or through a higher-level management application.
Administrators can access the CLI through a Telnet session, SSH, or through the CMM's front
panel serial port. The CLI functions are also available through SNMP get/set commands and an
RPC interface. Using the CLI, users can access information about the current state of the system
including current sensor values, threshold settings, recent events, and overall chassis health.
Note: The CLI uses the term “blade” when referring to boards.
8.2Connecting to the CLI
The CMM provides three connections on its front panel.
• Two Ethernet connections via an RJ-45 connector
• An RS-232 serial port interface also via an RJ-45 connector
These same ports are also available on the rear transition module.
Any of these interfaces can be used to log into the CMM as well as the Ethernet interface provided
through the backplane of a chassis. Use Telnet to log into the CMM over an Ethernet connection, or
use a terminal application or serial console over the RS-232 interface. See the Intel® NetStructure™ MPCMM0001 Hardware Technical Product Specification for electrical pinouts of
the above interfaces.
If logging in for the first time to set up or obtain the CMMs IP addresses, use the serial port console
interface to perform configuration.
8.2.1Connecting through a Serial Port Console
Connect an RS-232 serial cable with an RJ-45 connector to the serial console port on the front of
the CMM. Set your terminal application settings as follows:
Logging in for the first time must be done through the serial port console to properly configure the
Ethernet settings and IP addresses for the network.
The username for the CMM is root. The default password is cmmrootpass.
At the login prompt, enter the username: root
When prompted for the password, enter: cmmrootpass
The root password can be changed using the passwd command. For information on resetting the
CMM password back to default, refer to Section 9, “Resetting the Password” on page 99.
8.3.1Setting IP Address Properties
Note: Changing any of the IP address settings and restarting the network could result in a failover
occurring based on the rules governing redundancy specified in Section 3, “Redundancy,
Synchronization, and Failover” on page 21.
By default, the CMM assigns IP addresses statically:
• eth0, labeled “Ethernet A” on the front panel, is configured with the static IP address
10.90.90.91
• eth1, labeled “Ethernet B” on the front panel, is configured with a static IP address of
192.168.100.92
• eth1:1, an alias of eth1 is used to always point to and be active on the active CMM, is
configured with a static IP address of 192.168.100.93
On initial power-up of a chassis with two CMMs, both CMMs will have the same IP addresses
assigned by default. When the chassis is powered up, the standby CMM automatically decrements
its IP address by one less than the active CMM if it detects a conflict.
Example:
1. A dual CMM Chassis is powered up.
2. Active CMM assigns IP address of 192.168.100.92 to eth1 on the acti ve CMM.
3. Standby CMM assigns IP address of 192.168.100.91 to eth1 on the standby CMM.
At this point the static IP addresses must be changed to appropriate values for their network
configuration, and ensure that the two CMMs do not contain duplicate IP addresses on eth0 and
eth1 to avoid address conflicts on the network.
eth0 and eth1 can also be set using DHCP. eth1:1 will always remain static.When setting both eth0
and eth1 to DHCP, use the /etc/pump.conf to determine which interface should own the default
gateway. The default is for eth0 to own the default gateway. To configure eth1 to own the default
gateway, and thereby eth1:1, uncomment the two lines under the eth0 section of /etc/pump.conf
and comment the two lines under the eth1 section of that file. Save the file and run the /etc/rc.d/
network reload script.
Note: It is recommended that both CMMs use static IP addresses for all interfaces. DHCP addresses may
be unexpectedly lost or changed in some network configurations.
Note: eth0 should always be set to a different subnet than eth1/eth1:1. Failure to set eth0 to a different
subnet than eth1 will cause network errors on the CMM and redundancy will be lost.
8.3.1.1Setting Static IP Information for eth0
1. Open the /etc/ifcfg-eth0 file using the vi editor. By default, the file contains three variables.
2. Set the STATICIP1 variable to the IP address you want to assign to eth1.
3. Set the STATICIP2 variable to the IP address you want to assign to the active CMM on the
network. This value should ONL Y be set on the active CMM, as it will be synchronized to and
overwritten on the standby CMM.
4. Set the SETIP variable to assign IP addresses eth1 and eth1:1 based on the following table:
5. Add the NETMASK1 variable and set it to the appropriate netmask for STATICIP1 for your
network.
6. Add the NETMASK2 variable and set it to the appropriate netmask for STATICIP2 for your
network. The NETMASK2 variable needs to be correct to allow for true redundant operation.
7. Add the GATEWAY1 variable and set it to the appropriate value for the gateway for
STATICIP1.
8. Add the GATEWAY2 variable and set it to the appropriate value for the gateway for
STATICIP2
9. To activate the changes, at the user prompt (from the root “/” directory), type:
/etc/rc.d/network reload
Note: The eth1:1 address should only be changed on the active CMM. The new address will be
synchronized to the standby CMM automatically when the /etc/rc.d/network reload command is
executed. Also, the eth1:1 should be changed with the procedure above and NOT by using the
ifconfig command manually. This method will cause the eth1:1 information to not be synchronized
to the standby.
8.3.1.3Setting eth0 to DHCP
Previous
Value
1. Using the vi editor, change the BOOTPROTO variable in the /etc/ifcfg-eth0 file to dhcp.
Note: Linux is case sensitive, so ensure that the BOOTPROTO value is entered in lower case letters in
the step above.
2. To activate the changes, the user can reboot the CMM, or at the user prompt (from the root “/”
directory) on the active CMM, type:
/etc/rc.d/network reload
Note: A DHCP server must be present on the network for the CMM to get a valid IP address. The
network reload command will refresh the IP addresses on both network interfaces.
8.3.1.4Setting eth1 to DHCP
1. Using the vi editor, change the BOOTPROTO variable in the /etc/ifcfg-eth1 file to dhcp.
2. eth1:1 will still use a static IP address in this configuration. Set the STATICIP2 variable to the
IP address you want to assign to the active CMM on the network. This value should ONL Y be
set on the active CMM, as it will be synchronized to and overwritten on the standby CMM.
3. Add the NETMASK1 variable and set it to the appropriate netmask for STATICIP1 for your
network.
4. Add the NETMASK2 variable and set it to the appropriate netmask for STA TICIP2 for your
network. The NETMASK2 variable needs to be correct to allow for true redundant operation.
5. Add the GA TEWAY1 variable and set it to the appropriate value for the gateway for
STATICIP1.
6. Add the GA TEWAY2 variable and set it to the appropriate value for the gateway for
STATICIP2
7. Set the SETIP variable to assign IP addresses eth1 and eth1:1 based on the following table:
Table 20. SETIP Interface Assignments when BOOTPROTO=”dhcp”
8. To activate the changes, at the user prompt (from the root “/” directory), type:
/etc/rc.d/network reload
8.3.2Setting a Hostname
The Command Line Interface (CLI)
The hostname of the CMM is a logical name that is used to identify a particular CMM. This name
is shown at login time just to the left of the login prompt on the serial port interface when
configured (i.e., “MYHOST login:”) and advertised to any DNS servers on a network. If there is no
entry in /etc/HOSTNAME, the login prompt will not have anything next to it. By default, the
hostname is set to the product name (i.e. MPCMM0001).
The hostname should be configured on the each CMM. To change the hostname:
1. Using the vi editor, change the HOSTNAME variable in /etc/HOSTNAME to the desired
name.
2. To activate the changes, at the user prompt (from the root “/” directory), type:
etc/rc.d/network reload
Note: Executing network reload also causes the network interfaces to reload their IP addresses. If DHCP
is being used on a network interface, then it is possible that the IP address on that interface will
change.
8.3.3Setting the Amount of Time for Auto-Logout
For security purposes, the CMM automatically logs the user out of the current console session after
15 minutes (900 seconds). This auto-logout time can be changed by editing /etc/profile and
changing the TMOUT value to the desired setting. The time-out (TMOUT) value is set in seconds
(900 seconds is the default). A setting of TMOUT=0 will disable the automatic logout. This can
also be set at the command line.
On the active CMM, use the date command in the CLI to view the current date and time for the
CMM. To set the date and time on the CMM use the setdate command. The setdate command
should use the following syntax:
setdate “mm/dd/yyyy [timezone] hh:mm:ss”
The date is stored on the CMM in Coordinated Universal Time (UTC). The local timezone can be
included in the setdate string, and the CMM will determine the offset and automatically change the
date to UTC. An example that will set the date and time to “Thu Mar 11 20:12:00 UTC 2004” is:
setdate “3/11/2004 PST 12:12:00”
The date and time are synchronized to the standby CMM when changed and then every hour.
8.3.5Telnet into the CMM
T o telnet into the CMM, point your console or telnet application to the IP address of the eth0, eth1,
or eth1:1 interface on the CMM you wish to telnet to. If you wish to telnet to the active CMM, you
can point the telnet application to the eth1:1 IP address. The “pointing” is accomplished using the
Telnet open command. To get the IP address see Section 8.3.1, “Setting IP Address Properties” on
page 72.
8.3.6Connect Through SSH (Secure Shell)
For a more secure connection, users can connect to the CMM using SSH, or Secure Shell. SSH is a
secure interface and protocol used for encrypted connections between a client and a server. Using
an SSH client, open the IP address of the eth0, eth1, or eth1:1 interface on the CMM you wish to
establish an SSH session with. SSH clients can be found freely available on the Internet.
8.3.7FTP into the CMM
For security purposes, the CMM will prevent users from accessing the CMM through FTP. So
before FTP’ing into the CMM, ensure the “root” entry is removed from the /etc/ftpusers file using
a text editor like vi. If this entry is not removed, you will be unable to login via FTP.
Using an FTP client, FTP to the IP address of the CMM you wish to transfer files to or from and
use the CLI login and password.
8.3.8Rebooting the CMM
T o reboot the CMM, type the reboot command in the CLI on the CMM that is to be rebooted. If the
reboot command is issued on the active CMM in a redundant configuration, a failover to the
standby CMM will occur. If the reboot command is issued on a CMM in a single CMM
configuration, chassis management will be lost during the reboot process. Telnet and SSH sessions
will have to be reestablished with the CMM after it is rebooted.
Note: Do not use the “init 0” or “init 6” command to reboot the CMM as problems may result.
The command line interface on the CMM supports two types of commands: cmmget and cmmset.
cmmget is used to query for information, whereas cmmset is used to write information.
There are man pages available on the CMM for these two commands. To access the man page for
cmmget use the command man cmmget. To access the man page for cmmset, use the command
man cmmset.
8.4.1Cmmget and Cmmset Syntax
The syntax for calling the CLI from the command line is as follows:
cmmget [-h] [-l location] [-t target] -d dataitem
cmmset [-h] [-l location] [-t target] -d dataitem -v value
Where cmmget and cmmset are the CLI executables. The parameters can be in any order. The CLI
is case insensitive, except for the executable name. Parameters shown in brackets are optional.
Any attribute value that contains a space must be enclosed in quotes. This happens often when
specifying targets. For example, to get the current value of a sensor called Brd Temp on the CMM,
the command would be:
cmmget –l cmm –t “Brd Temp” –d current
8.4.2Help Parameter: -h
If the Help parameter is given, the rest of the parameters are ignored, and the help text is output to
the user.
8.4.3Location Parameter: -l
The Location parameter is the location in the system on which the user is executing the cmmget or
cmmset on. If no location is given then the default location is the CMM.
Use the following cmmget command to list all valid locations in the chassis:
cmmget -d listlocations
The Location keywords are shown in the following table.
Table 21. Location (-l) Keywords
KeywordFunction
cmmThe Chassis Management Module.
bladeN
systemThe entire platform.
One of the CPU boards in the chassis. N refers to the chassis slot number
into which the CPU board is inserted. Please refer to the Chassis
documentation for slot information.
The system fantray where N is the number of the fantray. For example,
fantrayN
PEM1
PEM2
fantray1 refers to the single fantray in the MPCHC0001 shelf.
NOTE: fantray1 may also be referred to as blade15 in a 14 slot chassis or
blade17 in a 16 slot chassis.
The system Power Entry Modules. PEM1 is in the left slot when looking
from the front of chassis and PEM2 is in the right slot.
NOTE: PEM1 may also be referred to as blade16 and PEM2 as blade17 in
a 14-slot chassis; correspondingly they can be referred to as
blade18 and blade19 in a 16-slot chassis.
8.4.4Target Parameter: -t
The Target parameter is the sensor or variable that the cmmget or cmmset acts on. If target is
not given then it is assumed that dataitem is an attribute of location. An example of this is
presence. To obtain a list of valid targets for a device, issue the following command:
cmmget [-l location] -d listtargets
Where location is the device for which you want to obtain a list of targets.
The target parameter for plug-in bo ards and different chassis components is defined by the sensor
name in the Sensor Data Record (SDR) for that device. The various boards, fantrays, and PEMs
provide their own SDRs automatically.
The following table shows the values target can be for the CMM location.
Table 22. CMM Targets
Brd TempBoard Temperature
CPU TempCPU Temperature
FilterTrayTemp[1,2]Filter Tray Temperature Sensors
CPU Core VCPU Core Voltage
VBATBattery Voltage
VTT DDRCMM Memory voltage
+2.5V+2.5V voltage sensor
+3.3V+3.3V voltage sensor
+5V+5V voltage sensor
+12V+12V voltage sensor
CDM [1,2]Chassis Data Modules 1 and 2
Air FilterAir Filter
Filter TrayFilter Tray FRU
Filter Run TimeFilter Run Time
BISTBuilt-In Self Test Sensor
FRU Hot SwapFRU Hot Swap sensor
Filter Tray HSFilter Tray Hot Swap sensor
IPMB-0 Snsr [1-16]IPMB 0 sensors
FRUFRU file for CMM
all_ledsTarget for configuring all user-definable LEDs on the CMM front panel
hsledHot swap LED on the CMM front panel
userled[1-4]Corresponds to userled A-D on the CMM front panel
feedN
PmsGlobalTarget for PMS global data
PmsProcNTarget for each process monitored where N is the process number
Datasync StatusDatasync Status Sensor
CMM StatusCMM Status Sensor
PmsPieNProcess monitoring process integrity sensors
NoneSame as not entering a target
KeywordDescription
Corresponds to power feed (i.e. feed1, feed2). Use the feedcount
dataitem to determine number of power feeds for component.
The dataitem is the parameter, identified by target and/or location, that the user is getting or setting.
The dataitem must be given for every CLI command.
8.4.5.1Location Dataitem lists
Table 23 through Table 29 list the valid dataitems for each location when no target is specified.
Table 23. Dataitem Keywords for All Locations
DataitemDescriptionGet/SetCLI Get OutputValid Set Values
listdataItems
health
Used to find out what data
items are available on a target
or location.
Retrieves the health
information about a particular
location or target.
Get
Get
Table 24. Dataitem Keywords for All Locations Except System
Listing of all valid data items that can
be issued for the specified location or
target
"Location/Target has no/minor/major/
critical problems"
N/A
N/A
DataitemDescriptionGet/SetCLI Get OutputValid Set Values
listgetdataitems
listsetdataitems
healthevents
listtargets
Lists all available dataitems
that can be retrieved with
cmmget.
Lists all available dataitems
that can be set with cmmset.
Retrieves events that
contribute to the health of the
location or target. This is a list
of events currently active on
the location or target. Health
events strings are documented
in Section 11, “Health Events”
on page 104
Used to find what sensors or
targets are available on the
location. This is the list of
sensors defined by the SDR
for that particular location.
Get
Get
Get
Get
Listing of all valid get data items that
can be issued for the specified
location or target
Listing of all valid set data items that
can be issued for the specified
location or target
List of currently active events. E.g.
"Major Event : +12V_B Lower critical
going low asserted
Major Event : +12V_A Lower critical
going low asserted"
Listing of all the targets that are
available on the location
major.minor>
IPMI Version = <IPMI version>
Chassis Support = <Additional chassis
device support>
Bridge Support = <Additional bridge
support>
IPMB Event Generator Support =
<Additional IPMB Event Generator
support>
IPMB Event Receiver Support =
<Additional IPMB Event Receiver
support>
FRU Inventory Support = <Additional
FRU inventory device support>
SEL Support = <Additional SEL device
support>
SDR Repository Support = <Additional
SDR Repository device support>
Sensor Support = <Additional sensor
device support>
Manufacturer ID = <Manufacturer ID>
Product ID = <Product ID>
Aux Firmware Revision = <Auxiliary
firmware revision information>"
"<location> activation locked bit is set. If
in M1, <location> cannot transition to M2
until unlocked"
OR
"<location> activation locked bit is not
set. <location> can transition from M1 to
M2"
N/A
1=activate FRU
0=deactivate FRU
1=set locked bit
0=clear locked bit
deviceid
fruactivation
fruactivationpolicy
Retrieves the device’s SDR
support, hardware revision,
firmware / software revision,
and sensor and event
interface command
specification revision
information. Implements Get
Device ID command. See
IPMI 1.5 Specification
Section 17.1.
Set the activation state to
either activate or deactivate
the FRU. The Deactivate is
the same as a Graceful
Shutdown.
Get or Set the FRU
activation policy. A Get
returns whether the “Locked
Bit” is set. For example, if
blade 11 activation locked bit
is set, and if in M1, then
blade 11 cannot transition to
M2 until unlocked. If blade 11
activation locked bit is not set
then blade 11 can transition
from M1 to M2.
Table 25. Dataitem Keywords for All Locations Except Chassis and System (Sheet 2 of 4)
DataitemDescriptionGet/SetCLI Get OutputValid Set Values
Set the FRU payload to do
things like Cold Reset, Warm
Reset, etc.
frucontrol
hotswapstate
fruextractionnotify
ledproperties
picmgproperties
powerlevels
The CMM location only
supports 2 (graceful reboot)
and will only work on standby
CMM. Using frucontrol on an
active or single CMM will
attempt a failover before
executing the command. If
failover is unsuccessful,
fruncontrol will not execute
and return an error.
Retrieves the FRU’s current
M state (0-7).
Used to notify the Shelf
Manager that a FRU has
been extracted from the
shelf. Example is "cmmset -l
<location> -d
fruextractionnotify -v 1"
Find out the number and
type of LEDs the FRU
supports and which LED it
can control. Implements the
Get FRU LED Properties
command. See PICMG 3.0
Section 3.2.5.6.
Query the maximum FRU
Device ID supported by the
IPMI controller. Implements
Get PICMG Properties
command. See PICMG 3.0
Table 3-9.
Returns the power levels
available for a FRU and the
number of watts drawn by
each.
SetN/A
Get"<location> Hot Swap state is M[x]" N/A
SetN/A1=Extract FRU
Information pertaining to number and
control of the LEDs
"<location> has control of <main_leds>
<location> supports
<number_user_leds> user leds"
Where
Get
Get
Get
<location> is the -l parameter (can be a
sub FRU)
<main_leds> is Comma-separated list of
<led> items
<led> is hsled, led1, led2, led3
<number_user_leds> is the decimal
number of user LEDs supported by FRU
"PICMG Properties: < interpreted string
without label>
PICMG Properties ID = <PICMG ID>
PICMG Extension Version = <PICMG
extension version=major.minor>
Max FRU Device ID = <Max FRU device
ID>
FRU Device ID = <FRU device ID for
IPMI controller>"
"ATCA FRU Power Levels:
Power Level 1 = A watts
...
Power Level n = B watts"
Table 25. Dataitem Keywords for All Locations Except Chassis and System (Sheet 4 of 4)
DataitemDescriptionGet/SetCLI Get OutputValid Set Values
Metallic Test Bus Pair #1:
Token Owned: Yes/No
Owner's IPMBAddress: IPMBAddress
Metallic Test Bus Pair #2:
Token Owned: Yes/No
Owner's IPMBAddress: IPMBAddress
Sync Clock Group #1:
Token Owned: Yes/No
busedekeys
Get a list of Bused EKeys
and who owns them.
Get
Owner's IPMBAddress: IPMBAddress
N/A
Sync Clock Group #2:
Token Owned: Yes/No
Owner's IPMBAddress: IPMBAddress
Sync Clock Group #3:
Token Owned: Yes/No
Owner's IPMBAddress: IPMBAddress
Used to query the total
number of FRUs in a
particular location. Once the
number of FRUs for the
totalfrus
frudeactivationpolicy
ipmicommandSet
rawsel
location is known, the FRU
can be specified by the
format "-l location:fru#". Not
specifying the ":fru#" part will
direct the command to FRU
ID 0
Get/Set the deactivation
policy of the FRU. In PICMG
3.0 ECN 1 this refers to the
deactivation locked bit. The
FRU can be specified by the
format “-l [location:fru#]. Not
specifying the “:fru#” will
direct the command to FRU
0.
Used to list SEL in raw
format.
Getinteger numberN/A
Both
Get
where "Owner's IPMBAddress" is
displayed when "Token Owned" is set to
"Yes".
1 - Locked bit is set
0 - Locked bit is not set
Command Response string on success
or error code on failure.
"Listing of raw format SEL log of the
location. The listing is of the format:
<Entry1>\n\n<Entry2>…
Where :
<EntryM> is of the format: <Timestamp in
Linux date
format>\n\t<SensorName>\t<EventDescr
iption>"
dataitemDescriptionGet/SetCLI Get OutputValid Set Values
A numerical value
between 0 and 100
(i.e., “70”),
“localcontrol”, or
“emergencyshutdow
n”
(localcontrol is not
supported on the
MPCHC0001
chassis fan tray.)
Location String less
than 16 characters
in length.
fanspeed
location
Used to get or set the fan
speed of all fans in the
chassis. Value is percent of
the maximum fan speed. See
Section 16, “Fan Control and
Monitoring” on page 132 for
more information.
This is used to get or set the
Location field in the chassis
FRU and is sent out as a part
of SNMP and UDP alarts. This
is only used with the chassis
location.
Both
Both
The percentage of the max speed,
"Emergency Shut Down", or "Local
Control". For example, 80 for 80% of
the max speed.
"Shelf Address: <address>"
Where:
<address> is a space-separated list
of two-digit, hex numbers if the
address’ type/len byte is 0, decoded
string otherwise
Table 27. Dataitem Keywords for Cmm Location (Sheet 2 of 7)
dataitemDescription
Retrieve or set the state of the
Telco Alarm cutoff. When
enabled, it silences the Telco
alarm for active events and
alarmcutoff
alarmtimeout
criticalled
minorled
majorled
Ethernet
EthernetA
EthernetB
blinks the event LEDs on the
CMM. This dataitem is only
valid when used with the cmm
as the location and is used to
set the alarm cutoff or get its
value.
Retrieve or set the timeout
value in minutes for the Telco
Alarm cutoff. This is the
amount of time before the
alarm cutoff will automatically
become unset if the user
doesn’t unset it themselves.
This dataitem is only valid
when used with the cmm as
the location and is used to set
the alarm timeout or get its
value.
Used only with the CMM
location to turn on or off the
critical, major and minor leds.
When used with cmmset, a –v
value of 1 turns the LED on
while a 0 turns it off.
Included for backward
compatibility only.
The mapping of the command
for existing dataitem is:
Ethernet = EthernetA
Used only with the CMM
location to change the eth0/
eth1 direction to either the
front panel, the rear panel IO
card, or backplane.
The mapping of the command
for existing dataitems are:
EthernetA = cmm1EthernetA
+ cmm2EthernetA
EthernetB = cmm1EthernetB
+ cmm2EthernetB
Get/
Set
Both
Both"Timeout is <timeoutvalue> minutes."
Both
Both
Both
"Telco Alarm Cutoff is <enabled/
disabled>."
"1" if the LED is On
"0" if the LED is Off
"front" or "rear" or "backplane"
For example,
bash-2.04# cmmget -d ethernet
cmm1ethernetA: front
cmm2ethernetA: front
"Front" or "Rear" or "Backplane". For
example,
bash-2.04# cmmget -d ethernetA
cmm1ethernetA: front
cmm2ethernetA: front
bash-2.04# cmmget -d ethernetB
cmm1ethernetB: front
cmm2ethernetB: front
Used only with the CMM
location to change the eth0/
eth1 direction to either the
front panel, the rear panel IO
card, or backplane on CMM1
and/or CMM2.
The version of the CMM
software.
Used with the cmmget
command to update the CMM
firmware on the CMM.
In a redundant system,
updates should only be done
on one CMM at a time in order
to maintain chassis
management.
Refer to Section 23, “Updating
CMM Software” on page 204
for additional information.
Returns the list of CMMs in the
shelf and their status.
Used with cmmset from the
active CMM to force a failover
to the standby. This will only
complete successfully if the
standby CMM is in a state
where it can handle a failover.
Enable/Disable RMCP
interface.
Get/
Set
"Front" or "Rear" or "Backplane". For
example,
bash-2.04# cmmget -d
cmm1ethernetA
cmm1ethernetA: front
bash-2.04# cmmget -d
Both
Get
SetN/A
Get
SetN/A
Both
cmm1ethernetB
cmm1ethernetB: front
bash-2.04# cmmget -d
cmm2ethernetA
cmm2ethernetA: front
bash-2.04# cmmget -d
cmm2ethernetB
cmm2ethernetB: front
"Version:
[Generation].[SRA].[Patch].[Build]"
where Generation: Firmware
Generation. SRA: Release in that
Generation. Patch: Patch number.
Build: Build number. E.g.
Version:5.1.0.11
CMM 1: Present (active) *
CMM 2: Not Present (standby)
* = The CMM you are currently logged
into.
"1" - RMCP Enabled
"0" - RMCP disabled
CLI Get OutputValid Set Values
"Front", "Rear", or
"Backplane"
N/A
"<cmm image
location>
ftp:<hostname or IP
address>:username
:password"
N/A
“1” = Failover to
standby CMM with
equal or newer
firmware version.
“any”= Failover to
standby CMM
regardless of
firmware version.
Table 27. Dataitem Keywords for Cmm Location (Sheet 4 of 7)
dataitemDescription
snmpenable
snmptrapaddress[1-5]
snmptrapcommunity
snmptrapport
snmptrapversion
airfilterruntimelimit
resetairfilterruntime
Used to set or query SNMP
trap enabled status.
Get or Set the machine’s IP
address that will receive
SNMP traps from a location.
Up to five addresses can be
set. Default is 0.0.0.0 for all 5.
Example:
cmmset -l cmm -d
snmptrapaddress3 -v
10.10.241.105
Get or Set the SNMP trap
community name.
Example:
cmmget -l cmm -d
snmptrapcommunity
Returns:
SNMP trap community:
publiccmm
Get or Set the TCP/IP port that
the SNMP trap will be sent to.
The default is 162.
Retrieves or sets SNMP trap
version. This is either v1 or v3.
Returns the uppercritical limit.
Note: It uses the sensor to
display the runtime value in
days since the last reset.
To retrieve the
uppernoncritical limit use the
command:
cmmget -t "filter run time" -d
uppernoncritical
(or -d thresholdsall)
Resets the air filter runtime to
0. The set is supported to
allow the user to set the run
time to zero when the filter is
replaced.
Get/
Set
BothSNMP traps are <enabled/disabled>.
"SNMP trap address: IpAddress"
Both
Both
Both"SNMP trap port: portNumber"
Both"SNMP trap version: v1/v3"“v1” or “v3”
Both<uppercritical limit> Days
SetN/A
where IpAddress is of the format
A.B.C.D
"SNMP trap community:
communityValue"
CLI Get OutputValid Set Values
1=Enable
0=Disable
enable
disable
<IpAddress>
Where:
<IpAddress> is a
Valid IP address in
the form: A.B.C.D
<CommunityName
>
Where
<CommunityName
> is any Valid
SNMP community
name 64 characters
or less. Ex:
publiccmm
Valid port number
0-65535
1. Disable eventing
on air filter run time
-v 0
2. Enable eventing
on air filter and set
the uppercritical
limit to (xxx) days,
and it also sets the
upper non-critical
value to 90% of the
uppercritical.
-v xxx
3. Enable eventing
on air filter and set
the uppercritical
limit to (xxx) days
and the upper noncritical to (yyy) days
Table 27. Dataitem Keywords for Cmm Location (Sheet 5 of 7)
dataitemDescription
syncuserledstate
powersequence
loginmessage
cmdlineprompt
FaultLEDColor
Gets/Sets whether the LED
state is synced between the
active and standby CMM.
Used to get/set the power
sequence order, Power
Sequencing Delay,
ShelfManagerControlledActiva
tion in the CDM.
Note: The power sequencing
delay is in tenths of a second
to delay before powering up
any other FRU after powering
this FRU. The value of the
power sequencing delay is
between 0 and 63. Shelf
Manager Controlled Activation
determines if the Shelf
Manager activates the FRU
residing at this location when it
reaches M2.
Used to customize the login
screen message by allowing
user to add the OEM name.
Used to customize the bash
prompt by allowing user to add
the OEM name.
Get/Set the color of the fault/
health LED on the CMM
fronted FRUs (Filter Tray,
CDM) to be used when an
error is reported. Does not
affect CMM Health LED.
Get/
Set
Both"Yes" or "No"“Yes” or “No”
It will be in INI format displayed on the
console as follows:
Table 27. Dataitem Keywords for Cmm Location (Sheet 6 of 7)
dataitemDescription
Used to set or query the
adminstrative state of the PMS
as a whole or an individual
monitored process. A target of
“PmsGlobal” will get/set the
state of the PMS as a whole. A
target of “PmsProc[#]” will get/
set the unique state of an
individual process, where # is
AdminState
RecoveryAction
EscalationAction
ProcessName
OpState
the unique process number for
the process. A target of
"PmsPie[#]" will get/set the
unique state of an PIE, where
# is the unique pie number.
AdminState is CMM-specific
and is not synced between
CMMs. It allows individual
control of each CMMs
adminstate and can be set on
either active or standby CMM.
Used to set or query the
recovery action of a PMS
monitored process. This is
only valid for a target of
"PmsProc[#]. Where "#" is the
unique number for the
process.
Used to set or query the
process restart escalation
action. This is only valid for a
target of "PmsProc[#]. Where
"#" is the unique number for
the process.
Used to query the process
name and associated
command line arguments for a
monitored process. A target of
"PmsProc[#]” will retrieve the
name of an individual process,
where "#" is the unique
number for the process.
"PmsPie[#]" will retrieve
the path and command line
arguments, of the PIE to be
executed periodically.
Used to query the operational
state of a monitored process.
An operational state of
disabled indicates that the
process has failed and cannot
be recovered. This is valid for
a target of "PmsProc[#]” and
“PmsGlobal”, where "#" is the
unique number for the
process, and PmsGlobal
refers to the OpState for all of
PMS. This is also valid for a
target of Pie[#].
Get/
Set
Both"1:Unlocked" or "2:Locked"
Both
Both"1:No Action", "2:Failover and Reboot"
Get
Get"1:Enabled", "2:Disabled"N/A
"1:No Action", "2:Process Restart", "3:
Failover and Restart", or "4:Failover
and Reboot"
"<Process_Name>
<Command_Line_Arguments>"
CLI Get OutputValid Set Values
1 = Unlocked
2 = Locked
1 = No Action
2 = Process Restart
3 = Failover and
Restart
4 = Failover and
Reboot
Used to set or query the
minorlevel for the fantray.
Used to set or query the
normallevel for the fantray.
Used to set or query the control
mode of the fantray.
Used to set or query the
defaultcontrol mode of the
fantray.
Used to restore the cooling table
defaults of the fan tray to the
vendor defaults or code defaults.
Used to query the minimum
setting of the fantray returned
via the getfantray properties
IPMI command.
Used to query the maximum
setting of the fantray returned
via the getfantray properties
IPMI command.
Used to query the
recommended setting of the
fantray returned via the
getfantray properties IPMI
command.
Used to query the current
cooling level of the fantray.
Get/
Set
Both
Both
Both
Both
SetN/Atrue
GetN/A
GetN/A
GetN/A
Get
0
CLI Get OutputValid Set Values
Any value between
the normallevel and
the majorlevel of the
fantray.
Any value between
the minimumsetting
and the minorlevel of
the fantray.
EmergencyShutdown
fantray
CMM
defaultcontrol
fantray
CMM
N/A
8.4.5.2Target Dataitem Lists
When a target is specified, there is usually a slightly different set of dataitems specifically for that
target. Refer to Section 8.4.4, “Target Parameter: -t” on page 78 for more information on the target
parameter. Table 30 lists the possible dataitems used with various targ ets.
Table 30. Dataitem Keywords Used with the Target Parameter (Sheet 1 of 4)
dataitemDescription
listdataitems
health
healthevents
currentThe current value of a sensor.Get
thresholdsall
Lists the available dataitems for
that target.
Returns the health of the target
and if any events exist. The
returned values will be one of
OK, minor, major, or critical.
Returns the specific health
events that are occurring on the
target if any exist.
All thresholds of a sensor. This
includes lower nonrecoverable, lower critical,
lower non-critical, upper noncritical, upper critical, and
upper non-recoverable.
Get
Get
Get
Get
Get/
Set
Listing of all valid data items that can
be issued for the specified location
or target
"Location/Target has no/minor/major/
critical problems"
List of currently active events. E.g.
"Major Event : +12V_B Lower critical
going low asserted
Major Event : +12V_A Lower critical
going low asserted"
"The current value is currentValue
[Units]"
"Upper Non-recoverable:
ThresholdValue [Units]
Upper Critical:
ThresholdValue [Units]
Upper Non-critical:
ThresholdValue [Units]
Lower Non-critical:
ThresholdValue [Units]
Lower Critical:
ThresholdValue [Units]
Lower Non-recoverable:
ThresholdValue [Units]"
CLI Get OutputValid Set Values
N/A
N/A
N/A
N/A
N/A
If a certain threshold is not
supported, the ThresholdValue will
display "Not Supported"
Table 30. Dataitem Keywords Used with the Target Parameter (Sheet 2 of 4)
dataitemDescription
Used to configure user-defined
actions when events occur.
This dataitem is used with a
target (–t) parameter specified
sensor and a value (-v)
parameter. When an event
happens for that particular
sensor, then the script defined
in the –v parameter will be
executed. The script to be
executed must be located in
the /home/scripts/ directory on
the CMM and the /home/scripts
path should be omitted when
specifying the script.
Example:
cmmset -l blade9 -t +5V -d
minoraction -v
“powerdownblade 9”
In this example,
/home/scripts/powerdownblade
will be executed with a
parameter of 9 when the +5V
sensor on blade1 generates a
minor event.
Used to trigger a script based
on event code of a health
event. Refer to Section 18,
“CMM Scripting” on page 164
Gets a FRU LED’s valid color
set. This command returns a
comma separated list of
supported colors, the default
local control color, and the
default override color. This
command should be issued
before a ledstate set command.
Implements the Get LED Color
Capabilities command. See
PICMG 3.0 table 3-24.
Get/
Set
If set, the full path of the script e.g. /
Both
SetN/A
Get
home/scripts/EventScript.
If not set, output is ““ (null).
Color properties of the LED
"<ledtarget> supports <colors>
Default local control color is
<colorList>
Default override color is <color>"
Where:
<ledtarget> is One of the valid LEDs
(hsled, led1, led2, led3, userled1userled251)
<colorList> is Comma-separated list
of <color> items
<color> is one of blue, red, green,
amber, orange, white
CLI Get OutputValid Set Values
"<ScriptName> arg1
arg2 …argN"
Where Script name
(not full path) is the
script file name and
arg1-argN are the
parameters to the
script.
Use “none” to
remove an existing
entry.
“<event
code>:<ScriptName
> arg1 arg2...argN”
Where event code
is the event code
associated with the
event to associate
with the script.
ScriptName (not full
path) is the name of
the script file, and
arg1..argN are any
parameters required
with the script.
Use
“<eventcode>:none”
to remove an
existing entry.
Table 30. Dataitem Keywords Used with the Target Parameter (Sheet 3 of 4)
dataitemDescription
Gets or Sets a FRU LED’s
state. The Get returns the
LED’s mode, one of
{localcontrol, override, or
lamptest}, and a function
message. Implements the Get/
Set FRU LED State
commands. See PICMG 3.0
tables 3-26 for Get and 3-25 for
Set.
Set syntax model:
cmmset -l <location> -t <LED>
ledstate
-d ledstate -v <function>,
<function options>
Example:
cmmset -l cmm -t “userled1” -d
ledstate -v blink,300,700,green
This sets the CMM’s user1 LED
to blinking green with an off
duration of
300 ms and an on duration of
700 ms.
Get/
Set
Both
CLI Get OutputValid Set Values
"<ledtarget> is in <LEDmode> mode
<function message>"
where
<LEDMode> is one of localcontrol/
override/ lamptest
<function message> is one of the
following, depending on the LED’s
current function:
If LED is off:
function is off
If LED is on:
function is on
color is <color>
If LED is blinking:
function is blink
off time is <offtime> ms
on time is <ontime> ms
color is <color>
If LED is under lamp test:
duration is <duration> ms
<Color> is one of blue, red, green,
amber, orange, white
<Offtime> is Time in milliseconds
that the LED is in the off cycle of a
blink
<Ontime> is Time in milliseconds
that the LED is in the on cycle of a
blink
<Duration> is The duration of the
lamp test in milliseconds
Functions:
off, on, blink,
lamptest,
localcontrol
Accepted values:
<offtime>,
<ontime>,<color>,
<duration>
Refer to
Section 12.5,
“Setting the State of
the User LEDs” on
page 125 for more
Get the field in the CDM
regarding the max internal
available current. Only used
with the feedN target.
e.g.
cmmget -l cmm -t feed1 -d
maxinternalcurrent
See Section 7.7, “Power Feed
Targets” on page 69.
Get/Set the field in the CDM
regarding the max expected
operating voltage. Only used
with the feedN target.
e.g.
cmmget -l cmm -t feed1 -d
minexpectedoperatingvoltage
See Section 7.7, “Power Feed
Targets” on page 69.
Get/
Set
Bothcurrent in Amps with 1 decimal point
Getcurrent in Amps with 1 decimal point N/A
Both
voltage value in string between -36
to -72v
CLI Get OutputValid Set Values
current in Amps with
1 decimal point.
voltage value in
string between -36
to -72v
8.4.6Value Parameter: -v
The value parameter specifies the new valu e for a dataitem. This parameter is required for all
cmmset commands and is only used for cmmset commands. Valid value parameters are shown
in with their corresponding dataitems in the data item tables listed above.
8.4.7Sample CLI Operations
Sample CLI Operations can be found in Appendix A , “Exam ple CLI Com mands”.
8.5Generating a System Status Report
The CLI includes an executable script (cmmdump) that is used to generate a system status report
for use in communicating system health and configuration information to technical support
personnel. This is useful in helping technical support successfully troubleshoot any issues that may
be affecting the system. Cmmdump outputs system information to the screen by default or to a file.
To send the output to a file use the following command:
The filename should refer to a file that is in a valid directory (i.e. /home/cmmdump.txt).The file
can then be retrieved off the CMM using FTP (see Section 8.3.7, “FTP into the CMM” on
It may become necessary at some point to reset the CMM password to its default of cmmrootpass.
The CMM has one on board dip switch labeled S2-1 to perform this action. Refer to the Intel® NetStructure™ MPCMM0001 Hardware Technical Product Specification for the location of the
switch. Setting the switch and powering up the CMM will cause the password to reset to its default.
The CMM then needs to be removed and the switch then needs to be turned off again.
9.1Resetting the Password in a Dual CMM System
In redundant systems containing dual CMMs, one active, one standby , the password should be reset
on the standby CMM. Once reset to its default, the default password will synchronize itself to the
active CMM. This prevents the need to perform the reset on both CMMs and a failover.
1. Open the ejector latch on the standby CMM and wait for the blue hot swap LED to illuminate,
indicating the CMM is safe to remove from the system.
2. Remove the standby CMM from the chassis.
3. Set dip switch S2-1 to “on”. The dip switch has a label indicating which way is on.
4. Re-insert the CMM into the system and allow the CMM to fully boot (blue light will go off
when fully booted).
5. An OK health event will occur indicating that the passwords on both CMMs have been reset
and were synched from the standby CMM. A SEL entry will be recorded, and a trap will be
sent out.
6. Once at the login prompt, the password should now be reset to its default of cmmrootpass.
7. Login to the active CMM to ensure the password was reset.
8. Open the ejector latch on the standby CMM and wait for the blue hot swap LED to illuminate,
indicating the CMM is safe to remove from the system.
9. Remove the standby CMM from the chassis.
10. Set dip switch S2-1 back to original “off” position.
11. Re-insert the CMM into the system and allo w the CMM to fu lly boot (blue light will go off
when fully booted).
12. Login to the CMM and operate as normal.
13. Use the passwd command on the active CMM to change the CLI password if desired. The new
password will sync to the standby.
For nonredundant systems that contain only a single CMM, resetting the password will require
removing the CMM. This will cause any boards that are power controlled by the CMM become
unmanaged. Care should be taken to safely shut down boards in the system prior to removing the
CMM.
1. Safely shut down and power off boards being power controlled by the CMM.
2. Remove the CMM from the system.
3. Set dip switch S2-1 to “on”. The dip switch has a label indicating which way is on.
4. Re-insert the CMM into the system and allow the CMM to fully boot (blue light will go off
when fully booted).
5. Once at the login prompt, the password should now be reset to its default of cmmrootpass.
6. Login to the CMM to ensure the password was reset.
7. Remove the CMM from the system.
8. Set dip switch S2-1 back to its original “off” position.
9. Re-insert the CMM into the system and allow the CMM to fully boot (blue light will go off
when fully booted).
10. Login to the CMM and operate as normal.
11. Use the passwd command on the active CMM to change the CLI password if desired.