INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY
ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PRO PERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN
INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEV ER, AND INTEL DISCLAIMS
ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES
RELATING T O FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PA TENT, COPYRIGHT OR OTHER
INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “un defined.” Intel reserves these for
future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
®
The Intel
product to deviate from published specifications. Current characterized errata are available on request.
This Software Technical Product Specification as well as the software described in it is furnished under license and may only be used or copied in
accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to change without notice,
and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or
inaccuracies that may appear in this document or any software that may be provided in associat ion with this document.
Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any
means without the express written consent of Intel Corporation.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an ordering number and are referenced in this document, or other Intel literature may be obtained by calling
1-800-548-4725 or by visiting Intel's website at http://www.intel.com.
AnyPoint, AppChoice, BoardWatch, BunnyPeople, CablePort, Celeron, Chips, CT Media, Dialogic, DM3, EtherExpress, ETOX, FlashFile, i386, i486,
i960, iCOMP, InstantIP , I ntel, Inte l Centrino, I ntel logo, Intel386, I ntel486, I ntel740, Int elDX2, Inte lDX4, IntelSX2, Intel Creat e & Share, Intel GigaBla de,
Intel InBusiness, Intel Inside, Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel Play, Intel Play logo, Intel SingleDriver, Intel
SpeedStep, Intel StrataFlash, Intel TeamStation, Intel Xeon, Intel XScale, IPLink, Itanium, MCS, MMX, MMX logo, Optimizer logo, OverDrive,
Paragon, PC Dads, PC Parents, PDCharm, Pentium, Pentium II Xeon, Pe ntium III Xeon, Pe rformance at Your Command, RemoteExpress, SmartDie,
Solutions960, Sound Mark, StorageExpress, The Computer Inside., The Journey Inside, TokenExpress, VoiceBrick, VTune, and Xircom are
trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
The Intel® NetStructureTM MPCMM0001 Chassis Management Module is a 4U, single-slot CMM
intended for use with AdvancedTCA* PICMG* 3.0 platforms. This document details the software
features and specifications of the CMM. For information on hardware features for the CMM refer
to the Intel
specifications and other material can be found in Appendix B , “Data Sheet Reference.”
The CMM plugs into a dedicated slot in compatible systems. It provides centralized management
and alarming for up to 16 node and/or fabric slots as well as for system power supplies, fans and
power entry modules. The CMM may be paired with a backup for redundant use in highavailability applications.
The CMM is a special purpose single board computer (SBC) with its own CPU, memory, PCI bus,
operating system, and peripherals. The CMM monitors and configures IPMI-based components in
the chassis. When thresholds (such as temperature and voltage) are crossed or a failure occurs, the
CMM captures these events, stores them in an event log, sends SNMP traps, and drives the Telco
alarm relays and alarm LEDs. The CMM can query FRU information (such as serial number,
model number, manufacture date, etc.), detect presence of components (such as fan tray, CPU
board, etc.), perform health monitoring of each component, control the power-up sequencing of
each device, and control power to each slot via IPMI.
®
NetStructure™ MPCMM0001 Hardware Technical Product Specification. Links to
Assumptions: This document assumes some basic Linux* knowledge and the ability to use Linux
text editors such as vi.
1.2Terms Used in this Document
Table 1. Glossary (Sheet 1 of 2)
AcronymDescription
BISTBuilt-In Self Test
CDMChassis Data Module
CLICommand Line Interface
CMMChassis Management Module
DHCPDynamic Host Configuration Protocol
FFSFlash File System
FISFlash Image System
FPGAField-Programmable Gate Arrays
FRUField Replaceable Unit
HSHot Swap
IPMIIntelligent Platform Management
IPMBIntelligent Platform Management Bus
2.1Red Hat* Embedded Debug and Bootstrap (Redboot)
Upon initial power on, the CMM enters into the Redboot firmware to bootstrap the embedded
environment. Upon execution, Redboot acts as a TFTP server and checks for a TFTP connection to
a client. If a TFTP connection exists, Redboot will accept a firmware update that is pushed down
from the client, check the firmware update for data integrity, and then write the update to the flash.
Note: Firmware updates using the Redboot TFTP method are supported for backwards compatibility.
However, updating from within the OS using the CLI is the preferred method of updating CMM
firmware. For information on the firmware update process refe r to Section 23, “Updating CMM
Software” on page 204.
Under normal circumstances, Redboot runs through the standard diagnostics, memory setup,
decompresses the OS kernel, and boots into that kernel.
2.2Operating System
The CMM runs a customized version of embedded BlueCat* Linux* 4.0 on an Intel® 80321
processor with Intel
the web at http://www.lynuxworks.com.
®
XScale® technology. Development support for BlueCat Linux is available on
2.3Command Line Interface (CLI)
The Command Line Interface (CLI) connects to and communicates with the intelligent
management devices of the chassis, boards, and the CMM itself. The CLI is an IPMI-based library
of commands that can be accessed directly or through a higher-level management application.
Administrators can access the CLI through Telnet, SSH, or the CMM’s serial port. Using the CLI,
users can access information about the current state of the system including current sensor values,
threshold settings, recent events, and overall chassis health, access and modify shelf and CMM
configurations, set fan speeds, perform actions on a FRU, etc. The CLI is covered in Section 8,
“The Command Line Interface (CLI)” on page 71.
2.4SNMP/UDP
The chassis management module supports both queries and traps on SNMP (Simple Network
Management Protocol) v1 or v3. The SNMP version can be configured through the CLI interface.
The default is for SNMP v1. A MIB for the entire platform is included with the CMM. The CMM
can send out SNMP traps to up to five trap receivers.
Along with SNMP traps, the CMM sends UDP (User Datagram Protocol) alerts to port 10000. The
content of these UDP alerts is the same as the SNMP traps. SNMP is covered in Section 17,
In addition to the console command-line interface, the CMM can be administered by custom
remote applications via remote procedure calls (RPC). RPC is covered in Section 19, “Remote
Procedure Calls (RPC)” on page 174.
2.6RMCP
RMCP (Remote Management Control Protocol) is a protocol that defines a method to send IPMI
packets over LAN. The RMCP server on the CMM can decode RMCP packages and forward the
IPMI messages to the appropriate channels including: SBC blades, PEMs, and FanTrays or local
destination within the CMM. When there is a responding IPMI message coming from SBC blades,
PEMs, or FanTrays destined to RMCP client, the RMCP server will format this IPMI message into
a RMCP message and send it to through the designated LAN interface back to originator. RMCP is
covered in Section 20, “RMCP” on page 190.
2.7Ethernet Interfaces
The CMM contains two Ethernet ports. The software can configure each of these ports to either the
front panel, to the backplane, or to the rear transition module (RTM). Information on configuring
the Ethernet interfaces is covered in Section 8.3.1, “Setting IP Address Properties” on page 72.
Software Specifications
2.8Sensor Event Logs (SEL)
The AdvancedTCA CMM implements system event logs according to Section 3.5 of the PICMG
3.0 Specification. The SEL contained on the CMM is fully IPMI compliant.
2.8.1CMM SEL Architecture
The MPCMM0001 uses a single flat SEL file stored locally in the /etc/cmm directory. The SEL
maintains a list of all the sensor events in the shelf. Each of the managed devices may keep its own
SEL records in local SELs, but the master copy for the shelf is maintained by the CMM.
The SEL is limited to 65536 bytes. In order to keep the SEL from getting full, which can cause loss
of error logging, the SEL is checked every 15 minutes by the CMM, and if the size of the cmm_sel
is greater than 40000 bytes, the SEL is archived in gzip format and saved in /home/log/SEL. The
names of the saved logs will be cmm_sel.0.gz, cmm_sel.1.gz, and so on, to a maximum of 16 logs
where they are then rolled over.
Note: Archived files should NEVER be decompressed on the CMM as the resulting prolonged flash file
writing could disrupt normal CMM operation and behavior. Using FTP, transfer the files to a
different system before decompressing the archive using utilities such as gzip.
2.8.2Retrieving a SEL
To retrieve a SEL from the CMM, issues the following command:
Where location is one of {cmm, blade[1-14], fantray1, PEM[1-2]}. Even though the CMM uses a
single flat SEL for system events, the ‘cmmget’ command will filter the SEL and only return
events associated with the provided location. Also, some individual FRUs may keep their own
local SELs (i.e., blades).
2.8.3Clearing the SEL
The following command will clear the SEL on both the active and the standby:
cmmset -d clearsel -v clear
Note: Since the CMM uses a single flat SEL for system events, this command clears the entire shelf SEL,
not just a filtered subset.
2.8.4Retrieving the Raw SEL
To retrieve the SEL in its raw format from a location, issue the following command:
cmmget -l [location] -d rawsel
2.9Blade OverTemp Shutdown Script
The CMM software includes predefined script settings specifically for the MPCBL0001 board,
which will automatically shut down a board when the “baseboard temp” sensor on that board
crosses the upper critical threshold. This is done to prevent a runaway thermal event on the board
from occurring. If this functionality is needed when using boards other than the MPCBL0001, the
user will need to associate the name of the thermal sensor and the threshold with the board
shutdown script:
Please refer to Section 18, “CMM Scripting” on page 164 for more information on assocating a
script to an event.
When using the CMM with boards other than the MPCBL0001, as long as there is no sensor name
titled "baseboard temp" associated with the particular board being used, then there is no issue
leaving these settings intact. If needed, to deactivate these settings for each physical slot, use the
command:
where bladeN is the blade, corresponding to the physical slot number, on which to remove the
automatic shutdown setting (blade[1-16]). Please refer to Section 18, “CMM Scripting” on
page 164 for more information on removing script actions.
The CMM supports redundant operation with automatic failover in a chassis using redundant
CMM slots. In systems where two CMMs are present, one acts as the active shelf manager and the
other as standby. Both CMMs monitor each other, and either can trigger a failover if necessary.
Data from the active CMM is synchronized to the standby CMM whenever any changes occur.
Data on the standby CMM is overwritten. A full synchronization between active and standby
CMMs occurs on initial power up, or any insertion of a new CMM.
The active CMM is responsible for shelf FRU information management when CMMs are in
redundant mode.
3.2Synchronization
To ensure critical files on the standby CMM match the data on the active CMM, the active CMM
synchronizes its data with the standby CMM, overwriting any existing data on the standby CMM.
An exception to this is the password reset procedure, detailed in Section 9, “Resetting the
Password” on page 99. When the password reset switch is activated on the standby CMM, the
password will be synchronized to the active CMM.
The CMMs will initially fully synchronize data from the active to the standby CMM just after
booting. An insertion of a new CMM will also cause a full synchronization from the active to the
newly inserted standby. Date and time are synched every hour. Partial synchronization will also
occur any time files are modified or touched via the Linux* “touch” command with the exception
of all *.sif and *.bin files in the /etc/cmm directory.
The *.sif (ALL SIF files), and *.bin (SDR Files) files under /etc/cmm are synchronized only once
(when the CMMs establish communication). A 'touch' on those files at any later time will not
perform a sync operation. Also, any updates to these files always happen as part of the software
updates and not in isolation.
Note: During synchronization, the health event LEDs on the standby CMM may blink on and off as the
health events that were logged in the SEL are synchronized.
Below is a list of items that are synchronized between CMMs. During a full synchronization, all of
these files and data are synchronized. A change to any of these files results in that file being
synched. The active CMM overwrites these files on the standby CMM.
There are two "levels" of files that get synchronized. In order to normally manage the chassis, the
priority 1 files must be synchronized after power up or installation of a brand new CMM into the
chassis. It is absolutely necessary that a standby CMM has the priority one files synched before a
successful failover can occur. When a brand new CMM boots the first time as a standby, if a CMM
failover is forced before all priority 1 data items are synchronized to the standby CMM, the standby
CMM can still become the active CMM but may not be able to properly manage the FRUs in the
chassis.
Table 2. CMM Synchronization (Sheet 1 of 2)
File(s) or DataDescriptionPathPriority
date and timeDate and timeIPMB1
IP Address Settings
/etc/cmm.cfgCMM’s main configuration fileEthernet 1
/etc/cmm/cmm_selSystem SELEthernet 1
/etc/cmm/sensors.iniSensor Set ValuesEthernet 1
Ekey Controller StructuresEkey Controller StructuresEthernet 1
Bused EKey Token infoBused EKey Token infoEthernet 1
IPMB User StatesIPMB User StatesEthernet 1
Fan StatesFan StatesEthernet 1
Cooling StateCooling State InformationEthernet 1
User LED StatesUser LED StatesEthernet 1
SDR structures and SIPI Controller InfoSDR structures and SIPI Controller InfoEthernet 1
PHM FRU state, Power Usage and
Power Info
FIM FRU Cache (Local and Temp)FIM FRU Cache (Local and Temp)Ethernet 1
SEL TimeSEL TimeIPMB1
SEL EventsIndividual SEL EventsIPMB1
/etc/cmm/fantray.cfgFantray settings needed by cooling manager Ethernet 1
Recovery Action and escalation action for all
the monitored processes except monitor
process
Recovery action and escalation action for
monitor process
Note: The /.rhosts file is used for synchronization and should NEVER be modified.
3.3Heterogeneous Synchronization
Beginning in version 5.2 firmware, the CMM can synchronize data between differing CMM
versions. The firmware delineates synchronization from firmware versioning, thus allowing
seamless synchronization between all CMM versions. A form of internal data versioning
maintained by the CMM helps achieve this.
Note: SDR/SIF and user scripts differ slightly in synchronization architecture as described below .
Ethernet 2
Ethernet 2
3.3.1SDR/SIF Synchronization
Sensor Data Records (SDRs) and Sensor Information Files (SIFs) will be synchronized only
between CMMs having the same version for this data item (even if the CMM firmware versions
differ).
3.3.2User Scripts Synchronization and Configuration
By default, user scripts are synchronized only between CMM’s with same firmware versions. User
can control the user scripts synchronization irrespective of CMM version differences by modifying
the value of a configuration flag - "SyncUserScripts" (in the CMM configuration file, cmm.cfg
under /etc). The configuration flag can be modified using the cmmget/cmmset commands. This
flag can be read/set through any of the CMM interfaces (i.e., CLI, SNMP and RPC).
Only when CMM firmware versions differ will the value of this flag determines if user scripts
should be synchronized or not. Between same firmware versions, the user scripts directory will
continue to be synchronized and this flag ignored.
3.3.2.1Setting User Scripts Sync Configuration Flag
T o set the value of the Scripts Synchronization configuration flag, the following CMM command is
used:
upgrade: Synchronizes user scripts only when the other CMM has a newer firmware version.
downgrade: Synchronizes user scripts only when the other CMM has an older firmware version.
always: Synchronizes user scripts irrespective of version differences.
3.3.2.2Retrieving User Scripts Sync Configuration Flag
To retrieve the value of the Scripts Synchronization configuration flag, the following CMM
command is used:
cmmget -l cmm -d syncuserscripts
The value returned will be one of: Equal, Upgrade, Downgrade, Always, or Error on failure.
3.3.3Synchronization Requirements
For synchronization to occur:
• The CMMs must be able to communicate with each other over their dedicated IPMB. The
CMMs use a heartbeat via their dedicated IPMB to determine if they can communicate with
each other over IPMB.
• An Ethernet connection must exist between the two CMMs. The CMMs must be able to ping
each other via Ethernet for synchronization to be successful. This can be a connection through
the Ethernet switches in the chassis, which requires both switches to be present in the chassis;
a connection can occur through an external Ethernet switch connected to the front ports of the
CMM pair, or alternatively, the connection can be a crossover cable connecting the two front
ports of the CMM pair. If synchronization fails on eth1, then it will be attempted on eth0. If the
CMMs cannot successfully ping each other via eth0 or eth1, then synchronization between the
CMMs cannot occur.
A failure of any priority 1 synchronization will result in a health event being logged in the CMM
SEL and will inhibit a failover from occurring.
3.4Initial Data Synchronization
It is absolutely necessary that a standby CMM has the priority one files synched before a successful
failover can occur. A standby CMM can still become active if all priority one synchronization has
not been completed, but it may not be able to properly manage all the FRU’s in the chassis.
The CMM implements the “Datasync Status” sensor to determine the state of synchronization and
if synchronization has completed. successfully.
3.4.1Initial Data Sync Failure
If CMM encounters any failure during data synchronization it marks the data synchronization
failure and logs a SEL event and sends an SNMP trap. Duplicate failures are not reported multiple
times. As soon as CMM is out of failure condition it will reset data synchronization failure state.
The CMM will continue trying to synchronize as long as there are two CMMs present in the
chassis and they are able to communicate via their cross-connected IPMB.
A sensor named “Datasync Status” exists in order to make the Datasync state information available
to the user. This sensor tracks the status of the Datasync module and will make its status available
through the various CMM interfaces. This sensor is used to query the data synchronization states,
and log SEL events for initial synchronization complete event. It is a discrete OEM sensor with
status bits representing the state of different parts of the Datasync module.
Note: The Datasync Status sensor can only be queried through the active CMM.
3.5.1Sensor bitmap
When the Datasync starts the first time through in a dual CMM system and whenever the CMM
changes between Active and Standby, the status bits are all cleared to 0x0000.
• Bit 0 (Running) is set when the datasync module is active.
• Bit 1 (P1Done) is set when the priority 1 data syncs are done, and cleared when priority 1 data
needs to be synced.
• Bit 2 (P2Done) is set when the priority 2 data syncs are done, and cleared when a priority 2
data needs to be synced.
• Bit 3 (InitSyncDone) is set when both priority 1 and priority 2 data syncs are done, and stays
set (latches) until the CMM changes between Active and Standby, or looses contact with the
partner CMM.
• Bit 4 (SyncError) is set if an error was detected, and cleared when no data items have errors.
Redundancy, Synchronization, and Failover
3.5.2Event IDs
The “Datasync Status” sensor will use event ids 0x420 to 0x42f. The following new event ids are
used to log various events for these requirements. These event ID’s can be used to associated
scripts with the respective events.
EventEvent ID
Initial Data Synchronization complete0x420 (1056)
3.5.3Querying the Datasync Status
The status of the data synch sensor can be queried using the following CLI command:
• When initial data synchronization is complete, the following SNMP trap is generated:
[Month] [Date] [Time] [hostname] snmptrapd[xxxxx]: [IP Address]:
Enterprise Specific Trap (25) Uptime: [Time], SNMPv2SMI::enterprises.343.2.14.1.5 = STRING: "Time : [Day] [Month] [Date] [Time] [Year], Location : [location] , Chassis Serial # : [xxxxxxxx],
Board : CMM[x] , Sensor : CMM[x]:Datasync Status , Event : Initial Data
Synchronization is complete. Asserted "
3.5.6System Health
The “Datasync Status” sensor will not contribute to the system health. However sync failures are
captured by the “File Sync Failure” sensor and it contributes to the system health
3.6CMM Failover
Once information is synchronized between the redundant CMMs, the active CMM will constantly
monitor its own health as well as the health of the standby CMM. In the event of one of the
scenarios listed in the sections that follow, the active CMM will automatically failover to the
standby CMM so that no management functionality is lost at any time.
3.6.1Scenarios That Prevent Failover
The following are reasons a failover can NOT occur:
• The active CMM can NOT communicate with the standby CMM via their IPMB bus.
• Not all priority 1 data has been completely synchronized between the CMMs.
To determine the active CMM at anytime, use the CLI command:
cmmget -l cmm –d redundancy
This command will output a list stating if both CMMs are present, which one is the active CMM,
and which CMM you are logged in to. CMM1 is the CMM on the left when looking from the front
of the chassis, and CMM2 is on the right.
3.6.2Scenarios That Failover to a Healthier Standby CMM
The scenarios listed below can only cause a failover if the standby CMM is in a healthier state than
the active CMM. The health of the CMM is determined by computing a CMM health score, which
is equal to the sum of the weights of the following active conditions. A CMM health score is
determined for each CMM whenever any of these conditions occur on the active CMM. The CMM
health score is composed of the sum of the weights of any of the three conditions listed below. Each
condition has a default weight of 1 assigned to it, causing all conditions to have equal importance
in causing failover.
To determine if a failover is necessary when one of these conditions occurs, the active CMM
computes its CMM health score, and requests the health score of the standby CMM. If the score of
the standby CMM is LESS than the score of the active CMM, a failover will occur. If a failover
does not occur, the CMM SEL will contain an entry indicating the reason failover did not occur.
The active CMM will failover to the standby CMM if the active CMM cannot ping its first
SNMP trap address (SNMPTrapAddress1) over any of the available Ethernet ports, but the
standby CMM can. The trap address is set using the command:
cmmset –l cmm –d snmptrapaddress1 –v [ip address]
Only a ping failure of the first SNMP trap address (SNMPTrapAddress1) can cause a failover.
SNMPtrapaddress2 through SNMPtrapaddress5 do not perform this ping test.
Note: The frequency of the ping to the first trap address can vary from one second to approximately 20
seconds.
2. Critical events on the active CMM:
The active CMM has critical events for any of the CMM sensors (not critical chassis or blade
events) and the standby CMM does not. If both CMMs have critical CMM events, then the
number of major and minor CMM events is examined to decide if a failover should occur. The
number of major events is compared, and if they are equal, the number of minor events is used.
3.6.3Manual Failover
The following command can be issued to the active CMM to manually cause a failover to the
standby CMM:
Redundancy, Synchronization, and Failover
cmmset -l cmm -d failover -v [1/any]
Where:
1: Will failover only to a CMM with the same or newer version of firmware.
any: Will failover to any version of firmware.
A manual failover can only be initiated on the active CMM. A failover will only occur if the
standby CMM is at least as healthy as the active CMM. Once the command executes, the former
standby CMM immediately becomes the active CMM.
If the failover could not occur, the CLI will indicate the reason why the failover could not occur,
and a SEL event will be recorded.
In addition, opening the ejector latch on the active CMM will initiate a failover, but only if the
standby is at least as healthy as the active.
3.6.4Scenarios That Force a Failover
The following scenarios cause a failover as long as the standby CMM is operational, even when it
is less healthy than the active:
• The active CMM is pulled out of the chassis.
• The active CMM’s healthy signal is de-asserted.
• A “reboot” command issued to the active CMM.
• The front panel alarm quiet switch button on the active CMM is pushed for more than five
seconds. If the button continues to be pressed for more than 10 seconds, the CMM does not
reset.
The CMM Ready Event is a notification mechanism that informs the user when all CMM modules
are fully up and running. The CMM is ready to process any request after receiving this event.
The CMM uses the "CMM Status" sensor when generating the CMM Not Ready event. Please
refer to Table 46, “CMM Status Event Strings (CMM Status)” on page 118 for CMM status event
strings.
Table 3. CMM Status Event Strings (CMM Status)
Event StringEvent CodeEvent Severity
“CMM is not ready.”1024Minor
“CMM is ready.”1025OK
“CMM is Active”1026OK
“CMM is Standby”1027OK
“CMM ready timed out”1028Minor
A CMM Not Ready Assertion SEL event is generated on a CMM when it transitions from standby
mode to active mode during a failover or on the active CMM on power up. The event is only
generated on the newly active CMM. The “CMM is Ready” event is generated after all CMM
modules (board wrapper processes) are up and running and the SNMP daemon is active.