Intel MPCMM0001 User Manual

Intel® NetStructure™ MPCMM0001 Chassis Management Module
Software Technical Product Specification
April 2005
Order Number: 273888-007
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PRO PERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEV ER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING T O FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PA TENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “un defined.” Intel reserves these for
future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
®
The Intel product to deviate from published specifications. Current characterized errata are available on request.
This Software Technical Product Specification as well as the software described in it is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in associat ion with this document.
Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the express written consent of Intel Corporation.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an ordering number and are referenced in this document, or other Intel literature may be obtained by calling
1-800-548-4725 or by visiting Intel's website at http://www.intel.com. AnyPoint, AppChoice, BoardWatch, BunnyPeople, CablePort, Celeron, Chips, CT Media, Dialogic, DM3, EtherExpress, ETOX, FlashFile, i386, i486,
i960, iCOMP, InstantIP , I ntel, Inte l Centrino, I ntel logo, Intel386, I ntel486, I ntel740, Int elDX2, Inte lDX4, IntelSX2, Intel Creat e & Share, Intel GigaBla de, Intel InBusiness, Intel Inside, Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel Play, Intel Play logo, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel TeamStation, Intel Xeon, Intel XScale, IPLink, Itanium, MCS, MMX, MMX logo, Optimizer logo, OverDrive, Paragon, PC Dads, PC Parents, PDCharm, Pentium, Pentium II Xeon, Pe ntium III Xeon, Pe rformance at Your Command, RemoteExpress, SmartDie, Solutions960, Sound Mark, StorageExpress, The Computer Inside., The Journey Inside, TokenExpress, VoiceBrick, VTune, and Xircom are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
*Other names and brands may be claimed as the property of others. Copyright © 2005, Intel Corporation. All rights reserved.
NetStructureTM MPCMM0001 Chassis Management Module may contain design defects or errors known as errata which may cause the
Contents
Contents
1 Introduction....................................................................................................................................16
1.1 Overview.............................................................................................................................16
1.2 Terms Used in this Document ............................................................................................16
2 Software Specifications .................................................................................................................18
2.1 Red Hat* Embedded Debug and Bootstrap (Redboot).......................................................18
2.2 Operating System........................ ... ... .... ... ... ... ....................................... ... .... ... ... ... ... ..........18
2.3 Command Line Interface (CLI) ...........................................................................................18
2.4 SNMP/UDP.........................................................................................................................18
2.5 Remote Procedural Call (RPC) Interface............................................................................19
2.6 RMCP .................................................................................................................................19
2.7 Ethernet Interfaces .............................................................................................................19
2.8 Sensor Event Logs (SEL) ...................................................................................................19
2.8.1 CMM SEL Architecture ..........................................................................................19
2.8.2 Retrieving a SEL....................................................................................................19
2.8.3 Clearing the SEL....................................................................................................20
2.8.4 Retrieving the Raw SEL.........................................................................................20
2.9 Blade OverTemp Shutdown Script .....................................................................................20
3 Redundancy, Synchronization, and Failover.................................................................................21
3.1 Overview.............................................................................................................................21
3.2 Synchronization ..................................................................................................................21
3.3 Heterogeneous Synchronization.........................................................................................23
3.3.1 SDR/SIF Synchronization......................................................................................23
3.3.2 User Scripts Synchronization and Configuration ...................... ... .... ......................23
3.3.3 Synchronization Requirements..............................................................................24
3.4 Initial Data Synchronization................................................................................................24
3.4.1 Initial Data Sync Failure.........................................................................................24
3.5 Datasync Status Sensor .....................................................................................................25
3.5.1 Sensor bitmap........................................................................................................25
3.5.2 Event IDs ...............................................................................................................25
3.5.3 Querying the Datasync Status...............................................................................25
3.5.4 SEL Event..............................................................................................................27
3.5.5 SNMP Trap............................................................................................................27
3.5.6 System Health .......................................... ... ... .... ...................................... ... .... ... ...28
3.6 CMM Failover .....................................................................................................................28
3.6.1 Scenarios That Prevent Failover . ... ... .... ... ... ... .... ... ... ... ... .......................................28
3.6.2 Scenarios That Failover to a Healthier Standby CMM...........................................28
3.6.3 Manual Failover .....................................................................................................29
3.6.4 Scenarios That Force a Failover..... ... .......................................... .... ... ...................29
3.7 CMM Ready Event..............................................................................................................30
4 Built-In Self Test (BIST).................................................................................................................31
4.1 BIST Test Flow ...................................................................................................................31
4.2 Boot-BIST ...........................................................................................................................33
4.3 Early-BIST ..........................................................................................................................33
4.4 Mid-BIST.............................................................................................................................33
MPCMM0001 Chassis Management Module Software Technical Product Specification 3
Contents
4.5 Late-BIST............................................................................................................................33
4.6 QuickBoot Feature..............................................................................................................34
4.6.1 Configuring QuickBoot.................................... ... .... ... .......................................... ...34
4.7 Event Log Area and Event Management............................................................................35
4.8 OS Flash Corruption Detection and Recovery Design .......................................................35
4.8.1 Monitoring the Static Images.................................................................................35
4.8.2 Monitoring the Dynamic Images............................................................................36
4.8.3 CMM Failover .................... ... ... .... ... ... ... .... ... ... .......................................................36
4.9 BIST Test Descriptions.......................................................................................................36
4.9.1 Flash Checksum Test............................................................................................36
4.9.2 Base Memory Test.................................................................................................36
4.9.3 Extended Memory Tests . ... ... ... .... ... ... .......................................... ... .......................36
4.9.4 FPGA Version Check.............................................................................................37
4.9.5 DS1307 RTC (Real-Time Clock) Test ...................................................................37
4.9.6 NIC Presence/Local PCI Bus Test.........................................................................37
4.9.7 OS Image Checksum Test........................... ... ... .... ... ... ..........................................37
4.9.8 CRC32 Checksum.................................................................................................37
4.9.9 IPMB Bus Busy/Not Ready Test............................................................................38
5 Re-enumeration.............................................................................................................................39
5.1 Overview.............................................................................................................................39
5.2 Re-enumeration on Failover...............................................................................................39
5.3 Re-enumeration of M5 FRU................................................................................................40
5.4 Resolution of EKeys .......................... ... .... ... ... ... .... ... ..........................................................40
5.5 Events Regeneration..........................................................................................................40
6 Process Monitoring and Integrity...................................................................................................41
6.1 Overview.............................................................................................................................41
6.1.1 Process Existence Monitoring ........................ ... .... ... ... ... ... .... ... ... ... .... ... ... ... ... .... ...41
6.1.2 Thread Watchdog Monitoring ............................ .... ......................................... .... ...41
6.1.3 Process Integrity Monitoring.... .... ... ... ... .... ... ... ... .... ... ... ... .......................................42
6.2 Processes Monitored..........................................................................................................42
6.3 Process Monitoring Targets...... ... ... ... .......................................... ... .... ... ... ... .... ...................42
6.4 Process Monitoring Dataitems.................................. ... ... .... ... ... ... ... .... ... .............................43
6.4.1 Examples...............................................................................................................43
6.5 SNMP MIB Commands.......................................................................................................44
6.6 Process Monitoring CMM Events .......................... ... ... ... .......................................... ... .... ...44
6.7 Failure Scenarios and Eventing..........................................................................................45
6.7.1 No Action Recovery........ ... ... ... .... .......................................... ... ... ..........................45
6.7.2 Successful Restart Recovery.................................................................................46
6.7.3 Successful Failover/Restart Recovery...................................................................47
6.7.4 Successful Failover/Reboot Recovery...................................................................48
6.7.5 Failed Failover/Reboot Recovery, Non-Critical......................................................48
6.7.6 Failed Failover/Reboot Recovery, Critical .............................................................49
6.7.7 Excessive Restarts, Escalate No Action......................... ... .... ... .............................50
6.7.8 Excessive Restarts, Successful Escalate Failover/Reboot....................................51
6.7.9 Excessive Restarts, Failed Escalate Failover/Reboot, Non-Critical ......................52
6.7.10 Excessive Restarts, Failed Escalate Failover/Reboot, Critical ..............................52
6.7.11 Process Administrative Action..... ... ... ... .... .......................................... ... ... ... ... .... ...53
6.7.12 Excessive Failover/Reboots, Administrative Action..................... ... .......................54
Contents
6.8 Process Integrity Executable (PIE).....................................................................................54
6.9 Configuring pms.ini.............................................................................................................55
6.9.1 Global Data........... .... ... ... ... ... ....................................... ... .... ... ................................55
6.9.2 Process Specific Data............................................................................................56
6.9.3 Process Definition Section of pms.ini.....................................................................58
6.10 Process Integrity Executable (PIE) Specific Data Config ...................................................64
6.10.1 PIE Section Name ....................................... ... .... ... ... ... ... .... ... ... .............................64
6.10.2 Process Integrity Executable .................................................................................65
6.10.3 Unique ID...............................................................................................................65
6.10.4 Administrative State...............................................................................................65
6.10.5 Process Integrity Interval .......................... ... ... .... ... ... ... ..........................................66
6.10.6 Chassis Applicability.......... ....................................... ... ... .... ... ... ... .... ... ... ... .............66
6.10.7 PmsPieSnmp Command Line................................................................................66
6.10.8 SNMP PIE Section of pms.ini ................................................................................66
6.11 WP/BPM PIE ......................................... ... ... ... .... ... ....................................... ... ... ... ... .... ......67
6.11.1 WP/BPM Section of pms.ini...................................................................................67
7 Power and Hot Swap Management...............................................................................................68
7.1 Hot Swap States.................................................................................................................68
7.2 FRU Insertion......................................................................................................................68
7.3 Graceful FRU Extraction........ ... ... ... ... .... ...................................... .... ... ... ... .... ... ...................68
7.4 Surprise FRU Extraction/IPMI Failure.................................................................................69
7.5 Forced Power State Changes.............................................................................................69
7.6 Power Management on the Standby CMM............ ... ... .... ... ... .............................................69
7.7 Power Feed Targets ...........................................................................................................69
7.8 Pinging IPMI Controllers........................... ... ... ....................................... ... .... ... ... ... ... .... ... ...70
8 The Command Line Interface (CLI)...............................................................................................71
8.1 CLI Overview ......................................................................................................................71
8.2 Connecting to the CLI.........................................................................................................71
8.2.1 Connecting through a Serial Port Console ............................................................71
8.3 Initial Setup— Logging in for the First Time. ... .... ... ....................................... ... ... ... ... .... ... ...72
8.3.1 Setting IP Address Properties...... ... ... .... ... ... ... .... ... .......................................... ... ...72
8.3.2 Setting a Hostname ............................... ... ... ... .... ... ... ... ... .......................................75
8.3.3 Setting the Amount of Time for Auto-Logout .........................................................75
8.3.4 Setting the Date and Time..................... ... ... ... .... ... ... ... ....................................... ...76
8.3.5 Telnet into the CMM ..............................................................................................76
8.3.6 Connect Through SSH (Secure Shell)...................................................................76
8.3.7 FTP into the CMM. ....................................... ... .... ... ... ....................................... ... ...76
8.3.8 Rebooting the CMM...............................................................................................76
8.4 CLI Command Line Syntax and Arguments .......................................................................77
8.4.1 Cmmget and Cmmset Syntax......... ... .... ... ... ... .... ......................................... .... ......77
8.4.2 Help Parameter: -h ............................................. ... ... ... ... .... ... ................................77
8.4.3 Location Parameter: -l ...........................................................................................77
8.4.4 Target Parameter: -t .............................................................................. ... ... .... ... ...78
8.4.5 Dataitem Parameter: -d ............................................... ....................................... ...80
8.4.6 Value Parameter: -v...............................................................................................97
8.4.7 Sample CLI Operations .........................................................................................97
8.5 Generating a System Status Report...................................................................................97
MPCMM0001 Chassis Management Module Software Technical Product Specification 5
Contents
9 Resetting the Password.................................................................................................................99
9.1 Resetting the Password in a Dual CMM System............................ .... ... ... ... .... ... ... ... ..........99
9.2 Resetting the Password in a Single CMM System ...........................................................100
10 Sensor Types ......................... ... ... ... .... ...................................... .... ... ... ... ... .... ... ...........................101
10.1 CMM Sensor Types........ ... ... .... ... ... ... ... .... ... .....................................................................101
10.2 Threshold-Based Sensors................................................................................................101
10.2.1 Threshold-Based Sensor Events.........................................................................101
10.3 CMM Voltage/Temp Sensor Thresholds...........................................................................102
10.4 Discrete Sensors ............................ ... ... .... ... ... ... ...............................................................102
10.4.1 Discrete Sensor Events..... ....................................... ... ... ... .... ... ... ... .... ... ... ...........103
11 Health Events..............................................................................................................................104
11.1 Syntax of Health Event Strings.........................................................................................104
11.1.1 Healthevents Query Event Syntax.......................................................................104
11.1.2 SEL Event Syntax................................. .... ... ... ... .... ... ... ... ... .... ... ... ... .... ... ... ...........104
11.1.3 SEL Sensor Types................................ .... ... ... ... ..................................................105
11.1.4 SNMP Trap Event Syntax....................................................................................105
11.2 Sensor Targets..................... .... ... ... ... ... .... ... .....................................................................106
11.3 Healthevents Queries.......................................................................................................107
11.3.1 HealthEvents Queries for Individual Sensors. ... .... ..............................................107
11.3.2 HealthEvents Queries for All Sensors on a Location...........................................108
11.3.3 No Active Events ............... ... ... .... ... ... ... .... ...........................................................108
11.3.4 Not Present or Non-IPMI Locations............................. ... ... .... ... ... ... .....................108
11.4 List of Possible Health Event Strings................................................................................108
11.4.1 All Locations ........................................................................................................109
11.4.2 CMM Location......................................................................................................115
11.4.3 Chassis Location .................................................................................................120
11.5 IPMI Error Completion Codes...........................................................................................120
11.5.1 Configuring IPMI Error Completion Codes ........................ .... ... ... ... .... ... ... ... ... .... .121
11.5.2 IPMI/IMB Error Message Format.........................................................................121
12 Front Panel LEDs...................... ..................................................................................................123
12.1 LED Types and States......................................................................................................123
12.1.1 Alarm LEDs..................... ... ... ... ....................................... ... .... ... ... ... .... .................123
12.1.2 Health LED ..........................................................................................................124
12.1.3 Hot Swap LED.....................................................................................................124
12.1.4 User Definable LEDs...........................................................................................124
12.2 Retrieving a Location’s LED properties ............................................................................124
12.3 Retrieving Color Properties of LEDs.................................................................................124
12.4 Retrieving the State of LEDs .......................... ....................................... ... ... .... ... ... ... ... .... .125
12.5 Setting the State of the User LEDs . ... ... ............................................................................125
12.6 LED Boot Sequence.........................................................................................................126
13 Node Power Control....................................................................................................................127
13.1 Node Operational State Management..............................................................................127
13.2 Obtaining the Power State of a Board............................ .... ... ... ... ... ..................................127
13.3 Controlling the Power State of a Board ............................................................................127
13.3.1 Powering Off a Board .............. .... ... ... ... .... ... ... ... .... ... ...........................................1 27
13.3.2 Powering On a Board .................. ... .....................................................................127
Contents
13.3.3 Resetting a Board................................................................................................128
14 Electronic Keying Manager..........................................................................................................129
14.1 Point-to-Point EKeying......................................................................................................129
14.2 Bused EKeying .................................................................................................................129
14.3 EKeying CLI Commands .......... ... ... ... .... ... .......................................... ... ... ........................1 29
15 CDMs and FRU Information .......................................... ... .......................................... ... ... ... .... ....130
15.1 Chassis Data Module......... .... ...................................... .... ... ... ... ... .... ... ... ... .... ....................130
15.2 FRU/CDM Election Process .............................................................................................130
15.3 FRU Information ...............................................................................................................130
15.4 FRU Query Syntax............................................................................................................131
16 Fan Control and Monitoring.........................................................................................................132
16.1 Automatic Fan Control......................................................................................................132
16.2 Querying Fan Tray Sensors - FantrayN location ..............................................................132
16.3 Fantray Cooling Levels.....................................................................................................132
16.4 CMM Cooling Manager Temperature Status....................................................................132
16.5 CMM Cooling Table..........................................................................................................133
16.5.1 Setting Values in the Cooling Table.....................................................................133
16.6 Control Modes for Fan Trays............................................................................................134
16.6.1 CMM Control Mode..............................................................................................134
16.6.2 Fantray Control Mode..........................................................................................134
16.6.3 Emergency Shutdown Control Mode...................................................................134
16.6.4 User Initiated Mode Change................................................................................135
16.6.5 Automatic Mode Change .....................................................................................135
16.7 Getting Temperature Statuses..........................................................................................135
16.8 Fantray Properties ............................................................................................................136
16.9 Retrieving the Current Cooling Level................................................................................136
16.10 Fantray Insertion...............................................................................................................136
16.11 Default Cooling Values .....................................................................................................137
16.11.1 Vendor Defaults ...................................................................................................137
16.11.2 Structure of /etc/cmm/fantray.cfg.........................................................................138
16.11.3 Code Defaults......................................................................................................138
16.11.4 Restoring Defaults ...............................................................................................138
16.12 Firmware Upgrade/Downgrade.........................................................................................138
16.13 Chassis vs. Fantray .................................. ... ... .... ... ... ........................................................139
16.14 Legacy Method of Querying/Setting Fan Speed...............................................................139
17 SNMP..........................................................................................................................................140
17.1 CMM MIB..........................................................................................................................141
17.2 MIB Design .......................................................................................................................141
17.2.1 MIB Tree..............................................................................................................141
17.2.2 CMM MIB Objects................................................................................................142
17.3 SNMP Agent.....................................................................................................................158
17.3.1 Configuring the SNMP Agent Port.......................................................................158
17.3.2 Configuring the Agent to Respond to SNMP v3 Requests ................................. .158
17.3.3 Configuring the Agent Back to SNMP v1.............................................................159
17.3.4 Setting up an SNMP v1 MIB Browser..................................................................159
17.3.5 Setting up an SNMP v3 MIB Browser..................................................................159
17.3.6 Changing the SNMP MD5 and DES Passwords..................................................159
Contents
17.4 SNMP Trap Utility.............................................................................................................160
17.4.1 Configuring the SNMP Trap Port.........................................................................160
17.4.2 Configuring the CMM to Send SNMP v3 Traps...................................................160
17.4.3 Configuring the CMM to Send SNMP v1 Traps...................................................160
17.5 Configuring and Enabling SNMP Trap Addresses................................. ... ... .... ... ... ... ... .... .160
17.5.1 Configuring an SNMP Trap Address ...................................................................161
17.5.2 Enabling and Disabling SNMP Traps ..................................................................161
17.5.3 Alerts Using SNMP v3....... ... ... .... ........................................................................161
17.5.4 Alert Using UDP Alert..........................................................................................161
17.6 SNMP Security .................................................................................................................162
17.6.1 SNMP v1 Security................................................................................................162
17.6.2 SNMP v3 Security - Authentication Protocol and Privacy Protocol .....................162
17.7 SNMP Trap Descriptions..................................................................................................162
17.8 Snmpd.conf File. ... ... .... ... ... ....................................... ... ... .... ... ... ... ... .... ... ... ........................163
18 CMM Scripting.... ... .... ... ... ... .... ... ....................................... ... ... ... .... ... ... ... .....................................164
18.1 CLI Scripting.....................................................................................................................164
18.1.1 Script Synchronization................................. ... ... .... ... ... ... ... .... ... ... ... .....................164
18.2 Event Scripting................................ ... ... .... ... ... ... .... ... ....................................... ... ... ... ........164
18.2.1 Listing Scripts Associated With Events................................................................165
18.2.2 Removing Scripts From an Associated Event .......... ...................................... .... .165
18.3 Setting Scripts for Specific Individual Events....................................................................165
18.3.1 Event Codes............ ....................................... ... .... ... ... ... ... .... ... ... ... .....................165
18.3.2 Setting Event Action Scripts ................................................................................166
18.4 Running CMM Event Scripts on CMM State Transitions
(Active/Standby/Ready/Not Ready).......................................... ... ... .... ... ... ... .....................166
18.4.1 Sensor Data Bits............................. ... ... .... ... ... ... .... ... ...........................................1 66
18.4.2 Retrieving the Value of the Data Sensor Bits ......................................................167
18.4.3 CMMReadyTimeout Value...................................................................................168
18.4.4 CMM State Transition Model....... ... ... ... .... ... ... ... .... ... ... ... ... .... ... ...........................168
18.5 FRU Control Script............................................................................................................169
18.5.1 Command line arguments....................................................................................170
18.5.2 Sample frucontrol file......................................... .... ... .......................................... .170
19 Remote Procedure Calls (RPC) ..................................................................................................174
19.1 Setting Up the RPC Interface ...........................................................................................174
19.2 Using the RPC Interface...................................................................................................174
19.2.1 GetAuthCapability() .............................................................................................175
19.2.2 ChassisManagementApi() ...................................................................................175
19.2.3 ChassisManagementApi() Threshold Response Format.....................................181
19.2.4 ChassisManagementApi() String Response Format ...... ... ..................................181
19.2.5 ChassisManagementApi() Integer Response Format..........................................185
19.2.6 FRU String Response Format .............................................................................186
19.3 RPC Sample Code ...........................................................................................................187
19.4 RPC Usage Examples......................................................................................................187
20 RMCP..........................................................................................................................................190
20.1 RMCP References............................................................................................................190
20.2 RMCP Modes ...................................................................................................................190
20.3 RMCP User Privilege Levels ............................................................................................191
20.4 RMCP Discovery ..............................................................................................................191
Contents
20.5 RMCP Session Activation.................. .... .......................................... ... ... ...........................191
20.6 RMCP Port Numbers........................................................................................................192
20.7 IPMB Slave Addresses.....................................................................................................193
20.8 CMM RMCP Configuration ...............................................................................................193
20.9 IPMI Commands Supported by CMM RMCP ........ ... ... .....................................................194
20.10 Configuring IPMI Command Privileges.............................................................................196
20.10.1 Sample cmdPrivillege.ini file................................................................................197
20.11 Completion Codes for the RMCP Messages....................................................................197
21 Command and Error Logging......................................................................................................199
21.1 Command Logging ...........................................................................................................199
21.2 Error Logging....................................................................................................................199
21.2.1 Error.log File ........................................................................................................199
21.2.2 Debug.log File......................................................................................................199
21.3 Cmmdump Utility ......... ... ... .... ...................................... .... ... ... ... ... .... ... ... ... .... ....................200
22 Application Hosting......................................................................................................................201
22.1 System Details................ ... .... ...................................... .... ... ... ... ... .... .................................201
22.2 Startup and Shutdown Scripts ..........................................................................................201
22.3 System Resources Available to User Applications...........................................................201
22.3.1 File System Storage Constraints .........................................................................201
22.3.2 RAM Constraints..................................................................................................202
22.3.3 Interrupt Constraints ............................................................................................203
22.4 RAM Disk Directory Structure...........................................................................................203
23 Updating CMM Software .............. .... ... .......................................... ... ... ... .... .................................204
23.1 Key Features of the Firmware Update Process................................................................204
23.2 Update Process Architecture............................................................................................204
23.3 Critical Software Update Files and Directories .................................................................205
23.4 Update Package...............................................................................................................205
23.4.1 Update Package File Validation................................... ... .....................................206
23.4.2 Update Firmware Package Version.....................................................................207
23.4.3 Component Versioning ........................................................................................207
23.5 saveList and Data Preservation........................................................................................207
23.6 Update Mode....................................................................................................................208
23.7 Update_Metadata File ......................................................................................................209
23.8 Firmware Update Synchronization/Failover Support ....................................... ... ... ... .... ... .209
23.9 Automatic/Manual Failover Configuration.........................................................................209
23.9.1 Setting Failover Configuration Flag .....................................................................210
23.9.2 Retrieving the Failover Configuration Flag...........................................................210
23.10 Single CMM System .........................................................................................................210
23.11 Redundant CMM Systems................................................................................................210
23.12 CLI Software Update Procedure.......................................................................................210
23.13 Hooks for User Scripts......................................................................................................211
23.13.1 Update Mode User Scripts...................................................................................211
23.13.2 Data Restore User Scripts...................................................................................212
23.13.3 Example Task—Replace /home/scripts/myScript................................................212
23.14 Update Process ................................................................................................................213
23.15 Update Process Status and Logging ................................................................................215
23.16 Update Process Sensor and SEL Events.........................................................................215
23.17 Redboot* Update Process................................................................................................215
MPCMM0001 Chassis Management Module Software Technical Product Specification 9
Contents
23.17.1 Required Set up..................................... .... ... ... ... .... ...................................... ... .... .215
23.17.2 Update Procedure................................................................................................215
24 Updating Shelf Components........................................................................................................217
25 IPMI Pass-Through......................................................................................................................218
25.1 Overview...........................................................................................................................218
25.2 Command Syntax and Interface.......................................................................................218
25.2.1 Command Request String Format.......................................................................218
25.2.2 Response String...... .... ... ... ... ... ....................................... ... .... ... ... ... .... ... ... ... ... .....2 19
25.2.3 Usage Examples..................................................................................................219
25.3 SNMP ...............................................................................................................................219
25.3.1 Usage Example ...................................................................................................219
26 FRU Update Utility.......................................................................................................................221
26.1 Overview...........................................................................................................................221
26.2 FRU Update Architecture..................................................................................................221
26.3 FRU Update Process........................................................................................................222
26.4 FRU Recovery Process....................................................................................................222
26.5 FRU Verification................................................................................................................223
26.6 FRU Display......................................................................................................................223
26.7 Setting the Library Path And Invoking the Utility...............................................................223
26.8 FRU Update Command Line Interface .............................................................................223
26.9 Using the Location Switch ................................................................................................224
26.10 Updating the FRU.............................................................................................................225
26.11 Getting the Inventory ........................................................................................................225
26.12 Viewing the Contents of the FRU .....................................................................................225
26.13 Getting the Contents of the FRU ......................................................................................225
26.14 Dumping the Contents of the FRU....................................................................................225
27 FRU Update Configuration File ...................................................................................................227
27.1 Configuration File Format.................................................................................................227
27.2 File Format........................................................................................................................227
27.3 String Constraints.............................................................................................................227
27.4 Numeric Constraints.........................................................................................................228
27.5 Tags..................................................................................................................................228
27.6 Control Commands...... ... ... ... .... ... ... .......................................... ... .....................................228
27.6.1 IFSET...................................................................................................................228
27.6.2 ELSE....................................................................................................................229
27.6.3 ENDIF..................................................................................................................229
27.6.4 SET......................................................................................................................229
27.6.5 CLEAR.................................................................................................................230
27.6.6 CFGNAME...........................................................................................................230
27.6.7 ERRORLEVEL.....................................................................................................230
27.7 Probing Commands..........................................................................................................230
27.7.1 PROBE................................................................................................................230
27.7.2 SYSTEM..............................................................................................................231
27.7.3 FRUVER..............................................................................................................231
27.7.4 BMCVER .............................................................................................................232
27.7.5 FOUND................................................................................................................232
27.8 Update Commands ..........................................................................................................233
Contents
27.8.1 FRUNAME...........................................................................................................233
27.8.2 FRUADDRESS....................................................................................................234
27.8.3 FRUAREA............................................................................................................234
27.8.4 MULTIREC ..........................................................................................................235
27.8.5 FRUFIELD ...........................................................................................................236
27.8.6 Input of Data ........................................................................................................240
27.9 Display Commands...........................................................................................................240
27.9.1 DISPLAY..............................................................................................................241
27.9.2 CONFIGURATION...............................................................................................241
27.9.3 Input Commands .. ...............................................................................................241
27.9.4 MENU ..................................................................................................................241
27.9.5 MENUTITLE ........................................................................................................242
27.9.6 MENUPROMPT...................................................................................................242
27.9.7 PROMPT .............................................................................................................242
27.9.8 YES......................................................................................................................243
27.9.9 NO .......................................................................................................................243
27.10 Command Quick Reference .............................................................................................243
27.11 Example Configuration File...............................................................................................246
27.11.1 Chassis Update Version 0 ...................................................................................246
27.11.2 Chassis Update Version 1 ...................................................................................249
28 Unrecognized Sensor Types.......................................................................................................253
28.1 System Events Overview................ ... .......................................... .... .................................253
28.2 System Events— SNMP Trap Support........................ .... ... ... ... ... .... ... ... ... .... ... ... ... ... ........254
28.2.1 SNMP Trap Header Format.................................................................................254
28.2.2 SNMP Trap ATCA Trap Text Translation Format................................................254
28.3 SNMP Trap Raw Format ..................................................................................................255
28.3.1 SNMP Trap Control .............................................................................................256
28.3.2 System Events— SEL Support............................................................................256
28.3.3 Configuring SEL Format ......................................................................................257
29 Warranty Information...................................................................................................................259
29.1 Intel
®
NetStructure™ Compute Boards and Platform Products Limited Warranty ...........259
29.2 Returning a Defective Product (RMA) ..............................................................................259
29.3 For the Americas ..............................................................................................................260
29.3.1 For Europe, Middle East, and Africa (EMEA) .............................. .... ... ... ... ... .... ... .260
29.3.2 For Asia and Pacific (APAC)................................................................................260
30 Customer Support .......................................................................................................................262
30.1 Customer Support.............................................................................................................262
30.2 Technical Support and Return for Service Assistance .....................................................262
30.3 Sales Assistance ...... ... ... ... .... ...........................................................................................262
31 Certifications................................................................................................................................263
32 Agency Information................. .....................................................................................................264
32.1 North America (FCC Class A)...... ... ... .... ...................................... .... ... ... ... .... ... ... ... ... .... ....264
32.2 Canada – Industry Canada (ICES-003 Class A) (English and French-translated below).264
32.3 Safety Instructions (English and French-translated below) ..............................................265
32.3.1 English.................................................................................................................265
32.3.2 French..................................................................................................................265
MPCMM0001 Chassis Management Module Software Technical Product Specification 11
Contents
32.4 Taiwan Class A Warning Statement.................................................................................266
32.5 Japan VCCI Class A....................................................... .... ... .......................................... .266
32.6 Korean Class A.................................................................................................................266
32.7 Australia, New Zealand.....................................................................................................266
33 Safety Warnings..........................................................................................................................267
33.1 Mesures de Sécurité.........................................................................................................268
33.2 Sicherheitshinweise..........................................................................................................270
33.3 Norme di Sicurezza .............................. ............................................................................272
33.4 Instrucciones de Seguridad..............................................................................................274
33.5 Chinese Safety Warning...................................................................................................276
Figures
1 BIST Flow Chart .........................................................................................................................32
2 Timing of BIST Stages................................................................................................................34
3 High Level SNMP/MIB Layout..................................................................................................140
4 CMM Custom MIB Tree............................................................................................................142
5 CMM Status State Diagram................. ... ... ... .... ... ... ... .... ... ... ... .... ... .......................................... .169
6 SNMPTrapFormat = 1 ..............................................................................................................255
7 SNMPTrapFormat = 2 ..............................................................................................................255
8 SNMPTrapFormat = 3 ..............................................................................................................255
Tables
1 Glossary .....................................................................................................................................16
2 CMM Synchronization ................................................................................................................22
3 CMM Status Event Strings (CMM Status) ..................................................................................30
4 BIST Implementation..................................................................................................................32
5 Processes Monitored.................... .... ... ... ... ... ..............................................................................42
6 No Action Recovery....................................................................................................................46
7 Successful Restart Recovery . ... ... .... ... ... .......................................... ... .......................................46
8 Successful Failover/Restart Recovery.... ... ... .... ..........................................................................47
9 Successful Failover/Reboot Recovery. ... ... ... .... ... ... .......................................... ... .... ... ... ... ... .......48
10 Failed Failover/Reboot Recovery, Non-Critical ..........................................................................49
11 Failed Failover/Reboot Recovery, Critical ..................................................................................50
12 Existence Fault, Excessive Restarts, Escalate No Action..........................................................50
13 Excessive Restarts, Successful Escalate Failover/Reboot ........................................................51
14 Excessive Restarts, Failed Escalate Failover/Reboot, Non-Critical .............................. ... ... .... ...52
15 Excessive Restarts, Failed Escalate Failover/Reboot, Critical................... ... ... ... .... ... ... ... ... .... ...53
16 Administrative Action..................................................................................................................53
17 Excessive Failover/Reboots, Administrative Action....................................................................54
18 Time to Delay and Number of Attempts .....................................................................................70
19 SETIP Interface Assignments when BOOTPROTO=”static” ......................................................74
20 SETIP Interface Assignments when BOOTPROTO=”dhcp”.......................................................75
21 Location (-l) Keywords................................................................................................................77
22 CMM Targets..............................................................................................................................79
23 Dataitem Keywords for All Locations..........................................................................................80
Contents
24 Dataitem Keywords for All Locations Except System.................................................................80
25 Dataitem Keywords for All Locations Except Chassis and System ............................................81
26 Dataitem Keywords for Chassis Location...................................................................................85
27 Dataitem Keywords for Cmm Location.......................................................................................86
28 Dataitem Keywords for System Location....................................................................................92
29 Dataitem Keywords for FantrayN Location.................................... ... ... .......................................93
30 Dataitem Keywords Used with the Target Parameter.................................................................94
31 CMM Voltage and Temp Sensor Thresholds................................. ... ... .... ... ... ... .... ... ... ... ... .... ... .102
32 CMM SEL Sensor Information..................................................................................................105
33 Sensor Targets.........................................................................................................................106
34 Threshold-Based Sensors: Voltage, Temp, Current, Fan.........................................................109
35 Hot Swap Sensor: Filter Tray HS, FRU Hot Swap....................................................................110
36 IPMB Link State Sensor: IPMB-0 Snsr [1-16]...........................................................................110
37 System Firmware Progress Event Strings (System Firmware Progress) .................................111
38 Watchdog 2 Sensor Event Strings............................................................................................113
39 CMM Redundancy....................................................................................................................115
40 CMM Trap Connectivity (CMM [1-2] Trap Conn)......................................................................115
41 CMM Failover ...........................................................................................................................115
42 CMM Synchronization...............................................................................................................116
43 BIST Event Strings .................... .... ... ... ... ... .... ... ... ... .... ... ... ... .......................................... ...........117
44 Chassis Data Module (CDM [1,2])............................................................................................118
45 Datasync Status............................. ... ... .......................................... ... ........................................118
46 CMM Status Event Strings (CMM Status) ................................................................................118
47 Process Monitoring Service Fault Event Strings (PMS Fault) ..................................................119
48 Process Monitoring Service Info Event Strings (PMS Info) ......................................................120
49 Chassis Events.........................................................................................................................120
50 IPMI Error Completion Codes and Enumerations.....................................................................121
51 System Health LED States.......................................................................................................123
52 CMM Health LED States...........................................................................................................124
53 CMM Hot Swap LED States ...................... .... ... ... ... .... ... ... ... .... ... ... ... ... .... ... ... ... ........................124
54 Ledstate Functions and Function Options................................................................................125
55 LED Event Sequence ...............................................................................................................126
56 Dataitems Used With FRU Target (-t) to Obtain FRU Information............................................131
57 CMM Cooling Table..................................................................................................................133
58 MIB II Objects - System Group.. .......................................... .... ... ... ... ... .... .................................141
59 MIB II - Interface Group................. ... ... ... ... .... ... ... .......................................... ... .... ... ... ... ... ........141
60 System Location (1.3.6.1.4.1.343.2.14.2.10.1).........................................................................143
61 Shelf Location (Equivalent to Chassis) (1.3.6.1.4.1.343.2.14.2.10.2).......................................144
62 ShelfTable/shelfEntry (1.3.6.1.4.1.343.2.14.2.10.2.50.1).........................................................144
63 Cmm Location (1.3.6.1.4.1.343.2.14.2.10.3)............................................................................146
64 CmmTable/cmmEntry (1.3.6.1.4.1.343.2.14.2.10.3.51.1).........................................................149
65 CmmFruTable/cmmFruEntry (1.3.6.1.4.1.343.2.14.2.10.3.52.1)..............................................151
66 CmmFruTargetTable (1.3.6.1.4.1.343.2.14.2.10.3.53.1)..........................................................151
67 CmmPmsTable/cmmPmsEntry (1.3.6.1.4.1.343.2.14.2.10.3.54.1)..........................................151
68 Blade# Location (1.3.6.1.4.1.343.2.14.2.10.4.[1-16]) ...............................................................152
69 Blade#TargetTable/blade#TargetEntry (1.3.6.1.4.1.343.2.14.2.10.4.[1-16].51.1)....................153
70 Blade#FruTable/blade#FruEntry (1.3.6.1.4.1.343.2.14.2.10.4.[1-16].52.1)..............................154
71 Blade#FruTargetTable/blade#FruTargetEntry (1.3.6.1.4.1.343.2.14.2.10.4.[1-16].53.1).........155
72 [FanTray/pem]Table/[fanTray/pem]Entry (1.3.6.1.4.1.343.2.14.2.10.[5/6].51.1) ......................155
73 [FanTray/pem]TargetTable/[fanTray/pem]TargetEntry (1.3.6.1.4.1.343.2.14.2.10.[5/6].52.1)..156
MPCMM0001 Chassis Management Module Software Technical Product Specification 13
Contents
74 [FanTray/pem]FruTable/[fanTray/pem]FruEntry (1.3.6.1.4.1.343.2.14.2.10.[5/6].53.1) ...........157
75 [FanTray/pem]FruTargetTable/[fanTray/pem]FruTargetEntry
(1.3.6.1.4.1.343.2.14.2.10.[5/6].54.1).......................................................................................158
76 SNMP v3 Security Fields For Traps .........................................................................................162
77 SNMP v3 Security Fields For Queries......................................................................................162
78 CMM State Transition Events and Event IDs ...........................................................................166
79 CMM Status Sensor Data Bits................... ... .... ... ... ... .... ... ... ... .... ... ... ... .....................................167
80 Error and Return Codes for the RPC Interface.........................................................................177
81 Threshold Response Formats ....................................... ... ... ... .... ......................................... .... .181
82 String Response Formats.........................................................................................................181
83 Integer Response Formats.......................................................................................................185
84 FRU Data Items String Response Format................................................................................186
85 RPC Usage Examples...................... ... ... ... ... .... ... ... ... .... .......................................... ... ..............187
86 RMCP Modes ...................................... ... ... ... .... ... ....................................... ... ... ... .... ... ... ...........190
87 RMCP Session Timers ............................................................................... ... ...........................192
88 RMCP Slave Addresses......................................................... .... ... ... ... ... .... ..............................193
89 IPMI Commands Supported by CMM RMCP...........................................................................194
90 RMCP Message Completion Codes.........................................................................................198
91 Flash #1....................................................................................................................................202
92 Flash #2....................................................................................................................................202
93 Flash #3....................................................................................................................................202
94 Flash #4....................................................................................................................................202
95 List of Critical Software Update Files and Directories ..............................................................205
96 Contents of the Update Package..............................................................................................206
97 SaveList Items and Their Priorities............... ............................................................................208
98 CMM Update Directions ...........................................................................................................209
99 Platform FRU Accessibility of the FRU Update Utility ..............................................................221
100 FruUpdate Utility Command Line Options................................................................................224
101 Probe Command Parameters....................... ............................................................................231
102 FRU Area String Specifications................................................................................................235
103 Multi-Record Selection Parameters..........................................................................................236
104 FRU Field First String Specifications........................................................................................237
105 FRU Field Maximum Allowed Lengths .....................................................................................237
106 FRU Field Second String Specification ....................................................................................238
107 Type Code Specification...........................................................................................................239
108 Command Quick Reference.....................................................................................................243
109 Probe Arguments Quick Reference..........................................................................................246
110 Results of Variable Settings ......... .... ... ... ... ... .... ... ... .......................................... ... .... ... ... ...........256
111 Example CLI Commands..........................................................................................................277
Revision History
Date Revision Description
April 2005 007 Firmware version 5.2
August 2004 006 Firmware version 5.1.0.757
April 2004 005
January 2004 004.1 Version 4.1 TPS
Contents
Version 5.1 TPS Added Re-Enumeration Section Added Process Monitoring Section
MPCMM0001 Chassis Management Module Software Technical Product Specification 15
Introduction

Introduction 1

1.1 Overview

The Intel® NetStructureTM MPCMM0001 Chassis Management Module is a 4U, single-slot CMM intended for use with AdvancedTCA* PICMG* 3.0 platforms. This document details the software features and specifications of the CMM. For information on hardware features for the CMM refer to the Intel specifications and other material can be found in Appendix B , “Data Sheet Reference.”
The CMM plugs into a dedicated slot in compatible systems. It provides centralized management and alarming for up to 16 node and/or fabric slots as well as for system power supplies, fans and power entry modules. The CMM may be paired with a backup for redundant use in high­availability applications.
The CMM is a special purpose single board computer (SBC) with its own CPU, memory, PCI bus, operating system, and peripherals. The CMM monitors and configures IPMI-based components in the chassis. When thresholds (such as temperature and voltage) are crossed or a failure occurs, the CMM captures these events, stores them in an event log, sends SNMP traps, and drives the Telco alarm relays and alarm LEDs. The CMM can query FRU information (such as serial number, model number, manufacture date, etc.), detect presence of components (such as fan tray, CPU board, etc.), perform health monitoring of each component, control the power-up sequencing of each device, and control power to each slot via IPMI.
®
NetStructure™ MPCMM0001 Hardware Technical Product Specification. Links to
Assumptions: This document assumes some basic Linux* knowledge and the ability to use Linux text editors such as vi.

1.2 Terms Used in this Document

Table 1. Glossary (Sheet 1 of 2)
Acronym Description
BIST Built-In Self Test CDM Chassis Data Module CLI Command Line Interface CMM Chassis Management Module DHCP Dynamic Host Configuration Protocol FFS Flash File System FIS Flash Image System FPGA Field-Programmable Gate Arrays FRU Field Replaceable Unit HS Hot Swap IPMI Intelligent Platform Management IPMB Intelligent Platform Management Bus
16 MPCMM0001 Chassis Management Module Software Technical Product Specification
Table 1. Glossary (Sheet 2 of 2)
Acronym Description
IPMI Intelligent Platform Management Interface LED Light Emitting Diode MIB Management Information Base
MIB II PEM Power Entry Module
PICMG PCI Industrial Computer Manufacturers’ Group RMCP Remote Management Control Protocol RPC Remote Procedural Calls SBC Single Board Computer SDR Sensor Data Record SEL System Event Log ShMC Shelf Management Controller SNMP Simple Network Management Protocol SSH Secure Socket Shell TFTP Trivial File Transfer Protocol UDP User Datagram Protocol WDT Watchdog Timer
RFC1213 - A standard Management Information Base for Network Management
Introduction
MPCMM0001 Chassis Management Module Software Technical Product Specification 17
Software Specifications

Software Specifications 2

2.1 Red Hat* Embedded Debug and Bootstrap (Redboot)

Upon initial power on, the CMM enters into the Redboot firmware to bootstrap the embedded environment. Upon execution, Redboot acts as a TFTP server and checks for a TFTP connection to a client. If a TFTP connection exists, Redboot will accept a firmware update that is pushed down from the client, check the firmware update for data integrity, and then write the update to the flash.
Note: Firmware updates using the Redboot TFTP method are supported for backwards compatibility.
However, updating from within the OS using the CLI is the preferred method of updating CMM firmware. For information on the firmware update process refe r to Section 23, “Updating CMM
Software” on page 204.
Under normal circumstances, Redboot runs through the standard diagnostics, memory setup, decompresses the OS kernel, and boots into that kernel.

2.2 Operating System

The CMM runs a customized version of embedded BlueCat* Linux* 4.0 on an Intel® 80321 processor with Intel the web at http://www.lynuxworks.com.
®
XScale® technology. Development support for BlueCat Linux is available on

2.3 Command Line Interface (CLI)

The Command Line Interface (CLI) connects to and communicates with the intelligent management devices of the chassis, boards, and the CMM itself. The CLI is an IPMI-based library of commands that can be accessed directly or through a higher-level management application. Administrators can access the CLI through Telnet, SSH, or the CMM’s serial port. Using the CLI, users can access information about the current state of the system including current sensor values, threshold settings, recent events, and overall chassis health, access and modify shelf and CMM configurations, set fan speeds, perform actions on a FRU, etc. The CLI is covered in Section 8,
“The Command Line Interface (CLI)” on page 71.

2.4 SNMP/UDP

The chassis management module supports both queries and traps on SNMP (Simple Network Management Protocol) v1 or v3. The SNMP version can be configured through the CLI interface. The default is for SNMP v1. A MIB for the entire platform is included with the CMM. The CMM can send out SNMP traps to up to five trap receivers.
Along with SNMP traps, the CMM sends UDP (User Datagram Protocol) alerts to port 10000. The content of these UDP alerts is the same as the SNMP traps. SNMP is covered in Section 17,
“SNMP” on page 140.
18 MPCMM0001 Chassis Management Module Software Technical Product Specification

2.5 Remote Procedural Call (RPC) Interface

In addition to the console command-line interface, the CMM can be administered by custom remote applications via remote procedure calls (RPC). RPC is covered in Section 19, “Remote
Procedure Calls (RPC)” on page 174.

2.6 RMCP

RMCP (Remote Management Control Protocol) is a protocol that defines a method to send IPMI packets over LAN. The RMCP server on the CMM can decode RMCP packages and forward the IPMI messages to the appropriate channels including: SBC blades, PEMs, and FanTrays or local destination within the CMM. When there is a responding IPMI message coming from SBC blades, PEMs, or FanTrays destined to RMCP client, the RMCP server will format this IPMI message into a RMCP message and send it to through the designated LAN interface back to originator. RMCP is covered in Section 20, “RMCP” on page 190.

2.7 Ethernet Interfaces

The CMM contains two Ethernet ports. The software can configure each of these ports to either the front panel, to the backplane, or to the rear transition module (RTM). Information on configuring the Ethernet interfaces is covered in Section 8.3.1, “Setting IP Address Properties” on page 72.
Software Specifications

2.8 Sensor Event Logs (SEL)

The AdvancedTCA CMM implements system event logs according to Section 3.5 of the PICMG
3.0 Specification. The SEL contained on the CMM is fully IPMI compliant.
2.8.1 CMM SEL Architecture
The MPCMM0001 uses a single flat SEL file stored locally in the /etc/cmm directory. The SEL maintains a list of all the sensor events in the shelf. Each of the managed devices may keep its own SEL records in local SELs, but the master copy for the shelf is maintained by the CMM.
The SEL is limited to 65536 bytes. In order to keep the SEL from getting full, which can cause loss of error logging, the SEL is checked every 15 minutes by the CMM, and if the size of the cmm_sel is greater than 40000 bytes, the SEL is archived in gzip format and saved in /home/log/SEL. The names of the saved logs will be cmm_sel.0.gz, cmm_sel.1.gz, and so on, to a maximum of 16 logs where they are then rolled over.
Note: Archived files should NEVER be decompressed on the CMM as the resulting prolonged flash file
writing could disrupt normal CMM operation and behavior. Using FTP, transfer the files to a different system before decompressing the archive using utilities such as gzip.
2.8.2 Retrieving a SEL
To retrieve a SEL from the CMM, issues the following command:
cmmget [-l location] -d sel
MPCMM0001 Chassis Management Module Software Technical Product Specification 19
Software Specifications
Where location is one of {cmm, blade[1-14], fantray1, PEM[1-2]}. Even though the CMM uses a single flat SEL for system events, the ‘cmmget’ command will filter the SEL and only return events associated with the provided location. Also, some individual FRUs may keep their own local SELs (i.e., blades).
2.8.3 Clearing the SEL
The following command will clear the SEL on both the active and the standby:
cmmset -d clearsel -v clear
Note: Since the CMM uses a single flat SEL for system events, this command clears the entire shelf SEL,
not just a filtered subset.
2.8.4 Retrieving the Raw SEL
To retrieve the SEL in its raw format from a location, issue the following command:
cmmget -l [location] -d rawsel

2.9 Blade OverTemp Shutdown Script

The CMM software includes predefined script settings specifically for the MPCBL0001 board, which will automatically shut down a board when the “baseboard temp” sensor on that board crosses the upper critical threshold. This is done to prevent a runaway thermal event on the board from occurring. If this functionality is needed when using boards other than the MPCBL0001, the user will need to associate the name of the thermal sensor and the threshold with the board shutdown script:
cmmset -l bladeN -d majoraction -t [temp sensor name] -v overtempbladepoweroff [Blade Number]
Please refer to Section 18, “CMM Scripting” on page 164 for more information on assocating a script to an event.
When using the CMM with boards other than the MPCBL0001, as long as there is no sensor name titled "baseboard temp" associated with the particular board being used, then there is no issue leaving these settings intact. If needed, to deactivate these settings for each physical slot, use the command:
cmmset -l bladeN -d majoraction -t “baseboard temp” -v none
where bladeN is the blade, corresponding to the physical slot number, on which to remove the automatic shutdown setting (blade[1-16]). Please refer to Section 18, “CMM Scripting” on
page 164 for more information on removing script actions.
20 MPCMM0001 Chassis Management Module Software Technical Product Specification
Redundancy, Synchronization, and Failover
Redundancy, Synchronization, and
Failover 3

3.1 Overview

The CMM supports redundant operation with automatic failover in a chassis using redundant CMM slots. In systems where two CMMs are present, one acts as the active shelf manager and the other as standby. Both CMMs monitor each other, and either can trigger a failover if necessary.
Data from the active CMM is synchronized to the standby CMM whenever any changes occur. Data on the standby CMM is overwritten. A full synchronization between active and standby CMMs occurs on initial power up, or any insertion of a new CMM.
The active CMM is responsible for shelf FRU information management when CMMs are in redundant mode.

3.2 Synchronization

To ensure critical files on the standby CMM match the data on the active CMM, the active CMM synchronizes its data with the standby CMM, overwriting any existing data on the standby CMM.
An exception to this is the password reset procedure, detailed in Section 9, “Resetting the
Password” on page 99. When the password reset switch is activated on the standby CMM, the
password will be synchronized to the active CMM. The CMMs will initially fully synchronize data from the active to the standby CMM just after
booting. An insertion of a new CMM will also cause a full synchronization from the active to the newly inserted standby. Date and time are synched every hour. Partial synchronization will also occur any time files are modified or touched via the Linux* “touch” command with the exception of all *.sif and *.bin files in the /etc/cmm directory.
The *.sif (ALL SIF files), and *.bin (SDR Files) files under /etc/cmm are synchronized only once (when the CMMs establish communication). A 'touch' on those files at any later time will not perform a sync operation. Also, any updates to these files always happen as part of the software updates and not in isolation.
Note: During synchronization, the health event LEDs on the standby CMM may blink on and off as the
health events that were logged in the SEL are synchronized. Below is a list of items that are synchronized between CMMs. During a full synchronization, all of
these files and data are synchronized. A change to any of these files results in that file being synched. The active CMM overwrites these files on the standby CMM.
There are two "levels" of files that get synchronized. In order to normally manage the chassis, the priority 1 files must be synchronized after power up or installation of a brand new CMM into the chassis. It is absolutely necessary that a standby CMM has the priority one files synched before a successful failover can occur. When a brand new CMM boots the first time as a standby, if a CMM
MPCMM0001 Chassis Management Module Software Technical Product Specification 21
Redundancy, Synchronization, and Failover
failover is forced before all priority 1 data items are synchronized to the standby CMM, the standby CMM can still become the active CMM but may not be able to properly manage the FRUs in the chassis.
Table 2. CMM Synchronization (Sheet 1 of 2)
File(s) or Data Description Path Priority
date and time Date and time IPMB 1
IP Address Settings
/etc/cmm.cfg CMM’s main configuration file Ethernet 1 /etc/cmm/cmm_sel System SEL Ethernet 1 /etc/cmm/sensors.ini Sensor Set Values Ethernet 1 Ekey Controller Structures Ekey Controller Structures Ethernet 1 Bused EKey Token info Bused EKey Token info Ethernet 1 IPMB User States IPMB User States Ethernet 1 Fan States Fan States Ethernet 1 Cooling State Cooling State Information Ethernet 1 User LED States User LED States Ethernet 1 SDR structures and SIPI Controller Info SDR structures and SIPI Controller Info Ethernet 1 PHM FRU state, Power Usage and
Power Info FIM FRU Cache (Local and Temp) FIM FRU Cache (Local and Temp) Ethernet 1 SEL Time SEL Time IPMB 1 SEL Events Individual SEL Events IPMB 1 /etc/cmm/fantray.cfg Fantray settings needed by cooling manager Ethernet 1
/etc/cmm.ini /etc/passwd Password file Ethernet 2
/etc/shadow Password file Ethernet 2 /etc/cmdPrivillege.ini /etc/cmm/*.bin All SDR Files Ethernet 2
/etc/cmm/*.sif All SIF Files Ethernet 2 /etc/var/snmpd.conf SNMP configuration files Ethernet 2 /etc/snmpd.conf SNMP configuration files Ethernet 2 /home/scripts Entire user scripts area Ethernet 2 Prompt file Prompt file Ethernet 2 /etc/actionscripts.cfg Event action settings Ethernet 2
CMM eth1, eth1:1, and eth0 IP address settings to allow CMMs to discover the other’s IP information.
PHM FRU state, Power Usage and Power Info
Provides configuration values like the bus mapping
Provides privilege related configuration values for RMCP
IPMB 1
Ethernet 1
Ethernet 2
Ethernet 2
22 MPCMM0001 Chassis Management Module Software Technical Product Specification
Redundancy, Synchronization, and Failover
Table 2. CMM Synchronization (Sheet 2 of 2)
File(s) or Data Description Path Priority
Issues files Issues files Ethernet 2
/usr/local/cmm/temp/pmssync.ini
/usr/local/cmm/temp/pmsshadowsync.ini
Recovery Action and escalation action for all the monitored processes except monitor process
Recovery action and escalation action for monitor process
Note: The /.rhosts file is used for synchronization and should NEVER be modified.

3.3 Heterogeneous Synchronization

Beginning in version 5.2 firmware, the CMM can synchronize data between differing CMM versions. The firmware delineates synchronization from firmware versioning, thus allowing seamless synchronization between all CMM versions. A form of internal data versioning maintained by the CMM helps achieve this.
Note: SDR/SIF and user scripts differ slightly in synchronization architecture as described below .
Ethernet 2
Ethernet 2
3.3.1 SDR/SIF Synchronization
Sensor Data Records (SDRs) and Sensor Information Files (SIFs) will be synchronized only between CMMs having the same version for this data item (even if the CMM firmware versions differ).
3.3.2 User Scripts Synchronization and Configuration
By default, user scripts are synchronized only between CMM’s with same firmware versions. User can control the user scripts synchronization irrespective of CMM version differences by modifying the value of a configuration flag - "SyncUserScripts" (in the CMM configuration file, cmm.cfg under /etc). The configuration flag can be modified using the cmmget/cmmset commands. This flag can be read/set through any of the CMM interfaces (i.e., CLI, SNMP and RPC).
Only when CMM firmware versions differ will the value of this flag determines if user scripts should be synchronized or not. Between same firmware versions, the user scripts directory will continue to be synchronized and this flag ignored.
3.3.2.1 Setting User Scripts Sync Configuration Flag
T o set the value of the Scripts Synchronization configuration flag, the following CMM command is used:
cmmget -l cmm -d syncuserscripts -v [equal/upgrade/downgrade/always]
Where: equal: Synchronizes user scripts only when the CMM versions are same. This is the default value.
MPCMM0001 Chassis Management Module Software Technical Product Specification 23
Redundancy, Synchronization, and Failover
upgrade: Synchronizes user scripts only when the other CMM has a newer firmware version. downgrade: Synchronizes user scripts only when the other CMM has an older firmware version. always: Synchronizes user scripts irrespective of version differences.
3.3.2.2 Retrieving User Scripts Sync Configuration Flag
To retrieve the value of the Scripts Synchronization configuration flag, the following CMM command is used:
cmmget -l cmm -d syncuserscripts
The value returned will be one of: Equal, Upgrade, Downgrade, Always, or Error on failure.
3.3.3 Synchronization Requirements
For synchronization to occur:
The CMMs must be able to communicate with each other over their dedicated IPMB. The
CMMs use a heartbeat via their dedicated IPMB to determine if they can communicate with each other over IPMB.
An Ethernet connection must exist between the two CMMs. The CMMs must be able to ping
each other via Ethernet for synchronization to be successful. This can be a connection through the Ethernet switches in the chassis, which requires both switches to be present in the chassis; a connection can occur through an external Ethernet switch connected to the front ports of the CMM pair, or alternatively, the connection can be a crossover cable connecting the two front ports of the CMM pair. If synchronization fails on eth1, then it will be attempted on eth0. If the CMMs cannot successfully ping each other via eth0 or eth1, then synchronization between the CMMs cannot occur.
A failure of any priority 1 synchronization will result in a health event being logged in the CMM SEL and will inhibit a failover from occurring.

3.4 Initial Data Synchronization

It is absolutely necessary that a standby CMM has the priority one files synched before a successful failover can occur. A standby CMM can still become active if all priority one synchronization has not been completed, but it may not be able to properly manage all the FRU’s in the chassis.
The CMM implements the “Datasync Status” sensor to determine the state of synchronization and if synchronization has completed. successfully.
3.4.1 Initial Data Sync Failure
If CMM encounters any failure during data synchronization it marks the data synchronization failure and logs a SEL event and sends an SNMP trap. Duplicate failures are not reported multiple times. As soon as CMM is out of failure condition it will reset data synchronization failure state.
The CMM will continue trying to synchronize as long as there are two CMMs present in the chassis and they are able to communicate via their cross-connected IPMB.
24 MPCMM0001 Chassis Management Module Software Technical Product Specification

3.5 Datasync Status Sensor

A sensor named “Datasync Status” exists in order to make the Datasync state information available to the user. This sensor tracks the status of the Datasync module and will make its status available through the various CMM interfaces. This sensor is used to query the data synchronization states, and log SEL events for initial synchronization complete event. It is a discrete OEM sensor with status bits representing the state of different parts of the Datasync module.
Note: The Datasync Status sensor can only be queried through the active CMM.
3.5.1 Sensor bitmap
When the Datasync starts the first time through in a dual CMM system and whenever the CMM changes between Active and Standby, the status bits are all cleared to 0x0000.
Bit 0 (Running) is set when the datasync module is active.
Bit 1 (P1Done) is set when the priority 1 data syncs are done, and cleared when priority 1 data
needs to be synced.
Bit 2 (P2Done) is set when the priority 2 data syncs are done, and cleared when a priority 2
data needs to be synced.
Bit 3 (InitSyncDone) is set when both priority 1 and priority 2 data syncs are done, and stays
set (latches) until the CMM changes between Active and Standby, or looses contact with the partner CMM.
Bit 4 (SyncError) is set if an error was detected, and cleared when no data items have errors.
Redundancy, Synchronization, and Failover
3.5.2 Event IDs
The “Datasync Status” sensor will use event ids 0x420 to 0x42f. The following new event ids are used to log various events for these requirements. These event ID’s can be used to associated scripts with the respective events.
Event Event ID
Initial Data Synchronization complete 0x420 (1056)
3.5.3 Querying the Datasync Status
The status of the data synch sensor can be queried using the following CLI command:
cmmget –l cmm –t “Datasync Status” –d current
Output of the command is as follows:
Initial State:
The current value is 0x0001
Initial Data Synchronization is not complete.
There is Priority 1 data to sync.
MPCMM0001 Chassis Management Module Software Technical Product Specification 25
Redundancy, Synchronization, and Failover
There is Priority 2 data to sync.
No Data Synchronization problems known.
Initial Data Synch Incomplete, Pri 1 Data Synced, Pri 2 Data Not Synched
The current value is 0x0003
Initial Data Synchronization is not complete.
Priority 1 data is synced.
There is Priority 2 data to sync.
No Data Synchronization problems known.
Initial Data Sync is complete, Priority 1 and Priority 2 are also synced
The current value is 0x000f
Initial Data Synchronization is complete.
Priority 1 data is synced.
Priority 2 data is synced.
No Data Synchronization problems known.
Initial Data Sync failure
The current value is 0x0013
Initial Data Synchronization is not complete.
Priority 1 data is synced.
There is Priority 2 data to sync.
Data Synchronization has encountered a problem in synchronizing data.
Initial Data Sync is complete and Priority 1 data is changed
The current value is 0x000d
Initial Data Synchronization is complete.
There is Priority 1 data to sync.
Priority 2 data is synced.
No Data Synchronization problems known
Data Sync failure of Priority 1 Data occurs after Initial Data Sync and there is a Data Sync Problem
The current value is 0x001d
Initial Data Synchronization is complete.
There is Priority 1 data to sync
26 MPCMM0001 Chassis Management Module Software Technical Product Specification
Priority 2 data is synced.
Data Synchronization has encountered a problem in synchronizing data.
Data Sync becomes normal after Data Sync failure
The current value is 0x000f
Initial Data Synchronization is complete.
Priority 1 data is synced.
Priority 2 data is synced.
No Data Synchronization problems known
Single CMM
The current value is 0x0000
Datasync disabled - there is no partner CMM present.
3.5.4 SEL Event
Redundancy, Synchronization, and Failover
The Datasync Status sensor generates the following two SEL events:
When the active CMM is or becomes the only CMM, or the active CMM loses communication
with the standby CMM, the following event will be logged:
[Day] [Month] [Date] [Time] [Year]
CMM[n]: CMM Datasync Status Initial Data Synchronization is complete. Deasserted
The following event will be logged in the SEL when initial data synchronization is complete:
[Day] [Month] [Date] [Time] [Year]
CMM[n]: CMM Datasync Status Initial Data Synchronization is complete. Asserted
Where n: The number of the CMM generating the event.
3.5.5 SNMP Trap
The Datasync Status sensor generates following two SNMP traps:
When the active CMM is or becomes the only CMM, or the active CMM loses communication
with the standby CMM, the following SNMP trap will be generated.
[Month] [Date] [Time] [hostname] snmptrapd[xxxxx]: [IP Address]: Enterprise Specific Trap (25) Uptime: [Time], SNMPv2­SMI::enterprises.343.2.14.1.5 = STRING: "Time : [Day] [Month] [Date] [Time] [Year], Location : [location] , Chassis Serial # : [xxxxxxxx], Board : CMM[x] , Sensor : Datasync Status , Event : Initial Data Synchronization complete: Deasserted "
MPCMM0001 Chassis Management Module Software Technical Product Specification 27
Redundancy, Synchronization, and Failover
When initial data synchronization is complete, the following SNMP trap is generated:
[Month] [Date] [Time] [hostname] snmptrapd[xxxxx]: [IP Address]: Enterprise Specific Trap (25) Uptime: [Time], SNMPv2­SMI::enterprises.343.2.14.1.5 = STRING: "Time : [Day] [Month] [Date] [Time] [Year], Location : [location] , Chassis Serial # : [xxxxxxxx], Board : CMM[x] , Sensor : CMM[x]:Datasync Status , Event : Initial Data Synchronization is complete. Asserted "
3.5.6 System Health
The “Datasync Status” sensor will not contribute to the system health. However sync failures are captured by the “File Sync Failure” sensor and it contributes to the system health

3.6 CMM Failover

Once information is synchronized between the redundant CMMs, the active CMM will constantly monitor its own health as well as the health of the standby CMM. In the event of one of the scenarios listed in the sections that follow, the active CMM will automatically failover to the standby CMM so that no management functionality is lost at any time.
3.6.1 Scenarios That Prevent Failover
The following are reasons a failover can NOT occur:
The active CMM can NOT communicate with the standby CMM via their IPMB bus.
Not all priority 1 data has been completely synchronized between the CMMs.
To determine the active CMM at anytime, use the CLI command:
cmmget -l cmm –d redundancy
This command will output a list stating if both CMMs are present, which one is the active CMM, and which CMM you are logged in to. CMM1 is the CMM on the left when looking from the front of the chassis, and CMM2 is on the right.
3.6.2 Scenarios That Failover to a Healthier Standby CMM
The scenarios listed below can only cause a failover if the standby CMM is in a healthier state than the active CMM. The health of the CMM is determined by computing a CMM health score, which is equal to the sum of the weights of the following active conditions. A CMM health score is determined for each CMM whenever any of these conditions occur on the active CMM. The CMM health score is composed of the sum of the weights of any of the three conditions listed below. Each condition has a default weight of 1 assigned to it, causing all conditions to have equal importance in causing failover.
To determine if a failover is necessary when one of these conditions occurs, the active CMM computes its CMM health score, and requests the health score of the standby CMM. If the score of the standby CMM is LESS than the score of the active CMM, a failover will occur. If a failover does not occur, the CMM SEL will contain an entry indicating the reason failover did not occur.
1. SNMPTrapAddress1 ping failure:
28 MPCMM0001 Chassis Management Module Software Technical Product Specification
The active CMM will failover to the standby CMM if the active CMM cannot ping its first SNMP trap address (SNMPTrapAddress1) over any of the available Ethernet ports, but the standby CMM can. The trap address is set using the command:
cmmset –l cmm –d snmptrapaddress1 –v [ip address]
Only a ping failure of the first SNMP trap address (SNMPTrapAddress1) can cause a failover. SNMPtrapaddress2 through SNMPtrapaddress5 do not perform this ping test.
Note: The frequency of the ping to the first trap address can vary from one second to approximately 20
seconds.
2. Critical events on the active CMM:
The active CMM has critical events for any of the CMM sensors (not critical chassis or blade events) and the standby CMM does not. If both CMMs have critical CMM events, then the number of major and minor CMM events is examined to decide if a failover should occur. The number of major events is compared, and if they are equal, the number of minor events is used.
3.6.3 Manual Failover
The following command can be issued to the active CMM to manually cause a failover to the standby CMM:
Redundancy, Synchronization, and Failover
cmmset -l cmm -d failover -v [1/any]
Where: 1: Will failover only to a CMM with the same or newer version of firmware. any: Will failover to any version of firmware. A manual failover can only be initiated on the active CMM. A failover will only occur if the
standby CMM is at least as healthy as the active CMM. Once the command executes, the former standby CMM immediately becomes the active CMM.
If the failover could not occur, the CLI will indicate the reason why the failover could not occur, and a SEL event will be recorded.
In addition, opening the ejector latch on the active CMM will initiate a failover, but only if the standby is at least as healthy as the active.
3.6.4 Scenarios That Force a Failover
The following scenarios cause a failover as long as the standby CMM is operational, even when it is less healthy than the active:
The active CMM is pulled out of the chassis.
The active CMM’s healthy signal is de-asserted.
A “reboot” command issued to the active CMM.
The front panel alarm quiet switch button on the active CMM is pushed for more than five
seconds. If the button continues to be pressed for more than 10 seconds, the CMM does not reset.
MPCMM0001 Chassis Management Module Software Technical Product Specification 29
Redundancy, Synchronization, and Failover

3.7 CMM Ready Event

The CMM Ready Event is a notification mechanism that informs the user when all CMM modules are fully up and running. The CMM is ready to process any request after receiving this event.
The CMM uses the "CMM Status" sensor when generating the CMM Not Ready event. Please refer to Table 46, “CMM Status Event Strings (CMM Status)” on page 118 for CMM status event strings.
Table 3. CMM Status Event Strings (CMM Status)
Event String Event Code Event Severity
“CMM is not ready.” 1024 Minor “CMM is ready.” 1025 OK “CMM is Active” 1026 OK “CMM is Standby” 1027 OK “CMM ready timed out” 1028 Minor
A CMM Not Ready Assertion SEL event is generated on a CMM when it transitions from standby mode to active mode during a failover or on the active CMM on power up. The event is only generated on the newly active CMM. The “CMM is Ready” event is generated after all CMM modules (board wrapper processes) are up and running and the SNMP daemon is active.
30 MPCMM0001 Chassis Management Module Software Technical Product Specification
Loading...
+ 251 hidden pages