Intel MPCMM0001 User Manual

Intel® NetStructure™ MPCMM0001 Chassis Management Module
Software Technical Product Specification
April 2005
Order Number: 273888-007
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PRO PERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEV ER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING T O FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PA TENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “un defined.” Intel reserves these for
future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
®
The Intel product to deviate from published specifications. Current characterized errata are available on request.
This Software Technical Product Specification as well as the software described in it is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in associat ion with this document.
Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the express written consent of Intel Corporation.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an ordering number and are referenced in this document, or other Intel literature may be obtained by calling
1-800-548-4725 or by visiting Intel's website at http://www.intel.com. AnyPoint, AppChoice, BoardWatch, BunnyPeople, CablePort, Celeron, Chips, CT Media, Dialogic, DM3, EtherExpress, ETOX, FlashFile, i386, i486,
i960, iCOMP, InstantIP , I ntel, Inte l Centrino, I ntel logo, Intel386, I ntel486, I ntel740, Int elDX2, Inte lDX4, IntelSX2, Intel Creat e & Share, Intel GigaBla de, Intel InBusiness, Intel Inside, Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel Play, Intel Play logo, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel TeamStation, Intel Xeon, Intel XScale, IPLink, Itanium, MCS, MMX, MMX logo, Optimizer logo, OverDrive, Paragon, PC Dads, PC Parents, PDCharm, Pentium, Pentium II Xeon, Pe ntium III Xeon, Pe rformance at Your Command, RemoteExpress, SmartDie, Solutions960, Sound Mark, StorageExpress, The Computer Inside., The Journey Inside, TokenExpress, VoiceBrick, VTune, and Xircom are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
*Other names and brands may be claimed as the property of others. Copyright © 2005, Intel Corporation. All rights reserved.
NetStructureTM MPCMM0001 Chassis Management Module may contain design defects or errors known as errata which may cause the
Contents
Contents
1 Introduction....................................................................................................................................16
1.1 Overview.............................................................................................................................16
1.2 Terms Used in this Document ............................................................................................16
2 Software Specifications .................................................................................................................18
2.1 Red Hat* Embedded Debug and Bootstrap (Redboot).......................................................18
2.2 Operating System........................ ... ... .... ... ... ... ....................................... ... .... ... ... ... ... ..........18
2.3 Command Line Interface (CLI) ...........................................................................................18
2.4 SNMP/UDP.........................................................................................................................18
2.5 Remote Procedural Call (RPC) Interface............................................................................19
2.6 RMCP .................................................................................................................................19
2.7 Ethernet Interfaces .............................................................................................................19
2.8 Sensor Event Logs (SEL) ...................................................................................................19
2.8.1 CMM SEL Architecture ..........................................................................................19
2.8.2 Retrieving a SEL....................................................................................................19
2.8.3 Clearing the SEL....................................................................................................20
2.8.4 Retrieving the Raw SEL.........................................................................................20
2.9 Blade OverTemp Shutdown Script .....................................................................................20
3 Redundancy, Synchronization, and Failover.................................................................................21
3.1 Overview.............................................................................................................................21
3.2 Synchronization ..................................................................................................................21
3.3 Heterogeneous Synchronization.........................................................................................23
3.3.1 SDR/SIF Synchronization......................................................................................23
3.3.2 User Scripts Synchronization and Configuration ...................... ... .... ......................23
3.3.3 Synchronization Requirements..............................................................................24
3.4 Initial Data Synchronization................................................................................................24
3.4.1 Initial Data Sync Failure.........................................................................................24
3.5 Datasync Status Sensor .....................................................................................................25
3.5.1 Sensor bitmap........................................................................................................25
3.5.2 Event IDs ...............................................................................................................25
3.5.3 Querying the Datasync Status...............................................................................25
3.5.4 SEL Event..............................................................................................................27
3.5.5 SNMP Trap............................................................................................................27
3.5.6 System Health .......................................... ... ... .... ...................................... ... .... ... ...28
3.6 CMM Failover .....................................................................................................................28
3.6.1 Scenarios That Prevent Failover . ... ... .... ... ... ... .... ... ... ... ... .......................................28
3.6.2 Scenarios That Failover to a Healthier Standby CMM...........................................28
3.6.3 Manual Failover .....................................................................................................29
3.6.4 Scenarios That Force a Failover..... ... .......................................... .... ... ...................29
3.7 CMM Ready Event..............................................................................................................30
4 Built-In Self Test (BIST).................................................................................................................31
4.1 BIST Test Flow ...................................................................................................................31
4.2 Boot-BIST ...........................................................................................................................33
4.3 Early-BIST ..........................................................................................................................33
4.4 Mid-BIST.............................................................................................................................33
MPCMM0001 Chassis Management Module Software Technical Product Specification 3
Contents
4.5 Late-BIST............................................................................................................................33
4.6 QuickBoot Feature..............................................................................................................34
4.6.1 Configuring QuickBoot.................................... ... .... ... .......................................... ...34
4.7 Event Log Area and Event Management............................................................................35
4.8 OS Flash Corruption Detection and Recovery Design .......................................................35
4.8.1 Monitoring the Static Images.................................................................................35
4.8.2 Monitoring the Dynamic Images............................................................................36
4.8.3 CMM Failover .................... ... ... .... ... ... ... .... ... ... .......................................................36
4.9 BIST Test Descriptions.......................................................................................................36
4.9.1 Flash Checksum Test............................................................................................36
4.9.2 Base Memory Test.................................................................................................36
4.9.3 Extended Memory Tests . ... ... ... .... ... ... .......................................... ... .......................36
4.9.4 FPGA Version Check.............................................................................................37
4.9.5 DS1307 RTC (Real-Time Clock) Test ...................................................................37
4.9.6 NIC Presence/Local PCI Bus Test.........................................................................37
4.9.7 OS Image Checksum Test........................... ... ... .... ... ... ..........................................37
4.9.8 CRC32 Checksum.................................................................................................37
4.9.9 IPMB Bus Busy/Not Ready Test............................................................................38
5 Re-enumeration.............................................................................................................................39
5.1 Overview.............................................................................................................................39
5.2 Re-enumeration on Failover...............................................................................................39
5.3 Re-enumeration of M5 FRU................................................................................................40
5.4 Resolution of EKeys .......................... ... .... ... ... ... .... ... ..........................................................40
5.5 Events Regeneration..........................................................................................................40
6 Process Monitoring and Integrity...................................................................................................41
6.1 Overview.............................................................................................................................41
6.1.1 Process Existence Monitoring ........................ ... .... ... ... ... ... .... ... ... ... .... ... ... ... ... .... ...41
6.1.2 Thread Watchdog Monitoring ............................ .... ......................................... .... ...41
6.1.3 Process Integrity Monitoring.... .... ... ... ... .... ... ... ... .... ... ... ... .......................................42
6.2 Processes Monitored..........................................................................................................42
6.3 Process Monitoring Targets...... ... ... ... .......................................... ... .... ... ... ... .... ...................42
6.4 Process Monitoring Dataitems.................................. ... ... .... ... ... ... ... .... ... .............................43
6.4.1 Examples...............................................................................................................43
6.5 SNMP MIB Commands.......................................................................................................44
6.6 Process Monitoring CMM Events .......................... ... ... ... .......................................... ... .... ...44
6.7 Failure Scenarios and Eventing..........................................................................................45
6.7.1 No Action Recovery........ ... ... ... .... .......................................... ... ... ..........................45
6.7.2 Successful Restart Recovery.................................................................................46
6.7.3 Successful Failover/Restart Recovery...................................................................47
6.7.4 Successful Failover/Reboot Recovery...................................................................48
6.7.5 Failed Failover/Reboot Recovery, Non-Critical......................................................48
6.7.6 Failed Failover/Reboot Recovery, Critical .............................................................49
6.7.7 Excessive Restarts, Escalate No Action......................... ... .... ... .............................50
6.7.8 Excessive Restarts, Successful Escalate Failover/Reboot....................................51
6.7.9 Excessive Restarts, Failed Escalate Failover/Reboot, Non-Critical ......................52
6.7.10 Excessive Restarts, Failed Escalate Failover/Reboot, Critical ..............................52
6.7.11 Process Administrative Action..... ... ... ... .... .......................................... ... ... ... ... .... ...53
6.7.12 Excessive Failover/Reboots, Administrative Action..................... ... .......................54
Contents
6.8 Process Integrity Executable (PIE).....................................................................................54
6.9 Configuring pms.ini.............................................................................................................55
6.9.1 Global Data........... .... ... ... ... ... ....................................... ... .... ... ................................55
6.9.2 Process Specific Data............................................................................................56
6.9.3 Process Definition Section of pms.ini.....................................................................58
6.10 Process Integrity Executable (PIE) Specific Data Config ...................................................64
6.10.1 PIE Section Name ....................................... ... .... ... ... ... ... .... ... ... .............................64
6.10.2 Process Integrity Executable .................................................................................65
6.10.3 Unique ID...............................................................................................................65
6.10.4 Administrative State...............................................................................................65
6.10.5 Process Integrity Interval .......................... ... ... .... ... ... ... ..........................................66
6.10.6 Chassis Applicability.......... ....................................... ... ... .... ... ... ... .... ... ... ... .............66
6.10.7 PmsPieSnmp Command Line................................................................................66
6.10.8 SNMP PIE Section of pms.ini ................................................................................66
6.11 WP/BPM PIE ......................................... ... ... ... .... ... ....................................... ... ... ... ... .... ......67
6.11.1 WP/BPM Section of pms.ini...................................................................................67
7 Power and Hot Swap Management...............................................................................................68
7.1 Hot Swap States.................................................................................................................68
7.2 FRU Insertion......................................................................................................................68
7.3 Graceful FRU Extraction........ ... ... ... ... .... ...................................... .... ... ... ... .... ... ...................68
7.4 Surprise FRU Extraction/IPMI Failure.................................................................................69
7.5 Forced Power State Changes.............................................................................................69
7.6 Power Management on the Standby CMM............ ... ... .... ... ... .............................................69
7.7 Power Feed Targets ...........................................................................................................69
7.8 Pinging IPMI Controllers........................... ... ... ....................................... ... .... ... ... ... ... .... ... ...70
8 The Command Line Interface (CLI)...............................................................................................71
8.1 CLI Overview ......................................................................................................................71
8.2 Connecting to the CLI.........................................................................................................71
8.2.1 Connecting through a Serial Port Console ............................................................71
8.3 Initial Setup— Logging in for the First Time. ... .... ... ....................................... ... ... ... ... .... ... ...72
8.3.1 Setting IP Address Properties...... ... ... .... ... ... ... .... ... .......................................... ... ...72
8.3.2 Setting a Hostname ............................... ... ... ... .... ... ... ... ... .......................................75
8.3.3 Setting the Amount of Time for Auto-Logout .........................................................75
8.3.4 Setting the Date and Time..................... ... ... ... .... ... ... ... ....................................... ...76
8.3.5 Telnet into the CMM ..............................................................................................76
8.3.6 Connect Through SSH (Secure Shell)...................................................................76
8.3.7 FTP into the CMM. ....................................... ... .... ... ... ....................................... ... ...76
8.3.8 Rebooting the CMM...............................................................................................76
8.4 CLI Command Line Syntax and Arguments .......................................................................77
8.4.1 Cmmget and Cmmset Syntax......... ... .... ... ... ... .... ......................................... .... ......77
8.4.2 Help Parameter: -h ............................................. ... ... ... ... .... ... ................................77
8.4.3 Location Parameter: -l ...........................................................................................77
8.4.4 Target Parameter: -t .............................................................................. ... ... .... ... ...78
8.4.5 Dataitem Parameter: -d ............................................... ....................................... ...80
8.4.6 Value Parameter: -v...............................................................................................97
8.4.7 Sample CLI Operations .........................................................................................97
8.5 Generating a System Status Report...................................................................................97
MPCMM0001 Chassis Management Module Software Technical Product Specification 5
Contents
9 Resetting the Password.................................................................................................................99
9.1 Resetting the Password in a Dual CMM System............................ .... ... ... ... .... ... ... ... ..........99
9.2 Resetting the Password in a Single CMM System ...........................................................100
10 Sensor Types ......................... ... ... ... .... ...................................... .... ... ... ... ... .... ... ...........................101
10.1 CMM Sensor Types........ ... ... .... ... ... ... ... .... ... .....................................................................101
10.2 Threshold-Based Sensors................................................................................................101
10.2.1 Threshold-Based Sensor Events.........................................................................101
10.3 CMM Voltage/Temp Sensor Thresholds...........................................................................102
10.4 Discrete Sensors ............................ ... ... .... ... ... ... ...............................................................102
10.4.1 Discrete Sensor Events..... ....................................... ... ... ... .... ... ... ... .... ... ... ...........103
11 Health Events..............................................................................................................................104
11.1 Syntax of Health Event Strings.........................................................................................104
11.1.1 Healthevents Query Event Syntax.......................................................................104
11.1.2 SEL Event Syntax................................. .... ... ... ... .... ... ... ... ... .... ... ... ... .... ... ... ...........104
11.1.3 SEL Sensor Types................................ .... ... ... ... ..................................................105
11.1.4 SNMP Trap Event Syntax....................................................................................105
11.2 Sensor Targets..................... .... ... ... ... ... .... ... .....................................................................106
11.3 Healthevents Queries.......................................................................................................107
11.3.1 HealthEvents Queries for Individual Sensors. ... .... ..............................................107
11.3.2 HealthEvents Queries for All Sensors on a Location...........................................108
11.3.3 No Active Events ............... ... ... .... ... ... ... .... ...........................................................108
11.3.4 Not Present or Non-IPMI Locations............................. ... ... .... ... ... ... .....................108
11.4 List of Possible Health Event Strings................................................................................108
11.4.1 All Locations ........................................................................................................109
11.4.2 CMM Location......................................................................................................115
11.4.3 Chassis Location .................................................................................................120
11.5 IPMI Error Completion Codes...........................................................................................120
11.5.1 Configuring IPMI Error Completion Codes ........................ .... ... ... ... .... ... ... ... ... .... .121
11.5.2 IPMI/IMB Error Message Format.........................................................................121
12 Front Panel LEDs...................... ..................................................................................................123
12.1 LED Types and States......................................................................................................123
12.1.1 Alarm LEDs..................... ... ... ... ....................................... ... .... ... ... ... .... .................123
12.1.2 Health LED ..........................................................................................................124
12.1.3 Hot Swap LED.....................................................................................................124
12.1.4 User Definable LEDs...........................................................................................124
12.2 Retrieving a Location’s LED properties ............................................................................124
12.3 Retrieving Color Properties of LEDs.................................................................................124
12.4 Retrieving the State of LEDs .......................... ....................................... ... ... .... ... ... ... ... .... .125
12.5 Setting the State of the User LEDs . ... ... ............................................................................125
12.6 LED Boot Sequence.........................................................................................................126
13 Node Power Control....................................................................................................................127
13.1 Node Operational State Management..............................................................................127
13.2 Obtaining the Power State of a Board............................ .... ... ... ... ... ..................................127
13.3 Controlling the Power State of a Board ............................................................................127
13.3.1 Powering Off a Board .............. .... ... ... ... .... ... ... ... .... ... ...........................................1 27
13.3.2 Powering On a Board .................. ... .....................................................................127
Contents
13.3.3 Resetting a Board................................................................................................128
14 Electronic Keying Manager..........................................................................................................129
14.1 Point-to-Point EKeying......................................................................................................129
14.2 Bused EKeying .................................................................................................................129
14.3 EKeying CLI Commands .......... ... ... ... .... ... .......................................... ... ... ........................1 29
15 CDMs and FRU Information .......................................... ... .......................................... ... ... ... .... ....130
15.1 Chassis Data Module......... .... ...................................... .... ... ... ... ... .... ... ... ... .... ....................130
15.2 FRU/CDM Election Process .............................................................................................130
15.3 FRU Information ...............................................................................................................130
15.4 FRU Query Syntax............................................................................................................131
16 Fan Control and Monitoring.........................................................................................................132
16.1 Automatic Fan Control......................................................................................................132
16.2 Querying Fan Tray Sensors - FantrayN location ..............................................................132
16.3 Fantray Cooling Levels.....................................................................................................132
16.4 CMM Cooling Manager Temperature Status....................................................................132
16.5 CMM Cooling Table..........................................................................................................133
16.5.1 Setting Values in the Cooling Table.....................................................................133
16.6 Control Modes for Fan Trays............................................................................................134
16.6.1 CMM Control Mode..............................................................................................134
16.6.2 Fantray Control Mode..........................................................................................134
16.6.3 Emergency Shutdown Control Mode...................................................................134
16.6.4 User Initiated Mode Change................................................................................135
16.6.5 Automatic Mode Change .....................................................................................135
16.7 Getting Temperature Statuses..........................................................................................135
16.8 Fantray Properties ............................................................................................................136
16.9 Retrieving the Current Cooling Level................................................................................136
16.10 Fantray Insertion...............................................................................................................136
16.11 Default Cooling Values .....................................................................................................137
16.11.1 Vendor Defaults ...................................................................................................137
16.11.2 Structure of /etc/cmm/fantray.cfg.........................................................................138
16.11.3 Code Defaults......................................................................................................138
16.11.4 Restoring Defaults ...............................................................................................138
16.12 Firmware Upgrade/Downgrade.........................................................................................138
16.13 Chassis vs. Fantray .................................. ... ... .... ... ... ........................................................139
16.14 Legacy Method of Querying/Setting Fan Speed...............................................................139
17 SNMP..........................................................................................................................................140
17.1 CMM MIB..........................................................................................................................141
17.2 MIB Design .......................................................................................................................141
17.2.1 MIB Tree..............................................................................................................141
17.2.2 CMM MIB Objects................................................................................................142
17.3 SNMP Agent.....................................................................................................................158
17.3.1 Configuring the SNMP Agent Port.......................................................................158
17.3.2 Configuring the Agent to Respond to SNMP v3 Requests ................................. .158
17.3.3 Configuring the Agent Back to SNMP v1.............................................................159
17.3.4 Setting up an SNMP v1 MIB Browser..................................................................159
17.3.5 Setting up an SNMP v3 MIB Browser..................................................................159
17.3.6 Changing the SNMP MD5 and DES Passwords..................................................159
Contents
17.4 SNMP Trap Utility.............................................................................................................160
17.4.1 Configuring the SNMP Trap Port.........................................................................160
17.4.2 Configuring the CMM to Send SNMP v3 Traps...................................................160
17.4.3 Configuring the CMM to Send SNMP v1 Traps...................................................160
17.5 Configuring and Enabling SNMP Trap Addresses................................. ... ... .... ... ... ... ... .... .160
17.5.1 Configuring an SNMP Trap Address ...................................................................161
17.5.2 Enabling and Disabling SNMP Traps ..................................................................161
17.5.3 Alerts Using SNMP v3....... ... ... .... ........................................................................161
17.5.4 Alert Using UDP Alert..........................................................................................161
17.6 SNMP Security .................................................................................................................162
17.6.1 SNMP v1 Security................................................................................................162
17.6.2 SNMP v3 Security - Authentication Protocol and Privacy Protocol .....................162
17.7 SNMP Trap Descriptions..................................................................................................162
17.8 Snmpd.conf File. ... ... .... ... ... ....................................... ... ... .... ... ... ... ... .... ... ... ........................163
18 CMM Scripting.... ... .... ... ... ... .... ... ....................................... ... ... ... .... ... ... ... .....................................164
18.1 CLI Scripting.....................................................................................................................164
18.1.1 Script Synchronization................................. ... ... .... ... ... ... ... .... ... ... ... .....................164
18.2 Event Scripting................................ ... ... .... ... ... ... .... ... ....................................... ... ... ... ........164
18.2.1 Listing Scripts Associated With Events................................................................165
18.2.2 Removing Scripts From an Associated Event .......... ...................................... .... .165
18.3 Setting Scripts for Specific Individual Events....................................................................165
18.3.1 Event Codes............ ....................................... ... .... ... ... ... ... .... ... ... ... .....................165
18.3.2 Setting Event Action Scripts ................................................................................166
18.4 Running CMM Event Scripts on CMM State Transitions
(Active/Standby/Ready/Not Ready).......................................... ... ... .... ... ... ... .....................166
18.4.1 Sensor Data Bits............................. ... ... .... ... ... ... .... ... ...........................................1 66
18.4.2 Retrieving the Value of the Data Sensor Bits ......................................................167
18.4.3 CMMReadyTimeout Value...................................................................................168
18.4.4 CMM State Transition Model....... ... ... ... .... ... ... ... .... ... ... ... ... .... ... ...........................168
18.5 FRU Control Script............................................................................................................169
18.5.1 Command line arguments....................................................................................170
18.5.2 Sample frucontrol file......................................... .... ... .......................................... .170
19 Remote Procedure Calls (RPC) ..................................................................................................174
19.1 Setting Up the RPC Interface ...........................................................................................174
19.2 Using the RPC Interface...................................................................................................174
19.2.1 GetAuthCapability() .............................................................................................175
19.2.2 ChassisManagementApi() ...................................................................................175
19.2.3 ChassisManagementApi() Threshold Response Format.....................................181
19.2.4 ChassisManagementApi() String Response Format ...... ... ..................................181
19.2.5 ChassisManagementApi() Integer Response Format..........................................185
19.2.6 FRU String Response Format .............................................................................186
19.3 RPC Sample Code ...........................................................................................................187
19.4 RPC Usage Examples......................................................................................................187
20 RMCP..........................................................................................................................................190
20.1 RMCP References............................................................................................................190
20.2 RMCP Modes ...................................................................................................................190
20.3 RMCP User Privilege Levels ............................................................................................191
20.4 RMCP Discovery ..............................................................................................................191
Contents
20.5 RMCP Session Activation.................. .... .......................................... ... ... ...........................191
20.6 RMCP Port Numbers........................................................................................................192
20.7 IPMB Slave Addresses.....................................................................................................193
20.8 CMM RMCP Configuration ...............................................................................................193
20.9 IPMI Commands Supported by CMM RMCP ........ ... ... .....................................................194
20.10 Configuring IPMI Command Privileges.............................................................................196
20.10.1 Sample cmdPrivillege.ini file................................................................................197
20.11 Completion Codes for the RMCP Messages....................................................................197
21 Command and Error Logging......................................................................................................199
21.1 Command Logging ...........................................................................................................199
21.2 Error Logging....................................................................................................................199
21.2.1 Error.log File ........................................................................................................199
21.2.2 Debug.log File......................................................................................................199
21.3 Cmmdump Utility ......... ... ... .... ...................................... .... ... ... ... ... .... ... ... ... .... ....................200
22 Application Hosting......................................................................................................................201
22.1 System Details................ ... .... ...................................... .... ... ... ... ... .... .................................201
22.2 Startup and Shutdown Scripts ..........................................................................................201
22.3 System Resources Available to User Applications...........................................................201
22.3.1 File System Storage Constraints .........................................................................201
22.3.2 RAM Constraints..................................................................................................202
22.3.3 Interrupt Constraints ............................................................................................203
22.4 RAM Disk Directory Structure...........................................................................................203
23 Updating CMM Software .............. .... ... .......................................... ... ... ... .... .................................204
23.1 Key Features of the Firmware Update Process................................................................204
23.2 Update Process Architecture............................................................................................204
23.3 Critical Software Update Files and Directories .................................................................205
23.4 Update Package...............................................................................................................205
23.4.1 Update Package File Validation................................... ... .....................................206
23.4.2 Update Firmware Package Version.....................................................................207
23.4.3 Component Versioning ........................................................................................207
23.5 saveList and Data Preservation........................................................................................207
23.6 Update Mode....................................................................................................................208
23.7 Update_Metadata File ......................................................................................................209
23.8 Firmware Update Synchronization/Failover Support ....................................... ... ... ... .... ... .209
23.9 Automatic/Manual Failover Configuration.........................................................................209
23.9.1 Setting Failover Configuration Flag .....................................................................210
23.9.2 Retrieving the Failover Configuration Flag...........................................................210
23.10 Single CMM System .........................................................................................................210
23.11 Redundant CMM Systems................................................................................................210
23.12 CLI Software Update Procedure.......................................................................................210
23.13 Hooks for User Scripts......................................................................................................211
23.13.1 Update Mode User Scripts...................................................................................211
23.13.2 Data Restore User Scripts...................................................................................212
23.13.3 Example Task—Replace /home/scripts/myScript................................................212
23.14 Update Process ................................................................................................................213
23.15 Update Process Status and Logging ................................................................................215
23.16 Update Process Sensor and SEL Events.........................................................................215
23.17 Redboot* Update Process................................................................................................215
MPCMM0001 Chassis Management Module Software Technical Product Specification 9
Contents
23.17.1 Required Set up..................................... .... ... ... ... .... ...................................... ... .... .215
23.17.2 Update Procedure................................................................................................215
24 Updating Shelf Components........................................................................................................217
25 IPMI Pass-Through......................................................................................................................218
25.1 Overview...........................................................................................................................218
25.2 Command Syntax and Interface.......................................................................................218
25.2.1 Command Request String Format.......................................................................218
25.2.2 Response String...... .... ... ... ... ... ....................................... ... .... ... ... ... .... ... ... ... ... .....2 19
25.2.3 Usage Examples..................................................................................................219
25.3 SNMP ...............................................................................................................................219
25.3.1 Usage Example ...................................................................................................219
26 FRU Update Utility.......................................................................................................................221
26.1 Overview...........................................................................................................................221
26.2 FRU Update Architecture..................................................................................................221
26.3 FRU Update Process........................................................................................................222
26.4 FRU Recovery Process....................................................................................................222
26.5 FRU Verification................................................................................................................223
26.6 FRU Display......................................................................................................................223
26.7 Setting the Library Path And Invoking the Utility...............................................................223
26.8 FRU Update Command Line Interface .............................................................................223
26.9 Using the Location Switch ................................................................................................224
26.10 Updating the FRU.............................................................................................................225
26.11 Getting the Inventory ........................................................................................................225
26.12 Viewing the Contents of the FRU .....................................................................................225
26.13 Getting the Contents of the FRU ......................................................................................225
26.14 Dumping the Contents of the FRU....................................................................................225
27 FRU Update Configuration File ...................................................................................................227
27.1 Configuration File Format.................................................................................................227
27.2 File Format........................................................................................................................227
27.3 String Constraints.............................................................................................................227
27.4 Numeric Constraints.........................................................................................................228
27.5 Tags..................................................................................................................................228
27.6 Control Commands...... ... ... ... .... ... ... .......................................... ... .....................................228
27.6.1 IFSET...................................................................................................................228
27.6.2 ELSE....................................................................................................................229
27.6.3 ENDIF..................................................................................................................229
27.6.4 SET......................................................................................................................229
27.6.5 CLEAR.................................................................................................................230
27.6.6 CFGNAME...........................................................................................................230
27.6.7 ERRORLEVEL.....................................................................................................230
27.7 Probing Commands..........................................................................................................230
27.7.1 PROBE................................................................................................................230
27.7.2 SYSTEM..............................................................................................................231
27.7.3 FRUVER..............................................................................................................231
27.7.4 BMCVER .............................................................................................................232
27.7.5 FOUND................................................................................................................232
27.8 Update Commands ..........................................................................................................233
Contents
27.8.1 FRUNAME...........................................................................................................233
27.8.2 FRUADDRESS....................................................................................................234
27.8.3 FRUAREA............................................................................................................234
27.8.4 MULTIREC ..........................................................................................................235
27.8.5 FRUFIELD ...........................................................................................................236
27.8.6 Input of Data ........................................................................................................240
27.9 Display Commands...........................................................................................................240
27.9.1 DISPLAY..............................................................................................................241
27.9.2 CONFIGURATION...............................................................................................241
27.9.3 Input Commands .. ...............................................................................................241
27.9.4 MENU ..................................................................................................................241
27.9.5 MENUTITLE ........................................................................................................242
27.9.6 MENUPROMPT...................................................................................................242
27.9.7 PROMPT .............................................................................................................242
27.9.8 YES......................................................................................................................243
27.9.9 NO .......................................................................................................................243
27.10 Command Quick Reference .............................................................................................243
27.11 Example Configuration File...............................................................................................246
27.11.1 Chassis Update Version 0 ...................................................................................246
27.11.2 Chassis Update Version 1 ...................................................................................249
28 Unrecognized Sensor Types.......................................................................................................253
28.1 System Events Overview................ ... .......................................... .... .................................253
28.2 System Events— SNMP Trap Support........................ .... ... ... ... ... .... ... ... ... .... ... ... ... ... ........254
28.2.1 SNMP Trap Header Format.................................................................................254
28.2.2 SNMP Trap ATCA Trap Text Translation Format................................................254
28.3 SNMP Trap Raw Format ..................................................................................................255
28.3.1 SNMP Trap Control .............................................................................................256
28.3.2 System Events— SEL Support............................................................................256
28.3.3 Configuring SEL Format ......................................................................................257
29 Warranty Information...................................................................................................................259
29.1 Intel
®
NetStructure™ Compute Boards and Platform Products Limited Warranty ...........259
29.2 Returning a Defective Product (RMA) ..............................................................................259
29.3 For the Americas ..............................................................................................................260
29.3.1 For Europe, Middle East, and Africa (EMEA) .............................. .... ... ... ... ... .... ... .260
29.3.2 For Asia and Pacific (APAC)................................................................................260
30 Customer Support .......................................................................................................................262
30.1 Customer Support.............................................................................................................262
30.2 Technical Support and Return for Service Assistance .....................................................262
30.3 Sales Assistance ...... ... ... ... .... ...........................................................................................262
31 Certifications................................................................................................................................263
32 Agency Information................. .....................................................................................................264
32.1 North America (FCC Class A)...... ... ... .... ...................................... .... ... ... ... .... ... ... ... ... .... ....264
32.2 Canada – Industry Canada (ICES-003 Class A) (English and French-translated below).264
32.3 Safety Instructions (English and French-translated below) ..............................................265
32.3.1 English.................................................................................................................265
32.3.2 French..................................................................................................................265
MPCMM0001 Chassis Management Module Software Technical Product Specification 11
Contents
32.4 Taiwan Class A Warning Statement.................................................................................266
32.5 Japan VCCI Class A....................................................... .... ... .......................................... .266
32.6 Korean Class A.................................................................................................................266
32.7 Australia, New Zealand.....................................................................................................266
33 Safety Warnings..........................................................................................................................267
33.1 Mesures de Sécurité.........................................................................................................268
33.2 Sicherheitshinweise..........................................................................................................270
33.3 Norme di Sicurezza .............................. ............................................................................272
33.4 Instrucciones de Seguridad..............................................................................................274
33.5 Chinese Safety Warning...................................................................................................276
Figures
1 BIST Flow Chart .........................................................................................................................32
2 Timing of BIST Stages................................................................................................................34
3 High Level SNMP/MIB Layout..................................................................................................140
4 CMM Custom MIB Tree............................................................................................................142
5 CMM Status State Diagram................. ... ... ... .... ... ... ... .... ... ... ... .... ... .......................................... .169
6 SNMPTrapFormat = 1 ..............................................................................................................255
7 SNMPTrapFormat = 2 ..............................................................................................................255
8 SNMPTrapFormat = 3 ..............................................................................................................255
Tables
1 Glossary .....................................................................................................................................16
2 CMM Synchronization ................................................................................................................22
3 CMM Status Event Strings (CMM Status) ..................................................................................30
4 BIST Implementation..................................................................................................................32
5 Processes Monitored.................... .... ... ... ... ... ..............................................................................42
6 No Action Recovery....................................................................................................................46
7 Successful Restart Recovery . ... ... .... ... ... .......................................... ... .......................................46
8 Successful Failover/Restart Recovery.... ... ... .... ..........................................................................47
9 Successful Failover/Reboot Recovery. ... ... ... .... ... ... .......................................... ... .... ... ... ... ... .......48
10 Failed Failover/Reboot Recovery, Non-Critical ..........................................................................49
11 Failed Failover/Reboot Recovery, Critical ..................................................................................50
12 Existence Fault, Excessive Restarts, Escalate No Action..........................................................50
13 Excessive Restarts, Successful Escalate Failover/Reboot ........................................................51
14 Excessive Restarts, Failed Escalate Failover/Reboot, Non-Critical .............................. ... ... .... ...52
15 Excessive Restarts, Failed Escalate Failover/Reboot, Critical................... ... ... ... .... ... ... ... ... .... ...53
16 Administrative Action..................................................................................................................53
17 Excessive Failover/Reboots, Administrative Action....................................................................54
18 Time to Delay and Number of Attempts .....................................................................................70
19 SETIP Interface Assignments when BOOTPROTO=”static” ......................................................74
20 SETIP Interface Assignments when BOOTPROTO=”dhcp”.......................................................75
21 Location (-l) Keywords................................................................................................................77
22 CMM Targets..............................................................................................................................79
23 Dataitem Keywords for All Locations..........................................................................................80
Contents
24 Dataitem Keywords for All Locations Except System.................................................................80
25 Dataitem Keywords for All Locations Except Chassis and System ............................................81
26 Dataitem Keywords for Chassis Location...................................................................................85
27 Dataitem Keywords for Cmm Location.......................................................................................86
28 Dataitem Keywords for System Location....................................................................................92
29 Dataitem Keywords for FantrayN Location.................................... ... ... .......................................93
30 Dataitem Keywords Used with the Target Parameter.................................................................94
31 CMM Voltage and Temp Sensor Thresholds................................. ... ... .... ... ... ... .... ... ... ... ... .... ... .102
32 CMM SEL Sensor Information..................................................................................................105
33 Sensor Targets.........................................................................................................................106
34 Threshold-Based Sensors: Voltage, Temp, Current, Fan.........................................................109
35 Hot Swap Sensor: Filter Tray HS, FRU Hot Swap....................................................................110
36 IPMB Link State Sensor: IPMB-0 Snsr [1-16]...........................................................................110
37 System Firmware Progress Event Strings (System Firmware Progress) .................................111
38 Watchdog 2 Sensor Event Strings............................................................................................113
39 CMM Redundancy....................................................................................................................115
40 CMM Trap Connectivity (CMM [1-2] Trap Conn)......................................................................115
41 CMM Failover ...........................................................................................................................115
42 CMM Synchronization...............................................................................................................116
43 BIST Event Strings .................... .... ... ... ... ... .... ... ... ... .... ... ... ... .......................................... ...........117
44 Chassis Data Module (CDM [1,2])............................................................................................118
45 Datasync Status............................. ... ... .......................................... ... ........................................118
46 CMM Status Event Strings (CMM Status) ................................................................................118
47 Process Monitoring Service Fault Event Strings (PMS Fault) ..................................................119
48 Process Monitoring Service Info Event Strings (PMS Info) ......................................................120
49 Chassis Events.........................................................................................................................120
50 IPMI Error Completion Codes and Enumerations.....................................................................121
51 System Health LED States.......................................................................................................123
52 CMM Health LED States...........................................................................................................124
53 CMM Hot Swap LED States ...................... .... ... ... ... .... ... ... ... .... ... ... ... ... .... ... ... ... ........................124
54 Ledstate Functions and Function Options................................................................................125
55 LED Event Sequence ...............................................................................................................126
56 Dataitems Used With FRU Target (-t) to Obtain FRU Information............................................131
57 CMM Cooling Table..................................................................................................................133
58 MIB II Objects - System Group.. .......................................... .... ... ... ... ... .... .................................141
59 MIB II - Interface Group................. ... ... ... ... .... ... ... .......................................... ... .... ... ... ... ... ........141
60 System Location (1.3.6.1.4.1.343.2.14.2.10.1).........................................................................143
61 Shelf Location (Equivalent to Chassis) (1.3.6.1.4.1.343.2.14.2.10.2).......................................144
62 ShelfTable/shelfEntry (1.3.6.1.4.1.343.2.14.2.10.2.50.1).........................................................144
63 Cmm Location (1.3.6.1.4.1.343.2.14.2.10.3)............................................................................146
64 CmmTable/cmmEntry (1.3.6.1.4.1.343.2.14.2.10.3.51.1).........................................................149
65 CmmFruTable/cmmFruEntry (1.3.6.1.4.1.343.2.14.2.10.3.52.1)..............................................151
66 CmmFruTargetTable (1.3.6.1.4.1.343.2.14.2.10.3.53.1)..........................................................151
67 CmmPmsTable/cmmPmsEntry (1.3.6.1.4.1.343.2.14.2.10.3.54.1)..........................................151
68 Blade# Location (1.3.6.1.4.1.343.2.14.2.10.4.[1-16]) ...............................................................152
69 Blade#TargetTable/blade#TargetEntry (1.3.6.1.4.1.343.2.14.2.10.4.[1-16].51.1)....................153
70 Blade#FruTable/blade#FruEntry (1.3.6.1.4.1.343.2.14.2.10.4.[1-16].52.1)..............................154
71 Blade#FruTargetTable/blade#FruTargetEntry (1.3.6.1.4.1.343.2.14.2.10.4.[1-16].53.1).........155
72 [FanTray/pem]Table/[fanTray/pem]Entry (1.3.6.1.4.1.343.2.14.2.10.[5/6].51.1) ......................155
73 [FanTray/pem]TargetTable/[fanTray/pem]TargetEntry (1.3.6.1.4.1.343.2.14.2.10.[5/6].52.1)..156
MPCMM0001 Chassis Management Module Software Technical Product Specification 13
Contents
74 [FanTray/pem]FruTable/[fanTray/pem]FruEntry (1.3.6.1.4.1.343.2.14.2.10.[5/6].53.1) ...........157
75 [FanTray/pem]FruTargetTable/[fanTray/pem]FruTargetEntry
(1.3.6.1.4.1.343.2.14.2.10.[5/6].54.1).......................................................................................158
76 SNMP v3 Security Fields For Traps .........................................................................................162
77 SNMP v3 Security Fields For Queries......................................................................................162
78 CMM State Transition Events and Event IDs ...........................................................................166
79 CMM Status Sensor Data Bits................... ... .... ... ... ... .... ... ... ... .... ... ... ... .....................................167
80 Error and Return Codes for the RPC Interface.........................................................................177
81 Threshold Response Formats ....................................... ... ... ... .... ......................................... .... .181
82 String Response Formats.........................................................................................................181
83 Integer Response Formats.......................................................................................................185
84 FRU Data Items String Response Format................................................................................186
85 RPC Usage Examples...................... ... ... ... ... .... ... ... ... .... .......................................... ... ..............187
86 RMCP Modes ...................................... ... ... ... .... ... ....................................... ... ... ... .... ... ... ...........190
87 RMCP Session Timers ............................................................................... ... ...........................192
88 RMCP Slave Addresses......................................................... .... ... ... ... ... .... ..............................193
89 IPMI Commands Supported by CMM RMCP...........................................................................194
90 RMCP Message Completion Codes.........................................................................................198
91 Flash #1....................................................................................................................................202
92 Flash #2....................................................................................................................................202
93 Flash #3....................................................................................................................................202
94 Flash #4....................................................................................................................................202
95 List of Critical Software Update Files and Directories ..............................................................205
96 Contents of the Update Package..............................................................................................206
97 SaveList Items and Their Priorities............... ............................................................................208
98 CMM Update Directions ...........................................................................................................209
99 Platform FRU Accessibility of the FRU Update Utility ..............................................................221
100 FruUpdate Utility Command Line Options................................................................................224
101 Probe Command Parameters....................... ............................................................................231
102 FRU Area String Specifications................................................................................................235
103 Multi-Record Selection Parameters..........................................................................................236
104 FRU Field First String Specifications........................................................................................237
105 FRU Field Maximum Allowed Lengths .....................................................................................237
106 FRU Field Second String Specification ....................................................................................238
107 Type Code Specification...........................................................................................................239
108 Command Quick Reference.....................................................................................................243
109 Probe Arguments Quick Reference..........................................................................................246
110 Results of Variable Settings ......... .... ... ... ... ... .... ... ... .......................................... ... .... ... ... ...........256
111 Example CLI Commands..........................................................................................................277
Revision History
Date Revision Description
April 2005 007 Firmware version 5.2
August 2004 006 Firmware version 5.1.0.757
April 2004 005
January 2004 004.1 Version 4.1 TPS
Contents
Version 5.1 TPS Added Re-Enumeration Section Added Process Monitoring Section
MPCMM0001 Chassis Management Module Software Technical Product Specification 15
Introduction

Introduction 1

1.1 Overview

The Intel® NetStructureTM MPCMM0001 Chassis Management Module is a 4U, single-slot CMM intended for use with AdvancedTCA* PICMG* 3.0 platforms. This document details the software features and specifications of the CMM. For information on hardware features for the CMM refer to the Intel specifications and other material can be found in Appendix B , “Data Sheet Reference.”
The CMM plugs into a dedicated slot in compatible systems. It provides centralized management and alarming for up to 16 node and/or fabric slots as well as for system power supplies, fans and power entry modules. The CMM may be paired with a backup for redundant use in high­availability applications.
The CMM is a special purpose single board computer (SBC) with its own CPU, memory, PCI bus, operating system, and peripherals. The CMM monitors and configures IPMI-based components in the chassis. When thresholds (such as temperature and voltage) are crossed or a failure occurs, the CMM captures these events, stores them in an event log, sends SNMP traps, and drives the Telco alarm relays and alarm LEDs. The CMM can query FRU information (such as serial number, model number, manufacture date, etc.), detect presence of components (such as fan tray, CPU board, etc.), perform health monitoring of each component, control the power-up sequencing of each device, and control power to each slot via IPMI.
®
NetStructure™ MPCMM0001 Hardware Technical Product Specification. Links to
Assumptions: This document assumes some basic Linux* knowledge and the ability to use Linux text editors such as vi.

1.2 Terms Used in this Document

Table 1. Glossary (Sheet 1 of 2)
Acronym Description
BIST Built-In Self Test CDM Chassis Data Module CLI Command Line Interface CMM Chassis Management Module DHCP Dynamic Host Configuration Protocol FFS Flash File System FIS Flash Image System FPGA Field-Programmable Gate Arrays FRU Field Replaceable Unit HS Hot Swap IPMI Intelligent Platform Management IPMB Intelligent Platform Management Bus
16 MPCMM0001 Chassis Management Module Software Technical Product Specification
Table 1. Glossary (Sheet 2 of 2)
Acronym Description
IPMI Intelligent Platform Management Interface LED Light Emitting Diode MIB Management Information Base
MIB II PEM Power Entry Module
PICMG PCI Industrial Computer Manufacturers’ Group RMCP Remote Management Control Protocol RPC Remote Procedural Calls SBC Single Board Computer SDR Sensor Data Record SEL System Event Log ShMC Shelf Management Controller SNMP Simple Network Management Protocol SSH Secure Socket Shell TFTP Trivial File Transfer Protocol UDP User Datagram Protocol WDT Watchdog Timer
RFC1213 - A standard Management Information Base for Network Management
Introduction
MPCMM0001 Chassis Management Module Software Technical Product Specification 17
Software Specifications

Software Specifications 2

2.1 Red Hat* Embedded Debug and Bootstrap (Redboot)

Upon initial power on, the CMM enters into the Redboot firmware to bootstrap the embedded environment. Upon execution, Redboot acts as a TFTP server and checks for a TFTP connection to a client. If a TFTP connection exists, Redboot will accept a firmware update that is pushed down from the client, check the firmware update for data integrity, and then write the update to the flash.
Note: Firmware updates using the Redboot TFTP method are supported for backwards compatibility.
However, updating from within the OS using the CLI is the preferred method of updating CMM firmware. For information on the firmware update process refe r to Section 23, “Updating CMM
Software” on page 204.
Under normal circumstances, Redboot runs through the standard diagnostics, memory setup, decompresses the OS kernel, and boots into that kernel.

2.2 Operating System

The CMM runs a customized version of embedded BlueCat* Linux* 4.0 on an Intel® 80321 processor with Intel the web at http://www.lynuxworks.com.
®
XScale® technology. Development support for BlueCat Linux is available on

2.3 Command Line Interface (CLI)

The Command Line Interface (CLI) connects to and communicates with the intelligent management devices of the chassis, boards, and the CMM itself. The CLI is an IPMI-based library of commands that can be accessed directly or through a higher-level management application. Administrators can access the CLI through Telnet, SSH, or the CMM’s serial port. Using the CLI, users can access information about the current state of the system including current sensor values, threshold settings, recent events, and overall chassis health, access and modify shelf and CMM configurations, set fan speeds, perform actions on a FRU, etc. The CLI is covered in Section 8,
“The Command Line Interface (CLI)” on page 71.

2.4 SNMP/UDP

The chassis management module supports both queries and traps on SNMP (Simple Network Management Protocol) v1 or v3. The SNMP version can be configured through the CLI interface. The default is for SNMP v1. A MIB for the entire platform is included with the CMM. The CMM can send out SNMP traps to up to five trap receivers.
Along with SNMP traps, the CMM sends UDP (User Datagram Protocol) alerts to port 10000. The content of these UDP alerts is the same as the SNMP traps. SNMP is covered in Section 17,
“SNMP” on page 140.
18 MPCMM0001 Chassis Management Module Software Technical Product Specification

2.5 Remote Procedural Call (RPC) Interface

In addition to the console command-line interface, the CMM can be administered by custom remote applications via remote procedure calls (RPC). RPC is covered in Section 19, “Remote
Procedure Calls (RPC)” on page 174.

2.6 RMCP

RMCP (Remote Management Control Protocol) is a protocol that defines a method to send IPMI packets over LAN. The RMCP server on the CMM can decode RMCP packages and forward the IPMI messages to the appropriate channels including: SBC blades, PEMs, and FanTrays or local destination within the CMM. When there is a responding IPMI message coming from SBC blades, PEMs, or FanTrays destined to RMCP client, the RMCP server will format this IPMI message into a RMCP message and send it to through the designated LAN interface back to originator. RMCP is covered in Section 20, “RMCP” on page 190.

2.7 Ethernet Interfaces

The CMM contains two Ethernet ports. The software can configure each of these ports to either the front panel, to the backplane, or to the rear transition module (RTM). Information on configuring the Ethernet interfaces is covered in Section 8.3.1, “Setting IP Address Properties” on page 72.
Software Specifications

2.8 Sensor Event Logs (SEL)

The AdvancedTCA CMM implements system event logs according to Section 3.5 of the PICMG
3.0 Specification. The SEL contained on the CMM is fully IPMI compliant.
2.8.1 CMM SEL Architecture
The MPCMM0001 uses a single flat SEL file stored locally in the /etc/cmm directory. The SEL maintains a list of all the sensor events in the shelf. Each of the managed devices may keep its own SEL records in local SELs, but the master copy for the shelf is maintained by the CMM.
The SEL is limited to 65536 bytes. In order to keep the SEL from getting full, which can cause loss of error logging, the SEL is checked every 15 minutes by the CMM, and if the size of the cmm_sel is greater than 40000 bytes, the SEL is archived in gzip format and saved in /home/log/SEL. The names of the saved logs will be cmm_sel.0.gz, cmm_sel.1.gz, and so on, to a maximum of 16 logs where they are then rolled over.
Note: Archived files should NEVER be decompressed on the CMM as the resulting prolonged flash file
writing could disrupt normal CMM operation and behavior. Using FTP, transfer the files to a different system before decompressing the archive using utilities such as gzip.
2.8.2 Retrieving a SEL
To retrieve a SEL from the CMM, issues the following command:
cmmget [-l location] -d sel
MPCMM0001 Chassis Management Module Software Technical Product Specification 19
Software Specifications
Where location is one of {cmm, blade[1-14], fantray1, PEM[1-2]}. Even though the CMM uses a single flat SEL for system events, the ‘cmmget’ command will filter the SEL and only return events associated with the provided location. Also, some individual FRUs may keep their own local SELs (i.e., blades).
2.8.3 Clearing the SEL
The following command will clear the SEL on both the active and the standby:
cmmset -d clearsel -v clear
Note: Since the CMM uses a single flat SEL for system events, this command clears the entire shelf SEL,
not just a filtered subset.
2.8.4 Retrieving the Raw SEL
To retrieve the SEL in its raw format from a location, issue the following command:
cmmget -l [location] -d rawsel

2.9 Blade OverTemp Shutdown Script

The CMM software includes predefined script settings specifically for the MPCBL0001 board, which will automatically shut down a board when the “baseboard temp” sensor on that board crosses the upper critical threshold. This is done to prevent a runaway thermal event on the board from occurring. If this functionality is needed when using boards other than the MPCBL0001, the user will need to associate the name of the thermal sensor and the threshold with the board shutdown script:
cmmset -l bladeN -d majoraction -t [temp sensor name] -v overtempbladepoweroff [Blade Number]
Please refer to Section 18, “CMM Scripting” on page 164 for more information on assocating a script to an event.
When using the CMM with boards other than the MPCBL0001, as long as there is no sensor name titled "baseboard temp" associated with the particular board being used, then there is no issue leaving these settings intact. If needed, to deactivate these settings for each physical slot, use the command:
cmmset -l bladeN -d majoraction -t “baseboard temp” -v none
where bladeN is the blade, corresponding to the physical slot number, on which to remove the automatic shutdown setting (blade[1-16]). Please refer to Section 18, “CMM Scripting” on
page 164 for more information on removing script actions.
20 MPCMM0001 Chassis Management Module Software Technical Product Specification
Redundancy, Synchronization, and Failover
Redundancy, Synchronization, and
Failover 3

3.1 Overview

The CMM supports redundant operation with automatic failover in a chassis using redundant CMM slots. In systems where two CMMs are present, one acts as the active shelf manager and the other as standby. Both CMMs monitor each other, and either can trigger a failover if necessary.
Data from the active CMM is synchronized to the standby CMM whenever any changes occur. Data on the standby CMM is overwritten. A full synchronization between active and standby CMMs occurs on initial power up, or any insertion of a new CMM.
The active CMM is responsible for shelf FRU information management when CMMs are in redundant mode.

3.2 Synchronization

To ensure critical files on the standby CMM match the data on the active CMM, the active CMM synchronizes its data with the standby CMM, overwriting any existing data on the standby CMM.
An exception to this is the password reset procedure, detailed in Section 9, “Resetting the
Password” on page 99. When the password reset switch is activated on the standby CMM, the
password will be synchronized to the active CMM. The CMMs will initially fully synchronize data from the active to the standby CMM just after
booting. An insertion of a new CMM will also cause a full synchronization from the active to the newly inserted standby. Date and time are synched every hour. Partial synchronization will also occur any time files are modified or touched via the Linux* “touch” command with the exception of all *.sif and *.bin files in the /etc/cmm directory.
The *.sif (ALL SIF files), and *.bin (SDR Files) files under /etc/cmm are synchronized only once (when the CMMs establish communication). A 'touch' on those files at any later time will not perform a sync operation. Also, any updates to these files always happen as part of the software updates and not in isolation.
Note: During synchronization, the health event LEDs on the standby CMM may blink on and off as the
health events that were logged in the SEL are synchronized. Below is a list of items that are synchronized between CMMs. During a full synchronization, all of
these files and data are synchronized. A change to any of these files results in that file being synched. The active CMM overwrites these files on the standby CMM.
There are two "levels" of files that get synchronized. In order to normally manage the chassis, the priority 1 files must be synchronized after power up or installation of a brand new CMM into the chassis. It is absolutely necessary that a standby CMM has the priority one files synched before a successful failover can occur. When a brand new CMM boots the first time as a standby, if a CMM
MPCMM0001 Chassis Management Module Software Technical Product Specification 21
Redundancy, Synchronization, and Failover
failover is forced before all priority 1 data items are synchronized to the standby CMM, the standby CMM can still become the active CMM but may not be able to properly manage the FRUs in the chassis.
Table 2. CMM Synchronization (Sheet 1 of 2)
File(s) or Data Description Path Priority
date and time Date and time IPMB 1
IP Address Settings
/etc/cmm.cfg CMM’s main configuration file Ethernet 1 /etc/cmm/cmm_sel System SEL Ethernet 1 /etc/cmm/sensors.ini Sensor Set Values Ethernet 1 Ekey Controller Structures Ekey Controller Structures Ethernet 1 Bused EKey Token info Bused EKey Token info Ethernet 1 IPMB User States IPMB User States Ethernet 1 Fan States Fan States Ethernet 1 Cooling State Cooling State Information Ethernet 1 User LED States User LED States Ethernet 1 SDR structures and SIPI Controller Info SDR structures and SIPI Controller Info Ethernet 1 PHM FRU state, Power Usage and
Power Info FIM FRU Cache (Local and Temp) FIM FRU Cache (Local and Temp) Ethernet 1 SEL Time SEL Time IPMB 1 SEL Events Individual SEL Events IPMB 1 /etc/cmm/fantray.cfg Fantray settings needed by cooling manager Ethernet 1
/etc/cmm.ini /etc/passwd Password file Ethernet 2
/etc/shadow Password file Ethernet 2 /etc/cmdPrivillege.ini /etc/cmm/*.bin All SDR Files Ethernet 2
/etc/cmm/*.sif All SIF Files Ethernet 2 /etc/var/snmpd.conf SNMP configuration files Ethernet 2 /etc/snmpd.conf SNMP configuration files Ethernet 2 /home/scripts Entire user scripts area Ethernet 2 Prompt file Prompt file Ethernet 2 /etc/actionscripts.cfg Event action settings Ethernet 2
CMM eth1, eth1:1, and eth0 IP address settings to allow CMMs to discover the other’s IP information.
PHM FRU state, Power Usage and Power Info
Provides configuration values like the bus mapping
Provides privilege related configuration values for RMCP
IPMB 1
Ethernet 1
Ethernet 2
Ethernet 2
22 MPCMM0001 Chassis Management Module Software Technical Product Specification
Redundancy, Synchronization, and Failover
Table 2. CMM Synchronization (Sheet 2 of 2)
File(s) or Data Description Path Priority
Issues files Issues files Ethernet 2
/usr/local/cmm/temp/pmssync.ini
/usr/local/cmm/temp/pmsshadowsync.ini
Recovery Action and escalation action for all the monitored processes except monitor process
Recovery action and escalation action for monitor process
Note: The /.rhosts file is used for synchronization and should NEVER be modified.

3.3 Heterogeneous Synchronization

Beginning in version 5.2 firmware, the CMM can synchronize data between differing CMM versions. The firmware delineates synchronization from firmware versioning, thus allowing seamless synchronization between all CMM versions. A form of internal data versioning maintained by the CMM helps achieve this.
Note: SDR/SIF and user scripts differ slightly in synchronization architecture as described below .
Ethernet 2
Ethernet 2
3.3.1 SDR/SIF Synchronization
Sensor Data Records (SDRs) and Sensor Information Files (SIFs) will be synchronized only between CMMs having the same version for this data item (even if the CMM firmware versions differ).
3.3.2 User Scripts Synchronization and Configuration
By default, user scripts are synchronized only between CMM’s with same firmware versions. User can control the user scripts synchronization irrespective of CMM version differences by modifying the value of a configuration flag - "SyncUserScripts" (in the CMM configuration file, cmm.cfg under /etc). The configuration flag can be modified using the cmmget/cmmset commands. This flag can be read/set through any of the CMM interfaces (i.e., CLI, SNMP and RPC).
Only when CMM firmware versions differ will the value of this flag determines if user scripts should be synchronized or not. Between same firmware versions, the user scripts directory will continue to be synchronized and this flag ignored.
3.3.2.1 Setting User Scripts Sync Configuration Flag
T o set the value of the Scripts Synchronization configuration flag, the following CMM command is used:
cmmget -l cmm -d syncuserscripts -v [equal/upgrade/downgrade/always]
Where: equal: Synchronizes user scripts only when the CMM versions are same. This is the default value.
MPCMM0001 Chassis Management Module Software Technical Product Specification 23
Redundancy, Synchronization, and Failover
upgrade: Synchronizes user scripts only when the other CMM has a newer firmware version. downgrade: Synchronizes user scripts only when the other CMM has an older firmware version. always: Synchronizes user scripts irrespective of version differences.
3.3.2.2 Retrieving User Scripts Sync Configuration Flag
To retrieve the value of the Scripts Synchronization configuration flag, the following CMM command is used:
cmmget -l cmm -d syncuserscripts
The value returned will be one of: Equal, Upgrade, Downgrade, Always, or Error on failure.
3.3.3 Synchronization Requirements
For synchronization to occur:
The CMMs must be able to communicate with each other over their dedicated IPMB. The
CMMs use a heartbeat via their dedicated IPMB to determine if they can communicate with each other over IPMB.
An Ethernet connection must exist between the two CMMs. The CMMs must be able to ping
each other via Ethernet for synchronization to be successful. This can be a connection through the Ethernet switches in the chassis, which requires both switches to be present in the chassis; a connection can occur through an external Ethernet switch connected to the front ports of the CMM pair, or alternatively, the connection can be a crossover cable connecting the two front ports of the CMM pair. If synchronization fails on eth1, then it will be attempted on eth0. If the CMMs cannot successfully ping each other via eth0 or eth1, then synchronization between the CMMs cannot occur.
A failure of any priority 1 synchronization will result in a health event being logged in the CMM SEL and will inhibit a failover from occurring.

3.4 Initial Data Synchronization

It is absolutely necessary that a standby CMM has the priority one files synched before a successful failover can occur. A standby CMM can still become active if all priority one synchronization has not been completed, but it may not be able to properly manage all the FRU’s in the chassis.
The CMM implements the “Datasync Status” sensor to determine the state of synchronization and if synchronization has completed. successfully.
3.4.1 Initial Data Sync Failure
If CMM encounters any failure during data synchronization it marks the data synchronization failure and logs a SEL event and sends an SNMP trap. Duplicate failures are not reported multiple times. As soon as CMM is out of failure condition it will reset data synchronization failure state.
The CMM will continue trying to synchronize as long as there are two CMMs present in the chassis and they are able to communicate via their cross-connected IPMB.
24 MPCMM0001 Chassis Management Module Software Technical Product Specification

3.5 Datasync Status Sensor

A sensor named “Datasync Status” exists in order to make the Datasync state information available to the user. This sensor tracks the status of the Datasync module and will make its status available through the various CMM interfaces. This sensor is used to query the data synchronization states, and log SEL events for initial synchronization complete event. It is a discrete OEM sensor with status bits representing the state of different parts of the Datasync module.
Note: The Datasync Status sensor can only be queried through the active CMM.
3.5.1 Sensor bitmap
When the Datasync starts the first time through in a dual CMM system and whenever the CMM changes between Active and Standby, the status bits are all cleared to 0x0000.
Bit 0 (Running) is set when the datasync module is active.
Bit 1 (P1Done) is set when the priority 1 data syncs are done, and cleared when priority 1 data
needs to be synced.
Bit 2 (P2Done) is set when the priority 2 data syncs are done, and cleared when a priority 2
data needs to be synced.
Bit 3 (InitSyncDone) is set when both priority 1 and priority 2 data syncs are done, and stays
set (latches) until the CMM changes between Active and Standby, or looses contact with the partner CMM.
Bit 4 (SyncError) is set if an error was detected, and cleared when no data items have errors.
Redundancy, Synchronization, and Failover
3.5.2 Event IDs
The “Datasync Status” sensor will use event ids 0x420 to 0x42f. The following new event ids are used to log various events for these requirements. These event ID’s can be used to associated scripts with the respective events.
Event Event ID
Initial Data Synchronization complete 0x420 (1056)
3.5.3 Querying the Datasync Status
The status of the data synch sensor can be queried using the following CLI command:
cmmget –l cmm –t “Datasync Status” –d current
Output of the command is as follows:
Initial State:
The current value is 0x0001
Initial Data Synchronization is not complete.
There is Priority 1 data to sync.
MPCMM0001 Chassis Management Module Software Technical Product Specification 25
Redundancy, Synchronization, and Failover
There is Priority 2 data to sync.
No Data Synchronization problems known.
Initial Data Synch Incomplete, Pri 1 Data Synced, Pri 2 Data Not Synched
The current value is 0x0003
Initial Data Synchronization is not complete.
Priority 1 data is synced.
There is Priority 2 data to sync.
No Data Synchronization problems known.
Initial Data Sync is complete, Priority 1 and Priority 2 are also synced
The current value is 0x000f
Initial Data Synchronization is complete.
Priority 1 data is synced.
Priority 2 data is synced.
No Data Synchronization problems known.
Initial Data Sync failure
The current value is 0x0013
Initial Data Synchronization is not complete.
Priority 1 data is synced.
There is Priority 2 data to sync.
Data Synchronization has encountered a problem in synchronizing data.
Initial Data Sync is complete and Priority 1 data is changed
The current value is 0x000d
Initial Data Synchronization is complete.
There is Priority 1 data to sync.
Priority 2 data is synced.
No Data Synchronization problems known
Data Sync failure of Priority 1 Data occurs after Initial Data Sync and there is a Data Sync Problem
The current value is 0x001d
Initial Data Synchronization is complete.
There is Priority 1 data to sync
26 MPCMM0001 Chassis Management Module Software Technical Product Specification
Priority 2 data is synced.
Data Synchronization has encountered a problem in synchronizing data.
Data Sync becomes normal after Data Sync failure
The current value is 0x000f
Initial Data Synchronization is complete.
Priority 1 data is synced.
Priority 2 data is synced.
No Data Synchronization problems known
Single CMM
The current value is 0x0000
Datasync disabled - there is no partner CMM present.
3.5.4 SEL Event
Redundancy, Synchronization, and Failover
The Datasync Status sensor generates the following two SEL events:
When the active CMM is or becomes the only CMM, or the active CMM loses communication
with the standby CMM, the following event will be logged:
[Day] [Month] [Date] [Time] [Year]
CMM[n]: CMM Datasync Status Initial Data Synchronization is complete. Deasserted
The following event will be logged in the SEL when initial data synchronization is complete:
[Day] [Month] [Date] [Time] [Year]
CMM[n]: CMM Datasync Status Initial Data Synchronization is complete. Asserted
Where n: The number of the CMM generating the event.
3.5.5 SNMP Trap
The Datasync Status sensor generates following two SNMP traps:
When the active CMM is or becomes the only CMM, or the active CMM loses communication
with the standby CMM, the following SNMP trap will be generated.
[Month] [Date] [Time] [hostname] snmptrapd[xxxxx]: [IP Address]: Enterprise Specific Trap (25) Uptime: [Time], SNMPv2­SMI::enterprises.343.2.14.1.5 = STRING: "Time : [Day] [Month] [Date] [Time] [Year], Location : [location] , Chassis Serial # : [xxxxxxxx], Board : CMM[x] , Sensor : Datasync Status , Event : Initial Data Synchronization complete: Deasserted "
MPCMM0001 Chassis Management Module Software Technical Product Specification 27
Redundancy, Synchronization, and Failover
When initial data synchronization is complete, the following SNMP trap is generated:
[Month] [Date] [Time] [hostname] snmptrapd[xxxxx]: [IP Address]: Enterprise Specific Trap (25) Uptime: [Time], SNMPv2­SMI::enterprises.343.2.14.1.5 = STRING: "Time : [Day] [Month] [Date] [Time] [Year], Location : [location] , Chassis Serial # : [xxxxxxxx], Board : CMM[x] , Sensor : CMM[x]:Datasync Status , Event : Initial Data Synchronization is complete. Asserted "
3.5.6 System Health
The “Datasync Status” sensor will not contribute to the system health. However sync failures are captured by the “File Sync Failure” sensor and it contributes to the system health

3.6 CMM Failover

Once information is synchronized between the redundant CMMs, the active CMM will constantly monitor its own health as well as the health of the standby CMM. In the event of one of the scenarios listed in the sections that follow, the active CMM will automatically failover to the standby CMM so that no management functionality is lost at any time.
3.6.1 Scenarios That Prevent Failover
The following are reasons a failover can NOT occur:
The active CMM can NOT communicate with the standby CMM via their IPMB bus.
Not all priority 1 data has been completely synchronized between the CMMs.
To determine the active CMM at anytime, use the CLI command:
cmmget -l cmm –d redundancy
This command will output a list stating if both CMMs are present, which one is the active CMM, and which CMM you are logged in to. CMM1 is the CMM on the left when looking from the front of the chassis, and CMM2 is on the right.
3.6.2 Scenarios That Failover to a Healthier Standby CMM
The scenarios listed below can only cause a failover if the standby CMM is in a healthier state than the active CMM. The health of the CMM is determined by computing a CMM health score, which is equal to the sum of the weights of the following active conditions. A CMM health score is determined for each CMM whenever any of these conditions occur on the active CMM. The CMM health score is composed of the sum of the weights of any of the three conditions listed below. Each condition has a default weight of 1 assigned to it, causing all conditions to have equal importance in causing failover.
To determine if a failover is necessary when one of these conditions occurs, the active CMM computes its CMM health score, and requests the health score of the standby CMM. If the score of the standby CMM is LESS than the score of the active CMM, a failover will occur. If a failover does not occur, the CMM SEL will contain an entry indicating the reason failover did not occur.
1. SNMPTrapAddress1 ping failure:
28 MPCMM0001 Chassis Management Module Software Technical Product Specification
The active CMM will failover to the standby CMM if the active CMM cannot ping its first SNMP trap address (SNMPTrapAddress1) over any of the available Ethernet ports, but the standby CMM can. The trap address is set using the command:
cmmset –l cmm –d snmptrapaddress1 –v [ip address]
Only a ping failure of the first SNMP trap address (SNMPTrapAddress1) can cause a failover. SNMPtrapaddress2 through SNMPtrapaddress5 do not perform this ping test.
Note: The frequency of the ping to the first trap address can vary from one second to approximately 20
seconds.
2. Critical events on the active CMM:
The active CMM has critical events for any of the CMM sensors (not critical chassis or blade events) and the standby CMM does not. If both CMMs have critical CMM events, then the number of major and minor CMM events is examined to decide if a failover should occur. The number of major events is compared, and if they are equal, the number of minor events is used.
3.6.3 Manual Failover
The following command can be issued to the active CMM to manually cause a failover to the standby CMM:
Redundancy, Synchronization, and Failover
cmmset -l cmm -d failover -v [1/any]
Where: 1: Will failover only to a CMM with the same or newer version of firmware. any: Will failover to any version of firmware. A manual failover can only be initiated on the active CMM. A failover will only occur if the
standby CMM is at least as healthy as the active CMM. Once the command executes, the former standby CMM immediately becomes the active CMM.
If the failover could not occur, the CLI will indicate the reason why the failover could not occur, and a SEL event will be recorded.
In addition, opening the ejector latch on the active CMM will initiate a failover, but only if the standby is at least as healthy as the active.
3.6.4 Scenarios That Force a Failover
The following scenarios cause a failover as long as the standby CMM is operational, even when it is less healthy than the active:
The active CMM is pulled out of the chassis.
The active CMM’s healthy signal is de-asserted.
A “reboot” command issued to the active CMM.
The front panel alarm quiet switch button on the active CMM is pushed for more than five
seconds. If the button continues to be pressed for more than 10 seconds, the CMM does not reset.
MPCMM0001 Chassis Management Module Software Technical Product Specification 29
Redundancy, Synchronization, and Failover

3.7 CMM Ready Event

The CMM Ready Event is a notification mechanism that informs the user when all CMM modules are fully up and running. The CMM is ready to process any request after receiving this event.
The CMM uses the "CMM Status" sensor when generating the CMM Not Ready event. Please refer to Table 46, “CMM Status Event Strings (CMM Status)” on page 118 for CMM status event strings.
Table 3. CMM Status Event Strings (CMM Status)
Event String Event Code Event Severity
“CMM is not ready.” 1024 Minor “CMM is ready.” 1025 OK “CMM is Active” 1026 OK “CMM is Standby” 1027 OK “CMM ready timed out” 1028 Minor
A CMM Not Ready Assertion SEL event is generated on a CMM when it transitions from standby mode to active mode during a failover or on the active CMM on power up. The event is only generated on the newly active CMM. The “CMM is Ready” event is generated after all CMM modules (board wrapper processes) are up and running and the SNMP daemon is active.
30 MPCMM0001 Chassis Management Module Software Technical Product Specification
Built-In Self Test (BIST)

Built-In Self Test (BIST) 4

The CMM provides for a Built-In Self Test (BIST). The test is run automatically after power up. This test detects flash corruption as well as other critical hardware failures.
Results of the BIST are displayed on the console through the serial port during boot time. Results of BIST are also available through the CLI if the OS successfully boots. If the BIST detects a fatal error, the CMM is not allowed to function as an active CMM.

4.1 BIST Test Flow

The following state diagram shows the order of the tests RedBoot runs following a pow er-up or front-panel reset. On every state before reaching active CMM, if there is an error , RedBoot will log the error event into the EEPROM, route the error message to the serial port, and continue booting. If the execution hangs before the OS loads due to the nature of the error, the CMM hangs. If the OS successfully boots, it alerts users to any errors that occurred during boot.
MPCMM0001 Chassis Management Module Software Technical Product Specification 31
Built-In Self Test (BIST)
C
Figure 1. BIST Flow Chart
Jump to
run from
RB
RB image pass
Power Up/
Reset
BlueCat
loaded (active
Run from
backup
RB
RB image
and backup
RB image checksum
RB image fail
backup FPGA image pass and FPGA image fail
FPGA image
and backup
FPGA image
checksum
NOT (backup FPGA image
pass and
FPGA image
fail)
CMM)
IPMB
Bus Test
BlueCat
Image
hecksum
FPGA,
DS1307, NIC
Load backup FPGA image
Memory Test
Load FPGA
image
The BIST has been broken down into stages consisting of groups of tests that run at certain times throughout the boot process. The following table shows the different BIST stages and the tests associated with each stage:
Table 4. BIST Implementation
Boot-BIST Early-BIST Mid-BIST Late-BIST
RedBoot image checksum
FPGA image checksum FPGA version check IPMB bus test Base memory test DS1307 RTC test
Strobe WDT to extend timeout period
Extended memory test BlueCat image checksum
Local PCI bus/NIC presence test
32 MPCMM0001 Chassis Management Module Software Technical Product Specification

4.2 Boot-BIST

The codes in Boot-BIST are executed at the very early stage of the RedBoot bootstrap, which is just before the FPGA programming and memory module initialization. Boot-BIST performs checksum checking over the RedBoot image and the FPGA image. A checksum error will be detected if there is a mismatch between the calculated checksum and the stored checksum in FIS directory.
Boot-BIST also performs a Base Memory T est for the fi rst 1 MByte of memo ry. Whenever there is an error, BIST will inform the user by prompting a warning message throug h the console terminal and log the event to event-log area.

4.3 Early-BIST

The early BIST stage extends the reset timeout period on the watchdog timer (MAX6374) by strobing GPIO7 on FPGA1. This prevents any possible hardware reset during the BIST process. The watchdog timer is enabled after the ADM1026 GPIO initialization and disabled once it reaches the RedBoot console. The OS enables the watchdog timer again and starts the strobing thread at the kernel level.
Built-In Self Test (BIST)

4.4 Mid-BIST

This stage of BIST performs the Extended Memory Test to scan and diagnose the possible bit errors in the memory. It starts scanning from 1 MByte to the 128 MByte. It does not test the memory below 1 MByte because a portion of RedBoot has already loaded and resided on it.
The memory test includes the walking ones test 32-bit address test, and 32-bit inverse address test. Furthermore, voltage and temperature ratings will be verified to lie within the hardware tolerable ranges. The FPGA firmware version is checked and will alert if an older version of an FPGA image has been detected. Also, system date and time is read from the real-time clock and displayed through the console terminal. NIC presence is also checked here, though the NIC self-test happens later when the driver is loaded.

4.5 Late-BIST

Late-BIST disables the watchdog timer once RedBoot is fully loaded. It then verifies the checksum of the OS image with a stored checksum at the top of flash memory, before proceeding with the boot script execution.
The following diagram shows the times during the boot cycle the when various stages of BIST are performed.
MPCMM0001 Chassis Management Module Software Technical Product Specification 33
Built-In Self Test (BIST)
Figure 2. Timing of BIST Stages
Boot-BIST
Early-BIST
HAL in it ia liz a tio n
(processor, cache,
serial port)
FPGA
programming
Memory
parameters
initialization
Module
initialization
(flash, zlib, ide)
Mid-BIST

4.6 QuickBoot Feature

This feature will skip all the diagnostics tests in the mid-BIST and late-BIST, once it has been enabled. However, Flash Test and Base Memory Test in the boot-BIST will still execute, even with this feature enabled. The default setting is QuickBoot enabled.
When QuickBoot feature has been disabled, user has the choice to optionally enable or disable the Extended Memory Test (in mid-BIST) and the OS Image Checksum Test (in late-BIST) individually.
4.6.1 Configuring QuickBoot
Module
initialization
(ethernet interface)
Late-BIST
Display c o pyright
banner, and
execute boot script
Done
RedBoot> fconfig ... Enable QuickBoot during BIST: false
34 MPCMM0001 Chassis Management Module Software Technical Product Specification
Execute extended memory test: true OS image checksum at boot: true ... Update RedBoot non-volatile configuration - are you sure (y/n)? y
The default 'Enable QuickBoot during BIST' is true. When 'Enable QuickBoot during BIST' set to false, there will be two additional options displayed in the configuration menu. They are 'Execute extended memory test' and 'OS image checksum at boot' options. User can selectively enable one or both tests during the QuickBoot disabled mode. Both options will not be shown in the configuration menu if the QuickBoot is enabled. These options will go into effect during the next boot.

4.7 Event Log Area and Event Management

Errors detected by the BIST are stored in an event log. The event-log area is designed to have up to 269 entries. Each entry is 14 bytes. The event-log area is located in EEPROM on the CMM. The BIST can place entries into the event log until it becomes full. Once full, any new entries will be lost. The BIST event log is cleared by the OS once the OS logs any BIST errors into the SEL.
At OS start-up, the CMM reads the contents of BIST results in the reserved event log area and stores the errors as entries in the CMM SEL. This allows the CMM application to take the appropriate action based upon the SEL events as a result of RedBoot BIST tests. If there is not enough space to log the events in the CMM SEL, no results are logged to the CMM SEL.
Built-In Self Test (BIST)
The BIST event log is erased only after the event log is stored into the CMM SEL. Event strings for BIST events are listed in Section 11, “Health Events” on page 104.

4.8 OS Flash Corruption Detection and Recovery Design

The OS is responsible for the flash content integrity at runtim e. Flash monit oring under the OS environment can be divided into two parts: Monitoring static images and monitoring dynamic images.
Static images refer to the RedBoot image, FPGA image and BlueCat image in flash. These images should not change throughout the lifetime of the CMM unless they are purposely updated or corrupted. The checksum for these files is written into flash when the images are uploaded.
Dynamic image refers to the OS Flash File System (JFFS2). This image dynamically changes throughout the runtime of the OS.
4.8.1 Monitoring the Static Images
A static test is run every 24 hours during CMM operation. The static test reads each static image (RedBoot, FPGA, BlueCat), calculates the image checksum, and compares with the checksum in the RedBoot configuration area (FIS). If the checksum test fails, the error is logged to the CMM SEL.
MPCMM0001 Chassis Management Module Software Technical Product Specification 35
Built-In Self Test (BIST)
4.8.2 Monitoring the Dynamic Images
For monitoring the dynamic images, the CMM leverages the corruption detection ability from the JFFS(2) flash file system. At OS start-up, the CMM executes an initialization script to mount the JFFS(2) flash partitions (/etc and /home). If a flash corruption is detected, an event is logged to the CMM SEL.
During normal OS operation, flash corruption during file access can also be detected by the JFFS(2) and/or the flash driver. If a flash corruption is detected, an event is logged to the CMM SEL.
4.8.3 CMM Failover
If during normal OS operation a critical error occurs on the active CMM, such as a flash corruption, the standby CMM is checked to see if it is in a healthier state. If the standby CMM is in a healthier state, then a failover will occur. See Section 3, “Redundancy, Synchronization, and
Failover” on page 21.

4.9 BIST Test Descriptions

4.9.1 Flash Checksum Test
This test is targeted to verify the RedBoot image and FPGA image are not corrupted. This test calculates the CRC32 checksum from the RedBoot image, then compares with the image checksum stored in the FIS directory. If one mismatches another, BIST switches to the backup image. If checksum mismatch was found from the FPGA image, BIST loads the backup image to program the FPGA device.
4.9.2 Base Memory Test
This test writes the data pattern of 55AA55AA into every 4 bytes of the memory below 1 MByte. Its objective is to verify the wire connectivity of address and data pins between the memory module and the processor. The test first writes the data pattern into the complete first 1 MByte, then verifies the written data pattern by reading them from the memory module. If the data pattern mismatches, the test logs the error event into the event-log area and routes the error message to the serial port.
4.9.3 Extended Memory T ests
Walking Ones Test
This test is targeted to verify the data bus wiring by testing the bus one bit at a time. The data bus passes the test if each data bit can be set to 0 and 1 independently of the other data bits.
32-Bit Address Test This test is targeted to verify the address bus wiring. The smallest set of addresses that will cover
all possible combinations is the set o f “power - of-two” addresses. These addresses are analogous to the set of data values used in the walking ones test. The corresponding memory locations are 0001h, 0002h, 0004h, 0008h, 0010h, 0020h, and so on. In addition, address 0000h must also be tested. To confirm that no two memory locations overlap, initial data value is first written at each power-of-two offset within the device. Then a new value is written–an inverted copy of the initial
36 MPCMM0001 Chassis Management Module Software Technical Product Specification
value to the first test offset. It is then verified that the initial data value is still stored at every other power-of-two offset. If a location is found, other than the one just written, that contains the new data value, there is a problem with the current address bit. If no overlapping is found, the procedure is repeated for each of the remaining offsets.
32-Bit Inverse Address Test
This test behaves similarly to the memory test described above, except the addresses are tested in the inverse direction. This test helps to identify a broader scope of possible addressing errors inherent in the memory modules.
4.9.4 FPGA Version Check
This test is targeted to verify the correct FPGA image programmed into both FPGA chips. It displays the FPGA version on both FPGAs. Both versions should be the same. If the programmed version is older than expected, an event is logged to the SEL.
4.9.5 DS1307 RTC (Real-Time Clock) Test
This test is targeted to verify the functionality of DS1307 RTC chip. This test displays the date/time settings from the RTC and validates the readings. If any readings are found to be non-BCD format, an event is logged to the SEL. This test also captures current time, sleeps a while, and compares the previously captured time and new time. If they differ, it means the R TC is working. If not, an event is logged to the SEL.
Built-In Self Test (BIST)
4.9.6 NIC Presence/Local PCI Bus Test
This test generates the PCI bus transaction by scanning the PCI buses available on the board. This test detects the two Ethernet devices and verifies each device has the valid Vendor ID and Device ID in the PCI configuration space. NIC internal self-test is not performed here, as the self-test is executed when loading the Ethernet driver.
4.9.7 OS Image Checksum Test
This test is targeted to verify the OS image st ored in the flash is not corrupted. This test calculates the CRC32 checksum from the OS image, and then compares it with the image checksum stored in the FIS directory. If one mismatches another, BIST will log an error event to the SEL.
4.9.8 CRC32 Checksum
CRC32 is the 32-bit version of Cyclic Redundant Check technique, which is designed to ensure the bits validity and integrity within the data. It first generates the diffusion table, which consists of 256 entries of double-word; each entry is known as a unique diffusion code. The checksum calculation is started by fetching the first byte in data buffer, exclusive-OR with the temporary checksum value. The resulting value is AND-ed with 0xFF to restrict an index from 0 to 255 (decimal). That index is used to fetch a new diffusion code from the table. Next, the newly fetched diffusion code is exclusive-OR with the most significant 24 bits of the temporary checksum value (effectively 8 bits left-shifting the checksum value). The resulting value is the new temporary checksum value. The calculation process is repeated until the last byte in the data buffer. The final temporary checksum value becomes the final checksum value.
MPCMM0001 Chassis Management Module Software Technical Product Specification 37
Built-In Self Test (BIST)
4.9.9 IPMB Bus Busy/Not Ready Test
The objective of the test is to identify any potential FPGA lockup before loading the BlueCat.
When the FPGA is detected to be locked up, an event indicating which bus actually failed is logged into the Event log.
38 MPCMM0001 Chassis Management Module Software Technical Product Specification
Re-enumeration

Re-enumeration 5

5.1 Overview

The Chassis Management Module has the ability to re-enumerate devices in the chassis in the event that the chassis loses and then regains CMM management. This allows the CMM to query information on all devices in the chassis on startup if there are no active CMMs in that chassis already containing that information from which it can receive via a regular synchronization. This is achieved without having to restart the individual blades already present in the chassis.
Re-enumeration provides a way to recover from situations such as double failures where both the CMMs have failed or been accidentally removed from the chassis. For the CMM to identify the contents of the chassis, it first determines if it should do this function. The Standby CMM does not re-enumerate its information and relies on the information synchronized from the Active CMM in case a failover occurs. After the startup, the Active CMM determines what Entities are present. Then for each of these Entities, the CMM queries it to get state and other information to be able to properly manage the Entity as well as the entire chassis. The CMM stays in M2 state unt il re­enumeration is complete.
The CMM re-enumeration process obtains the following information for each FRU in the chassis:
—Presence — M-State — Power Usage — Sensor Data Records — Health Events — Board EKey Usage — Bused EKey Usage

5.2 Re-enumeration on Failover

In case of forced failover, the newly Active CMM will do re-enumeration if following conditions are satisfied:
Re-enumeration has not completed on the Active CMM.
Active CMM has not yet synchronized the re-enumerated data over to the Standby CMM.
In case the newly Active CMM has to do re-enumeration, it will switch to M2 state before starting re-enumeration. The Blue LED uses long blinks to provide visual indication of the state of the CMM. It is recommended that the Entities in the chassis be not activated or deactivated while re­enumeration is in progress.
MPCMM0001 Chassis Management Module Software Technical Product Specification 39
Re-enumeration

5.3 Re-enumeration of M5 FRU

If, during re-enumeration, the CMM discovers that a FRU is requesting for deactivation (State M5), it denies the request and informs the FRU to go back to Active (M4) state if there is no frucontrol script present (refer to Section 18.5, “FRU Control Script” on page 169). Otherwise, the CMM executes the frucontrol script and lets it handle the deactivation of the FRU.

5.4 Resolution of EKeys

During re-enumeration, the CMM determines the status of EKeys of the Boards present in the chassis. If there are interfaces which can be enabled with respect to other end-point, the CMM completes the EKeying process as per Section 14.1. If there are EKeys enabled to a slot but CMM was unable to discover a Board in that slot, it assumes that the Board in that slot is in M7 (Communication Lost) state.

5.5 Events Regeneration

The Re-enumeration agent sends out the "Set Event Receiver" command to all the Entities in the chassis. On receiving the command, the Entities re-arm event generation for all their internal sensors. This will cause them to transmit the event messages that they have based on the current event conditions. These events will be logged in the SEL.
Note: The regeneration of events may cause events to be logged into the SEL twice. This could result in
configured eventaction scripts running twice. During the process of identifying the chassis content, once the CMM determines that the Entity is a
fantray, it automatically sets the fan speeds to the critical level. The speeds are not brought back to normal level until it has determined that there are no thermal events in the chassis.
40 MPCMM0001 Chassis Management Module Software Technical Product Specification
Process Monitoring and Integrity

Process Monitoring and Integrity 6

6.1 Overview

The Chassis Management Module monitors the general healt h of pr ocesses running on the CMM and can take recovery actions upon detection of failed processes. This is handled by the Process Monitoring Service (PMS).
Upon detecting unhealthy processes, the PMS will take a configurable recovery action. Examples of recovery actions include restarting the process, failing over to the standby CMM, etc.
The PMS itself is also monitored to ensure that it is operating correctly. The PMS is monitored in both a single CMM configuration and a redundant CMM configuration. When faults are detected in the PMS, corrective actions are taken.
The PMS also provides dynamic configuration and status information through the CLI, RPC, and SNMP interfaces. For example, users can administratively lock/disable monitoring of a process while the PMS is running to suit their particular needs. The PMS also provides static configuration to allow customers the ability to tune the static system parameters for the given platform. Examples of these parameters may include monitoring interval, retries, and ramp-up times.
6.1.1 Process Existence Monitoring
Process existence monitoring utilizes the operating system's process table to determine the existence of the process. When the CMM software is started, the PMS initializes and determines the set of processes to monitor for process existence. The PMS periodically queries the operating system for the existence of that set of processes. When a monitored process is found not to exist, the PMS will generate a SEL entry and take a recovery action.
Process existence monitoring can be utilized on all permanent processes (processes which exist for the life of the CMM software as a whole). It is particularly useful when monitoring processes that were not specifically developed for running on the CMM. Applications that are provided by the operating system vendor are examples of these types of processes. For the Linux* operating system, processes like syslogd and crond would be good examples.
6.1.2 Thread Watchdog Monitoring
Thread watchdog monitoring requires that the process being monitored notifies the PMS of its continued operation. Notifying the PMS will allow the PMS to monitor the process for existence and conditions where a process locks-up. Each thread requiring monit oring within a process using the thread watchdog will register with the PMS. The PMS will loop through its list of registered threads and determine if the set of registered threads are operating. When any thread is determined to be unresponsive (i.e., not notifying the PMS of its continued operation), the PMS will generate a SEL entry and take a recovery action.
Thread watchdog monitoring can be used on all processes that are instrumented with the PMS thread watchdog API. It provides more functionality then process existence monitoring and can be used in conjunction with process integrity monitoring to provide a comprehensive solution. Thread
MPCMM0001 Chassis Management Module Software Technical Product Specification 41
Process Monitoring and Integrity
watchdog monitoring is relatively lightweight and can be done every second, although, the process being monitored may dictate a (much) lower frequency depending on how often it is capable of feeding the watchdog.
6.1.3 Process Integrity Monitoring
The Process Integrity Executable (PIE) will be responsible for determining the health of process or processes. When a PIE finds an unhealthy process, it will notify the PMS of the errant process so that the PMS can take the appropriate action. An example of a PIE would be one that monitored the Simple Network Management Protocol (SNMP) process. The PIE could utilize SNMP get operations to query the SNMP process. If the SNMP process cannot respond to the queries with the appropriate information, the process would be considered unhealthy and the PIE would notify the PMS.
Process integrity monitoring may be used in conjunction with existence monitoring to provide a comprehensive solution.

6.2 Processes Monitored

Below is a list of processes that are monitored for Process Existence on the CMM by the Process Monitoring Service.
Table 5. Processes Monitored
Process Monitored
CMM Wrapper Process ./WrapperProcess 23 PmsProc 23
CMM Wrapper Process ./WrapperProcess 255 PmsProc50
SNMP Daemon CLI Server ./cli_svr PmsProc52 Existence
Cron Daemon /bin/crond PmsProc100 Existence Inet Daemon xinetd -stayalive -reuse PmsProc101 Existence Syslog Daemon /sbin/syslogd PmsProc102 Existence CMM Command
Handler CMM Blade Process
Manager CMM Wrapper Process
[0-39] Pms Monitor ./PmsMonitor PmsProc3 Existence and TWL Pms Shadow ./PmsMonitor shadow PmsProc2 Existence and TWL
Process Command Line / Process Name
/usr/sbin/snmpd -c /etc/ snmpd.conf
./cmd_hand PmsProc53 Existence
./BPM PmsProc54
./WrapperProcess[#] (0-39)
Target Name
PmsProc51
PmsProc[#] (60-99)
Monitoring Level
Existence and Integrity
Existence and Integrity
Existence and Integrity
Existence and Integrity
Integrity

6.3 Process Monitoring Targets

The following targets are provided for the Process Monitoring Service under the cmm location:
42 MPCMM0001 Chassis Management Module Software Technical Product Specification
PmsGlobal
Target for PMS global data
PmsProc[#]
Target for each process monitored
PmsPie[#]
Target for each PMS PIE
Use the following CLI command to view the targets for the processes being monitored.
cmmget -l cmm -d listtargets
The particular processes being monitored will be listed (e.g., PmsProc23, PmsProc100). To view the name of the process being monitored use the following example command:
cmmget -l cmm -t PmsProc34 -d ProcessName
T able 5, “Processes Monitored” contai ns the list of processes monito red and the command lines
and the target names. The ProcessName dataitem will return the Process Command Line.

6.4 Process Monitoring Dataitems

Process Monitoring and Integrity
The following dataitems are used to retrieve information on and configure the Process Monitoring Service (used with PmsGlobal or PmsProc[#] targets on the cmm location).
AdminState
RecoveryAction
EscalationAction
ProcessName
OpState
More information on the usage and descriptions of these dataitems can be found in Section 8, “The
Command Line Interface (CLI)” on page 71.
6.4.1 Examples
The following example will set the global PMS AdminState to locked:
cmmset -l cmm -t PmsGlobal -d AdminState -v 2
The following example will get the recovery action assigned to a monitored process:
cmmget -l cmm -t PmsProc34 -d RecoveryAction
The following example will get the admin state to a PIE: cmmget -l cmm -t PmsPie176 -d AdminState
MPCMM0001 Chassis Management Module Software Technical Product Specification 43
Process Monitoring and Integrity

6.5 SNMP MIB Commands

SNMP commands are implemented in the CMM mib for Process Monitoring. The list of new commands can be found in the CMMs MIB file or in Section 17, “SNMP” on page 140.

6.6 Process Monitoring CMM Events

The “Process Monitoring Service” sensor types are used to assert and de-assert process status information such as process presence not detected, process recovery failure, or recovery action taken. See Section 11.4, “List of Possible Health Event Strings” on page 108 for event strings, codes, and severities for Process Monitoring.
Event severities are configurable by the user and are unique to the process being monitored. The processes that are monitored and their default severities are listed below. Severities are
configured (while PMS is not running) by changing the ProcessSeverity field in the configuration file (pms.ini). Values for severity: 1 = minor, 2 = major, 3 = critical.
./WrapperProcess 23
ProcessSeverity = 2
./WrapperProcess 255
ProcessSeverity = 2
/usr/sbin/snmpd -c /etc/snmpd.conf
ProcessSeverity = 2
./cli_svr
ProcessSeverity = 2
/bin/crond
ProcessSeverity = 2
xinetd -stayalive -reuse
ProcessSeverity = 2
/sbin/syslogd
ProcessSeverity = 1
./PmsMonitor
ProcessSeverity=2
./PmsMonitor shadow
ProcessSeverity=2
./WrapperProcess0 through ./WrapperProcess39
ProcessSeverity=2
./cmd_hand
ProcessSeverity=3
./BPM
44 MPCMM0001 Chassis Management Module Software Technical Product Specification
ProcessSeverity=3
Note: The recovery action and escalation action should not be set to "no action" for the xinetd process.
This process is involved in data synchronization between the CMMs.
Note: When a user tries to change the recovery action for cmd_hand or BPM to values other than allowed
via the CLI API, the error string displayed is:
"Recovery action not allowed for this target."

6.7 Failure Scenarios and Eventing

This section describes the process fault scenarios that are detected and handled by the PMS. It also describes the eventing that is associated with the detection and recovery mechanisms. Each scenario contains a brief textual description and a table that further describes the scenario.
In the table, the Description column outlines the current action. The Event Type String defines the text for the event that is written to the SEL. The text in this field describes the portion of the event containing event-specific string (the remainder of the event text is standard for all events). However, for PMS the target name (sensor name) will be PmsProc<#> instead of the name of the sensor (where # is the unique identifier of the given process).
Process Monitoring and Integrity
The UID indicates the unique identifier for the process causing the event. An ID of 1 indicates the monitoring service itself (global) and an ID of # indicates an application process.
The Assert column indicates if the event is asserted or de-asserted. For items that are just written to the SEL for informational purposes, the assertion state is not applicable. However, it is required by the interface and therefore it will be set to de-assert.
The Severity column will define the severity of the event. A severity of Configure indicates that the severity is configurable. The configurable severities are available in the Configuration Database. The remaining columns (SNMP traps, health events, LEDs, and telecommunication alarms) define what indicator will be triggered by the event.
6.7.1 No Action Recovery
In this scenario PMS detects a process fault. The PMS is configured to take no action and therefore disables monitoring of the process.
MPCMM0001 Chassis Management Module Software Technical Product Specification 45
Process Monitoring and Integrity
Table 6. No Action Recovery
Description Event String UID Asse rt Severity
PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.
The recovery action specified is "no action".
No attempt will be made to recover the process. The PMS will stop monitoring the process.
See Section 6.7.11, “Process
Administrative Action” on page 53, for
information about how to re-enable monitoring and de-assert the event.
Process existence fault; attempting recovery or
Thread watchdog fault; attempting recovery or
Process integrity fault; attempting recovery
Take no action specified for recovery
Process existence fault; monitoring disabled or
Thread watchdog fault; monitoring disabled or
Process integrity fault; monitoring disabled
6.7.2 Successful Restart Recovery
In this scenario PMS detects a process fault. The configured recovery action is: restart the process. The PMS is able to successfully recover the process by restarting it.
Table 7. Successful Restart Recovery
Description Event String UID Asse rt Severity
PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.
The recovery action specified is "process restart".
PMS was successfully able to restart the process
Process existence fault; attempting recovery or
Thread watchdog fault; attempting recovery or
Process integrity fault; attempting recovery
Attempting process restart recovery action
Recovery successful # De-assert OK
# Assert Configure
# N/A Configure
# Assert Configure
# Assert Configure
# N/A Configure
46 MPCMM0001 Chassis Management Module Software Technical Product Specification
Process Monitoring and Integrity
6.7.3 Successful Failover/Restart Recovery
In this scenario PMS detects a process fault. The configured recovery action is: failover to the standby CMM and then restart the failed process. The PMS is able to successfully recover the process by restarting it.
Table 8. Successful Failover/Restart Recovery
Description Event String UID Assert Severity
PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.
The recovery action specified is "failover and restart".
PMS executes a failover. Note this step is skipped when
running on the standby CMM.
PMS was successfully able to restart the process
Note PMS will execute this step even if the failover is unsuccessful (standby not available, unhealthy, etc.).
Process existence fault; attempting recovery or
Thread watchdog fault; attempting recovery or
Process integrity fault; attempting recovery
Attempting process failover & restart recovery action
The existing code generates the events for failover. They are separate from process monitoring events and are not described here.
Recovery successful # De-assert OK
# Assert Configure
# N/A Configure
-N/A N/A
MPCMM0001 Chassis Management Module Software Technical Product Specification 47
Process Monitoring and Integrity
6.7.4 Successful Failover/Reboot Recovery
In this scenario, PMS detects a process fault. The configured recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The recovery actions are successful.
Table 9. Successful Failover/Reboot Recovery
Description Event String UID Asse rt Severity
PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.
The recovery action specified is "failover & reboot"
PMS executes a failover. Note this step is skipped when
running on the standby CMM.
PMS is running on the standby CMM (failover was successful or already running on the standby), PMS recovers the CMM by rebooting.
Upon initialization of PMS after the reboot. The monitor will de-assert the event.
Process existence fault; attempting recovery or
Thread watchdog fault; attempting recovery or
Process integrity fault; attempting recovery
Attempting failover & reboot recovery action
The existing code generates the events for failover. They are separate from process monitoring events and are not described here.
Monitoring initialized # De-assert OK
# Assert Configure
# N/A Configure
-N/A N/A
6.7.5 Failed Failover/Reboot Recovery, Non-Critical
In this scenario, PMS is running on the active CMM and detects a monitored process fault. The severity of the process is configured to a value that is not critical. The configured recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The failover recovery action is unsuccessful (standby is not available, etc.). The process being monitored is not of a critical severity and therefore the reboot of the CMM will not be performed.
48 MPCMM0001 Chassis Management Module Software Technical Product Specification
Table 10. Failed Failover/Reboot Recovery, Non-Critical
Description Event String UID Assert Severity
Process Monitoring and Integrity
PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.
The recovery action specified is "failover & reboot"
PMS executes a failover
PMS detects that it is still running on the active CMM. The process is not critical and therefore the reboot operation will not be performed.
No attempt will be made to recover the process. The PMS will stop monitoring the process.
See Section 6.7.11, “Process
Administrative Action” on page 53, for
information about how to re-enable monitoring and de-assert the event.
Process existence fault; attempting recovery or
Thread watchdog fault; attempting recovery or
Process integrity fault; attempting recovery
Attempting failover & reboot recovery action
The existing code generates the events for failover. They are separate from process monitoring events and are not described here.
Failover & reboot recovery failure # N/A Configure
Process existence fault; monitoring disabled or
Thread watchdog fault; monitoring disabled or
Process integrity fault; monitoring disabled
6.7.6 Failed Failover/Reboot Recovery, Critical
# Assert Configure
# N/A Configure
-N/A N/A
# Assert Configure
In this scenario, PMS is running on the active CMM and detects a monitored process fault. The severity of the process is configured to be critical. The configured recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The failover recovery action is unsuccessful (standby is not available, etc.). The process being monitored is of a critical severity and therefore the reboot of the CMM will be performed.
MPCMM0001 Chassis Management Module Software Technical Product Specification 49
Process Monitoring and Integrity
Table 11. Failed Failover/Reboot Recovery, Critical
Description Event String UID Asse rt Severity
PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.
The recovery action specified is "failover & reboot"
PMS executes a failover.
PMS detects that it is still running on the active CMM. The process is critical and therefore the reboot operation is performed.
Upon initialization of PMS after the reboot. The monitor will de-assert the event.
Process existence fault; attempting recovery or
Thread watchdog fault; attempting recovery or
Process integrity fault; attempting recovery
Attempting failover & reboot recovery action
The existing code generates the events for failover. They are separate from process monitoring events and are not described here.
Monitoring initialized # De-assert OK
# Assert Configure
# N/A Configure
-N/A N/A
6.7.7 Excessive Restarts, Escalate No Action
In this scenario PMS detects a process fault. The configured recovery action is: restart the process. However, the PMS also detects that the process has exceeded the threshold for excessive process restarts. Therefore, the PMS will execute the escalation action. The escalation action is configured for no action.
Table 12. Existence Fault, Excessive Restarts, Escalate No Action (Sheet 1 of 2)
Description Event String UID Asse rt Severity
PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.
The recovery action specified is "process restart"
Process existence fault; attempting recovery or
Thread watchdog fault; attempting recovery or
Process integrity fault; attempting recovery
Attempting process restart recovery action
# Assert Configure
# N/A Configure
50 MPCMM0001 Chassis Management Module Software Technical Product Specification
Process Monitoring and Integrity
Table 12. Existence Fault, Excessive Restarts, Escalate No Action (Sheet 2 of 2)
Description Event String UID Assert Severity
PMS detects that the process has been restarted excessively.
PMS attempts to execute the escalated recovery action. Since the recovery action is "no action", PMS disables monitoring of the process.
No attempt will be made to recover the process. The PMS will stop monitoring the process.
See Section 6.7.11, “Process
Administrative Action” on page 53, for
information about how to re-enable monitoring and de-assert the event.
Recovery failure due to excessive restarts
Take no action specified for escalated recovery
Process existence fault; monitoring disabled or
Thread watchdog fault; monitoring disabled or
Process integrity fault; monitoring disabled
# N/A Configure
# N/A Configure
# Assert Configure
6.7.8 Excessive Restarts, Successful Escalate Failover/Reboot
In this scenario PMS detects a process fault. The configured recovery action is: restart the process. However, the PMS also detects that the process has exceeded the threshold for excessive process restarts. Therefore, the PMS will execute the escalation action. The configured escalation recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The escalated recovery action is successful.
Table 13. Excessive Restarts, Successful Escalate Failover/Reboot
Description Event String UID Assert Severity
PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.
The recovery action specified is "restart process"
PMS detects that the process has been restarted excessively.
The escalated recovery action specified is "failover and reboot"
PMS executes a failover. Note this step is skipped when
running on the standby CMM.
PMS is running on the standby CMM (failover was successful or already running on the standby), PMS recovers the CMM by rebooting.
Upon initialization of PMS after the reboot. The monitor will de-assert the event.
Process existence fault; attempting recovery or
Thread watchdog fault; attempting recovery or
Process integrity fault; attempting recovery
Attempting process restart recovery action
Recovery failure due to excessive restarts
Attempting failover & reboot escalated recovery action
The existing code generates the events for failover. They are separate from process monitoring events and are not described here.
Monitoring initialized # De-assert OK
# Assert Configure
# N/A Configure
# N/A Configure
# N/A Configure
-N/A N/A
MPCMM0001 Chassis Management Module Software Technical Product Specification 51
Process Monitoring and Integrity
6.7.9 Excessive Restarts, Failed Escalate Failover/Reboot, Non­Critical
In this scenario PMS detects a process fault. The severity of the process is configured to a value that is not critical. The configured recovery action is: restart the process. However, the PMS also detects that the process has exceeded the threshold for excessive process restarts. Therefore, the PMS will execute the escalation action. The configured escalation recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The failover recovery action is unsuccessful (standby is not available, etc.). The process being monitored is not of a critical severity and therefore the reboot of th e CMM will not be performed.
Table 14. Excessive Restarts, Failed Escalate Failover/Reboot, Non-Critical
Description Event String UID Asse rt Severity
PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.
The recovery action specified is "restart process"
PMS detects that the process has been restarted excessively.
The escalated recovery action specified is "failover and reboot"
PMS executes a failover.
PMS detects that it is still running on the active CMM. The process is not critical and therefore the reboot operation will not be performed.
No attempt will be made to recover the process. The PMS will stop monitoring the process.
See Section 6.7.11, “Process
Administrative Action” on page 53, for
information about how to re-enable monitoring and de-assert the event.
Process existence fault; attempting recovery or
Thread watchdog fault; attempting recovery or
Process integrity fault; attempting recovery
Attempting process restart recovery action
Recovery failure due to excessive restarts
Attempting failover & reboot escalated recovery action
The existing code generates the events for failover. They are separate from process monitoring events and are not described here.
Failover & reboot escalated recovery failure
Process existence fault; monitoring disabled or
Thread watchdog fault; monitoring disabled or
Process integrity fault; monitoring disabled
# Assert Configure
# N/A Configure
# N/A Configure
# N/A Configure
-N/A N/A
# N/A Configure
# Assert Configure
6.7.10 Excessive Restarts, Failed Escalate Failover/Reboot, Critical
In this scenario, PMS detects a process fault. The severity of the process is configured as critical. The configured recovery action is: restart the process. However, the PMS also detects that the process has exceeded the threshold for excessive process restarts. Therefore, the PMS will execute the escalation recovery action. The configured escalation recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The failover
52 MPCMM0001 Chassis Management Module Software Technical Product Specification
Process Monitoring and Integrity
recovery action is unsuccessful (standby is not available, etc.). The process being monitored is of critical severity and therefore the reboot of the CMM will still be executed even though the CMM is still active.
Table 15. Excessive Restarts, Failed Escalate Failover/Reboot, Critical
Description Event String UID Assert Severity
PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.
The recovery action specified is "restart process"
PMS detects that the process has been restarted excessively.
The escalated recovery action specified is "failover and reboot"
PMS executes a failover.
PMS detects that it is still running on the active CMM. The process is critical and therefore the reboot operation is performed.
Upon initialization of PMS after the reboot. The monitor will de-assert the event.
Process existence fault; attempting recovery or
Thread watchdog fault; attempting recovery or
Process integrity fault; attempting recovery
Attempting process restart recovery action
Recovery failure due to excessive restarts
Attempting failover & reboot escalated recovery action
The existing code generates the events for failover. They are separate from process monitoring events and are not described here.
Monitoring initialized # De-assert OK
6.7.11 Process Administrative Action
# Assert Configure
# N/A Configure
# N/A Configure
# N/A Configure
-N/A N/A
In this scenario, PMS has detected a fault in a process, but has not been able to recover the process (recovery is configured for no action, etc.). This causes PMS to operationally disable monitoring of the process. To re-enable monitoring of the process, an operator must administratively lock the process, take the necessary actions to fix the process, and administratively unlock the process.
Table 16. Administrative Action
Description Event String UID Assert Severity
Operator administratively locks monitoring of the process
Operator takes actions to fix the problem
Operator administratively unlocks monitoring of the process causing monitoring to restart
None - N/A N/A
N/A - N/A N/A
Monitoring initialized # De-assert OK
MPCMM0001 Chassis Management Module Software Technical Product Specification 53
Process Monitoring and Integrity
6.7.12 Excessive Failover/Reboots, Administrative Action
Prior to executing any failover/reboot the PMS will determine if the failover/reboot threshold has been exceeded. If it has, the PMS will be operationally disabled. When PMS is disabled, all process monitoring is halted. To re-enable the PMS, the operator must lock the global administrative state. The operator can then fix the problem and administratively unlock the global administrative st ate.
The following events are generated against the PMS Monitor (unique ID 1). The events for the process or processes that caused this condition to occur will also be present, but are not described in this table. They are defined in the scenarios provided above.
Table 17. Excessive Failover/Reboots, Administrative Action
Description Event String UID Asse rt Severity
PMS detects excessive failover/ reboots
Operator locks the global administrative state
Operator takes actions to fix the problem
Operator unlocks the global administrative state causing monitoring to be resumed
a. The "Monitoring initialized" will be generated for the monitor (unique 1) as well as the individual processes that are admin-
istratively unlocked.
Excessive reboots/failovers; all process monitoring disabled
None - N/A N/A
N/A - N/A N/A
Monitoring initialized 1#
1 Assert Major
a
De-assert OK

6.8 Process Integrity Executable (PIE)

The Process Integrity Executable (PIE) for the Chassis Management Module’s (CMM) Blade Proxy Manager (BPM) and Wrapper Processes is responsible for determining the health of the Wrapper Processes. Monitoring the integrity means not only monitoring the fact that the process is running but that it is functioning properly.
The PIE will monitor the BPM, CMM Wrapper Process (Wrapper Process num ber 255) and Chassis Wrapper Processes (23). It will also monitor the Wrapper Processes for intelligent (have a management controller) blades, power supplies, and fans. Wrapper Processes for non-intelligent devices will not be monitored.
PIE will monitor the BPM and Wrapper Processes. The Wrapper Processes have two categories for integrity monitoring. The first category contains the static processes. Static processes are processes that are always present while the CMM software is running. The CMM (255) and chassis (23) Wrapper Processes are the static processes. The second category contains all the dynamic Wr apper Processes. Dynamic processes are ones that come and go as the configuration of the chassis changes (such as a blade insertion or removal). The fan, power supply, and blade Wrapper Processes belong to the dynamic category.
54 MPCMM0001 Chassis Management Module Software Technical Product Specification

6.9 Configuring pms.ini

The pms.ini file is the Process Monitoring Service (PMS) and Process Integrity Exectuable (PIE) configuration file. It contains all of the non-volatile configuration data for the service. This file can be found in the /etc/cmm directory on the CMM. It is an ASCII based text file that can be edited with vi or any other text editor.
Note: Any changes made to the pms.ini file will be overwritten during a firmware update. Care should be
made to preserve the file or any changes before a firmware update is done so that the file and changes can be restored following the update.
The dynamic data fields (except the AdminStates) in this file will be replicated to the standby CMM via the CMM Data Synchronization Service. If invalid data is provided for a particular field (i.e. out of range), the default value, if one exists, will be used.
If invalid data is provided for a particular field (i.e. out of range), the default value, if one exists, will be used. If a default value is not possible, that entire section not be used. For example, PmsProcess012 will be ignored if no value is given for its CommandLine.
Database changes are classified in two categories: dynamic and static. Dynamic changes are initiated by an interface (RPC, CLI, or SNMP). The change will take effect in the PMS and the data in this file will be updated. Dynamic changes can be made while the PMS is running.
Process Monitoring and Integrity
Static changes are made directly to this file and must be done while the PmsMonitor is not running.
6.9.1 Global Data
This data applies to the PMS as a whole (not specific to a process). There must be one and only one set of this data.
6.9.1.1 PMS Administrative State
The PMS administrative state determines if monitoring of all processes will be allowed. Values: 1 - unlocked (enabled), 2 - locked. Default: 1. (dynamic)
AdminState = 1
6.9.1.2 PMS Excessive Reboot/Failover Cou nt
The maximum number of reboots or failover attempts allowed (over the interval specified in the field below).
Values: 2 - 255. Default: 3.
ExcessiveRebootOrFailoverCount = 3
6.9.1.3 PMS Excessive Reboot/Failover Interval
The interval, in seconds, over which the maximum number of reboots/failovers will be measured. Values: 1 - 65535. Default: 900.
MPCMM0001 Chassis Management Module Software Technical Product Specification 55
Process Monitoring and Integrity
ExcessiveRebootOrFailoverInterval = 21600
6.9.2 Process Specific Data
This data applies to a specific process running on the CMM. There will be one set of this data for each process.
The following information describes each of the fields in the process specific section.
6.9.2.1 Process Section Name
The section name MUST follow the pattern "PmsProcessXXX" where XXX is a number from 010 to 175 inclusive. PmsProcess section names must be unique but are NOT significant in any other way. Specifically, they are NOT required to match the UniqueID field for the section.
[PmsProcess151]
6.9.2.2 Unique ID
This is a unique identifier for the program and its arguments. It is essentially the short version of the "Process Name and Arguments" field above.
Values:
0 = Reserved 1 = PMS Monitor (global) 2 = PMS Shadow 3 = PMS Monitor (for shadow monitoring) 4-9 = Reserved 10-150 = CMM processes 151-175 = User processes 176-200 = PIEs 201-255 = Reserved Default: None.
UniqueID = 151
6.9.2.3 Chassis Applicability
This is a list of chassis types for which this particular Uid is valid. The list is comma delimited. Spaces are ignored. If this key is not present, then the Uid is valid on all chassis.
Values: MPCHC0001, ZT5085, ZT5 088, ZT5089, ZT5090, ZT5091.
ChassisApplicability = MPCHC0001, ZT5085, ZT5088, ZT5089, ZT5090, ZT5091
6.9.2.4 Process Name and Arguments
This string contains the program name including its path and its associated command line arguments. This field will be used to monitor a program and therefore must be an exact match to how the program is represented in the OS. The program name and command line arguments are
56 MPCMM0001 Chassis Management Module Software Technical Product Specification
space separated with the program name being the first entry in the string. If an individual argument contains spaces, the argument must be encapsulated in quotation marks. The program name and arguments will uniquely identify the entry. This means if the same program is started multiple times with different arguments, each of them will require a separate entry.
Values: N/A. Default: None.
CommandLine = MyProcess -x -y
6.9.2.5 Start Program Name and Arguments
This is the program name and arguments used to start the program. This differs from the monitoring program name and arguments because some programs are started via scripts. For example many Linux system programs are started via startup scripts located in the "init.d" directory.
Values: N/A. Default: None.
StartCommandLine = MyProcess -x -y
6.9.2.6 Administrative State
Process Monitoring and Integrity
The process administrative state determines if the process will be monitored. Values: 1 - unlocked (enabled), 2 - locked. Default: 1. (dynamic)
AdminState = 1
6.9.2.7 Process Existence Interval
This is the interval in seconds in which to verify that a process exists. A value of 0 disables Existence Monitoring.
Values: 0 - 65535. Default: 2.
ProcessExistenceInterval = 2
6.9.2.8 Thread Watchdog Retries
This is the number of retries (number of thread watchdog intervals) to wait for notification from a thread. Recovery takes place on retries+1 missed thread watchdog intervals.
Values: 0-10, default: 3.
ThreadWatchdogRetries = 3
6.9.2.9 Process Ramp-up Time
The amount of time in seconds necessary for the process to initialize and be functional. Values: 0-255. Default: 60.
ProcessRampUpTime = 3
MPCMM0001 Chassis Management Module Software Technical Product Specification 57
Process Monitoring and Integrity
6.9.2.10 Process Severity
An indicator for the importance of a given process. This severity will determine at what level SEL entries are generated and when reboots should occur on an active CMM.
Values: 1 = minor, 2 = major, 3 = critical. Default: 1.
ProcessSeverity = 1
6.9.2.11 Recovery Action
This is the recovery action to take upon detection of a failed process. Values: 1 = no Action, 2 = process restart, 3 = failover and process restart, 4 = failover and reboot.
Default: 1. (dynamic)
RecoveryAction = 1
6.9.2.12 Process Restart Escalation Action
This determines the action to take if the RecoveryAction includes "process restart" and it fails. Values: 1= no action, 2 = failover and reboot. Default: 1. (dynamic)
ProcessRestartEscalationAction = 1
6.9.2.13 Process Restart Escalation Number
This is the number of process restarts that are allowed (within the interval specified below) before escalation starts.
Values: 1 - 255. Default: 5.
ProcessRestartEscalationNumber = 5
6.9.2.14 Process Restart Escalation Interval
This is the interval in seconds at which the number of restarts will be limited (see above). Values: 1 - 65535. Default: 900.
ProcessRestartEscalationInterval = 900
6.9.3 Process Definition Section of pms.ini
The following sections describe and give examples of each of the process types that are defined in the pms.ini file.
6.9.3.1 Shadow Process
Shadow process must exist to monitor "Monitor Process". Therefore this process should never have a recovery action of "no action".
58 MPCMM0001 Chassis Management Module Software Technical Product Specification
[PmsProcess002]
UniqueID = 2
CommandLine = ./PmsMonitor shadow
StartCommandLine = ./PmsMonitor shadow
AdminState = 1
ProcessExistenceInterval = 2
ThreadWatchdogRetries = 5
ProcessRampUpTime = 5
ProcessSeverity = 2
RecoveryAction = 2
ProcessRestartEscalationAction = 1
ProcessRestartEscalationNumber = 10
ProcessRestartEscalationInterval = 4800
Process Monitoring and Integrity
6.9.3.2 Monitor Process
This process must exist to monitor all other processes. Therefore this process should never have a recovery action of "no action".
[PmsProcess003]
UniqueID = 3
CommandLine = ./PmsMonitor
StartCommandLine = ./PmsMonitor
AdminState = 1
ProcessExistenceInterval = 2
ThreadWatchdogRetries = 5
ProcessRampUpTime = 5
ProcessSeverity = 2
RecoveryAction = 2
ProcessRestartEscalationAction = 1
ProcessRestartEscalationNumber = 10
ProcessRestartEscalationInterval = 4800
MPCMM0001 Chassis Management Module Software Technical Product Specification 59
Process Monitoring and Integrity
6.9.3.3 Chassis Wrapper Process
[PmsProcess023]
UniqueID = 23
CommandLine = ./WrapperProcess 23
StartCommandLine = ./WrapperProcess 23
AdminState = 1
ProcessExistenceInterval = 2
ProcessRampUpTime = 10
ProcessSeverity = 2
RecoveryAction = 2
ProcessRestartEscalationAction = 2
ProcessRestartEscalationNumber = 4
ProcessRestartEscalationInterval = 5400
6.9.3.4 CMM Wrapper Process
This process must exist to execute interface commands (CLI, SNMP, etc.) for the CMM. Therefore this process should never have a recovery action of "no action".
[PmsProcess050]
UniqueID = 50
CommandLine = ./WrapperProcess 255
StartCommandLine = ./WrapperProcess 255
AdminState = 1
ProcessExistenceInterval = 2
ProcessRampUpTime = 10
ProcessSeverity = 2
RecoveryAction = 2
ProcessRestartEscalationAction = 2
ProcessRestartEscalationNumber = 15
ProcessRestartEscalationInterval = 60
60 MPCMM0001 Chassis Management Module Software Technical Product Specification
6.9.3.5 SNMP
[PmsProcess051]
UniqueID = 51
CommandLine = /usr/sbin/snmpd -c /etc/snmpd.conf
StartCommandLine = /usr/sbin/snmpd -c /etc/snmpd.conf
AdminState = 1
ProcessExistenceInterval = 2
ProcessRampUpTime = 30
ProcessSeverity = 2
RecoveryAction = 2
ProcessRestartEscalationAction = 2
ProcessRestartEscalationNumber = 4
ProcessRestartEscalationInterval = 5400
Process Monitoring and Integrity
6.9.3.6 CLI Server
[PmsProcess052]
UniqueID = 52
CommandLine = ./cli_svr
StartCommandLine = ./cli_svr
AdminState = 1
ProcessExistenceInterval = 2
ProcessRampUpTime = 10
ProcessSeverity = 2
RecoveryAction = 2
ProcessRestartEscalationAction = 2
ProcessRestartEscalationNumber = 5
ProcessRestartEscalationInterval = 300
6.9.3.7 Command Handler
Note: PmsProc053 represents a crucial process cmd_hand (command handler) of the CMM software
stack. This process cannot be restarted properly if it terminates unexpectedly. Hence, none of the recovery actions that attempt to restart a process i.e., 2 (Restart), 3 (Failover & Restart) are allowed
MPCMM0001 Chassis Management Module Software Technical Product Specification 61
Process Monitoring and Integrity
as valid recovery actions for cmd_hand. The default recovery action for cmd_hand process is 4 (failover and reboot) and that cannot be changed to anything else. A recovery action of 1 (No Action) is also not allowed because of the severity of the process.
In the event that cmd_hand process terminates unexpectedly, and the default recovery action kicks in, there is 2-3 minute delay before the CMM actually reboots. This is normal and expected because PMS makes multiple tries to failover, and times out because cmd_hand does not respond.
[PmsProcess053]
UniqueID = 53
CommandLine = ./cmd_hand
StartCommandLine = ./cmd_hand
AdminState = 1
ProcessExistenceInterval = 2
ProcessRampUpTime = 10
ProcessSeverity = 3
RecoveryAction = 4
ProcessRestartEscalationAction = 2
ProcessRestartEscalationNumber = 5
ProcessRestartEscalationInterval = 300
6.9.3.8 BPM
Note: PmsProc054 represents a crucial process of the CMM software stack. This process cannot be
restarted properly if it terminates unexpectedly. Hence, none of the recovery actions that attempt to restart a process i.e., 2 (Restart) or 3 (Failover & Restart) are allowed as valid recovery actions for BPM. The default recovery action for BPM process is 4 (failover and reboot) which can only be changed to 1 (No Action).
[PmsProcess054]
UniqueID = 54
CommandLine = ./BPM
StartCommandLine = ./BPM
AdminState = 1
ProcessExistenceInterval = 2
ProcessRampUpTime = 10
ProcessSeverity = 3
RecoveryAction = 4
62 MPCMM0001 Chassis Management Module Software Technical Product Specification
ProcessRestartEscalationAction = 2
ProcessRestartEscalationNumber = 5
ProcessRestartEscalationInterval = 300
6.9.3.9 Dynamic Wrapper Process 0-39
[PmsProcessXYZ]
UniqueID = XYZ
ChassisApplicability = MPCHC0001, ZT5091, MPCHC5091
CommandLine = /usr/local/cmm/bin/WrapperProcess N
StartCommandLine = /usr/local/cmm/bin/WrapperProcess N
AdminState = 1
ProcessExistenceInterval = 0
ProcessRampUpTime = 10
Process Monitoring and Integrity
ProcessSeverity = 2
RecoveryAction = 2
ProcessRestartEscalationAction = 2
ProcessRestartEscalationNumber = 4
ProcessRestartEscalationInterval = 5400
Where: XYZ: A unique number given to the process in the range defined in Section 6.9.2.2, “Unique ID”
on page 56 above (10-150).
N: The number of the wrapper process you are defining. For example, if defining wrap per process 21, then N would be 21.
6.9.3.10 Inet Daemon
[PmsProcess101] UniqueID = 101 CommandLine = xinetd -stayalive -reuse StartCommandLine = xinetd -stayalive -reuse AdminState = 1 ProcessExistenceInterval = 2 ProcessRampUpTime = 5
MPCMM0001 Chassis Management Module Software Technical Product Specification 63
Process Monitoring and Integrity
ProcessSeverity = 2 RecoveryAction = 2 ProcessRestartEscalationAction = 2 ProcessRestartEscalationNumber = 5 ProcessRestartEscalationInterval = 300
6.9.3.1 1 Syslog Daemon
[PmsProcess102] UniqueID = 102 CommandLine = /sbin/syslogd StartCommandLine = /sbin/syslogd AdminState = 1 ProcessExistenceInterval = 2 ProcessRampUpTime = 5 ProcessSeverity = 1 RecoveryAction = 2 ProcessRestartEscalationAction = 2 ProcessRestartEscalationNumber = 5 ProcessRestartEscalationInterval = 300

6.10 Process Integrity Executable (PIE) Specific Data Config

This data applies to each Process Integrity Executable (PIE). One PIE may monitor multiple CMM processes or only one CMM process. There will be one set of this data for each PIE.
The following information describes each of the fields in the PIE specific section. Lines with a '*' prefix, indicate the actual fields (the prefix is not part of the field name).
6.10.1 PIE Section Name
The section name MUST follow the pattern "PmsPieXXX" where XXX is a number from 176 to 200 inclusive. PmsPie section names must be unique but are NOT significant in any other way. Specifically, they are NOT required to match the UniqueID field for the section.
[PmsPie176]
64 MPCMM0001 Chassis Management Module Software Technical Product Specification
6.10.2 Process Integrity Executable
The name, including its path and command line arguments, of the PIE to be executed periodically. This is used to start the program and may, in the future, be used to monitor the program and therefore must be an exact match to how the program is represented in the OS. The program name and command line arguments will all be space separated with the program name being the first entry in the string. If an individual argument contains spaces, the argument must be encapsulated in quotation marks. The program name and arguments will uniquely identify the entry. This means if the same program is started multiple times with different arguments, each of them will require a separate entry. Each PIE will likely have PIE specific options that can be specified through the command line. This options must be included in the arguments to the "ProcessIntegrityExecutable" command.
ProcessIntegrityExecutable = ./PmsPieSnmp
6.10.3 Unique ID
This is a unique identifier for the executable and its arguments. It is essentially the short version of the "Process Integrity Executable" field above. It is used for logging and CSL access.
Values:
Process Monitoring and Integrity
0 = Reserved 1 = PMS Monitor (global) 2 = PMS Shadow 3 = PMS Monitor (for shadow monitoring) 4-9 = Reserved 10-150 = CMM processes 151-175 = User processes 176-200 = PIEs 201-255 = Reserved Default: None.
UniqueID = 176
6.10.4 Administrative State
The PIE administrative state determines if the PIE will be restarted at the next interval. Values: 1 - unlocked (enabled), 2 - locked. Default: 1. (dynamic)
AdminState = 1
MPCMM0001 Chassis Management Module Software Technical Product Specification 65
Process Monitoring and Integrity
6.10.5 Process Integrity Interval
This is the interval in seconds between executions of the PIE. Values: 0 - 65535, where 0 indicates that the PIE only gets executed once. Default: 3600.
ProcessIntegrityInterval = 3600
6.10.6 Chassis Applicability
This is a list of chassis types on which this particular Pie should be run. The list is comma delimited. Spaces are ignored. If this key is not present, then the Pie will run on all chassis.
Values: MPCHC0001, ZT5085, MPCHC5085, ZT5088, MPCHC5088, ZT5089, MPCHC5089, ZT5090, MPCHC5090, ZT5091, MPCHC5091.
ChassisApplicability = MPCHC0001, ZT5085, MPCHC5085, ZT5088, MPCHC5088, ZT5089, MPCHC5089, ZT5090, MPCHC5090, ZT5091, MPCHC5091
6.10.7 PmsPieSnmp Command Line
The command line usage of PmsPieSnmp is: PmsPieSnmp [-f SuccessiveFailureNumber] where:
-f : This is the number of allowed successive integrity failures before the PMS performs recovery on the faulting process. PMS performs recovery on "this number + 1".
Values: 1 - 100. Default = 3 For Example
PmsPieSnmp
PmsPieSnmp -f2
PmsPieSnmp -f 2
6.10.8 SNMP PIE Section of pms.ini
[PmsPie176]
ProcessIntegrityExecutable = ./PmsPieSnmp -f2
UniqueID = 176
AdminState = 1
ProcessIntegrityInterval = 300
66 MPCMM0001 Chassis Management Module Software Technical Product Specification

6.11 WP/BPM PIE

The command line usage of PmsPieWp is: PmsPieWp [-s] [-d[NumberOfDynamicWrappersPerRun]] [-f SuccessiveFailureNumber] where:
-s: check static wrappers (optional)
-d: check dynamic wrappers and bpm threads (optional) Number of dynamic wrappers and bpm threads to check on each run (optional) Values: 0 - 100. Default : 0 = process all dynamic wrappers and bpm threads on each execution.
-f: Successive Failure Number - This is the number of allowed successive integrity failures before the PMS performs recovery on the faulting process. PMS performs recovery on "this number + 1".
Values: 1 - 100. Default = 3
Example:
Process Monitoring and Integrity
PmsPieWp -s -f2 - check static wrappers
PmsPieWp -d0 -f2 - check all dynamic wrappers and all BPM threads
PmsPieWp -s -d0 -f2 - Check static and all dynamic wrappers all BPM threads
PmsPieWp -s -d10 -f2 - Check static and 10 dynamic wrappers and BPM threads
PmsPieWp -s -d10 -f 2 - Check static and 10 dynamic wrappers and BPM threads
6.11.1 WP/BPM Section of pms.ini
[PmsPie180]
ProcessIntegrityExecutable = ./PmsPieWp -s -d0 -f2
UniqueID = 180
AdminState = 1
ProcessIntegrityInterval = 300
MPCMM0001 Chassis Management Module Software Technical Product Specification 67
Power and Hot Swap Management

Power and Hot Swap Management 7

The CMM is responsible for the management of FRU hot-swap activities. The CMM listens to FRU hot-swap SEL messages from IPMI devices and distributes power to each FRU after negotiating with the respective IPMI device fronting the FRU. The CMM also manages the shelf­wide power budget. The CMM also polls IPMI devices to get the status of each FRU fronted by the IPMI device. The CMM uses shelf FRU information to guarantee power-up sequence delays between boards.
Once the CMM receives the shelf FRU information on power budget and power sequence delays, it is ready to service FRU hot-swap requests from respective IPMI devices.

7.1 Hot Swap States

The CMM defines the hot swap status of a FRU as being in one of eight states. CMM documentation often refers to only the letter/number designation of that state (M0 - M7). Here is a list of what each of those states means:
"State M0 - Not Installed"
"State M1 - Inactive"
"State M2 - Activation Request"
"State M3 - Activation In Progress"
"State M4 - Active"
"State M5 - Deactivation Request"
"State M6 - Deactivation In Progress"
"State M7 - Communication Lost"

7.2 FRU Insertion

When the CMM receives a request that a FRU is ready to activate, it will compute the FRU’s power, get the power levels, and check the available power budget.
The Set_Power_Level command will be sent only when the necessary power budget, from each of the redundant power feeds, is available to satisfy FRU's desired power level. If a FRU can't be activated at the time of the request, it should remain in the M3 state and shall be powered up when the necessary power budget becomes available. If the FRU decides to operate at a lower power level and notifies the Shelf Manager and the new power level is within the current Shelf Power envelope, the CMM shall send the Set_Power_Level (new desired level) command to the FRU.

7.3 Graceful FRU Extraction

When the CMM receives a FRU Hot swap request for extraction, the CMM will send the deactivate state command, and the FRU will transition to M6 state and begin its shut-down procedures. Once the FRU has shut down, it transitions to M1 state, and the CMM then reclaims the FRU’s power and adjusts the power budget for the newly available power.
68 MPCMM0001 Chassis Management Module Software Technical Product Specification
Power and Hot Swap Management

7.4 Surprise FRU Extraction/IPMI Failure

The CMM detects a surprise FRU extraction or a failure of the IPMI device fronting the FRU if a device previously in one of the M2-7 states reports a transition to the M2 state. If this scenario is detected, the CMM assumes one of three things has happened:
Surprise extraction and reinsertion of the same (or another) FRU.
IPMI Device fronting the FRU failed, FRU was extracted, then the same (or another) FRU is
reinserted.
Watchdog Timer (WDT) on the IPMI device restarted the IPMI Device firmware.
Once this occurs, the CMM shall reclaim all the resources allocated to that FRU. The CMM will log a SEL message describing the situation, i.e. IPMI device failure or surprise extraction. From this point the CMM shall follow the sequence of actions described in Section 7.2, “FRU Insertion”.

7.5 Forced Power State Changes

An external authorized entity (e.g., a management interface like RMCP) can request FRU power state changes like Power OFF, RESET etc. The CMM is responsible for handling these requests.

7.6 Power Management on the Standby CMM

The standby CMM does not participate in any power management activities in the standby mode. The CMM is in a hot standby state on a standby CMM. The standby CMM starts performing power management activities as soon as it becomes the active CMM.

7.7 Power Feed Targets

The CLI allows certain get and set actions to be taken on power feeds for a location. They include the following dataitems; maxexternalavailablecurrent, maxinternalcurrent, and minexpectedoperatingvoltage. These dataitems are described in Section 8, “The Command Line
Interface (CLI)” on page 71.
To find the number of feed targets, use the command:
cmmget -l cmm -d feedcount
This returns an integer, indicating the number of power feeds. As an example, the MPCHC0001 chassis with four power feeds coming from the PEMs will return
the number 4, meaning there are four feed targets (feed1, feed2, feed3, and feed4). They correlate to the physical feeds on the MPCHC0001as follows:
feed1 = FeedA1 feed2 = FeedB2 feed3 = FeedA2 feed4 = FeedB1
Refer to the chassis documentation for more information on power feeds.
MPCMM0001 Chassis Management Module Software Technical Product Specification 69
Power and Hot Swap Management

7.8 Pinging IPMI Controllers

The following lists the values of time to delay and number of pings that the CMM uses to determine the state of a FRU.
Table 18. Time to Delay and Number of Attempts
Variable Description Value
The number of microseconds to delay between
DelayBetweenPingLoops
DelayBetweenIPMControllerPings
NumberFailedAttemptsBeforeAlert
each ping loop. This is essentially the amount of time from the ping of the last IPMI Controller in the list to the ping of the first controller in the list.
The number of microseconds of delay between the ping on one controller that is in the list and the ping of the next one on the list. This delay does not apply after the last controller in the list.
How many failed attempts to contact the IPMI Controller must occur prior to raising an event that communication has been lost.
10000000 (10 seconds)
0
3
70 MPCMM0001 Chassis Management Module Software Technical Product Specification
The Command Line Interface (CLI)

The Command Line Interface (CLI) 8

8.1 CLI Overview

The Command Line Interface (CLI) connects to and communicates with the intelligent management devices of the chassis, boards, and the CMM itself. The CLI is an IPMI-based library of commands that can be accessed directly or through a higher-level management application. Administrators can access the CLI through a Telnet session, SSH, or through the CMM's front panel serial port. The CLI functions are also available through SNMP get/set commands and an RPC interface. Using the CLI, users can access information about the current state of the system including current sensor values, threshold settings, recent events, and overall chassis health.
Note: The CLI uses the term “blade” when referring to boards.

8.2 Connecting to the CLI

The CMM provides three connections on its front panel.
Two Ethernet connections via an RJ-45 connector
An RS-232 serial port interface also via an RJ-45 connector
These same ports are also available on the rear transition module. Any of these interfaces can be used to log into the CMM as well as the Ethernet interface provided
through the backplane of a chassis. Use Telnet to log into the CMM over an Ethernet connection, or use a terminal application or serial console over the RS-232 interface. See the Intel® NetStructure™ MPCMM0001 Hardware Technical Product Specification for electrical pinouts of the above interfaces.
If logging in for the first time to set up or obtain the CMMs IP addresses, use the serial port console interface to perform configuration.
8.2.1 Connecting through a Serial Port Console
Connect an RS-232 serial cable with an RJ-45 connector to the serial console port on the front of the CMM. Set your terminal application settings as follows:
Baud – 115200
Data Bits – 8
Parity – None
Stop Bits – 1
Flow Control – Xon/Xoff or none
Connect using your terminal application.
MPCMM0001 Chassis Management Module Software Technical Product Specification 71
The Command Line Interface (CLI)
8.3 Initial Setup— Logging in for the First Time
Logging in for the first time must be done through the serial port console to properly configure the Ethernet settings and IP addresses for the network.
The username for the CMM is root. The default password is cmmrootpass. At the login prompt, enter the username: root When prompted for the password, enter: cmmrootpass The root password can be changed using the passwd command. For information on resetting the
CMM password back to default, refer to Section 9, “Resetting the Password” on page 99.
8.3.1 Setting IP Address Properties
Note: Changing any of the IP address settings and restarting the network could result in a failover
occurring based on the rules governing redundancy specified in Section 3, “Redundancy,
Synchronization, and Failover” on page 21.
By default, the CMM assigns IP addresses statically:
eth0, labeled “Ethernet A” on the front panel, is configured with the static IP address
10.90.90.91
eth1, labeled “Ethernet B” on the front panel, is configured with a static IP address of
192.168.100.92
eth1:1, an alias of eth1 is used to always point to and be active on the active CMM, is
configured with a static IP address of 192.168.100.93
On initial power-up of a chassis with two CMMs, both CMMs will have the same IP addresses assigned by default. When the chassis is powered up, the standby CMM automatically decrements its IP address by one less than the active CMM if it detects a conflict.
Example:
1. A dual CMM Chassis is powered up.
2. Active CMM assigns IP address of 192.168.100.92 to eth1 on the acti ve CMM.
3. Standby CMM assigns IP address of 192.168.100.91 to eth1 on the standby CMM.
At this point the static IP addresses must be changed to appropriate values for their network configuration, and ensure that the two CMMs do not contain duplicate IP addresses on eth0 and eth1 to avoid address conflicts on the network.
eth0 and eth1 can also be set using DHCP. eth1:1 will always remain static.When setting both eth0 and eth1 to DHCP, use the /etc/pump.conf to determine which interface should own the default gateway. The default is for eth0 to own the default gateway. To configure eth1 to own the default
72 MPCMM0001 Chassis Management Module Software Technical Product Specification
gateway, and thereby eth1:1, uncomment the two lines under the eth0 section of /etc/pump.conf and comment the two lines under the eth1 section of that file. Save the file and run the /etc/rc.d/ network reload script.
Note: It is recommended that both CMMs use static IP addresses for all interfaces. DHCP addresses may
be unexpectedly lost or changed in some network configurations.
Note: eth0 should always be set to a different subnet than eth1/eth1:1. Failure to set eth0 to a different
subnet than eth1 will cause network errors on the CMM and redundancy will be lost.
8.3.1.1 Setting Static IP Information for eth0
1. Open the /etc/ifcfg-eth0 file using the vi editor. By default, the file contains three variables.
Example:
— BOOTPROTO=“static” — DEVICE=“eth0” — STATICIP=“10.90.90.91”
2. Ensure the BOOTPROTO value is set to static.
Note: Linux is case sensitive, so ensure that the BOOTPROTO variable is entered in lower case letters in
the step above.
3. Set the STATICIP variable to the IP address you want to assign to that interface.
The Command Line Interface (CLI)
4. To set the netmask for eth0, add the NETMASK0 variable and set it to the appropriate netmask for your network.
5. To set the gateway for eth0, add the GATEWAY0 variable and set it to the appropriate value for the gateway on your network.
6. To activate the changes, at the user prompt (from the root “/” directory), type:
/etc/rc.d/network reload
8.3.1.2 Setting Static IP Information for eth1 and eth1:1
Note: eth1:1 is static IP address only. It does not support DHCP.
1. Open the /etc/ifcfg-eth1 file using the vi editor. By default the file contains five variables.
Example:
— BOOTPROTO=”static” — DEVICE=”eth1” — SETIP=”both” — STATICIP1=”192.168.100.91” — STATICIP2=”192.168.100.93”
2. Set the STATICIP1 variable to the IP address you want to assign to eth1.
3. Set the STATICIP2 variable to the IP address you want to assign to the active CMM on the network. This value should ONL Y be set on the active CMM, as it will be synchronized to and overwritten on the standby CMM.
4. Set the SETIP variable to assign IP addresses eth1 and eth1:1 based on the following table:
MPCMM0001 Chassis Management Module Software Technical Product Specification 73
The Command Line Interface (CLI)
Table 19. SETIP Interface Assignments when BOOTPROTO=”static”
Interface SETIP=1 SETIP=2 SETIP=Both Other
eth1 STATICIP1 STATICIP2 STATICIP1 eth1:1 disabled disabled STATICIP2 disabled
5. Add the NETMASK1 variable and set it to the appropriate netmask for STATICIP1 for your network.
6. Add the NETMASK2 variable and set it to the appropriate netmask for STATICIP2 for your network. The NETMASK2 variable needs to be correct to allow for true redundant operation.
7. Add the GATEWAY1 variable and set it to the appropriate value for the gateway for STATICIP1.
8. Add the GATEWAY2 variable and set it to the appropriate value for the gateway for STATICIP2
9. To activate the changes, at the user prompt (from the root “/” directory), type:
/etc/rc.d/network reload
Note: The eth1:1 address should only be changed on the active CMM. The new address will be
synchronized to the standby CMM automatically when the /etc/rc.d/network reload command is executed. Also, the eth1:1 should be changed with the procedure above and NOT by using the ifconfig command manually. This method will cause the eth1:1 information to not be synchronized to the standby.
8.3.1.3 Setting eth0 to DHCP
Previous Value
1. Using the vi editor, change the BOOTPROTO variable in the /etc/ifcfg-eth0 file to dhcp.
Note: Linux is case sensitive, so ensure that the BOOTPROTO value is entered in lower case letters in
the step above.
2. To activate the changes, the user can reboot the CMM, or at the user prompt (from the root “/” directory) on the active CMM, type:
/etc/rc.d/network reload
Note: A DHCP server must be present on the network for the CMM to get a valid IP address. The
network reload command will refresh the IP addresses on both network interfaces.
8.3.1.4 Setting eth1 to DHCP
1. Using the vi editor, change the BOOTPROTO variable in the /etc/ifcfg-eth1 file to dhcp.
2. eth1:1 will still use a static IP address in this configuration. Set the STATICIP2 variable to the IP address you want to assign to the active CMM on the network. This value should ONL Y be set on the active CMM, as it will be synchronized to and overwritten on the standby CMM.
3. Add the NETMASK1 variable and set it to the appropriate netmask for STATICIP1 for your network.
74 MPCMM0001 Chassis Management Module Software Technical Product Specification
4. Add the NETMASK2 variable and set it to the appropriate netmask for STA TICIP2 for your network. The NETMASK2 variable needs to be correct to allow for true redundant operation.
5. Add the GA TEWAY1 variable and set it to the appropriate value for the gateway for STATICIP1.
6. Add the GA TEWAY2 variable and set it to the appropriate value for the gateway for STATICIP2
7. Set the SETIP variable to assign IP addresses eth1 and eth1:1 based on the following table:
Table 20. SETIP Interface Assignments when BOOTPROTO=”dhcp”
Interface SETIP=1 SETIP=2 SETIP=Both Other
eth1 dynamic STATICIP2 dynamic dynamic eth1:1 disabled disabled STATICIP2 disabled
8. To activate the changes, at the user prompt (from the root “/” directory), type:
/etc/rc.d/network reload
8.3.2 Setting a Hostname
The Command Line Interface (CLI)
The hostname of the CMM is a logical name that is used to identify a particular CMM. This name is shown at login time just to the left of the login prompt on the serial port interface when configured (i.e., “MYHOST login:”) and advertised to any DNS servers on a network. If there is no entry in /etc/HOSTNAME, the login prompt will not have anything next to it. By default, the hostname is set to the product name (i.e. MPCMM0001).
The hostname should be configured on the each CMM. To change the hostname:
1. Using the vi editor, change the HOSTNAME variable in /etc/HOSTNAME to the desired name.
2. To activate the changes, at the user prompt (from the root “/” directory), type:
etc/rc.d/network reload
Note: Executing network reload also causes the network interfaces to reload their IP addresses. If DHCP
is being used on a network interface, then it is possible that the IP address on that interface will change.
8.3.3 Setting the Amount of Time for Auto-Logout
For security purposes, the CMM automatically logs the user out of the current console session after 15 minutes (900 seconds). This auto-logout time can be changed by editing /etc/profile and changing the TMOUT value to the desired setting. The time-out (TMOUT) value is set in seconds (900 seconds is the default). A setting of TMOUT=0 will disable the automatic logout. This can also be set at the command line.
MPCMM0001 Chassis Management Module Software Technical Product Specification 75
The Command Line Interface (CLI)
8.3.4 Setting the Date and Time
On the active CMM, use the date command in the CLI to view the current date and time for the CMM. To set the date and time on the CMM use the setdate command. The setdate command should use the following syntax:
setdate “mm/dd/yyyy [timezone] hh:mm:ss”
The date is stored on the CMM in Coordinated Universal Time (UTC). The local timezone can be included in the setdate string, and the CMM will determine the offset and automatically change the date to UTC. An example that will set the date and time to “Thu Mar 11 20:12:00 UTC 2004” is:
setdate “3/11/2004 PST 12:12:00”
The date and time are synchronized to the standby CMM when changed and then every hour.
8.3.5 Telnet into the CMM
T o telnet into the CMM, point your console or telnet application to the IP address of the eth0, eth1, or eth1:1 interface on the CMM you wish to telnet to. If you wish to telnet to the active CMM, you can point the telnet application to the eth1:1 IP address. The “pointing” is accomplished using the Telnet open command. To get the IP address see Section 8.3.1, “Setting IP Address Properties” on
page 72.
8.3.6 Connect Through SSH (Secure Shell)
For a more secure connection, users can connect to the CMM using SSH, or Secure Shell. SSH is a secure interface and protocol used for encrypted connections between a client and a server. Using an SSH client, open the IP address of the eth0, eth1, or eth1:1 interface on the CMM you wish to establish an SSH session with. SSH clients can be found freely available on the Internet.
8.3.7 FTP into the CMM
For security purposes, the CMM will prevent users from accessing the CMM through FTP. So before FTP’ing into the CMM, ensure the “root” entry is removed from the /etc/ftpusers file using a text editor like vi. If this entry is not removed, you will be unable to login via FTP.
Using an FTP client, FTP to the IP address of the CMM you wish to transfer files to or from and use the CLI login and password.
8.3.8 Rebooting the CMM
T o reboot the CMM, type the reboot command in the CLI on the CMM that is to be rebooted. If the reboot command is issued on the active CMM in a redundant configuration, a failover to the standby CMM will occur. If the reboot command is issued on a CMM in a single CMM configuration, chassis management will be lost during the reboot process. Telnet and SSH sessions will have to be reestablished with the CMM after it is rebooted.
Note: Do not use the “init 0” or “init 6” command to reboot the CMM as problems may result.
76 MPCMM0001 Chassis Management Module Software Technical Product Specification
The Command Line Interface (CLI)

8.4 CLI Command Line Syntax and Arguments

The command line interface on the CMM supports two types of commands: cmmget and cmmset. cmmget is used to query for information, whereas cmmset is used to write information.
There are man pages available on the CMM for these two commands. To access the man page for cmmget use the command man cmmget. To access the man page for cmmset, use the command man cmmset.
8.4.1 Cmmget and Cmmset Syntax
The syntax for calling the CLI from the command line is as follows:
cmmget [-h] [-l location] [-t target] -d dataitem
cmmset [-h] [-l location] [-t target] -d dataitem -v value
Where cmmget and cmmset are the CLI executables. The parameters can be in any order. The CLI is case insensitive, except for the executable name. Parameters shown in brackets are optional.
Any attribute value that contains a space must be enclosed in quotes. This happens often when specifying targets. For example, to get the current value of a sensor called Brd Temp on the CMM, the command would be:
cmmget –l cmm –t “Brd Temp” –d current
8.4.2 Help Parameter: -h
If the Help parameter is given, the rest of the parameters are ignored, and the help text is output to the user.
8.4.3 Location Parameter: -l
The Location parameter is the location in the system on which the user is executing the cmmget or cmmset on. If no location is given then the default location is the CMM.
Use the following cmmget command to list all valid locations in the chassis:
cmmget -d listlocations
The Location keywords are shown in the following table.
Table 21. Location (-l) Keywords
Keyword Function
cmm The Chassis Management Module.
bladeN
system The entire platform.
One of the CPU boards in the chassis. N refers to the chassis slot number into which the CPU board is inserted. Please refer to the Chassis documentation for slot information.
MPCMM0001 Chassis Management Module Software Technical Product Specification 77
The Command Line Interface (CLI)
Table 21. Location (-l) Keywords
Keyword Function
chassis Chassis specific information.
The system fantray where N is the number of the fantray. For example,
fantrayN
PEM1 PEM2
fantray1 refers to the single fantray in the MPCHC0001 shelf. NOTE: fantray1 may also be referred to as blade15 in a 14 slot chassis or
blade17 in a 16 slot chassis.
The system Power Entry Modules. PEM1 is in the left slot when looking from the front of chassis and PEM2 is in the right slot. NOTE: PEM1 may also be referred to as blade16 and PEM2 as blade17 in
a 14-slot chassis; correspondingly they can be referred to as blade18 and blade19 in a 16-slot chassis.
8.4.4 Target Parameter: -t
The Target parameter is the sensor or variable that the cmmget or cmmset acts on. If target is not given then it is assumed that dataitem is an attribute of location. An example of this is
presence. To obtain a list of valid targets for a device, issue the following command:
cmmget [-l location] -d listtargets
Where location is the device for which you want to obtain a list of targets. The target parameter for plug-in bo ards and different chassis components is defined by the sensor
name in the Sensor Data Record (SDR) for that device. The various boards, fantrays, and PEMs provide their own SDRs automatically.
78 MPCMM0001 Chassis Management Module Software Technical Product Specification
The Command Line Interface (CLI)
The following table shows the values target can be for the CMM location.
Table 22. CMM Targets
Brd Temp Board Temperature CPU Temp CPU Temperature FilterTrayTemp[1,2] Filter Tray Temperature Sensors CPU Core V CPU Core Voltage VBAT Battery Voltage VTT DDR CMM Memory voltage +2.5V +2.5V voltage sensor +3.3V +3.3V voltage sensor +5V +5V voltage sensor +12V +12V voltage sensor CDM [1,2] Chassis Data Modules 1 and 2 Air Filter Air Filter Filter Tray Filter Tray FRU Filter Run Time Filter Run Time BIST Built-In Self Test Sensor FRU Hot Swap FRU Hot Swap sensor Filter Tray HS Filter Tray Hot Swap sensor IPMB-0 Snsr [1-16] IPMB 0 sensors FRU FRU file for CMM all_leds Target for configuring all user-definable LEDs on the CMM front panel hsled Hot swap LED on the CMM front panel userled[1-4] Corresponds to userled A-D on the CMM front panel
feedN PmsGlobal Target for PMS global data
PmsProcN Target for each process monitored where N is the process number Datasync Status Datasync Status Sensor CMM Status CMM Status Sensor PmsPieN Process monitoring process integrity sensors None Same as not entering a target
Keyword Description
Corresponds to power feed (i.e. feed1, feed2). Use the feedcount dataitem to determine number of power feeds for component.
MPCMM0001 Chassis Management Module Software Technical Product Specification 79
The Command Line Interface (CLI)
8.4.5 Dat aitem Parameter: -d
The dataitem is the parameter, identified by target and/or location, that the user is getting or setting. The dataitem must be given for every CLI command.
8.4.5.1 Location Dataitem lists
Table 23 through Table 29 list the valid dataitems for each location when no target is specified.
Table 23. Dataitem Keywords for All Locations
Dataitem Description Get/Set CLI Get Output Valid Set Values
listdataItems
health
Used to find out what data items are available on a target or location.
Retrieves the health information about a particular location or target.
Get
Get
Table 24. Dataitem Keywords for All Locations Except System
Listing of all valid data items that can be issued for the specified location or target
"Location/Target has no/minor/major/ critical problems"
N/A
N/A
Dataitem Description Get/Set CLI Get Output Valid Set Values
listgetdataitems
listsetdataitems
healthevents
listtargets
Lists all available dataitems that can be retrieved with cmmget.
Lists all available dataitems that can be set with cmmset.
Retrieves events that contribute to the health of the location or target. This is a list of events currently active on the location or target. Health events strings are documented in Section 11, “Health Events”
on page 104
Used to find what sensors or targets are available on the location. This is the list of sensors defined by the SDR for that particular location.
Get
Get
Get
Get
Listing of all valid get data items that can be issued for the specified location or target
Listing of all valid set data items that can be issued for the specified location or target
List of currently active events. E.g. "Major Event : +12V_B Lower critical
going low asserted Major Event : +12V_A Lower critical
going low asserted"
Listing of all the targets that are available on the location
N/A
N/A
N/A
N/A
80 MPCMM0001 Chassis Management Module Software Technical Product Specification
The Command Line Interface (CLI)
Table 25. Dataitem Keywords for All Locations Except Chassis and System (Sheet 1 of 4)
Dataitem Description Get/Set CLI Get Output Valid Set Values
"GetDeviceID: < interpreted string without label>
Device ID = <Device ID> SDR Support = <device provides Device
SDRs> Device Revision = <Device revision> Device Available = <Device available:
0=normal operation, 1=device firmware> Firmware Revision = <Firmware revision:
major.minor> IPMI Version = <IPMI version> Chassis Support = <Additional chassis
device support> Bridge Support = <Additional bridge
support> IPMB Event Generator Support =
<Additional IPMB Event Generator support>
IPMB Event Receiver Support = <Additional IPMB Event Receiver support>
FRU Inventory Support = <Additional FRU inventory device support>
SEL Support = <Additional SEL device support>
SDR Repository Support = <Additional SDR Repository device support>
Sensor Support = <Additional sensor device support>
Manufacturer ID = <Manufacturer ID> Product ID = <Product ID> Aux Firmware Revision = <Auxiliary
firmware revision information>"
"<location> activation locked bit is set. If in M1, <location> cannot transition to M2 until unlocked"
OR "<location> activation locked bit is not
set. <location> can transition from M1 to M2"
N/A
1=activate FRU 0=deactivate FRU
1=set locked bit 0=clear locked bit
deviceid
fruactivation
fruactivationpolicy
Retrieves the device’s SDR support, hardware revision, firmware / software revision, and sensor and event interface command specification revision information. Implements Get Device ID command. See IPMI 1.5 Specification Section 17.1.
Set the activation state to either activate or deactivate the FRU. The Deactivate is the same as a Graceful Shutdown.
Get or Set the FRU activation policy. A Get returns whether the “Locked Bit” is set. For example, if blade 11 activation locked bit is set, and if in M1, then blade 11 cannot transition to M2 until unlocked. If blade 11 activation locked bit is not set then blade 11 can transition from M1 to M2.
Get
Set N/A
Both
MPCMM0001 Chassis Management Module Software Technical Product Specification 81
The Command Line Interface (CLI)
Table 25. Dataitem Keywords for All Locations Except Chassis and System (Sheet 2 of 4)
Dataitem Description Get/Set CLI Get Output Valid Set Values
Set the FRU payload to do things like Cold Reset, Warm Reset, etc.
frucontrol
hotswapstate
fruextractionnotify
ledproperties
picmgproperties
powerlevels
The CMM location only supports 2 (graceful reboot) and will only work on standby CMM. Using frucontrol on an active or single CMM will attempt a failover before executing the command. If failover is unsuccessful, fruncontrol will not execute and return an error.
Retrieves the FRU’s current M state (0-7).
Used to notify the Shelf Manager that a FRU has been extracted from the shelf. Example is "cmmset -l <location> -d fruextractionnotify -v 1"
Find out the number and type of LEDs the FRU supports and which LED it can control. Implements the Get FRU LED Properties command. See PICMG 3.0 Section 3.2.5.6.
Query the maximum FRU Device ID supported by the IPMI controller. Implements Get PICMG Properties command. See PICMG 3.0 Table 3-9.
Returns the power levels available for a FRU and the number of watts drawn by each.
Set N/A
Get "<location> Hot Swap state is M[x]" N/A
Set N/A 1=Extract FRU
Information pertaining to number and control of the LEDs
"<location> has control of <main_leds> <location> supports
<number_user_leds> user leds" Where
Get
Get
Get
<location> is the -l parameter (can be a sub FRU)
<main_leds> is Comma-separated list of <led> items
<led> is hsled, led1, led2, led3 <number_user_leds> is the decimal
number of user LEDs supported by FRU "PICMG Properties: < interpreted string
without label>
PICMG Properties ID = <PICMG ID> PICMG Extension Version = <PICMG
extension version=major.minor> Max FRU Device ID = <Max FRU device
ID> FRU Device ID = <FRU device ID for
IPMI controller>" "ATCA FRU Power Levels:
Power Level 1 = A watts ... Power Level n = B watts"
0=cold reset 1=warm reset 2=graceful reboot 3=issue diagnostic
interrupt
N/A
N/A
N/A
82 MPCMM0001 Chassis Management Module Software Technical Product Specification
The Command Line Interface (CLI)
Table 25. Dataitem Keywords for All Locations Except Chassis and System (Sheet 3 of 4)
Dataitem Description Get/Set CLI Get Output Valid Set Values
Retrieves and sets power state of a blade.
Get: This is used to find out the powered on/off or offline state of a blade
powerstate
presence
presentpowerlevel
sel
Set: T o reboot, shutdown and turn on a blade
If “reset” is used on CMM location, the software will check for redundancy and a reset will only occur if a redundant partner is identified.
“PowerOff” is not supported on the CMM location.
Used to find out if a particular location is occupied or present in the chassis. This can be blades or intelligent entities like fan trays or PEMs.
Get/Set the current power level of a FRU.
Returns the System Event Log of the specified location.
Both
Get
Get
Get
"<location>: <currentState> (Mx)" where Mx is the ATCA-defined M state of
the location
"<location> is <presenceState>." Where <presenceState> is "present" - if the location is present "not present" - if the location is not
present or the CMM can not communicate with it.
"The FRU Power Level is <PwrLevel> Consuming <WattageValue> Watts"
where <PwrLevel> is the current power level of
the FRU in the range 1-20 <WattageValue> is the current power
draw in watts Listing of the interpreted SEL log of the
location. The listing is of the format: “<Entry1>\n\n<Entry2>…” where <EntryM> is of the format:
<Timestamp in Linux date format>\n\t<SensorName>\t<EventDescr iption>
Base Interface Ekeys: <EkeyList>
"Reset" "PowerOff" "PowerOn"
N/A
N/A
N/A
Fabric Interface EKeys: <EkeyList>
grantedboardekeys
Get the EKeys that have been granted to the Board.
Get
Update Channel Interface EKeys: <EkeyList>"
where <EkeyList> is a list of ekey settings for the interface such as
Ekey1 : enabled EKey2 : disbaled EKey3 : no set
N/A
MPCMM0001 Chassis Management Module Software Technical Product Specification 83
The Command Line Interface (CLI)
Table 25. Dataitem Keywords for All Locations Except Chassis and System (Sheet 4 of 4)
Dataitem Description Get/Set CLI Get Output Valid Set Values
Metallic Test Bus Pair #1: Token Owned: Yes/No Owner's IPMBAddress: IPMBAddress
Metallic Test Bus Pair #2: Token Owned: Yes/No Owner's IPMBAddress: IPMBAddress
Sync Clock Group #1: Token Owned: Yes/No
busedekeys
Get a list of Bused EKeys and who owns them.
Get
Owner's IPMBAddress: IPMBAddress
N/A
Sync Clock Group #2: Token Owned: Yes/No Owner's IPMBAddress: IPMBAddress
Sync Clock Group #3: Token Owned: Yes/No Owner's IPMBAddress: IPMBAddress
Used to query the total number of FRUs in a particular location. Once the number of FRUs for the
totalfrus
frudeactivationpolicy
ipmicommand Set
rawsel
location is known, the FRU can be specified by the format "-l location:fru#". Not specifying the ":fru#" part will direct the command to FRU ID 0
Get/Set the deactivation policy of the FRU. In PICMG
3.0 ECN 1 this refers to the deactivation locked bit. The FRU can be specified by the format “-l [location:fru#]. Not specifying the “:fru#” will direct the command to FRU
0.
Used to list SEL in raw format.
Get integer number N/A
Both
Get
where "Owner's IPMBAddress" is displayed when "Token Owned" is set to "Yes".
1 - Locked bit is set 0 - Locked bit is not set
Command Response string on success or error code on failure.
"Listing of raw format SEL log of the location. The listing is of the format:
<Entry1>\n\n<Entry2>… Where : <EntryM> is of the format: <Timestamp in
Linux date format>\n\t<SensorName>\t<EventDescr iption>"
1- Set the locked bit
0 - Clear the locked bit
Command request string
N/A
84 MPCMM0001 Chassis Management Module Software Technical Product Specification
The Command Line Interface (CLI)
Table 26. Dataitem Keywords for Chassis Location
dataitem Description Get/Set CLI Get Output Valid Set Values
A numerical value between 0 and 100 (i.e., “70”), “localcontrol”, or “emergencyshutdow n”
(localcontrol is not supported on the MPCHC0001 chassis fan tray.)
Location String less than 16 characters in length.
fanspeed
location
Used to get or set the fan speed of all fans in the chassis. Value is percent of the maximum fan speed. See
Section 16, “Fan Control and Monitoring” on page 132 for
more information.
This is used to get or set the Location field in the chassis FRU and is sent out as a part of SNMP and UDP alarts. This is only used with the chassis location.
Both
Both
The percentage of the max speed, "Emergency Shut Down", or "Local Control". For example, 80 for 80% of the max speed.
"Shelf Address: <address>" Where: <address> is a space-separated list
of two-digit, hex numbers if the address’ type/len byte is 0, decoded string otherwise
MPCMM0001 Chassis Management Module Software Technical Product Specification 85
The Command Line Interface (CLI)
Table 27. Dataitem Keywords for Cmm Location (Sheet 1 of 7)
dataitem Description
Used to find out all the
listlocations
listpresent
clearsel
powerbudget
locations that can be queried. This list can contain both present and non-present locations.
Used to query the locations that are currently present in the shelf that the CMM can communicate with.
Clears the event log of the entire shelf.
cmmset -d clearsel -v clear
Get information about the overall power budget, how much is used, how much available
Get/
Set
All possible locations in the shelf e.g. “cmm blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8
Get
Get
Set N/A clear
Get
blade9 blade10 blade11 blade12 blade13 blade14 FanTray PEM1 PEM2 chassis system”
All the present locations in the shelf. It is the subset of listlocation. e.g.:
“cmm blade1 blade2 FanTray PEM1 PEM2 chassis system”
"Shelf Power Budget Distribution: Feed #1 = A Watts Feed #2 = B Watts Feed #3 = B Watts ... Feed #N = D Watts"
CLI Get Output Valid Set Values
N/A
N/A
N/A
86 MPCMM0001 Chassis Management Module Software Technical Product Specification
The Command Line Interface (CLI)
Table 27. Dataitem Keywords for Cmm Location (Sheet 2 of 7)
dataitem Description
Retrieve or set the state of the Telco Alarm cutoff. When enabled, it silences the Telco alarm for active events and
alarmcutoff
alarmtimeout
criticalled minorled majorled
Ethernet
EthernetA EthernetB
blinks the event LEDs on the CMM. This dataitem is only valid when used with the cmm as the location and is used to set the alarm cutoff or get its value.
Retrieve or set the timeout value in minutes for the Telco Alarm cutoff. This is the amount of time before the alarm cutoff will automatically become unset if the user doesn’t unset it themselves. This dataitem is only valid when used with the cmm as the location and is used to set the alarm timeout or get its value.
Used only with the CMM location to turn on or off the critical, major and minor leds. When used with cmmset, a –v value of 1 turns the LED on while a 0 turns it off.
Included for backward compatibility only.
The mapping of the command for existing dataitem is:
Ethernet = EthernetA
Used only with the CMM location to change the eth0/ eth1 direction to either the front panel, the rear panel IO card, or backplane.
The mapping of the command for existing dataitems are:
EthernetA = cmm1EthernetA + cmm2EthernetA
EthernetB = cmm1EthernetB + cmm2EthernetB
Get/
Set
Both
Both "Timeout is <timeoutvalue> minutes."
Both
Both
Both
"Telco Alarm Cutoff is <enabled/ disabled>."
"1" if the LED is On "0" if the LED is Off
"front" or "rear" or "backplane"
For example, bash-2.04# cmmget -d ethernet cmm1ethernetA: front cmm2ethernetA: front
"Front" or "Rear" or "Backplane". For example,
bash-2.04# cmmget -d ethernetA cmm1ethernetA: front cmm2ethernetA: front
bash-2.04# cmmget -d ethernetB cmm1ethernetB: front cmm2ethernetB: front
CLI Get Output Valid Set Values
1 = Set cut off 0 = Unset cut off
Number of minutes: 0-1000.
Value of 0 disables the time-out.
1 - Turn On LED 0 - Turn Off LED
"Front" – Set direction to the front panel.
"Back" – Set direction to the rear IO panel card.
"Front", "Rear", or "Backplane"
MPCMM0001 Chassis Management Module Software Technical Product Specification 87
The Command Line Interface (CLI)
Table 27. Dataitem Keywords for Cmm Location (Sheet 3 of 7)
dataitem Description
cmm1EthernetA cmm1EthernetB cmm2EthernetA cmm2EthernetB
version
update
redundancy
failover
rmcpenable
Used only with the CMM location to change the eth0/ eth1 direction to either the front panel, the rear panel IO card, or backplane on CMM1 and/or CMM2.
The version of the CMM software.
Used with the cmmget command to update the CMM firmware on the CMM.
In a redundant system, updates should only be done on one CMM at a time in order to maintain chassis management.
Refer to Section 23, “Updating
CMM Software” on page 204
for additional information.
Returns the list of CMMs in the shelf and their status.
Used with cmmset from the active CMM to force a failover to the standby. This will only complete successfully if the standby CMM is in a state where it can handle a failover.
Enable/Disable RMCP interface.
Get/
Set
"Front" or "Rear" or "Backplane". For example,
bash-2.04# cmmget -d cmm1ethernetA
cmm1ethernetA: front
bash-2.04# cmmget -d
Both
Get
Set N/A
Get
Set N/A
Both
cmm1ethernetB cmm1ethernetB: front
bash-2.04# cmmget -d cmm2ethernetA
cmm2ethernetA: front
bash-2.04# cmmget -d cmm2ethernetB
cmm2ethernetB: front "Version:
[Generation].[SRA].[Patch].[Build]" where Generation: Firmware Generation. SRA: Release in that Generation. Patch: Patch number. Build: Build number. E.g. Version:5.1.0.11
CMM 1: Present (active) * CMM 2: Not Present (standby) * = The CMM you are currently logged
into.
"1" - RMCP Enabled "0" - RMCP disabled
CLI Get Output Valid Set Values
"Front", "Rear", or "Backplane"
N/A
"<cmm image location> ftp:<hostname or IP address>:username :password"
N/A
“1” = Failover to standby CMM with equal or newer firmware version.
“any”= Failover to standby CMM regardless of firmware version.
"1" - RMCP Enable "0" - RMCP disable
88 MPCMM0001 Chassis Management Module Software Technical Product Specification
The Command Line Interface (CLI)
Table 27. Dataitem Keywords for Cmm Location (Sheet 4 of 7)
dataitem Description
snmpenable
snmptrapaddress[1-5]
snmptrapcommunity
snmptrapport
snmptrapversion
airfilterruntimelimit
resetairfilterruntime
Used to set or query SNMP trap enabled status.
Get or Set the machine’s IP address that will receive SNMP traps from a location. Up to five addresses can be set. Default is 0.0.0.0 for all 5. Example:
cmmset -l cmm -d snmptrapaddress3 -v
10.10.241.105 Get or Set the SNMP trap
community name. Example:
cmmget -l cmm -d snmptrapcommunity
Returns: SNMP trap community:
publiccmm Get or Set the TCP/IP port that
the SNMP trap will be sent to. The default is 162.
Retrieves or sets SNMP trap version. This is either v1 or v3.
Returns the uppercritical limit. Note: It uses the sensor to
display the runtime value in days since the last reset.
To retrieve the uppernoncritical limit use the command:
cmmget -t "filter run time" -d uppernoncritical
(or -d thresholdsall)
Resets the air filter runtime to
0. The set is supported to allow the user to set the run time to zero when the filter is replaced.
Get/
Set
Both SNMP traps are <enabled/disabled>.
"SNMP trap address: IpAddress"
Both
Both
Both "SNMP trap port: portNumber"
Both "SNMP trap version: v1/v3" “v1” or “v3”
Both <uppercritical limit> Days
Set N/A
where IpAddress is of the format A.B.C.D
"SNMP trap community: communityValue"
CLI Get Output Valid Set Values
1=Enable 0=Disable enable disable
<IpAddress> Where: <IpAddress> is a
Valid IP address in the form: A.B.C.D
<CommunityName >
Where <CommunityName > is any Valid SNMP community name 64 characters or less. Ex: publiccmm
Valid port number 0-65535
1. Disable eventing on air filter run time
-v 0
2. Enable eventing on air filter and set the uppercritical limit to (xxx) days, and it also sets the upper non-critical value to 90% of the uppercritical.
-v xxx
3. Enable eventing on air filter and set the uppercritical limit to (xxx) days and the upper non­critical to (yyy) days
-v xxx,yyy xxx and yyy must
be an integer between 1 and 255.
1 - Only accepts a value of 1
MPCMM0001 Chassis Management Module Software Technical Product Specification 89
The Command Line Interface (CLI)
Table 27. Dataitem Keywords for Cmm Location (Sheet 5 of 7)
dataitem Description
syncuserledstate
powersequence
loginmessage
cmdlineprompt
FaultLEDColor
Gets/Sets whether the LED state is synced between the active and standby CMM.
Used to get/set the power sequence order, Power Sequencing Delay, ShelfManagerControlledActiva tion in the CDM.
Note: The power sequencing delay is in tenths of a second to delay before powering up any other FRU after powering this FRU. The value of the power sequencing delay is between 0 and 63. Shelf Manager Controlled Activation determines if the Shelf Manager activates the FRU residing at this location when it reaches M2.
Used to customize the login screen message by allowing user to add the OEM name.
Used to customize the bash prompt by allowing user to add the OEM name.
Get/Set the color of the fault/ health LED on the CMM fronted FRUs (Filter Tray, CDM) to be used when an error is reported. Does not affect CMM Health LED.
Get/
Set
Both "Yes" or "No" “Yes” or “No”
It will be in INI format displayed on the console as follows:
[Settings] EntryCount=xxx
[Power Sequence 1] Location=cmm1 FRUDeviceID=0 ShelfManagerControlledActivation=Ye
Both
Both
Both
Both "amber" or "red" “amber” or “red”
s DelayBeforeNextPowerOn=0
[Power Sequence 2] … …
[Power Sequence xxx] Location=blade12 FRUDeviceID=0 ShelfManagerControlledActivation=N
o DelayBeforeNextPowerOn=0
The OEM string that is displayed at the login screen
The OEM Name string to be prepended to the $PS1 variable
CLI Get Output Valid Set Values
INI file with its path, such as "-v /home/ PowerSeq.INI"
"OEMWelcomeMes sage" where the OEMWelcomeMess age is the message that will appear at the login screen
Max length = 63 characters
"OEMName" where the OEMName is the string that will appear at the beginning of the bash prompt.
Max length = 63 characters
90 MPCMM0001 Chassis Management Module Software Technical Product Specification
The Command Line Interface (CLI)
Table 27. Dataitem Keywords for Cmm Location (Sheet 6 of 7)
dataitem Description
Used to set or query the adminstrative state of the PMS as a whole or an individual monitored process. A target of “PmsGlobal” will get/set the state of the PMS as a whole. A target of “PmsProc[#]” will get/ set the unique state of an individual process, where # is
AdminState
RecoveryAction
EscalationAction
ProcessName
OpState
the unique process number for the process. A target of "PmsPie[#]" will get/set the unique state of an PIE, where # is the unique pie number.
AdminState is CMM-specific and is not synced between CMMs. It allows individual control of each CMMs adminstate and can be set on either active or standby CMM.
Used to set or query the recovery action of a PMS monitored process. This is only valid for a target of "PmsProc[#]. Where "#" is the unique number for the process.
Used to set or query the process restart escalation action. This is only valid for a target of "PmsProc[#]. Where "#" is the unique number for the process.
Used to query the process name and associated command line arguments for a monitored process. A target of "PmsProc[#]” will retrieve the name of an individual process, where "#" is the unique number for the process. "PmsPie[#]" will retrieve
the path and command line arguments, of the PIE to be executed periodically.
Used to query the operational state of a monitored process. An operational state of disabled indicates that the process has failed and cannot be recovered. This is valid for a target of "PmsProc[#]” and “PmsGlobal”, where "#" is the unique number for the process, and PmsGlobal refers to the OpState for all of PMS. This is also valid for a target of Pie[#].
Get/
Set
Both "1:Unlocked" or "2:Locked"
Both
Both "1:No Action", "2:Failover and Reboot"
Get
Get "1:Enabled", "2:Disabled" N/A
"1:No Action", "2:Process Restart", "3: Failover and Restart", or "4:Failover and Reboot"
"<Process_Name> <Command_Line_Arguments>"
CLI Get Output Valid Set Values
1 = Unlocked 2 = Locked
1 = No Action 2 = Process Restart 3 = Failover and Restart 4 = Failover and Reboot
1 = No Action 2 = Failover and Reboot
N/A
MPCMM0001 Chassis Management Module Software Technical Product Specification 91
The Command Line Interface (CLI)
Table 27. Dataitem Keywords for Cmm Location (Sheet 7 of 7)
dataitem Description
standbycmmreboot
feedcount
failoveronredundancy
syncuserscripts
snmptrapformat
snmpsendunrecogniz edevents
selformat
seldisplayunrecogniz edevents
temperaturelevel
Used to request to reboot the standby CMM from the active CMM.
Get the power feed count for that location. Determines number of feed targets (i.e., feed1) for that location. See
Section 7.7, “Power Feed Targets” on page 69.
Used to set the failover configuration flag
Used to set the direction of synchronization for home scripts when the CMM versions differ.
Uset to get/set SNMP trap format
Used to get/set the option if to send unrecognized events. Used only when snmptrapformat is set to 1.
Used to get/set the option if to send unrecognized events. Used only when snmptrapformat is set to 1.
Used to get/set the option if to display unrecognized events in SEL. Used only when selformat is set to 1.
Used to query the current temperature level of the fantray.
Get/
Set
Set N/A
Get Integer number N/A
Both automatic, manual automatic, manual
Both
Both
Both
Both
Both
Get “Normal”, “Minor”, “Major”, “Critical” N/A
“upgrade”, “downgrade”, “always”, “equal”
“1” - text “2” -raw “3”-text+raw
0 - don't send 1 - send
“0” - don't send “1” - send
“0” - don't display “1” - display
CLI Get Output Valid Set Values
1 - Request to reboot the standby CMM
upgrade, downgrade, always, equal
“1” - text “2” - raw “3” - text+raw
“0” - don't send “1” - send
“0” - don't send “1” - send
“0” - don't display “1” - display
T a ble 28. Dataitem Keywords for System Location
dataitem Description
unhealthylocations
clearmajor
clearminor
Used to query which locations have active health events
Clear major alarm LED on the active CMM.
Clear minor alarm LED on the active CMM.
Get/
Set
"Critical : CritList Major : MajList
Get
Set N/A
Set N/A
Minor : MinList" where each list is a list of locations
having that level of health events (space separated)
CLI Get Output Valid Set Values
N/A
1 - Only accepts a value of 1
1 - Only accepts a value of 1
92 MPCMM0001 Chassis Management Module Software Technical Product Specification
Table 29. Dataitem Keywords for FantrayN Location
The Command Line Interface (CLI)
dataitem Description
minorlevel
normallevel
control
defaultcontrol
restoredefaults
minimumsetting
maximumsetting
recommendedsetting
currentfanlevel
Used to set or query the minorlevel for the fantray.
Used to set or query the normallevel for the fantray.
Used to set or query the control mode of the fantray.
Used to set or query the defaultcontrol mode of the fantray.
Used to restore the cooling table defaults of the fan tray to the vendor defaults or code defaults.
Used to query the minimum setting of the fantray returned via the getfantray properties IPMI command.
Used to query the maximum setting of the fantray returned via the getfantray properties IPMI command.
Used to query the recommended setting of the fantray returned via the getfantray properties IPMI command.
Used to query the current cooling level of the fantray.
Get/
Set
Both
Both
Both
Both
Set N/A true
Get N/A
Get N/A
Get N/A
Get
0
CLI Get Output Valid Set Values
Any value between the normallevel and the majorlevel of the fantray.
Any value between the minimumsetting and the minorlevel of the fantray.
EmergencyShutdown fantray CMM defaultcontrol
fantray CMM
N/A
8.4.5.2 Target Dataitem Lists
When a target is specified, there is usually a slightly different set of dataitems specifically for that target. Refer to Section 8.4.4, “Target Parameter: -t” on page 78 for more information on the target parameter. Table 30 lists the possible dataitems used with various targ ets.
MPCMM0001 Chassis Management Module Software Technical Product Specification 93
The Command Line Interface (CLI)
Table 30. Dataitem Keywords Used with the Target Parameter (Sheet 1 of 4)
dataitem Description
listdataitems
health
healthevents
current The current value of a sensor. Get
thresholdsall
Lists the available dataitems for that target.
Returns the health of the target and if any events exist. The returned values will be one of OK, minor, major, or critical.
Returns the specific health events that are occurring on the target if any exist.
All thresholds of a sensor. This includes lower non­recoverable, lower critical, lower non-critical, upper non­critical, upper critical, and upper non-recoverable.
Get
Get
Get
Get
Get/
Set
Listing of all valid data items that can be issued for the specified location or target
"Location/Target has no/minor/major/ critical problems"
List of currently active events. E.g. "Major Event : +12V_B Lower critical
going low asserted Major Event : +12V_A Lower critical
going low asserted" "The current value is currentValue
[Units]" "Upper Non-recoverable:
ThresholdValue [Units] Upper Critical:
ThresholdValue [Units] Upper Non-critical:
ThresholdValue [Units] Lower Non-critical:
ThresholdValue [Units] Lower Critical:
ThresholdValue [Units] Lower Non-recoverable:
ThresholdValue [Units]"
CLI Get Output Valid Set Values
N/A
N/A
N/A
N/A
N/A
If a certain threshold is not supported, the ThresholdValue will display "Not Supported"
One of the following: “Upper Critical:
ThresholdValue [Units]” uppernonrecoverable uppercritical uppernoncritical lowernoncritical lowercritical lowernonrecoverable
Used to query individual thresholds for a value based sensor, such as temperature or voltage.
Get
“Upper Non-critical:
ThresholdValue [Units]”
“Lower Non-critical:
ThresholdValue [Units]”
“Lower Critical:
ThresholdValue [Units]”
“Lower Non-recoverable:
ThresholdValue [Units]”
“Lower Non-recoverable:
ThresholdValue [Units]”
N/A
94 MPCMM0001 Chassis Management Module Software Technical Product Specification
The Command Line Interface (CLI)
Table 30. Dataitem Keywords Used with the Target Parameter (Sheet 2 of 4)
dataitem Description
Used to configure user-defined actions when events occur. This dataitem is used with a target (–t) parameter specified sensor and a value (-v) parameter. When an event happens for that particular sensor, then the script defined in the –v parameter will be executed. The script to be
criticalaction majoraction minoraction normalaction
eventaction
ledcolorprops
executed must be located in the /home/scripts/ directory on the CMM and the /home/scripts path should be omitted when specifying the script.
Example: cmmset -l blade9 -t +5V -d
minoraction -v “powerdownblade 9”
In this example, /home/scripts/powerdownblade will be executed with a parameter of 9 when the +5V sensor on blade1 generates a minor event.
Used to trigger a script based on event code of a health event. Refer to Section 18,
“CMM Scripting” on page 164
Gets a FRU LED’s valid color set. This command returns a comma separated list of supported colors, the default local control color, and the default override color. This command should be issued before a ledstate set command. Implements the Get LED Color Capabilities command. See PICMG 3.0 table 3-24.
Get/
Set
If set, the full path of the script e.g. /
Both
Set N/A
Get
home/scripts/EventScript. If not set, output is ““ (null).
Color properties of the LED "<ledtarget> supports <colors> Default local control color is
<colorList> Default override color is <color>" Where: <ledtarget> is One of the valid LEDs
(hsled, led1, led2, led3, userled1­userled251)
<colorList> is Comma-separated list of <color> items
<color> is one of blue, red, green, amber, orange, white
CLI Get Output Valid Set Values
"<ScriptName> arg1 arg2 …argN"
Where Script name (not full path) is the script file name and arg1-argN are the parameters to the script.
Use “none” to remove an existing entry.
“<event code>:<ScriptName > arg1 arg2...argN” Where event code is the event code associated with the event to associate with the script. ScriptName (not full path) is the name of the script file, and arg1..argN are any parameters required with the script.
Use “<eventcode>:none” to remove an existing entry.
N/A
MPCMM0001 Chassis Management Module Software Technical Product Specification 95
The Command Line Interface (CLI)
Table 30. Dataitem Keywords Used with the Target Parameter (Sheet 3 of 4)
dataitem Description
Gets or Sets a FRU LED’s state. The Get returns the LED’s mode, one of {localcontrol, override, or lamptest}, and a function message. Implements the Get/ Set FRU LED State commands. See PICMG 3.0 tables 3-26 for Get and 3-25 for Set.
Set syntax model: cmmset -l <location> -t <LED>
ledstate
-d ledstate -v <function>, <function options>
Example: cmmset -l cmm -t “userled1” -d
ledstate -v blink,300,700,green
This sets the CMM’s user1 LED to blinking green with an off duration of 300 ms and an on duration of 700 ms.
Get/
Set
Both
CLI Get Output Valid Set Values
"<ledtarget> is in <LEDmode> mode
<function message>"
where
<LEDMode> is one of localcontrol/
override/ lamptest
<function message> is one of the
following, depending on the LED’s
current function:
If LED is off:
function is off
If LED is on:
function is on
color is <color>
If LED is blinking:
function is blink
off time is <offtime> ms
on time is <ontime> ms
color is <color>
If LED is under lamp test:
duration is <duration> ms
<Color> is one of blue, red, green,
amber, orange, white
<Offtime> is Time in milliseconds
that the LED is in the off cycle of a
blink
<Ontime> is Time in milliseconds
that the LED is in the on cycle of a
blink
<Duration> is The duration of the
lamp test in milliseconds
Functions: off, on, blink,
lamptest, localcontrol
Accepted values: <offtime>,
<ontime>,<color>, <duration>
Refer to
Section 12.5, “Setting the State of the User LEDs” on page 125 for more
information.
96 MPCMM0001 Chassis Management Module Software Technical Product Specification
The Command Line Interface (CLI)
Table 30. Dataitem Keywords Used with the Target Parameter (Sheet 4 of 4)
dataitem Description
Get/Set the field in the CDM regarding the max external available current. Only used
maxexternalavailable current
maxinternalcurrent
minexpectedoperating voltage
with the feedN target. e.g.
cmmget -l cmm -t feed1 -d maxexternalavailablecurrent
See Section 7.7, “Power Feed
Targets” on page 69.
Get the field in the CDM regarding the max internal available current. Only used with the feedN target.
e.g. cmmget -l cmm -t feed1 -d maxinternalcurrent
See Section 7.7, “Power Feed
Targets” on page 69.
Get/Set the field in the CDM regarding the max expected operating voltage. Only used with the feedN target.
e.g. cmmget -l cmm -t feed1 -d minexpectedoperatingvoltage
See Section 7.7, “Power Feed
Targets” on page 69.
Get/
Set
Both current in Amps with 1 decimal point
Get current in Amps with 1 decimal point N/A
Both
voltage value in string between -36 to -72v
CLI Get Output Valid Set Values
current in Amps with 1 decimal point.
voltage value in string between -36 to -72v
8.4.6 Value Parameter: -v
The value parameter specifies the new valu e for a dataitem. This parameter is required for all cmmset commands and is only used for cmmset commands. Valid value parameters are shown in with their corresponding dataitems in the data item tables listed above.
8.4.7 Sample CLI Operations
Sample CLI Operations can be found in Appendix A , “Exam ple CLI Com mands”.

8.5 Generating a System Status Report

The CLI includes an executable script (cmmdump) that is used to generate a system status report for use in communicating system health and configuration information to technical support personnel. This is useful in helping technical support successfully troubleshoot any issues that may be affecting the system. Cmmdump outputs system information to the screen by default or to a file. To send the output to a file use the following command:
cmmdump > [filename]
MPCMM0001 Chassis Management Module Software Technical Product Specification 97
The Command Line Interface (CLI)
The filename should refer to a file that is in a valid directory (i.e. /home/cmmdump.txt).The file can then be retrieved off the CMM using FTP (see Section 8.3.7, “FTP into the CMM” on
page 76).
98 MPCMM0001 Chassis Management Module Software Technical Product Specification
Resetting the Password

Resetting the Password 9

It may become necessary at some point to reset the CMM password to its default of cmmrootpass. The CMM has one on board dip switch labeled S2-1 to perform this action. Refer to the Intel® NetStructure™ MPCMM0001 Hardware Technical Product Specification for the location of the switch. Setting the switch and powering up the CMM will cause the password to reset to its default. The CMM then needs to be removed and the switch then needs to be turned off again.

9.1 Resetting the Password in a Dual CMM System

In redundant systems containing dual CMMs, one active, one standby , the password should be reset on the standby CMM. Once reset to its default, the default password will synchronize itself to the active CMM. This prevents the need to perform the reset on both CMMs and a failover.
1. Open the ejector latch on the standby CMM and wait for the blue hot swap LED to illuminate, indicating the CMM is safe to remove from the system.
2. Remove the standby CMM from the chassis.
3. Set dip switch S2-1 to “on”. The dip switch has a label indicating which way is on.
4. Re-insert the CMM into the system and allow the CMM to fully boot (blue light will go off when fully booted).
5. An OK health event will occur indicating that the passwords on both CMMs have been reset and were synched from the standby CMM. A SEL entry will be recorded, and a trap will be sent out.
6. Once at the login prompt, the password should now be reset to its default of cmmrootpass.
7. Login to the active CMM to ensure the password was reset.
8. Open the ejector latch on the standby CMM and wait for the blue hot swap LED to illuminate, indicating the CMM is safe to remove from the system.
9. Remove the standby CMM from the chassis.
10. Set dip switch S2-1 back to original “off” position.
11. Re-insert the CMM into the system and allo w the CMM to fu lly boot (blue light will go off when fully booted).
12. Login to the CMM and operate as normal.
13. Use the passwd command on the active CMM to change the CLI password if desired. The new password will sync to the standby.
MPCMM0001 Chassis Management Module Software Technical Product Specification 99
Resetting the Password

9.2 Resetting the Password in a Single CMM System

For nonredundant systems that contain only a single CMM, resetting the password will require removing the CMM. This will cause any boards that are power controlled by the CMM become unmanaged. Care should be taken to safely shut down boards in the system prior to removing the CMM.
1. Safely shut down and power off boards being power controlled by the CMM.
2. Remove the CMM from the system.
3. Set dip switch S2-1 to “on”. The dip switch has a label indicating which way is on.
4. Re-insert the CMM into the system and allow the CMM to fully boot (blue light will go off when fully booted).
5. Once at the login prompt, the password should now be reset to its default of cmmrootpass.
6. Login to the CMM to ensure the password was reset.
7. Remove the CMM from the system.
8. Set dip switch S2-1 back to its original “off” position.
9. Re-insert the CMM into the system and allow the CMM to fully boot (blue light will go off when fully booted).
10. Login to the CMM and operate as normal.
11. Use the passwd command on the active CMM to change the CLI password if desired.
100 MPCMM0001 Chassis Management Module Software Technical Product Specification
Loading...