Intel MPCMM0001 User Manual

Download

Page 1

Intel® NetStructure™ MPCMM0001 Chassis Management Module

Software Technical Product Specification

April 2005

Order Number: 273888-007

Page 2

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PRO PERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEV ER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING T O FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PA TENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, life sustaining applications.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “un defined.” Intel reserves these for

future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.

The Intel product to deviate from published specifications. Current characterized errata are available on request.

This Software Technical Product Specification as well as the software described in it is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in associat ion with this document.

Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the express written consent of Intel Corporation.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an ordering number and are referenced in this document, or other Intel literature may be obtained by calling

1-800-548-4725 or by visiting Intel's website at http://www.intel.com. AnyPoint, AppChoice, BoardWatch, BunnyPeople, CablePort, Celeron, Chips, CT Media, Dialogic, DM3, EtherExpress, ETOX, FlashFile, i386, i486,

i960, iCOMP, InstantIP , I ntel, Inte l Centrino, I ntel logo, Intel386, I ntel486, I ntel740, Int elDX2, Inte lDX4, IntelSX2, Intel Creat e & Share, Intel GigaBla de, Intel InBusiness, Intel Inside, Intel Inside logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel Play, Intel Play logo, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel TeamStation, Intel Xeon, Intel XScale, IPLink, Itanium, MCS, MMX, MMX logo, Optimizer logo, OverDrive, Paragon, PC Dads, PC Parents, PDCharm, Pentium, Pentium II Xeon, Pe ntium III Xeon, Pe rformance at Your Command, RemoteExpress, SmartDie, Solutions960, Sound Mark, StorageExpress, The Computer Inside., The Journey Inside, TokenExpress, VoiceBrick, VTune, and Xircom are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

NetStructureTM MPCMM0001 Chassis Management Module may contain design defects or errors known as errata which may cause the

2 MPCMM0001 Chassis Management Module Software Technical Product Spe cification

Page 3

Contents

1 Introduction....................................................................................................................................16

1.1 Overview.............................................................................................................................16

1.2 Terms Used in this Document ............................................................................................16

2 Software Specifications .................................................................................................................18

2.1 Red Hat* Embedded Debug and Bootstrap (Redboot).......................................................18

2.2 Operating System........................ ... ... .... ... ... ... ....................................... ... .... ... ... ... ... ..........18

2.3 Command Line Interface (CLI) ...........................................................................................18

2.4 SNMP/UDP.........................................................................................................................18

2.5 Remote Procedural Call (RPC) Interface............................................................................19

2.6 RMCP .................................................................................................................................19

2.7 Ethernet Interfaces .............................................................................................................19

2.8 Sensor Event Logs (SEL) ...................................................................................................19

2.8.1 CMM SEL Architecture ..........................................................................................19

2.8.2 Retrieving a SEL....................................................................................................19

2.8.3 Clearing the SEL....................................................................................................20

2.8.4 Retrieving the Raw SEL.........................................................................................20

2.9 Blade OverTemp Shutdown Script .....................................................................................20

3 Redundancy, Synchronization, and Failover.................................................................................21

3.1 Overview.............................................................................................................................21

3.2 Synchronization ..................................................................................................................21

3.3 Heterogeneous Synchronization.........................................................................................23

3.3.1 SDR/SIF Synchronization......................................................................................23

3.3.2 User Scripts Synchronization and Configuration ...................... ... .... ......................23

3.3.3 Synchronization Requirements..............................................................................24

3.4 Initial Data Synchronization................................................................................................24

3.4.1 Initial Data Sync Failure.........................................................................................24

3.5 Datasync Status Sensor .....................................................................................................25

3.5.1 Sensor bitmap........................................................................................................25

3.5.2 Event IDs ...............................................................................................................25

3.5.3 Querying the Datasync Status...............................................................................25

3.5.4 SEL Event..............................................................................................................27

3.5.5 SNMP Trap............................................................................................................27

3.5.6 System Health .......................................... ... ... .... ...................................... ... .... ... ...28

3.6 CMM Failover .....................................................................................................................28

3.6.1 Scenarios That Prevent Failover . ... ... .... ... ... ... .... ... ... ... ... .......................................28

3.6.2 Scenarios That Failover to a Healthier Standby CMM...........................................28

3.6.3 Manual Failover .....................................................................................................29

3.6.4 Scenarios That Force a Failover..... ... .......................................... .... ... ...................29

3.7 CMM Ready Event..............................................................................................................30

4 Built-In Self Test (BIST).................................................................................................................31

4.1 BIST Test Flow ...................................................................................................................31

4.2 Boot-BIST ...........................................................................................................................33

4.3 Early-BIST ..........................................................................................................................33

4.4 Mid-BIST.............................................................................................................................33

MPCMM0001 Chassis Management Module Software Technical Product Specification 3

Page 4

Contents

4.5 Late-BIST............................................................................................................................33

4.6 QuickBoot Feature..............................................................................................................34

4.6.1 Configuring QuickBoot.................................... ... .... ... .......................................... ...34

4.7 Event Log Area and Event Management............................................................................35

4.8 OS Flash Corruption Detection and Recovery Design .......................................................35

4.8.1 Monitoring the Static Images.................................................................................35

4.8.2 Monitoring the Dynamic Images............................................................................36

4.8.3 CMM Failover .................... ... ... .... ... ... ... .... ... ... .......................................................36

4.9 BIST Test Descriptions.......................................................................................................36

4.9.1 Flash Checksum Test............................................................................................36

4.9.2 Base Memory Test.................................................................................................36

4.9.3 Extended Memory Tests . ... ... ... .... ... ... .......................................... ... .......................36

4.9.4 FPGA Version Check.............................................................................................37

4.9.5 DS1307 RTC (Real-Time Clock) Test ...................................................................37

4.9.6 NIC Presence/Local PCI Bus Test.........................................................................37

4.9.7 OS Image Checksum Test........................... ... ... .... ... ... ..........................................37

4.9.8 CRC32 Checksum.................................................................................................37

4.9.9 IPMB Bus Busy/Not Ready Test............................................................................38

5 Re-enumeration.............................................................................................................................39

5.1 Overview.............................................................................................................................39

5.2 Re-enumeration on Failover...............................................................................................39

5.3 Re-enumeration of M5 FRU................................................................................................40

5.4 Resolution of EKeys .......................... ... .... ... ... ... .... ... ..........................................................40

5.5 Events Regeneration..........................................................................................................40

6 Process Monitoring and Integrity...................................................................................................41

6.1 Overview.............................................................................................................................41

6.1.1 Process Existence Monitoring ........................ ... .... ... ... ... ... .... ... ... ... .... ... ... ... ... .... ...41

6.1.2 Thread Watchdog Monitoring ............................ .... ......................................... .... ...41

6.1.3 Process Integrity Monitoring.... .... ... ... ... .... ... ... ... .... ... ... ... .......................................42

6.2 Processes Monitored..........................................................................................................42

6.3 Process Monitoring Targets...... ... ... ... .......................................... ... .... ... ... ... .... ...................42

6.4 Process Monitoring Dataitems.................................. ... ... .... ... ... ... ... .... ... .............................43

6.4.1 Examples...............................................................................................................43

6.5 SNMP MIB Commands.......................................................................................................44

6.6 Process Monitoring CMM Events .......................... ... ... ... .......................................... ... .... ...44

6.7 Failure Scenarios and Eventing..........................................................................................45

6.7.1 No Action Recovery........ ... ... ... .... .......................................... ... ... ..........................45

6.7.2 Successful Restart Recovery.................................................................................46

6.7.3 Successful Failover/Restart Recovery...................................................................47

6.7.4 Successful Failover/Reboot Recovery...................................................................48

6.7.5 Failed Failover/Reboot Recovery, Non-Critical......................................................48

6.7.6 Failed Failover/Reboot Recovery, Critical .............................................................49

6.7.7 Excessive Restarts, Escalate No Action......................... ... .... ... .............................50

6.7.8 Excessive Restarts, Successful Escalate Failover/Reboot....................................51

6.7.9 Excessive Restarts, Failed Escalate Failover/Reboot, Non-Critical ......................52

6.7.10 Excessive Restarts, Failed Escalate Failover/Reboot, Critical ..............................52

6.7.11 Process Administrative Action..... ... ... ... .... .......................................... ... ... ... ... .... ...53

6.7.12 Excessive Failover/Reboots, Administrative Action..................... ... .......................54

4 MPCMM0001 Chassis Management Module Software Technical Product Spe cification

Page 5

Contents

6.8 Process Integrity Executable (PIE).....................................................................................54

6.9 Configuring pms.ini.............................................................................................................55

6.9.1 Global Data........... .... ... ... ... ... ....................................... ... .... ... ................................55

6.9.2 Process Specific Data............................................................................................56

6.9.3 Process Definition Section of pms.ini.....................................................................58

6.10 Process Integrity Executable (PIE) Specific Data Config ...................................................64

6.10.1 PIE Section Name ....................................... ... .... ... ... ... ... .... ... ... .............................64

6.10.2 Process Integrity Executable .................................................................................65

6.10.3 Unique ID...............................................................................................................65

6.10.4 Administrative State...............................................................................................65

6.10.5 Process Integrity Interval .......................... ... ... .... ... ... ... ..........................................66

6.10.6 Chassis Applicability.......... ....................................... ... ... .... ... ... ... .... ... ... ... .............66

6.10.7 PmsPieSnmp Command Line................................................................................66

6.10.8 SNMP PIE Section of pms.ini ................................................................................66

6.11 WP/BPM PIE ......................................... ... ... ... .... ... ....................................... ... ... ... ... .... ......67

6.11.1 WP/BPM Section of pms.ini...................................................................................67

7 Power and Hot Swap Management...............................................................................................68

7.1 Hot Swap States.................................................................................................................68

7.2 FRU Insertion......................................................................................................................68

7.3 Graceful FRU Extraction........ ... ... ... ... .... ...................................... .... ... ... ... .... ... ...................68

7.4 Surprise FRU Extraction/IPMI Failure.................................................................................69

7.5 Forced Power State Changes.............................................................................................69

7.6 Power Management on the Standby CMM............ ... ... .... ... ... .............................................69

7.7 Power Feed Targets ...........................................................................................................69

7.8 Pinging IPMI Controllers........................... ... ... ....................................... ... .... ... ... ... ... .... ... ...70

8 The Command Line Interface (CLI)...............................................................................................71

8.1 CLI Overview ......................................................................................................................71

8.2 Connecting to the CLI.........................................................................................................71

8.2.1 Connecting through a Serial Port Console ............................................................71

8.3 Initial Setup— Logging in for the First Time. ... .... ... ....................................... ... ... ... ... .... ... ...72

8.3.1 Setting IP Address Properties...... ... ... .... ... ... ... .... ... .......................................... ... ...72

8.3.2 Setting a Hostname ............................... ... ... ... .... ... ... ... ... .......................................75

8.3.3 Setting the Amount of Time for Auto-Logout .........................................................75

8.3.4 Setting the Date and Time..................... ... ... ... .... ... ... ... ....................................... ...76

8.3.5 Telnet into the CMM ..............................................................................................76

8.3.6 Connect Through SSH (Secure Shell)...................................................................76

8.3.7 FTP into the CMM. ....................................... ... .... ... ... ....................................... ... ...76

8.3.8 Rebooting the CMM...............................................................................................76

8.4 CLI Command Line Syntax and Arguments .......................................................................77

8.4.1 Cmmget and Cmmset Syntax......... ... .... ... ... ... .... ......................................... .... ......77

8.4.2 Help Parameter: -h ............................................. ... ... ... ... .... ... ................................77

8.4.3 Location Parameter: -l ...........................................................................................77

8.4.4 Target Parameter: -t .............................................................................. ... ... .... ... ...78

8.4.5 Dataitem Parameter: -d ............................................... ....................................... ...80

8.4.6 Value Parameter: -v...............................................................................................97

8.4.7 Sample CLI Operations .........................................................................................97

8.5 Generating a System Status Report...................................................................................97

MPCMM0001 Chassis Management Module Software Technical Product Specification 5

Page 6

Contents

9 Resetting the Password.................................................................................................................99

9.1 Resetting the Password in a Dual CMM System............................ .... ... ... ... .... ... ... ... ..........99

9.2 Resetting the Password in a Single CMM System ...........................................................100

10 Sensor Types ......................... ... ... ... .... ...................................... .... ... ... ... ... .... ... ...........................101

10.1 CMM Sensor Types........ ... ... .... ... ... ... ... .... ... .....................................................................101

10.2 Threshold-Based Sensors................................................................................................101

10.2.1 Threshold-Based Sensor Events.........................................................................101

10.3 CMM Voltage/Temp Sensor Thresholds...........................................................................102

10.4 Discrete Sensors ............................ ... ... .... ... ... ... ...............................................................102

10.4.1 Discrete Sensor Events..... ....................................... ... ... ... .... ... ... ... .... ... ... ...........103

11 Health Events..............................................................................................................................104

11.1 Syntax of Health Event Strings.........................................................................................104

11.1.1 Healthevents Query Event Syntax.......................................................................104

11.1.2 SEL Event Syntax................................. .... ... ... ... .... ... ... ... ... .... ... ... ... .... ... ... ...........104

11.1.3 SEL Sensor Types................................ .... ... ... ... ..................................................105

11.1.4 SNMP Trap Event Syntax....................................................................................105

11.2 Sensor Targets..................... .... ... ... ... ... .... ... .....................................................................106

11.3 Healthevents Queries.......................................................................................................107

11.3.1 HealthEvents Queries for Individual Sensors. ... .... ..............................................107

11.3.2 HealthEvents Queries for All Sensors on a Location...........................................108

11.3.3 No Active Events ............... ... ... .... ... ... ... .... ...........................................................108

11.3.4 Not Present or Non-IPMI Locations............................. ... ... .... ... ... ... .....................108

11.4 List of Possible Health Event Strings................................................................................108

11.4.1 All Locations ........................................................................................................109

11.4.2 CMM Location......................................................................................................115

11.4.3 Chassis Location .................................................................................................120

11.5 IPMI Error Completion Codes...........................................................................................120

11.5.1 Configuring IPMI Error Completion Codes ........................ .... ... ... ... .... ... ... ... ... .... .121

11.5.2 IPMI/IMB Error Message Format.........................................................................121

12 Front Panel LEDs...................... ..................................................................................................123

12.1 LED Types and States......................................................................................................123

12.1.1 Alarm LEDs..................... ... ... ... ....................................... ... .... ... ... ... .... .................123

12.1.2 Health LED ..........................................................................................................124

12.1.3 Hot Swap LED.....................................................................................................124

12.1.4 User Definable LEDs...........................................................................................124

12.2 Retrieving a Location’s LED properties ............................................................................124

12.3 Retrieving Color Properties of LEDs.................................................................................124

12.4 Retrieving the State of LEDs .......................... ....................................... ... ... .... ... ... ... ... .... .125

12.5 Setting the State of the User LEDs . ... ... ............................................................................125

12.6 LED Boot Sequence.........................................................................................................126

13 Node Power Control....................................................................................................................127

13.1 Node Operational State Management..............................................................................127

13.2 Obtaining the Power State of a Board............................ .... ... ... ... ... ..................................127

13.3 Controlling the Power State of a Board ............................................................................127

13.3.1 Powering Off a Board .............. .... ... ... ... .... ... ... ... .... ... ...........................................1 27

13.3.2 Powering On a Board .................. ... .....................................................................127

6 MPCMM0001 Chassis Management Module Software Technical Product Spe cification

Page 7

Contents

13.3.3 Resetting a Board................................................................................................128

14 Electronic Keying Manager..........................................................................................................129

14.1 Point-to-Point EKeying......................................................................................................129

14.2 Bused EKeying .................................................................................................................129

14.3 EKeying CLI Commands .......... ... ... ... .... ... .......................................... ... ... ........................1 29

15 CDMs and FRU Information .......................................... ... .......................................... ... ... ... .... ....130

15.1 Chassis Data Module......... .... ...................................... .... ... ... ... ... .... ... ... ... .... ....................130

15.2 FRU/CDM Election Process .............................................................................................130

15.3 FRU Information ...............................................................................................................130

15.4 FRU Query Syntax............................................................................................................131

16 Fan Control and Monitoring.........................................................................................................132

16.1 Automatic Fan Control......................................................................................................132

16.2 Querying Fan Tray Sensors - FantrayN location ..............................................................132

16.3 Fantray Cooling Levels.....................................................................................................132

16.4 CMM Cooling Manager Temperature Status....................................................................132

16.5 CMM Cooling Table..........................................................................................................133

16.5.1 Setting Values in the Cooling Table.....................................................................133

16.6 Control Modes for Fan Trays............................................................................................134

16.6.1 CMM Control Mode..............................................................................................134

16.6.2 Fantray Control Mode..........................................................................................134

16.6.3 Emergency Shutdown Control Mode...................................................................134

16.6.4 User Initiated Mode Change................................................................................135

16.6.5 Automatic Mode Change .....................................................................................135

16.7 Getting Temperature Statuses..........................................................................................135

16.8 Fantray Properties ............................................................................................................136

16.9 Retrieving the Current Cooling Level................................................................................136

16.10 Fantray Insertion...............................................................................................................136

16.11 Default Cooling Values .....................................................................................................137

16.11.1 Vendor Defaults ...................................................................................................137

16.11.2 Structure of /etc/cmm/fantray.cfg.........................................................................138

16.11.3 Code Defaults......................................................................................................138

16.11.4 Restoring Defaults ...............................................................................................138

16.12 Firmware Upgrade/Downgrade.........................................................................................138

16.13 Chassis vs. Fantray .................................. ... ... .... ... ... ........................................................139

16.14 Legacy Method of Querying/Setting Fan Speed...............................................................139

17 SNMP..........................................................................................................................................140

17.1 CMM MIB..........................................................................................................................141

17.2 MIB Design .......................................................................................................................141

17.2.1 MIB Tree..............................................................................................................141

17.2.2 CMM MIB Objects................................................................................................142

17.3 SNMP Agent.....................................................................................................................158

17.3.1 Configuring the SNMP Agent Port.......................................................................158

17.3.2 Configuring the Agent to Respond to SNMP v3 Requests ................................. .158

17.3.3 Configuring the Agent Back to SNMP v1.............................................................159

17.3.4 Setting up an SNMP v1 MIB Browser..................................................................159

17.3.5 Setting up an SNMP v3 MIB Browser..................................................................159

17.3.6 Changing the SNMP MD5 and DES Passwords..................................................159

7 MPCMM0001 Chassis Management Module Software Technical Product Spe cification

Page 8

Contents

17.4 SNMP Trap Utility.............................................................................................................160

17.4.1 Configuring the SNMP Trap Port.........................................................................160

17.4.2 Configuring the CMM to Send SNMP v3 Traps...................................................160

17.4.3 Configuring the CMM to Send SNMP v1 Traps...................................................160

17.5 Configuring and Enabling SNMP Trap Addresses................................. ... ... .... ... ... ... ... .... .160

17.5.1 Configuring an SNMP Trap Address ...................................................................161

17.5.2 Enabling and Disabling SNMP Traps ..................................................................161

17.5.3 Alerts Using SNMP v3....... ... ... .... ........................................................................161

17.5.4 Alert Using UDP Alert..........................................................................................161

17.6 SNMP Security .................................................................................................................162

17.6.1 SNMP v1 Security................................................................................................162

17.6.2 SNMP v3 Security - Authentication Protocol and Privacy Protocol .....................162

17.7 SNMP Trap Descriptions..................................................................................................162

17.8 Snmpd.conf File. ... ... .... ... ... ....................................... ... ... .... ... ... ... ... .... ... ... ........................163

18 CMM Scripting.... ... .... ... ... ... .... ... ....................................... ... ... ... .... ... ... ... .....................................164

18.1 CLI Scripting.....................................................................................................................164

18.1.1 Script Synchronization................................. ... ... .... ... ... ... ... .... ... ... ... .....................164

18.2 Event Scripting................................ ... ... .... ... ... ... .... ... ....................................... ... ... ... ........164

18.2.1 Listing Scripts Associated With Events................................................................165

18.2.2 Removing Scripts From an Associated Event .......... ...................................... .... .165

18.3 Setting Scripts for Specific Individual Events....................................................................165

18.3.1 Event Codes............ ....................................... ... .... ... ... ... ... .... ... ... ... .....................165

18.3.2 Setting Event Action Scripts ................................................................................166

18.4 Running CMM Event Scripts on CMM State Transitions

(Active/Standby/Ready/Not Ready).......................................... ... ... .... ... ... ... .....................166

18.4.1 Sensor Data Bits............................. ... ... .... ... ... ... .... ... ...........................................1 66

18.4.2 Retrieving the Value of the Data Sensor Bits ......................................................167

18.4.3 CMMReadyTimeout Value...................................................................................168

18.4.4 CMM State Transition Model....... ... ... ... .... ... ... ... .... ... ... ... ... .... ... ...........................168

18.5 FRU Control Script............................................................................................................169

18.5.1 Command line arguments....................................................................................170

18.5.2 Sample frucontrol file......................................... .... ... .......................................... .170

19 Remote Procedure Calls (RPC) ..................................................................................................174

19.1 Setting Up the RPC Interface ...........................................................................................174

19.2 Using the RPC Interface...................................................................................................174

19.2.1 GetAuthCapability() .............................................................................................175

19.2.2 ChassisManagementApi() ...................................................................................175

19.2.3 ChassisManagementApi() Threshold Response Format.....................................181

19.2.4 ChassisManagementApi() String Response Format ...... ... ..................................181

19.2.5 ChassisManagementApi() Integer Response Format..........................................185

19.2.6 FRU String Response Format .............................................................................186

19.3 RPC Sample Code ...........................................................................................................187

19.4 RPC Usage Examples......................................................................................................187

20 RMCP..........................................................................................................................................190

20.1 RMCP References............................................................................................................190

20.2 RMCP Modes ...................................................................................................................190

20.3 RMCP User Privilege Levels ............................................................................................191

20.4 RMCP Discovery ..............................................................................................................191

8 MPCMM0001 Chassis Management Module Software Technical Product Spe cification

Page 9

Contents

20.5 RMCP Session Activation.................. .... .......................................... ... ... ...........................191

20.6 RMCP Port Numbers........................................................................................................192

20.7 IPMB Slave Addresses.....................................................................................................193

20.8 CMM RMCP Configuration ...............................................................................................193

20.9 IPMI Commands Supported by CMM RMCP ........ ... ... .....................................................194

20.10 Configuring IPMI Command Privileges.............................................................................196

20.10.1 Sample cmdPrivillege.ini file................................................................................197

20.11 Completion Codes for the RMCP Messages....................................................................197

21 Command and Error Logging......................................................................................................199

21.1 Command Logging ...........................................................................................................199

21.2 Error Logging....................................................................................................................199

21.2.1 Error.log File ........................................................................................................199

21.2.2 Debug.log File......................................................................................................199

21.3 Cmmdump Utility ......... ... ... .... ...................................... .... ... ... ... ... .... ... ... ... .... ....................200

22 Application Hosting......................................................................................................................201

22.1 System Details................ ... .... ...................................... .... ... ... ... ... .... .................................201

22.2 Startup and Shutdown Scripts ..........................................................................................201

22.3 System Resources Available to User Applications...........................................................201

22.3.1 File System Storage Constraints .........................................................................201

22.3.2 RAM Constraints..................................................................................................202

22.3.3 Interrupt Constraints ............................................................................................203

22.4 RAM Disk Directory Structure...........................................................................................203

23 Updating CMM Software .............. .... ... .......................................... ... ... ... .... .................................204

23.1 Key Features of the Firmware Update Process................................................................204

23.2 Update Process Architecture............................................................................................204

23.3 Critical Software Update Files and Directories .................................................................205

23.4 Update Package...............................................................................................................205

23.4.1 Update Package File Validation................................... ... .....................................206

23.4.2 Update Firmware Package Version.....................................................................207

23.4.3 Component Versioning ........................................................................................207

23.5 saveList and Data Preservation........................................................................................207

23.6 Update Mode....................................................................................................................208

23.7 Update_Metadata File ......................................................................................................209

23.8 Firmware Update Synchronization/Failover Support ....................................... ... ... ... .... ... .209

23.9 Automatic/Manual Failover Configuration.........................................................................209

23.9.1 Setting Failover Configuration Flag .....................................................................210

23.9.2 Retrieving the Failover Configuration Flag...........................................................210

23.10 Single CMM System .........................................................................................................210

23.11 Redundant CMM Systems................................................................................................210

23.12 CLI Software Update Procedure.......................................................................................210

23.13 Hooks for User Scripts......................................................................................................211

23.13.1 Update Mode User Scripts...................................................................................211

23.13.2 Data Restore User Scripts...................................................................................212

23.13.3 Example Task—Replace /home/scripts/myScript................................................212

23.14 Update Process ................................................................................................................213

23.15 Update Process Status and Logging ................................................................................215

23.16 Update Process Sensor and SEL Events.........................................................................215

23.17 Redboot* Update Process................................................................................................215

MPCMM0001 Chassis Management Module Software Technical Product Specification 9

Page 10

Contents

23.17.1 Required Set up..................................... .... ... ... ... .... ...................................... ... .... .215

23.17.2 Update Procedure................................................................................................215

24 Updating Shelf Components........................................................................................................217

25 IPMI Pass-Through......................................................................................................................218

25.1 Overview...........................................................................................................................218

25.2 Command Syntax and Interface.......................................................................................218

25.2.1 Command Request String Format.......................................................................218

25.2.2 Response String...... .... ... ... ... ... ....................................... ... .... ... ... ... .... ... ... ... ... .....2 19

25.2.3 Usage Examples..................................................................................................219

25.3 SNMP ...............................................................................................................................219

25.3.1 Usage Example ...................................................................................................219

26 FRU Update Utility.......................................................................................................................221

26.1 Overview...........................................................................................................................221

26.2 FRU Update Architecture..................................................................................................221

26.3 FRU Update Process........................................................................................................222

26.4 FRU Recovery Process....................................................................................................222

26.5 FRU Verification................................................................................................................223

26.6 FRU Display......................................................................................................................223

26.7 Setting the Library Path And Invoking the Utility...............................................................223

26.8 FRU Update Command Line Interface .............................................................................223

26.9 Using the Location Switch ................................................................................................224

26.10 Updating the FRU.............................................................................................................225

26.11 Getting the Inventory ........................................................................................................225

26.12 Viewing the Contents of the FRU .....................................................................................225

26.13 Getting the Contents of the FRU ......................................................................................225

26.14 Dumping the Contents of the FRU....................................................................................225

27 FRU Update Configuration File ...................................................................................................227

27.1 Configuration File Format.................................................................................................227

27.2 File Format........................................................................................................................227

27.3 String Constraints.............................................................................................................227

27.4 Numeric Constraints.........................................................................................................228

27.5 Tags..................................................................................................................................228

27.6 Control Commands...... ... ... ... .... ... ... .......................................... ... .....................................228

27.6.1 IFSET...................................................................................................................228

27.6.2 ELSE....................................................................................................................229

27.6.3 ENDIF..................................................................................................................229

27.6.4 SET......................................................................................................................229

27.6.5 CLEAR.................................................................................................................230

27.6.6 CFGNAME...........................................................................................................230

27.6.7 ERRORLEVEL.....................................................................................................230

27.7 Probing Commands..........................................................................................................230

27.7.1 PROBE................................................................................................................230

27.7.2 SYSTEM..............................................................................................................231

27.7.3 FRUVER..............................................................................................................231

27.7.4 BMCVER .............................................................................................................232

27.7.5 FOUND................................................................................................................232

27.8 Update Commands ..........................................................................................................233

10 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 11

Contents

27.8.1 FRUNAME...........................................................................................................233

27.8.2 FRUADDRESS....................................................................................................234

27.8.3 FRUAREA............................................................................................................234

27.8.4 MULTIREC ..........................................................................................................235

27.8.5 FRUFIELD ...........................................................................................................236

27.8.6 Input of Data ........................................................................................................240

27.9 Display Commands...........................................................................................................240

27.9.1 DISPLAY..............................................................................................................241

27.9.2 CONFIGURATION...............................................................................................241

27.9.3 Input Commands .. ...............................................................................................241

27.9.4 MENU ..................................................................................................................241

27.9.5 MENUTITLE ........................................................................................................242

27.9.6 MENUPROMPT...................................................................................................242

27.9.7 PROMPT .............................................................................................................242

27.9.8 YES......................................................................................................................243

27.9.9 NO .......................................................................................................................243

27.10 Command Quick Reference .............................................................................................243

27.11 Example Configuration File...............................................................................................246

27.11.1 Chassis Update Version 0 ...................................................................................246

27.11.2 Chassis Update Version 1 ...................................................................................249

28 Unrecognized Sensor Types.......................................................................................................253

28.1 System Events Overview................ ... .......................................... .... .................................253

28.2 System Events— SNMP Trap Support........................ .... ... ... ... ... .... ... ... ... .... ... ... ... ... ........254

28.2.1 SNMP Trap Header Format.................................................................................254

28.2.2 SNMP Trap ATCA Trap Text Translation Format................................................254

28.3 SNMP Trap Raw Format ..................................................................................................255

28.3.1 SNMP Trap Control .............................................................................................256

28.3.2 System Events— SEL Support............................................................................256

28.3.3 Configuring SEL Format ......................................................................................257

29 Warranty Information...................................................................................................................259

29.1 Intel

NetStructure™ Compute Boards and Platform Products Limited Warranty ...........259

29.2 Returning a Defective Product (RMA) ..............................................................................259

29.3 For the Americas ..............................................................................................................260

29.3.1 For Europe, Middle East, and Africa (EMEA) .............................. .... ... ... ... ... .... ... .260

29.3.2 For Asia and Pacific (APAC)................................................................................260

30 Customer Support .......................................................................................................................262

30.1 Customer Support.............................................................................................................262

30.2 Technical Support and Return for Service Assistance .....................................................262

30.3 Sales Assistance ...... ... ... ... .... ...........................................................................................262

31 Certifications................................................................................................................................263

32 Agency Information................. .....................................................................................................264

32.1 North America (FCC Class A)...... ... ... .... ...................................... .... ... ... ... .... ... ... ... ... .... ....264

32.2 Canada – Industry Canada (ICES-003 Class A) (English and French-translated below).264

32.3 Safety Instructions (English and French-translated below) ..............................................265

32.3.1 English.................................................................................................................265

32.3.2 French..................................................................................................................265

MPCMM0001 Chassis Management Module Software Technical Product Specification 11

Page 12

Contents

32.4 Taiwan Class A Warning Statement.................................................................................266

32.5 Japan VCCI Class A....................................................... .... ... .......................................... .266

32.6 Korean Class A.................................................................................................................266

32.7 Australia, New Zealand.....................................................................................................266

33 Safety Warnings..........................................................................................................................267

33.1 Mesures de Sécurité.........................................................................................................268

33.2 Sicherheitshinweise..........................................................................................................270

33.3 Norme di Sicurezza .............................. ............................................................................272

33.4 Instrucciones de Seguridad..............................................................................................274

33.5 Chinese Safety Warning...................................................................................................276

Figures

1 BIST Flow Chart .........................................................................................................................32

2 Timing of BIST Stages................................................................................................................34

3 High Level SNMP/MIB Layout..................................................................................................140

4 CMM Custom MIB Tree............................................................................................................142

5 CMM Status State Diagram................. ... ... ... .... ... ... ... .... ... ... ... .... ... .......................................... .169

6 SNMPTrapFormat = 1 ..............................................................................................................255

7 SNMPTrapFormat = 2 ..............................................................................................................255

8 SNMPTrapFormat = 3 ..............................................................................................................255

Tables

1 Glossary .....................................................................................................................................16

2 CMM Synchronization ................................................................................................................22

3 CMM Status Event Strings (CMM Status) ..................................................................................30

4 BIST Implementation..................................................................................................................32

5 Processes Monitored.................... .... ... ... ... ... ..............................................................................42

6 No Action Recovery....................................................................................................................46

7 Successful Restart Recovery . ... ... .... ... ... .......................................... ... .......................................46

8 Successful Failover/Restart Recovery.... ... ... .... ..........................................................................47

9 Successful Failover/Reboot Recovery. ... ... ... .... ... ... .......................................... ... .... ... ... ... ... .......48

10 Failed Failover/Reboot Recovery, Non-Critical ..........................................................................49

11 Failed Failover/Reboot Recovery, Critical ..................................................................................50

12 Existence Fault, Excessive Restarts, Escalate No Action..........................................................50

13 Excessive Restarts, Successful Escalate Failover/Reboot ........................................................51

14 Excessive Restarts, Failed Escalate Failover/Reboot, Non-Critical .............................. ... ... .... ...52

15 Excessive Restarts, Failed Escalate Failover/Reboot, Critical................... ... ... ... .... ... ... ... ... .... ...53

16 Administrative Action..................................................................................................................53

17 Excessive Failover/Reboots, Administrative Action....................................................................54

18 Time to Delay and Number of Attempts .....................................................................................70

19 SETIP Interface Assignments when BOOTPROTO=”static” ......................................................74

20 SETIP Interface Assignments when BOOTPROTO=”dhcp”.......................................................75

21 Location (-l) Keywords................................................................................................................77

22 CMM Targets..............................................................................................................................79

23 Dataitem Keywords for All Locations..........................................................................................80

12 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 13

Contents

24 Dataitem Keywords for All Locations Except System.................................................................80

25 Dataitem Keywords for All Locations Except Chassis and System ............................................81

26 Dataitem Keywords for Chassis Location...................................................................................85

27 Dataitem Keywords for Cmm Location.......................................................................................86

28 Dataitem Keywords for System Location....................................................................................92

29 Dataitem Keywords for FantrayN Location.................................... ... ... .......................................93

30 Dataitem Keywords Used with the Target Parameter.................................................................94

31 CMM Voltage and Temp Sensor Thresholds................................. ... ... .... ... ... ... .... ... ... ... ... .... ... .102

32 CMM SEL Sensor Information..................................................................................................105

33 Sensor Targets.........................................................................................................................106

34 Threshold-Based Sensors: Voltage, Temp, Current, Fan.........................................................109

35 Hot Swap Sensor: Filter Tray HS, FRU Hot Swap....................................................................110

36 IPMB Link State Sensor: IPMB-0 Snsr [1-16]...........................................................................110

37 System Firmware Progress Event Strings (System Firmware Progress) .................................111

38 Watchdog 2 Sensor Event Strings............................................................................................113

39 CMM Redundancy....................................................................................................................115

40 CMM Trap Connectivity (CMM [1-2] Trap Conn)......................................................................115

41 CMM Failover ...........................................................................................................................115

42 CMM Synchronization...............................................................................................................116

43 BIST Event Strings .................... .... ... ... ... ... .... ... ... ... .... ... ... ... .......................................... ...........117

44 Chassis Data Module (CDM [1,2])............................................................................................118

45 Datasync Status............................. ... ... .......................................... ... ........................................118

46 CMM Status Event Strings (CMM Status) ................................................................................118

47 Process Monitoring Service Fault Event Strings (PMS Fault) ..................................................119

48 Process Monitoring Service Info Event Strings (PMS Info) ......................................................120

49 Chassis Events.........................................................................................................................120

50 IPMI Error Completion Codes and Enumerations.....................................................................121

51 System Health LED States.......................................................................................................123

52 CMM Health LED States...........................................................................................................124

53 CMM Hot Swap LED States ...................... .... ... ... ... .... ... ... ... .... ... ... ... ... .... ... ... ... ........................124

54 Ledstate Functions and Function Options................................................................................125

55 LED Event Sequence ...............................................................................................................126

56 Dataitems Used With FRU Target (-t) to Obtain FRU Information............................................131

57 CMM Cooling Table..................................................................................................................133

58 MIB II Objects - System Group.. .......................................... .... ... ... ... ... .... .................................141

59 MIB II - Interface Group................. ... ... ... ... .... ... ... .......................................... ... .... ... ... ... ... ........141

60 System Location (1.3.6.1.4.1.343.2.14.2.10.1).........................................................................143

61 Shelf Location (Equivalent to Chassis) (1.3.6.1.4.1.343.2.14.2.10.2).......................................144

62 ShelfTable/shelfEntry (1.3.6.1.4.1.343.2.14.2.10.2.50.1).........................................................144

63 Cmm Location (1.3.6.1.4.1.343.2.14.2.10.3)............................................................................146

64 CmmTable/cmmEntry (1.3.6.1.4.1.343.2.14.2.10.3.51.1).........................................................149

65 CmmFruTable/cmmFruEntry (1.3.6.1.4.1.343.2.14.2.10.3.52.1)..............................................151

66 CmmFruTargetTable (1.3.6.1.4.1.343.2.14.2.10.3.53.1)..........................................................151

67 CmmPmsTable/cmmPmsEntry (1.3.6.1.4.1.343.2.14.2.10.3.54.1)..........................................151

68 Blade# Location (1.3.6.1.4.1.343.2.14.2.10.4.[1-16]) ...............................................................152

69 Blade#TargetTable/blade#TargetEntry (1.3.6.1.4.1.343.2.14.2.10.4.[1-16].51.1)....................153

70 Blade#FruTable/blade#FruEntry (1.3.6.1.4.1.343.2.14.2.10.4.[1-16].52.1)..............................154

71 Blade#FruTargetTable/blade#FruTargetEntry (1.3.6.1.4.1.343.2.14.2.10.4.[1-16].53.1).........155

72 [FanTray/pem]Table/[fanTray/pem]Entry (1.3.6.1.4.1.343.2.14.2.10.[5/6].51.1) ......................155

73 [FanTray/pem]TargetTable/[fanTray/pem]TargetEntry (1.3.6.1.4.1.343.2.14.2.10.[5/6].52.1)..156

MPCMM0001 Chassis Management Module Software Technical Product Specification 13

Page 14

Contents

74 [FanTray/pem]FruTable/[fanTray/pem]FruEntry (1.3.6.1.4.1.343.2.14.2.10.[5/6].53.1) ...........157

75 [FanTray/pem]FruTargetTable/[fanTray/pem]FruTargetEntry

(1.3.6.1.4.1.343.2.14.2.10.[5/6].54.1).......................................................................................158

76 SNMP v3 Security Fields For Traps .........................................................................................162

77 SNMP v3 Security Fields For Queries......................................................................................162

78 CMM State Transition Events and Event IDs ...........................................................................166

79 CMM Status Sensor Data Bits................... ... .... ... ... ... .... ... ... ... .... ... ... ... .....................................167

80 Error and Return Codes for the RPC Interface.........................................................................177

81 Threshold Response Formats ....................................... ... ... ... .... ......................................... .... .181

82 String Response Formats.........................................................................................................181

83 Integer Response Formats.......................................................................................................185

84 FRU Data Items String Response Format................................................................................186

85 RPC Usage Examples...................... ... ... ... ... .... ... ... ... .... .......................................... ... ..............187

86 RMCP Modes ...................................... ... ... ... .... ... ....................................... ... ... ... .... ... ... ...........190

87 RMCP Session Timers ............................................................................... ... ...........................192

88 RMCP Slave Addresses......................................................... .... ... ... ... ... .... ..............................193

89 IPMI Commands Supported by CMM RMCP...........................................................................194

90 RMCP Message Completion Codes.........................................................................................198

91 Flash #1....................................................................................................................................202

92 Flash #2....................................................................................................................................202

93 Flash #3....................................................................................................................................202

94 Flash #4....................................................................................................................................202

95 List of Critical Software Update Files and Directories ..............................................................205

96 Contents of the Update Package..............................................................................................206

97 SaveList Items and Their Priorities............... ............................................................................208

98 CMM Update Directions ...........................................................................................................209

99 Platform FRU Accessibility of the FRU Update Utility ..............................................................221

100 FruUpdate Utility Command Line Options................................................................................224

101 Probe Command Parameters....................... ............................................................................231

102 FRU Area String Specifications................................................................................................235

103 Multi-Record Selection Parameters..........................................................................................236

104 FRU Field First String Specifications........................................................................................237

105 FRU Field Maximum Allowed Lengths .....................................................................................237

106 FRU Field Second String Specification ....................................................................................238

107 Type Code Specification...........................................................................................................239

108 Command Quick Reference.....................................................................................................243

109 Probe Arguments Quick Reference..........................................................................................246

110 Results of Variable Settings ......... .... ... ... ... ... .... ... ... .......................................... ... .... ... ... ...........256

111 Example CLI Commands..........................................................................................................277

14 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 15

Revision History

Date Revision Description

April 2005 007 Firmware version 5.2

August 2004 006 Firmware version 5.1.0.757

April 2004 005

January 2004 004.1 Version 4.1 TPS

Contents

Version 5.1 TPS Added Re-Enumeration Section Added Process Monitoring Section

MPCMM0001 Chassis Management Module Software Technical Product Specification 15

Page 16

Introduction

Introduction 1

1.1 Overview

The Intel® NetStructureTM MPCMM0001 Chassis Management Module is a 4U, single-slot CMM intended for use with AdvancedTCA* PICMG* 3.0 platforms. This document details the software features and specifications of the CMM. For information on hardware features for the CMM refer to the Intel specifications and other material can be found in Appendix B , “Data Sheet Reference.”

The CMM plugs into a dedicated slot in compatible systems. It provides centralized management and alarming for up to 16 node and/or fabric slots as well as for system power supplies, fans and power entry modules. The CMM may be paired with a backup for redundant use in highavailability applications.

The CMM is a special purpose single board computer (SBC) with its own CPU, memory, PCI bus, operating system, and peripherals. The CMM monitors and configures IPMI-based components in the chassis. When thresholds (such as temperature and voltage) are crossed or a failure occurs, the CMM captures these events, stores them in an event log, sends SNMP traps, and drives the Telco alarm relays and alarm LEDs. The CMM can query FRU information (such as serial number, model number, manufacture date, etc.), detect presence of components (such as fan tray, CPU board, etc.), perform health monitoring of each component, control the power-up sequencing of each device, and control power to each slot via IPMI.

NetStructure™ MPCMM0001 Hardware Technical Product Specification. Links to

Assumptions: This document assumes some basic Linux* knowledge and the ability to use Linux text editors such as vi.

1.2 Terms Used in this Document

Table 1. Glossary (Sheet 1 of 2)

Acronym Description

BIST Built-In Self Test CDM Chassis Data Module CLI Command Line Interface CMM Chassis Management Module DHCP Dynamic Host Configuration Protocol FFS Flash File System FIS Flash Image System FPGA Field-Programmable Gate Arrays FRU Field Replaceable Unit HS Hot Swap IPMI Intelligent Platform Management IPMB Intelligent Platform Management Bus

16 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 17

Table 1. Glossary (Sheet 2 of 2)

Acronym Description

IPMI Intelligent Platform Management Interface LED Light Emitting Diode MIB Management Information Base

MIB II PEM Power Entry Module

PICMG PCI Industrial Computer Manufacturers’ Group RMCP Remote Management Control Protocol RPC Remote Procedural Calls SBC Single Board Computer SDR Sensor Data Record SEL System Event Log ShMC Shelf Management Controller SNMP Simple Network Management Protocol SSH Secure Socket Shell TFTP Trivial File Transfer Protocol UDP User Datagram Protocol WDT Watchdog Timer

RFC1213 - A standard Management Information Base for Network Management

Introduction

MPCMM0001 Chassis Management Module Software Technical Product Specification 17

Page 18

Software Specifications

Software Specifications 2

2.1 Red Hat* Embedded Debug and Bootstrap (Redboot)

Upon initial power on, the CMM enters into the Redboot firmware to bootstrap the embedded environment. Upon execution, Redboot acts as a TFTP server and checks for a TFTP connection to a client. If a TFTP connection exists, Redboot will accept a firmware update that is pushed down from the client, check the firmware update for data integrity, and then write the update to the flash.

Note: Firmware updates using the Redboot TFTP method are supported for backwards compatibility.

However, updating from within the OS using the CLI is the preferred method of updating CMM firmware. For information on the firmware update process refe r to Section 23, “Updating CMM

Software” on page 204.

Under normal circumstances, Redboot runs through the standard diagnostics, memory setup, decompresses the OS kernel, and boots into that kernel.

2.2 Operating System

The CMM runs a customized version of embedded BlueCat* Linux* 4.0 on an Intel® 80321 processor with Intel the web at http://www.lynuxworks.com.

XScale® technology. Development support for BlueCat Linux is available on

2.3 Command Line Interface (CLI)

The Command Line Interface (CLI) connects to and communicates with the intelligent management devices of the chassis, boards, and the CMM itself. The CLI is an IPMI-based library of commands that can be accessed directly or through a higher-level management application. Administrators can access the CLI through Telnet, SSH, or the CMM’s serial port. Using the CLI, users can access information about the current state of the system including current sensor values, threshold settings, recent events, and overall chassis health, access and modify shelf and CMM configurations, set fan speeds, perform actions on a FRU, etc. The CLI is covered in Section 8,

“The Command Line Interface (CLI)” on page 71.

2.4 SNMP/UDP

The chassis management module supports both queries and traps on SNMP (Simple Network Management Protocol) v1 or v3. The SNMP version can be configured through the CLI interface. The default is for SNMP v1. A MIB for the entire platform is included with the CMM. The CMM can send out SNMP traps to up to five trap receivers.

Along with SNMP traps, the CMM sends UDP (User Datagram Protocol) alerts to port 10000. The content of these UDP alerts is the same as the SNMP traps. SNMP is covered in Section 17,

“SNMP” on page 140.

18 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 19

2.5 Remote Procedural Call (RPC) Interface

In addition to the console command-line interface, the CMM can be administered by custom remote applications via remote procedure calls (RPC). RPC is covered in Section 19, “Remote

Procedure Calls (RPC)” on page 174.

2.6 RMCP

RMCP (Remote Management Control Protocol) is a protocol that defines a method to send IPMI packets over LAN. The RMCP server on the CMM can decode RMCP packages and forward the IPMI messages to the appropriate channels including: SBC blades, PEMs, and FanTrays or local destination within the CMM. When there is a responding IPMI message coming from SBC blades, PEMs, or FanTrays destined to RMCP client, the RMCP server will format this IPMI message into a RMCP message and send it to through the designated LAN interface back to originator. RMCP is covered in Section 20, “RMCP” on page 190.

2.7 Ethernet Interfaces

The CMM contains two Ethernet ports. The software can configure each of these ports to either the front panel, to the backplane, or to the rear transition module (RTM). Information on configuring the Ethernet interfaces is covered in Section 8.3.1, “Setting IP Address Properties” on page 72.

Software Specifications

2.8 Sensor Event Logs (SEL)

The AdvancedTCA CMM implements system event logs according to Section 3.5 of the PICMG

3.0 Specification. The SEL contained on the CMM is fully IPMI compliant.

2.8.1 CMM SEL Architecture

The MPCMM0001 uses a single flat SEL file stored locally in the /etc/cmm directory. The SEL maintains a list of all the sensor events in the shelf. Each of the managed devices may keep its own SEL records in local SELs, but the master copy for the shelf is maintained by the CMM.

The SEL is limited to 65536 bytes. In order to keep the SEL from getting full, which can cause loss of error logging, the SEL is checked every 15 minutes by the CMM, and if the size of the cmm_sel is greater than 40000 bytes, the SEL is archived in gzip format and saved in /home/log/SEL. The names of the saved logs will be cmm_sel.0.gz, cmm_sel.1.gz, and so on, to a maximum of 16 logs where they are then rolled over.

Note: Archived files should NEVER be decompressed on the CMM as the resulting prolonged flash file

writing could disrupt normal CMM operation and behavior. Using FTP, transfer the files to a different system before decompressing the archive using utilities such as gzip.

2.8.2 Retrieving a SEL

To retrieve a SEL from the CMM, issues the following command:

cmmget [-l location] -d sel

MPCMM0001 Chassis Management Module Software Technical Product Specification 19

Page 20

Software Specifications

Where location is one of {cmm, blade[1-14], fantray1, PEM[1-2]}. Even though the CMM uses a single flat SEL for system events, the ‘cmmget’ command will filter the SEL and only return events associated with the provided location. Also, some individual FRUs may keep their own local SELs (i.e., blades).

2.8.3 Clearing the SEL

The following command will clear the SEL on both the active and the standby:

cmmset -d clearsel -v clear

Note: Since the CMM uses a single flat SEL for system events, this command clears the entire shelf SEL,

not just a filtered subset.

2.8.4 Retrieving the Raw SEL

To retrieve the SEL in its raw format from a location, issue the following command:

cmmget -l [location] -d rawsel

2.9 Blade OverTemp Shutdown Script

The CMM software includes predefined script settings specifically for the MPCBL0001 board, which will automatically shut down a board when the “baseboard temp” sensor on that board crosses the upper critical threshold. This is done to prevent a runaway thermal event on the board from occurring. If this functionality is needed when using boards other than the MPCBL0001, the user will need to associate the name of the thermal sensor and the threshold with the board shutdown script:

cmmset -l bladeN -d majoraction -t [temp sensor name] -v overtempbladepoweroff [Blade Number]

Please refer to Section 18, “CMM Scripting” on page 164 for more information on assocating a script to an event.

When using the CMM with boards other than the MPCBL0001, as long as there is no sensor name titled "baseboard temp" associated with the particular board being used, then there is no issue leaving these settings intact. If needed, to deactivate these settings for each physical slot, use the command:

cmmset -l bladeN -d majoraction -t “baseboard temp” -v none

where bladeN is the blade, corresponding to the physical slot number, on which to remove the automatic shutdown setting (blade[1-16]). Please refer to Section 18, “CMM Scripting” on

page 164 for more information on removing script actions.

20 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 21

Redundancy, Synchronization, and Failover

Redundancy, Synchronization, and

Failover 3

3.1 Overview

The CMM supports redundant operation with automatic failover in a chassis using redundant CMM slots. In systems where two CMMs are present, one acts as the active shelf manager and the other as standby. Both CMMs monitor each other, and either can trigger a failover if necessary.

Data from the active CMM is synchronized to the standby CMM whenever any changes occur. Data on the standby CMM is overwritten. A full synchronization between active and standby CMMs occurs on initial power up, or any insertion of a new CMM.

The active CMM is responsible for shelf FRU information management when CMMs are in redundant mode.

3.2 Synchronization

To ensure critical files on the standby CMM match the data on the active CMM, the active CMM synchronizes its data with the standby CMM, overwriting any existing data on the standby CMM.

An exception to this is the password reset procedure, detailed in Section 9, “Resetting the

Password” on page 99. When the password reset switch is activated on the standby CMM, the

password will be synchronized to the active CMM. The CMMs will initially fully synchronize data from the active to the standby CMM just after

booting. An insertion of a new CMM will also cause a full synchronization from the active to the newly inserted standby. Date and time are synched every hour. Partial synchronization will also occur any time files are modified or touched via the Linux* “touch” command with the exception of all *.sif and *.bin files in the /etc/cmm directory.

The *.sif (ALL SIF files), and *.bin (SDR Files) files under /etc/cmm are synchronized only once (when the CMMs establish communication). A 'touch' on those files at any later time will not perform a sync operation. Also, any updates to these files always happen as part of the software updates and not in isolation.

Note: During synchronization, the health event LEDs on the standby CMM may blink on and off as the

health events that were logged in the SEL are synchronized. Below is a list of items that are synchronized between CMMs. During a full synchronization, all of

these files and data are synchronized. A change to any of these files results in that file being synched. The active CMM overwrites these files on the standby CMM.

There are two "levels" of files that get synchronized. In order to normally manage the chassis, the priority 1 files must be synchronized after power up or installation of a brand new CMM into the chassis. It is absolutely necessary that a standby CMM has the priority one files synched before a successful failover can occur. When a brand new CMM boots the first time as a standby, if a CMM

MPCMM0001 Chassis Management Module Software Technical Product Specification 21

Page 22

Redundancy, Synchronization, and Failover

failover is forced before all priority 1 data items are synchronized to the standby CMM, the standby CMM can still become the active CMM but may not be able to properly manage the FRUs in the chassis.

Table 2. CMM Synchronization (Sheet 1 of 2)

File(s) or Data Description Path Priority

date and time Date and time IPMB 1

IP Address Settings

/etc/cmm.cfg CMM’s main configuration file Ethernet 1 /etc/cmm/cmm_sel System SEL Ethernet 1 /etc/cmm/sensors.ini Sensor Set Values Ethernet 1 Ekey Controller Structures Ekey Controller Structures Ethernet 1 Bused EKey Token info Bused EKey Token info Ethernet 1 IPMB User States IPMB User States Ethernet 1 Fan States Fan States Ethernet 1 Cooling State Cooling State Information Ethernet 1 User LED States User LED States Ethernet 1 SDR structures and SIPI Controller Info SDR structures and SIPI Controller Info Ethernet 1 PHM FRU state, Power Usage and

Power Info FIM FRU Cache (Local and Temp) FIM FRU Cache (Local and Temp) Ethernet 1 SEL Time SEL Time IPMB 1 SEL Events Individual SEL Events IPMB 1 /etc/cmm/fantray.cfg Fantray settings needed by cooling manager Ethernet 1

/etc/cmm.ini /etc/passwd Password file Ethernet 2

/etc/shadow Password file Ethernet 2 /etc/cmdPrivillege.ini /etc/cmm/*.bin All SDR Files Ethernet 2

/etc/cmm/*.sif All SIF Files Ethernet 2 /etc/var/snmpd.conf SNMP configuration files Ethernet 2 /etc/snmpd.conf SNMP configuration files Ethernet 2 /home/scripts Entire user scripts area Ethernet 2 Prompt file Prompt file Ethernet 2 /etc/actionscripts.cfg Event action settings Ethernet 2

CMM eth1, eth1:1, and eth0 IP address settings to allow CMMs to discover the other’s IP information.

PHM FRU state, Power Usage and Power Info

Provides configuration values like the bus mapping

Provides privilege related configuration values for RMCP

IPMB 1

Ethernet 1

Ethernet 2

22 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 23

Redundancy, Synchronization, and Failover

Table 2. CMM Synchronization (Sheet 2 of 2)

File(s) or Data Description Path Priority

Issues files Issues files Ethernet 2

/usr/local/cmm/temp/pmssync.ini

/usr/local/cmm/temp/pmsshadowsync.ini

Recovery Action and escalation action for all the monitored processes except monitor process

Recovery action and escalation action for monitor process

Note: The /.rhosts file is used for synchronization and should NEVER be modified.

3.3 Heterogeneous Synchronization

Beginning in version 5.2 firmware, the CMM can synchronize data between differing CMM versions. The firmware delineates synchronization from firmware versioning, thus allowing seamless synchronization between all CMM versions. A form of internal data versioning maintained by the CMM helps achieve this.

Note: SDR/SIF and user scripts differ slightly in synchronization architecture as described below .

Ethernet 2

3.3.1 SDR/SIF Synchronization

Sensor Data Records (SDRs) and Sensor Information Files (SIFs) will be synchronized only between CMMs having the same version for this data item (even if the CMM firmware versions differ).

3.3.2 User Scripts Synchronization and Configuration

By default, user scripts are synchronized only between CMM’s with same firmware versions. User can control the user scripts synchronization irrespective of CMM version differences by modifying the value of a configuration flag - "SyncUserScripts" (in the CMM configuration file, cmm.cfg under /etc). The configuration flag can be modified using the cmmget/cmmset commands. This flag can be read/set through any of the CMM interfaces (i.e., CLI, SNMP and RPC).

Only when CMM firmware versions differ will the value of this flag determines if user scripts should be synchronized or not. Between same firmware versions, the user scripts directory will continue to be synchronized and this flag ignored.

3.3.2.1 Setting User Scripts Sync Configuration Flag

T o set the value of the Scripts Synchronization configuration flag, the following CMM command is used:

cmmget -l cmm -d syncuserscripts -v [equal/upgrade/downgrade/always]

Where: equal: Synchronizes user scripts only when the CMM versions are same. This is the default value.

MPCMM0001 Chassis Management Module Software Technical Product Specification 23

Page 24

Redundancy, Synchronization, and Failover

upgrade: Synchronizes user scripts only when the other CMM has a newer firmware version. downgrade: Synchronizes user scripts only when the other CMM has an older firmware version. always: Synchronizes user scripts irrespective of version differences.

3.3.2.2 Retrieving User Scripts Sync Configuration Flag

To retrieve the value of the Scripts Synchronization configuration flag, the following CMM command is used:

cmmget -l cmm -d syncuserscripts

The value returned will be one of: Equal, Upgrade, Downgrade, Always, or Error on failure.

3.3.3 Synchronization Requirements

For synchronization to occur:

• The CMMs must be able to communicate with each other over their dedicated IPMB. The

CMMs use a heartbeat via their dedicated IPMB to determine if they can communicate with each other over IPMB.

• An Ethernet connection must exist between the two CMMs. The CMMs must be able to ping

each other via Ethernet for synchronization to be successful. This can be a connection through the Ethernet switches in the chassis, which requires both switches to be present in the chassis; a connection can occur through an external Ethernet switch connected to the front ports of the CMM pair, or alternatively, the connection can be a crossover cable connecting the two front ports of the CMM pair. If synchronization fails on eth1, then it will be attempted on eth0. If the CMMs cannot successfully ping each other via eth0 or eth1, then synchronization between the CMMs cannot occur.

A failure of any priority 1 synchronization will result in a health event being logged in the CMM SEL and will inhibit a failover from occurring.

3.4 Initial Data Synchronization

It is absolutely necessary that a standby CMM has the priority one files synched before a successful failover can occur. A standby CMM can still become active if all priority one synchronization has not been completed, but it may not be able to properly manage all the FRU’s in the chassis.

The CMM implements the “Datasync Status” sensor to determine the state of synchronization and if synchronization has completed. successfully.

3.4.1 Initial Data Sync Failure

If CMM encounters any failure during data synchronization it marks the data synchronization failure and logs a SEL event and sends an SNMP trap. Duplicate failures are not reported multiple times. As soon as CMM is out of failure condition it will reset data synchronization failure state.

The CMM will continue trying to synchronize as long as there are two CMMs present in the chassis and they are able to communicate via their cross-connected IPMB.

24 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 25

3.5 Datasync Status Sensor

A sensor named “Datasync Status” exists in order to make the Datasync state information available to the user. This sensor tracks the status of the Datasync module and will make its status available through the various CMM interfaces. This sensor is used to query the data synchronization states, and log SEL events for initial synchronization complete event. It is a discrete OEM sensor with status bits representing the state of different parts of the Datasync module.

Note: The Datasync Status sensor can only be queried through the active CMM.

3.5.1 Sensor bitmap

When the Datasync starts the first time through in a dual CMM system and whenever the CMM changes between Active and Standby, the status bits are all cleared to 0x0000.

• Bit 0 (Running) is set when the datasync module is active.

• Bit 1 (P1Done) is set when the priority 1 data syncs are done, and cleared when priority 1 data

needs to be synced.

• Bit 2 (P2Done) is set when the priority 2 data syncs are done, and cleared when a priority 2

data needs to be synced.

• Bit 3 (InitSyncDone) is set when both priority 1 and priority 2 data syncs are done, and stays

set (latches) until the CMM changes between Active and Standby, or looses contact with the partner CMM.

• Bit 4 (SyncError) is set if an error was detected, and cleared when no data items have errors.

Redundancy, Synchronization, and Failover

3.5.2 Event IDs

The “Datasync Status” sensor will use event ids 0x420 to 0x42f. The following new event ids are used to log various events for these requirements. These event ID’s can be used to associated scripts with the respective events.

Event Event ID

Initial Data Synchronization complete 0x420 (1056)

3.5.3 Querying the Datasync Status

The status of the data synch sensor can be queried using the following CLI command:

cmmget –l cmm –t “Datasync Status” –d current

Output of the command is as follows:

Initial State:

The current value is 0x0001

Initial Data Synchronization is not complete.

There is Priority 1 data to sync.

MPCMM0001 Chassis Management Module Software Technical Product Specification 25

Page 26

Redundancy, Synchronization, and Failover

There is Priority 2 data to sync.

No Data Synchronization problems known.

Initial Data Synch Incomplete, Pri 1 Data Synced, Pri 2 Data Not Synched

The current value is 0x0003

Initial Data Synchronization is not complete.

Priority 1 data is synced.

There is Priority 2 data to sync.

No Data Synchronization problems known.

Initial Data Sync is complete, Priority 1 and Priority 2 are also synced

The current value is 0x000f

Initial Data Synchronization is complete.

Priority 1 data is synced.

Priority 2 data is synced.

No Data Synchronization problems known.

Initial Data Sync failure

The current value is 0x0013

Initial Data Synchronization is not complete.

Priority 1 data is synced.

There is Priority 2 data to sync.

Data Synchronization has encountered a problem in synchronizing data.

Initial Data Sync is complete and Priority 1 data is changed

The current value is 0x000d

Initial Data Synchronization is complete.

There is Priority 1 data to sync.

Priority 2 data is synced.

No Data Synchronization problems known

Data Sync failure of Priority 1 Data occurs after Initial Data Sync and there is a Data Sync Problem

The current value is 0x001d

Initial Data Synchronization is complete.

There is Priority 1 data to sync

26 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 27

Priority 2 data is synced.

Data Synchronization has encountered a problem in synchronizing data.

Data Sync becomes normal after Data Sync failure

The current value is 0x000f

Initial Data Synchronization is complete.

Priority 1 data is synced.

Priority 2 data is synced.

No Data Synchronization problems known

Single CMM

The current value is 0x0000

Datasync disabled - there is no partner CMM present.

3.5.4 SEL Event

Redundancy, Synchronization, and Failover

The Datasync Status sensor generates the following two SEL events:

• When the active CMM is or becomes the only CMM, or the active CMM loses communication

with the standby CMM, the following event will be logged:

[Day] [Month] [Date] [Time] [Year]

CMM[n]: CMM Datasync Status Initial Data Synchronization is complete. Deasserted

• The following event will be logged in the SEL when initial data synchronization is complete:

[Day] [Month] [Date] [Time] [Year]

CMM[n]: CMM Datasync Status Initial Data Synchronization is complete. Asserted

Where n: The number of the CMM generating the event.

3.5.5 SNMP Trap

The Datasync Status sensor generates following two SNMP traps:

• When the active CMM is or becomes the only CMM, or the active CMM loses communication

with the standby CMM, the following SNMP trap will be generated.

[Month] [Date] [Time] [hostname] snmptrapd[xxxxx]: [IP Address]: Enterprise Specific Trap (25) Uptime: [Time], SNMPv2SMI::enterprises.343.2.14.1.5 = STRING: "Time : [Day] [Month] [Date] [Time] [Year], Location : [location] , Chassis Serial # : [xxxxxxxx], Board : CMM[x] , Sensor : Datasync Status , Event : Initial Data Synchronization complete: Deasserted "

MPCMM0001 Chassis Management Module Software Technical Product Specification 27

Page 28

Redundancy, Synchronization, and Failover

• When initial data synchronization is complete, the following SNMP trap is generated:

[Month] [Date] [Time] [hostname] snmptrapd[xxxxx]: [IP Address]: Enterprise Specific Trap (25) Uptime: [Time], SNMPv2SMI::enterprises.343.2.14.1.5 = STRING: "Time : [Day] [Month] [Date] [Time] [Year], Location : [location] , Chassis Serial # : [xxxxxxxx], Board : CMM[x] , Sensor : CMM[x]:Datasync Status , Event : Initial Data Synchronization is complete. Asserted "

3.5.6 System Health

The “Datasync Status” sensor will not contribute to the system health. However sync failures are captured by the “File Sync Failure” sensor and it contributes to the system health

3.6 CMM Failover

Once information is synchronized between the redundant CMMs, the active CMM will constantly monitor its own health as well as the health of the standby CMM. In the event of one of the scenarios listed in the sections that follow, the active CMM will automatically failover to the standby CMM so that no management functionality is lost at any time.

3.6.1 Scenarios That Prevent Failover

The following are reasons a failover can NOT occur:

• The active CMM can NOT communicate with the standby CMM via their IPMB bus.

• Not all priority 1 data has been completely synchronized between the CMMs.

To determine the active CMM at anytime, use the CLI command:

cmmget -l cmm –d redundancy

This command will output a list stating if both CMMs are present, which one is the active CMM, and which CMM you are logged in to. CMM1 is the CMM on the left when looking from the front of the chassis, and CMM2 is on the right.

3.6.2 Scenarios That Failover to a Healthier Standby CMM

The scenarios listed below can only cause a failover if the standby CMM is in a healthier state than the active CMM. The health of the CMM is determined by computing a CMM health score, which is equal to the sum of the weights of the following active conditions. A CMM health score is determined for each CMM whenever any of these conditions occur on the active CMM. The CMM health score is composed of the sum of the weights of any of the three conditions listed below. Each condition has a default weight of 1 assigned to it, causing all conditions to have equal importance in causing failover.

To determine if a failover is necessary when one of these conditions occurs, the active CMM computes its CMM health score, and requests the health score of the standby CMM. If the score of the standby CMM is LESS than the score of the active CMM, a failover will occur. If a failover does not occur, the CMM SEL will contain an entry indicating the reason failover did not occur.

1. SNMPTrapAddress1 ping failure:

28 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 29

The active CMM will failover to the standby CMM if the active CMM cannot ping its first SNMP trap address (SNMPTrapAddress1) over any of the available Ethernet ports, but the standby CMM can. The trap address is set using the command:

cmmset –l cmm –d snmptrapaddress1 –v [ip address]

Only a ping failure of the first SNMP trap address (SNMPTrapAddress1) can cause a failover. SNMPtrapaddress2 through SNMPtrapaddress5 do not perform this ping test.

Note: The frequency of the ping to the first trap address can vary from one second to approximately 20

seconds.

2. Critical events on the active CMM:

The active CMM has critical events for any of the CMM sensors (not critical chassis or blade events) and the standby CMM does not. If both CMMs have critical CMM events, then the number of major and minor CMM events is examined to decide if a failover should occur. The number of major events is compared, and if they are equal, the number of minor events is used.

3.6.3 Manual Failover

The following command can be issued to the active CMM to manually cause a failover to the standby CMM:

Redundancy, Synchronization, and Failover

cmmset -l cmm -d failover -v [1/any]

Where: 1: Will failover only to a CMM with the same or newer version of firmware. any: Will failover to any version of firmware. A manual failover can only be initiated on the active CMM. A failover will only occur if the

standby CMM is at least as healthy as the active CMM. Once the command executes, the former standby CMM immediately becomes the active CMM.

If the failover could not occur, the CLI will indicate the reason why the failover could not occur, and a SEL event will be recorded.

In addition, opening the ejector latch on the active CMM will initiate a failover, but only if the standby is at least as healthy as the active.

3.6.4 Scenarios That Force a Failover

The following scenarios cause a failover as long as the standby CMM is operational, even when it is less healthy than the active:

• The active CMM is pulled out of the chassis.

• The active CMM’s healthy signal is de-asserted.

• A “reboot” command issued to the active CMM.

• The front panel alarm quiet switch button on the active CMM is pushed for more than five

seconds. If the button continues to be pressed for more than 10 seconds, the CMM does not reset.

MPCMM0001 Chassis Management Module Software Technical Product Specification 29

Page 30

Redundancy, Synchronization, and Failover

3.7 CMM Ready Event

The CMM Ready Event is a notification mechanism that informs the user when all CMM modules are fully up and running. The CMM is ready to process any request after receiving this event.

The CMM uses the "CMM Status" sensor when generating the CMM Not Ready event. Please refer to Table 46, “CMM Status Event Strings (CMM Status)” on page 118 for CMM status event strings.

Table 3. CMM Status Event Strings (CMM Status)

Event String Event Code Event Severity

“CMM is not ready.” 1024 Minor “CMM is ready.” 1025 OK “CMM is Active” 1026 OK “CMM is Standby” 1027 OK “CMM ready timed out” 1028 Minor

A CMM Not Ready Assertion SEL event is generated on a CMM when it transitions from standby mode to active mode during a failover or on the active CMM on power up. The event is only generated on the newly active CMM. The “CMM is Ready” event is generated after all CMM modules (board wrapper processes) are up and running and the SNMP daemon is active.

30 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 31

Built-In Self Test (BIST)

Built-In Self Test (BIST) 4

The CMM provides for a Built-In Self Test (BIST). The test is run automatically after power up. This test detects flash corruption as well as other critical hardware failures.

Results of the BIST are displayed on the console through the serial port during boot time. Results of BIST are also available through the CLI if the OS successfully boots. If the BIST detects a fatal error, the CMM is not allowed to function as an active CMM.

4.1 BIST Test Flow

The following state diagram shows the order of the tests RedBoot runs following a pow er-up or front-panel reset. On every state before reaching active CMM, if there is an error , RedBoot will log the error event into the EEPROM, route the error message to the serial port, and continue booting. If the execution hangs before the OS loads due to the nature of the error, the CMM hangs. If the OS successfully boots, it alerts users to any errors that occurred during boot.

MPCMM0001 Chassis Management Module Software Technical Product Specification 31

Page 32

Built-In Self Test (BIST)

Figure 1. BIST Flow Chart

Jump to

run from

RB image pass

Power Up/

Reset

BlueCat

loaded (active

Run from

backup

RB image

and backup

RB image checksum

RB image fail

backup FPGA image pass and FPGA image fail

FPGA image

and backup

FPGA image

checksum

NOT (backup FPGA image

pass and

FPGA image

fail)

CMM)

IPMB

Bus Test

BlueCat

Image

hecksum

FPGA,

DS1307, NIC

Load backup FPGA image

Memory Test

Load FPGA

image

The BIST has been broken down into stages consisting of groups of tests that run at certain times throughout the boot process. The following table shows the different BIST stages and the tests associated with each stage:

Table 4. BIST Implementation

Boot-BIST Early-BIST Mid-BIST Late-BIST

RedBoot image checksum

FPGA image checksum FPGA version check IPMB bus test Base memory test DS1307 RTC test

Strobe WDT to extend timeout period

Extended memory test BlueCat image checksum

Local PCI bus/NIC presence test

32 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 33

4.2 Boot-BIST

The codes in Boot-BIST are executed at the very early stage of the RedBoot bootstrap, which is just before the FPGA programming and memory module initialization. Boot-BIST performs checksum checking over the RedBoot image and the FPGA image. A checksum error will be detected if there is a mismatch between the calculated checksum and the stored checksum in FIS directory.

Boot-BIST also performs a Base Memory T est for the fi rst 1 MByte of memo ry. Whenever there is an error, BIST will inform the user by prompting a warning message throug h the console terminal and log the event to event-log area.

4.3 Early-BIST

The early BIST stage extends the reset timeout period on the watchdog timer (MAX6374) by strobing GPIO7 on FPGA1. This prevents any possible hardware reset during the BIST process. The watchdog timer is enabled after the ADM1026 GPIO initialization and disabled once it reaches the RedBoot console. The OS enables the watchdog timer again and starts the strobing thread at the kernel level.

Built-In Self Test (BIST)

4.4 Mid-BIST

This stage of BIST performs the Extended Memory Test to scan and diagnose the possible bit errors in the memory. It starts scanning from 1 MByte to the 128 MByte. It does not test the memory below 1 MByte because a portion of RedBoot has already loaded and resided on it.

The memory test includes the walking ones test 32-bit address test, and 32-bit inverse address test. Furthermore, voltage and temperature ratings will be verified to lie within the hardware tolerable ranges. The FPGA firmware version is checked and will alert if an older version of an FPGA image has been detected. Also, system date and time is read from the real-time clock and displayed through the console terminal. NIC presence is also checked here, though the NIC self-test happens later when the driver is loaded.

4.5 Late-BIST

Late-BIST disables the watchdog timer once RedBoot is fully loaded. It then verifies the checksum of the OS image with a stored checksum at the top of flash memory, before proceeding with the boot script execution.

The following diagram shows the times during the boot cycle the when various stages of BIST are performed.

MPCMM0001 Chassis Management Module Software Technical Product Specification 33

Page 34

Built-In Self Test (BIST)

Figure 2. Timing of BIST Stages

Boot-BIST

Early-BIST

HAL in it ia liz a tio n

(processor, cache,

serial port)

FPGA

programming

Memory

parameters

initialization

Module

initialization

(flash, zlib, ide)

Mid-BIST

4.6 QuickBoot Feature

This feature will skip all the diagnostics tests in the mid-BIST and late-BIST, once it has been enabled. However, Flash Test and Base Memory Test in the boot-BIST will still execute, even with this feature enabled. The default setting is QuickBoot enabled.

When QuickBoot feature has been disabled, user has the choice to optionally enable or disable the Extended Memory Test (in mid-BIST) and the OS Image Checksum Test (in late-BIST) individually.

4.6.1 Configuring QuickBoot

Module

initialization

(ethernet interface)

Late-BIST

Display c o pyright

banner, and

execute boot script

Done

RedBoot> fconfig ... Enable QuickBoot during BIST: false

34 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 35

Execute extended memory test: true OS image checksum at boot: true ... Update RedBoot non-volatile configuration - are you sure (y/n)? y

The default 'Enable QuickBoot during BIST' is true. When 'Enable QuickBoot during BIST' set to false, there will be two additional options displayed in the configuration menu. They are 'Execute extended memory test' and 'OS image checksum at boot' options. User can selectively enable one or both tests during the QuickBoot disabled mode. Both options will not be shown in the configuration menu if the QuickBoot is enabled. These options will go into effect during the next boot.

4.7 Event Log Area and Event Management

Errors detected by the BIST are stored in an event log. The event-log area is designed to have up to 269 entries. Each entry is 14 bytes. The event-log area is located in EEPROM on the CMM. The BIST can place entries into the event log until it becomes full. Once full, any new entries will be lost. The BIST event log is cleared by the OS once the OS logs any BIST errors into the SEL.

At OS start-up, the CMM reads the contents of BIST results in the reserved event log area and stores the errors as entries in the CMM SEL. This allows the CMM application to take the appropriate action based upon the SEL events as a result of RedBoot BIST tests. If there is not enough space to log the events in the CMM SEL, no results are logged to the CMM SEL.

Built-In Self Test (BIST)

The BIST event log is erased only after the event log is stored into the CMM SEL. Event strings for BIST events are listed in Section 11, “Health Events” on page 104.

4.8 OS Flash Corruption Detection and Recovery Design

The OS is responsible for the flash content integrity at runtim e. Flash monit oring under the OS environment can be divided into two parts: Monitoring static images and monitoring dynamic images.

Static images refer to the RedBoot image, FPGA image and BlueCat image in flash. These images should not change throughout the lifetime of the CMM unless they are purposely updated or corrupted. The checksum for these files is written into flash when the images are uploaded.

Dynamic image refers to the OS Flash File System (JFFS2). This image dynamically changes throughout the runtime of the OS.

4.8.1 Monitoring the Static Images

A static test is run every 24 hours during CMM operation. The static test reads each static image (RedBoot, FPGA, BlueCat), calculates the image checksum, and compares with the checksum in the RedBoot configuration area (FIS). If the checksum test fails, the error is logged to the CMM SEL.

MPCMM0001 Chassis Management Module Software Technical Product Specification 35

Page 36

Built-In Self Test (BIST)

4.8.2 Monitoring the Dynamic Images

For monitoring the dynamic images, the CMM leverages the corruption detection ability from the JFFS(2) flash file system. At OS start-up, the CMM executes an initialization script to mount the JFFS(2) flash partitions (/etc and /home). If a flash corruption is detected, an event is logged to the CMM SEL.

During normal OS operation, flash corruption during file access can also be detected by the JFFS(2) and/or the flash driver. If a flash corruption is detected, an event is logged to the CMM SEL.

4.8.3 CMM Failover

If during normal OS operation a critical error occurs on the active CMM, such as a flash corruption, the standby CMM is checked to see if it is in a healthier state. If the standby CMM is in a healthier state, then a failover will occur. See Section 3, “Redundancy, Synchronization, and

Failover” on page 21.

4.9 BIST Test Descriptions

4.9.1 Flash Checksum Test

This test is targeted to verify the RedBoot image and FPGA image are not corrupted. This test calculates the CRC32 checksum from the RedBoot image, then compares with the image checksum stored in the FIS directory. If one mismatches another, BIST switches to the backup image. If checksum mismatch was found from the FPGA image, BIST loads the backup image to program the FPGA device.

4.9.2 Base Memory Test

This test writes the data pattern of 55AA55AA into every 4 bytes of the memory below 1 MByte. Its objective is to verify the wire connectivity of address and data pins between the memory module and the processor. The test first writes the data pattern into the complete first 1 MByte, then verifies the written data pattern by reading them from the memory module. If the data pattern mismatches, the test logs the error event into the event-log area and routes the error message to the serial port.

4.9.3 Extended Memory T ests

Walking Ones Test

This test is targeted to verify the data bus wiring by testing the bus one bit at a time. The data bus passes the test if each data bit can be set to 0 and 1 independently of the other data bits.

32-Bit Address Test This test is targeted to verify the address bus wiring. The smallest set of addresses that will cover

all possible combinations is the set o f “power - of-two” addresses. These addresses are analogous to the set of data values used in the walking ones test. The corresponding memory locations are 0001h, 0002h, 0004h, 0008h, 0010h, 0020h, and so on. In addition, address 0000h must also be tested. To confirm that no two memory locations overlap, initial data value is first written at each power-of-two offset within the device. Then a new value is written–an inverted copy of the initial

36 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 37

value to the first test offset. It is then verified that the initial data value is still stored at every other power-of-two offset. If a location is found, other than the one just written, that contains the new data value, there is a problem with the current address bit. If no overlapping is found, the procedure is repeated for each of the remaining offsets.

32-Bit Inverse Address Test

This test behaves similarly to the memory test described above, except the addresses are tested in the inverse direction. This test helps to identify a broader scope of possible addressing errors inherent in the memory modules.

4.9.4 FPGA Version Check

This test is targeted to verify the correct FPGA image programmed into both FPGA chips. It displays the FPGA version on both FPGAs. Both versions should be the same. If the programmed version is older than expected, an event is logged to the SEL.

4.9.5 DS1307 RTC (Real-Time Clock) Test

This test is targeted to verify the functionality of DS1307 RTC chip. This test displays the date/time settings from the RTC and validates the readings. If any readings are found to be non-BCD format, an event is logged to the SEL. This test also captures current time, sleeps a while, and compares the previously captured time and new time. If they differ, it means the R TC is working. If not, an event is logged to the SEL.

Built-In Self Test (BIST)

4.9.6 NIC Presence/Local PCI Bus Test

This test generates the PCI bus transaction by scanning the PCI buses available on the board. This test detects the two Ethernet devices and verifies each device has the valid Vendor ID and Device ID in the PCI configuration space. NIC internal self-test is not performed here, as the self-test is executed when loading the Ethernet driver.

4.9.7 OS Image Checksum Test

This test is targeted to verify the OS image st ored in the flash is not corrupted. This test calculates the CRC32 checksum from the OS image, and then compares it with the image checksum stored in the FIS directory. If one mismatches another, BIST will log an error event to the SEL.

4.9.8 CRC32 Checksum

CRC32 is the 32-bit version of Cyclic Redundant Check technique, which is designed to ensure the bits validity and integrity within the data. It first generates the diffusion table, which consists of 256 entries of double-word; each entry is known as a unique diffusion code. The checksum calculation is started by fetching the first byte in data buffer, exclusive-OR with the temporary checksum value. The resulting value is AND-ed with 0xFF to restrict an index from 0 to 255 (decimal). That index is used to fetch a new diffusion code from the table. Next, the newly fetched diffusion code is exclusive-OR with the most significant 24 bits of the temporary checksum value (effectively 8 bits left-shifting the checksum value). The resulting value is the new temporary checksum value. The calculation process is repeated until the last byte in the data buffer. The final temporary checksum value becomes the final checksum value.

MPCMM0001 Chassis Management Module Software Technical Product Specification 37

Page 38

Built-In Self Test (BIST)

4.9.9 IPMB Bus Busy/Not Ready Test

The objective of the test is to identify any potential FPGA lockup before loading the BlueCat.

•

When the FPGA is detected to be locked up, an event indicating which bus actually failed is logged into the Event log.

38 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 39

Re-enumeration

Re-enumeration 5

5.1 Overview

The Chassis Management Module has the ability to re-enumerate devices in the chassis in the event that the chassis loses and then regains CMM management. This allows the CMM to query information on all devices in the chassis on startup if there are no active CMMs in that chassis already containing that information from which it can receive via a regular synchronization. This is achieved without having to restart the individual blades already present in the chassis.

Re-enumeration provides a way to recover from situations such as double failures where both the CMMs have failed or been accidentally removed from the chassis. For the CMM to identify the contents of the chassis, it first determines if it should do this function. The Standby CMM does not re-enumerate its information and relies on the information synchronized from the Active CMM in case a failover occurs. After the startup, the Active CMM determines what Entities are present. Then for each of these Entities, the CMM queries it to get state and other information to be able to properly manage the Entity as well as the entire chassis. The CMM stays in M2 state unt il reenumeration is complete.

The CMM re-enumeration process obtains the following information for each FRU in the chassis:

—Presence — M-State — Power Usage — Sensor Data Records — Health Events — Board EKey Usage — Bused EKey Usage

5.2 Re-enumeration on Failover

In case of forced failover, the newly Active CMM will do re-enumeration if following conditions are satisfied:

• Re-enumeration has not completed on the Active CMM.

• Active CMM has not yet synchronized the re-enumerated data over to the Standby CMM.

In case the newly Active CMM has to do re-enumeration, it will switch to M2 state before starting re-enumeration. The Blue LED uses long blinks to provide visual indication of the state of the CMM. It is recommended that the Entities in the chassis be not activated or deactivated while reenumeration is in progress.

MPCMM0001 Chassis Management Module Software Technical Product Specification 39

Page 40

Re-enumeration

5.3 Re-enumeration of M5 FRU

If, during re-enumeration, the CMM discovers that a FRU is requesting for deactivation (State M5), it denies the request and informs the FRU to go back to Active (M4) state if there is no frucontrol script present (refer to Section 18.5, “FRU Control Script” on page 169). Otherwise, the CMM executes the frucontrol script and lets it handle the deactivation of the FRU.

5.4 Resolution of EKeys

During re-enumeration, the CMM determines the status of EKeys of the Boards present in the chassis. If there are interfaces which can be enabled with respect to other end-point, the CMM completes the EKeying process as per Section 14.1. If there are EKeys enabled to a slot but CMM was unable to discover a Board in that slot, it assumes that the Board in that slot is in M7 (Communication Lost) state.

5.5 Events Regeneration

The Re-enumeration agent sends out the "Set Event Receiver" command to all the Entities in the chassis. On receiving the command, the Entities re-arm event generation for all their internal sensors. This will cause them to transmit the event messages that they have based on the current event conditions. These events will be logged in the SEL.

Note: The regeneration of events may cause events to be logged into the SEL twice. This could result in

configured eventaction scripts running twice. During the process of identifying the chassis content, once the CMM determines that the Entity is a

fantray, it automatically sets the fan speeds to the critical level. The speeds are not brought back to normal level until it has determined that there are no thermal events in the chassis.

40 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 41

Process Monitoring and Integrity

Process Monitoring and Integrity 6

6.1 Overview

The Chassis Management Module monitors the general healt h of pr ocesses running on the CMM and can take recovery actions upon detection of failed processes. This is handled by the Process Monitoring Service (PMS).

Upon detecting unhealthy processes, the PMS will take a configurable recovery action. Examples of recovery actions include restarting the process, failing over to the standby CMM, etc.

The PMS itself is also monitored to ensure that it is operating correctly. The PMS is monitored in both a single CMM configuration and a redundant CMM configuration. When faults are detected in the PMS, corrective actions are taken.

The PMS also provides dynamic configuration and status information through the CLI, RPC, and SNMP interfaces. For example, users can administratively lock/disable monitoring of a process while the PMS is running to suit their particular needs. The PMS also provides static configuration to allow customers the ability to tune the static system parameters for the given platform. Examples of these parameters may include monitoring interval, retries, and ramp-up times.

6.1.1 Process Existence Monitoring

Process existence monitoring utilizes the operating system's process table to determine the existence of the process. When the CMM software is started, the PMS initializes and determines the set of processes to monitor for process existence. The PMS periodically queries the operating system for the existence of that set of processes. When a monitored process is found not to exist, the PMS will generate a SEL entry and take a recovery action.

Process existence monitoring can be utilized on all permanent processes (processes which exist for the life of the CMM software as a whole). It is particularly useful when monitoring processes that were not specifically developed for running on the CMM. Applications that are provided by the operating system vendor are examples of these types of processes. For the Linux* operating system, processes like syslogd and crond would be good examples.

6.1.2 Thread Watchdog Monitoring

Thread watchdog monitoring requires that the process being monitored notifies the PMS of its continued operation. Notifying the PMS will allow the PMS to monitor the process for existence and conditions where a process locks-up. Each thread requiring monit oring within a process using the thread watchdog will register with the PMS. The PMS will loop through its list of registered threads and determine if the set of registered threads are operating. When any thread is determined to be unresponsive (i.e., not notifying the PMS of its continued operation), the PMS will generate a SEL entry and take a recovery action.

Thread watchdog monitoring can be used on all processes that are instrumented with the PMS thread watchdog API. It provides more functionality then process existence monitoring and can be used in conjunction with process integrity monitoring to provide a comprehensive solution. Thread

MPCMM0001 Chassis Management Module Software Technical Product Specification 41

Page 42

Process Monitoring and Integrity

watchdog monitoring is relatively lightweight and can be done every second, although, the process being monitored may dictate a (much) lower frequency depending on how often it is capable of feeding the watchdog.

6.1.3 Process Integrity Monitoring

The Process Integrity Executable (PIE) will be responsible for determining the health of process or processes. When a PIE finds an unhealthy process, it will notify the PMS of the errant process so that the PMS can take the appropriate action. An example of a PIE would be one that monitored the Simple Network Management Protocol (SNMP) process. The PIE could utilize SNMP get operations to query the SNMP process. If the SNMP process cannot respond to the queries with the appropriate information, the process would be considered unhealthy and the PIE would notify the PMS.

Process integrity monitoring may be used in conjunction with existence monitoring to provide a comprehensive solution.

6.2 Processes Monitored

Below is a list of processes that are monitored for Process Existence on the CMM by the Process Monitoring Service.

Table 5. Processes Monitored

Process Monitored

CMM Wrapper Process ./WrapperProcess 23 PmsProc 23

CMM Wrapper Process ./WrapperProcess 255 PmsProc50

SNMP Daemon CLI Server ./cli_svr PmsProc52 Existence

Cron Daemon /bin/crond PmsProc100 Existence Inet Daemon xinetd -stayalive -reuse PmsProc101 Existence Syslog Daemon /sbin/syslogd PmsProc102 Existence CMM Command

Handler CMM Blade Process

Manager CMM Wrapper Process

[0-39] Pms Monitor ./PmsMonitor PmsProc3 Existence and TWL Pms Shadow ./PmsMonitor shadow PmsProc2 Existence and TWL

Process Command Line / Process Name

/usr/sbin/snmpd -c /etc/ snmpd.conf

./cmd_hand PmsProc53 Existence

./BPM PmsProc54

./WrapperProcess[#] (0-39)

Target Name

PmsProc51

PmsProc[#] (60-99)

Monitoring Level

Existence and Integrity

Integrity

6.3 Process Monitoring Targets

The following targets are provided for the Process Monitoring Service under the cmm location:

42 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 43

• PmsGlobal

Target for PMS global data

• PmsProc[#]

Target for each process monitored

• PmsPie[#]

Target for each PMS PIE

Use the following CLI command to view the targets for the processes being monitored.

cmmget -l cmm -d listtargets

The particular processes being monitored will be listed (e.g., PmsProc23, PmsProc100). To view the name of the process being monitored use the following example command:

cmmget -l cmm -t PmsProc34 -d ProcessName

T able 5, “Processes Monitored” contai ns the list of processes monito red and the command lines

and the target names. The ProcessName dataitem will return the Process Command Line.

6.4 Process Monitoring Dataitems

Process Monitoring and Integrity

The following dataitems are used to retrieve information on and configure the Process Monitoring Service (used with PmsGlobal or PmsProc[#] targets on the cmm location).

• AdminState

• RecoveryAction

• EscalationAction

• ProcessName

• OpState

More information on the usage and descriptions of these dataitems can be found in Section 8, “The

Command Line Interface (CLI)” on page 71.

6.4.1 Examples

The following example will set the global PMS AdminState to locked:

cmmset -l cmm -t PmsGlobal -d AdminState -v 2

The following example will get the recovery action assigned to a monitored process:

cmmget -l cmm -t PmsProc34 -d RecoveryAction

The following example will get the admin state to a PIE: cmmget -l cmm -t PmsPie176 -d AdminState

MPCMM0001 Chassis Management Module Software Technical Product Specification 43

Page 44

Process Monitoring and Integrity

6.5 SNMP MIB Commands

SNMP commands are implemented in the CMM mib for Process Monitoring. The list of new commands can be found in the CMMs MIB file or in Section 17, “SNMP” on page 140.

6.6 Process Monitoring CMM Events

The “Process Monitoring Service” sensor types are used to assert and de-assert process status information such as process presence not detected, process recovery failure, or recovery action taken. See Section 11.4, “List of Possible Health Event Strings” on page 108 for event strings, codes, and severities for Process Monitoring.

Event severities are configurable by the user and are unique to the process being monitored. The processes that are monitored and their default severities are listed below. Severities are

configured (while PMS is not running) by changing the ProcessSeverity field in the configuration file (pms.ini). Values for severity: 1 = minor, 2 = major, 3 = critical.

• ./WrapperProcess 23

ProcessSeverity = 2

• ./WrapperProcess 255

ProcessSeverity = 2

• /usr/sbin/snmpd -c /etc/snmpd.conf

ProcessSeverity = 2

• ./cli_svr

ProcessSeverity = 2

• /bin/crond

ProcessSeverity = 2

• xinetd -stayalive -reuse

ProcessSeverity = 2

• /sbin/syslogd

ProcessSeverity = 1

• ./PmsMonitor

ProcessSeverity=2

• ./PmsMonitor shadow

ProcessSeverity=2

• ./WrapperProcess0 through ./WrapperProcess39

ProcessSeverity=2

• ./cmd_hand

ProcessSeverity=3

• ./BPM

44 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 45

ProcessSeverity=3

Note: The recovery action and escalation action should not be set to "no action" for the xinetd process.

This process is involved in data synchronization between the CMMs.

Note: When a user tries to change the recovery action for cmd_hand or BPM to values other than allowed

via the CLI API, the error string displayed is:

"Recovery action not allowed for this target."

6.7 Failure Scenarios and Eventing

This section describes the process fault scenarios that are detected and handled by the PMS. It also describes the eventing that is associated with the detection and recovery mechanisms. Each scenario contains a brief textual description and a table that further describes the scenario.

In the table, the Description column outlines the current action. The Event Type String defines the text for the event that is written to the SEL. The text in this field describes the portion of the event containing event-specific string (the remainder of the event text is standard for all events). However, for PMS the target name (sensor name) will be PmsProc<#> instead of the name of the sensor (where # is the unique identifier of the given process).

Process Monitoring and Integrity

The UID indicates the unique identifier for the process causing the event. An ID of 1 indicates the monitoring service itself (global) and an ID of # indicates an application process.

The Assert column indicates if the event is asserted or de-asserted. For items that are just written to the SEL for informational purposes, the assertion state is not applicable. However, it is required by the interface and therefore it will be set to de-assert.

The Severity column will define the severity of the event. A severity of Configure indicates that the severity is configurable. The configurable severities are available in the Configuration Database. The remaining columns (SNMP traps, health events, LEDs, and telecommunication alarms) define what indicator will be triggered by the event.

6.7.1 No Action Recovery

In this scenario PMS detects a process fault. The PMS is configured to take no action and therefore disables monitoring of the process.

MPCMM0001 Chassis Management Module Software Technical Product Specification 45

Page 46

Process Monitoring and Integrity

Table 6. No Action Recovery

Description Event String UID Asse rt Severity

PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.

The recovery action specified is "no action".

No attempt will be made to recover the process. The PMS will stop monitoring the process.

See Section 6.7.11, “Process

Administrative Action” on page 53, for

information about how to re-enable monitoring and de-assert the event.

Process existence fault; attempting recovery or

Thread watchdog fault; attempting recovery or

Process integrity fault; attempting recovery

Take no action specified for recovery

Process existence fault; monitoring disabled or

Thread watchdog fault; monitoring disabled or

Process integrity fault; monitoring disabled

6.7.2 Successful Restart Recovery

In this scenario PMS detects a process fault. The configured recovery action is: restart the process. The PMS is able to successfully recover the process by restarting it.

Table 7. Successful Restart Recovery

Description Event String UID Asse rt Severity

PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.

The recovery action specified is "process restart".

PMS was successfully able to restart the process

Process existence fault; attempting recovery or

Thread watchdog fault; attempting recovery or

Process integrity fault; attempting recovery

Attempting process restart recovery action

Recovery successful # De-assert OK

# Assert Configure

# N/A Configure

# Assert Configure

# N/A Configure

46 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 47

Process Monitoring and Integrity

6.7.3 Successful Failover/Restart Recovery

In this scenario PMS detects a process fault. The configured recovery action is: failover to the standby CMM and then restart the failed process. The PMS is able to successfully recover the process by restarting it.

Table 8. Successful Failover/Restart Recovery

Description Event String UID Assert Severity

PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.

The recovery action specified is "failover and restart".

PMS executes a failover. Note this step is skipped when

running on the standby CMM.

PMS was successfully able to restart the process

Note PMS will execute this step even if the failover is unsuccessful (standby not available, unhealthy, etc.).

Process existence fault; attempting recovery or

Thread watchdog fault; attempting recovery or

Process integrity fault; attempting recovery

Attempting process failover & restart recovery action

The existing code generates the events for failover. They are separate from process monitoring events and are not described here.

Recovery successful # De-assert OK

# Assert Configure

# N/A Configure

-N/A N/A

MPCMM0001 Chassis Management Module Software Technical Product Specification 47

Page 48

Process Monitoring and Integrity

6.7.4 Successful Failover/Reboot Recovery

In this scenario, PMS detects a process fault. The configured recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The recovery actions are successful.

Table 9. Successful Failover/Reboot Recovery

Description Event String UID Asse rt Severity

PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.

The recovery action specified is "failover & reboot"

PMS executes a failover. Note this step is skipped when

running on the standby CMM.

PMS is running on the standby CMM (failover was successful or already running on the standby), PMS recovers the CMM by rebooting.

Upon initialization of PMS after the reboot. The monitor will de-assert the event.

Process existence fault; attempting recovery or

Thread watchdog fault; attempting recovery or

Process integrity fault; attempting recovery

Attempting failover & reboot recovery action

The existing code generates the events for failover. They are separate from process monitoring events and are not described here.

Monitoring initialized # De-assert OK

# Assert Configure

# N/A Configure

-N/A N/A

6.7.5 Failed Failover/Reboot Recovery, Non-Critical

In this scenario, PMS is running on the active CMM and detects a monitored process fault. The severity of the process is configured to a value that is not critical. The configured recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The failover recovery action is unsuccessful (standby is not available, etc.). The process being monitored is not of a critical severity and therefore the reboot of the CMM will not be performed.

48 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 49

Table 10. Failed Failover/Reboot Recovery, Non-Critical

Description Event String UID Assert Severity

Process Monitoring and Integrity

PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.

The recovery action specified is "failover & reboot"

PMS executes a failover

PMS detects that it is still running on the active CMM. The process is not critical and therefore the reboot operation will not be performed.

No attempt will be made to recover the process. The PMS will stop monitoring the process.

See Section 6.7.11, “Process

Administrative Action” on page 53, for

information about how to re-enable monitoring and de-assert the event.

Process existence fault; attempting recovery or

Thread watchdog fault; attempting recovery or

Process integrity fault; attempting recovery

Attempting failover & reboot recovery action

The existing code generates the events for failover. They are separate from process monitoring events and are not described here.

Failover & reboot recovery failure # N/A Configure

Process existence fault; monitoring disabled or

Thread watchdog fault; monitoring disabled or

Process integrity fault; monitoring disabled

6.7.6 Failed Failover/Reboot Recovery, Critical

# Assert Configure

# N/A Configure

-N/A N/A

# Assert Configure

In this scenario, PMS is running on the active CMM and detects a monitored process fault. The severity of the process is configured to be critical. The configured recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The failover recovery action is unsuccessful (standby is not available, etc.). The process being monitored is of a critical severity and therefore the reboot of the CMM will be performed.

MPCMM0001 Chassis Management Module Software Technical Product Specification 49

Page 50

Process Monitoring and Integrity

Table 11. Failed Failover/Reboot Recovery, Critical

Description Event String UID Asse rt Severity

PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.

The recovery action specified is "failover & reboot"

PMS executes a failover.

PMS detects that it is still running on the active CMM. The process is critical and therefore the reboot operation is performed.

Upon initialization of PMS after the reboot. The monitor will de-assert the event.

Process existence fault; attempting recovery or

Thread watchdog fault; attempting recovery or

Process integrity fault; attempting recovery

Attempting failover & reboot recovery action

The existing code generates the events for failover. They are separate from process monitoring events and are not described here.

Monitoring initialized # De-assert OK

# Assert Configure

# N/A Configure

-N/A N/A

6.7.7 Excessive Restarts, Escalate No Action

Table 12. Existence Fault, Excessive Restarts, Escalate No Action (Sheet 1 of 2)

Description Event String UID Asse rt Severity

PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.

The recovery action specified is "process restart"

Process existence fault; attempting recovery or

Thread watchdog fault; attempting recovery or

Process integrity fault; attempting recovery

Attempting process restart recovery action

# Assert Configure

# N/A Configure

50 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 51

Process Monitoring and Integrity

Table 12. Existence Fault, Excessive Restarts, Escalate No Action (Sheet 2 of 2)

Description Event String UID Assert Severity

PMS detects that the process has been restarted excessively.

PMS attempts to execute the escalated recovery action. Since the recovery action is "no action", PMS disables monitoring of the process.

No attempt will be made to recover the process. The PMS will stop monitoring the process.

See Section 6.7.11, “Process

Administrative Action” on page 53, for

information about how to re-enable monitoring and de-assert the event.

Recovery failure due to excessive restarts

Take no action specified for escalated recovery

Process existence fault; monitoring disabled or

Thread watchdog fault; monitoring disabled or

Process integrity fault; monitoring disabled

# N/A Configure

# Assert Configure

6.7.8 Excessive Restarts, Successful Escalate Failover/Reboot

In this scenario PMS detects a process fault. The configured recovery action is: restart the process. However, the PMS also detects that the process has exceeded the threshold for excessive process restarts. Therefore, the PMS will execute the escalation action. The configured escalation recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The escalated recovery action is successful.

Table 13. Excessive Restarts, Successful Escalate Failover/Reboot

Description Event String UID Assert Severity

PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.

The recovery action specified is "restart process"

PMS detects that the process has been restarted excessively.

The escalated recovery action specified is "failover and reboot"

PMS executes a failover. Note this step is skipped when

running on the standby CMM.

PMS is running on the standby CMM (failover was successful or already running on the standby), PMS recovers the CMM by rebooting.

Upon initialization of PMS after the reboot. The monitor will de-assert the event.

Process existence fault; attempting recovery or

Thread watchdog fault; attempting recovery or

Process integrity fault; attempting recovery

Attempting process restart recovery action

Recovery failure due to excessive restarts

Attempting failover & reboot escalated recovery action

The existing code generates the events for failover. They are separate from process monitoring events and are not described here.

Monitoring initialized # De-assert OK

# Assert Configure

# N/A Configure

-N/A N/A

MPCMM0001 Chassis Management Module Software Technical Product Specification 51

Page 52

Process Monitoring and Integrity

6.7.9 Excessive Restarts, Failed Escalate Failover/Reboot, NonCritical

In this scenario PMS detects a process fault. The severity of the process is configured to a value that is not critical. The configured recovery action is: restart the process. However, the PMS also detects that the process has exceeded the threshold for excessive process restarts. Therefore, the PMS will execute the escalation action. The configured escalation recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The failover recovery action is unsuccessful (standby is not available, etc.). The process being monitored is not of a critical severity and therefore the reboot of th e CMM will not be performed.

Table 14. Excessive Restarts, Failed Escalate Failover/Reboot, Non-Critical

Description Event String UID Asse rt Severity

PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.

The recovery action specified is "restart process"

PMS detects that the process has been restarted excessively.

The escalated recovery action specified is "failover and reboot"

PMS executes a failover.

PMS detects that it is still running on the active CMM. The process is not critical and therefore the reboot operation will not be performed.

No attempt will be made to recover the process. The PMS will stop monitoring the process.

See Section 6.7.11, “Process

Administrative Action” on page 53, for

information about how to re-enable monitoring and de-assert the event.

Process existence fault; attempting recovery or

Thread watchdog fault; attempting recovery or

Process integrity fault; attempting recovery

Attempting process restart recovery action

Recovery failure due to excessive restarts

Attempting failover & reboot escalated recovery action

The existing code generates the events for failover. They are separate from process monitoring events and are not described here.

Failover & reboot escalated recovery failure

Process existence fault; monitoring disabled or

Thread watchdog fault; monitoring disabled or

Process integrity fault; monitoring disabled

# Assert Configure

# N/A Configure

-N/A N/A

# N/A Configure

# Assert Configure

6.7.10 Excessive Restarts, Failed Escalate Failover/Reboot, Critical

In this scenario, PMS detects a process fault. The severity of the process is configured as critical. The configured recovery action is: restart the process. However, the PMS also detects that the process has exceeded the threshold for excessive process restarts. Therefore, the PMS will execute the escalation recovery action. The configured escalation recovery action is: failover to the standby CMM and upon successfully executing the failover, reboot the now standby CMM. The failover

52 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 53

Process Monitoring and Integrity

recovery action is unsuccessful (standby is not available, etc.). The process being monitored is of critical severity and therefore the reboot of the CMM will still be executed even though the CMM is still active.

Table 15. Excessive Restarts, Failed Escalate Failover/Reboot, Critical

Description Event String UID Assert Severity

PMS detects a faulty process. The mechanism (existence, thread watchdog, or integrity) used to detect the fault will determine which of the event type strings will be used.

The recovery action specified is "restart process"

PMS detects that the process has been restarted excessively.

The escalated recovery action specified is "failover and reboot"

PMS executes a failover.

PMS detects that it is still running on the active CMM. The process is critical and therefore the reboot operation is performed.

Upon initialization of PMS after the reboot. The monitor will de-assert the event.

Process existence fault; attempting recovery or

Thread watchdog fault; attempting recovery or

Process integrity fault; attempting recovery

Attempting process restart recovery action

Recovery failure due to excessive restarts

Attempting failover & reboot escalated recovery action

The existing code generates the events for failover. They are separate from process monitoring events and are not described here.

Monitoring initialized # De-assert OK

6.7.11 Process Administrative Action

# Assert Configure

# N/A Configure

-N/A N/A

In this scenario, PMS has detected a fault in a process, but has not been able to recover the process (recovery is configured for no action, etc.). This causes PMS to operationally disable monitoring of the process. To re-enable monitoring of the process, an operator must administratively lock the process, take the necessary actions to fix the process, and administratively unlock the process.

Table 16. Administrative Action

Description Event String UID Assert Severity

Operator administratively locks monitoring of the process

Operator takes actions to fix the problem

Operator administratively unlocks monitoring of the process causing monitoring to restart

None - N/A N/A

N/A - N/A N/A

Monitoring initialized # De-assert OK

MPCMM0001 Chassis Management Module Software Technical Product Specification 53

Page 54

Process Monitoring and Integrity

6.7.12 Excessive Failover/Reboots, Administrative Action

Prior to executing any failover/reboot the PMS will determine if the failover/reboot threshold has been exceeded. If it has, the PMS will be operationally disabled. When PMS is disabled, all process monitoring is halted. To re-enable the PMS, the operator must lock the global administrative state. The operator can then fix the problem and administratively unlock the global administrative st ate.

The following events are generated against the PMS Monitor (unique ID 1). The events for the process or processes that caused this condition to occur will also be present, but are not described in this table. They are defined in the scenarios provided above.

Table 17. Excessive Failover/Reboots, Administrative Action

Description Event String UID Asse rt Severity

PMS detects excessive failover/ reboots

Operator locks the global administrative state

Operator takes actions to fix the problem

Operator unlocks the global administrative state causing monitoring to be resumed

a. The "Monitoring initialized" will be generated for the monitor (unique 1) as well as the individual processes that are admin-

istratively unlocked.

Excessive reboots/failovers; all process monitoring disabled

None - N/A N/A

N/A - N/A N/A

Monitoring initialized 1#

1 Assert Major

De-assert OK

6.8 Process Integrity Executable (PIE)

The Process Integrity Executable (PIE) for the Chassis Management Module’s (CMM) Blade Proxy Manager (BPM) and Wrapper Processes is responsible for determining the health of the Wrapper Processes. Monitoring the integrity means not only monitoring the fact that the process is running but that it is functioning properly.

The PIE will monitor the BPM, CMM Wrapper Process (Wrapper Process num ber 255) and Chassis Wrapper Processes (23). It will also monitor the Wrapper Processes for intelligent (have a management controller) blades, power supplies, and fans. Wrapper Processes for non-intelligent devices will not be monitored.

PIE will monitor the BPM and Wrapper Processes. The Wrapper Processes have two categories for integrity monitoring. The first category contains the static processes. Static processes are processes that are always present while the CMM software is running. The CMM (255) and chassis (23) Wrapper Processes are the static processes. The second category contains all the dynamic Wr apper Processes. Dynamic processes are ones that come and go as the configuration of the chassis changes (such as a blade insertion or removal). The fan, power supply, and blade Wrapper Processes belong to the dynamic category.

54 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 55

6.9 Configuring pms.ini

The pms.ini file is the Process Monitoring Service (PMS) and Process Integrity Exectuable (PIE) configuration file. It contains all of the non-volatile configuration data for the service. This file can be found in the /etc/cmm directory on the CMM. It is an ASCII based text file that can be edited with vi or any other text editor.

Note: Any changes made to the pms.ini file will be overwritten during a firmware update. Care should be

made to preserve the file or any changes before a firmware update is done so that the file and changes can be restored following the update.

The dynamic data fields (except the AdminStates) in this file will be replicated to the standby CMM via the CMM Data Synchronization Service. If invalid data is provided for a particular field (i.e. out of range), the default value, if one exists, will be used.

If invalid data is provided for a particular field (i.e. out of range), the default value, if one exists, will be used. If a default value is not possible, that entire section not be used. For example, PmsProcess012 will be ignored if no value is given for its CommandLine.

Database changes are classified in two categories: dynamic and static. Dynamic changes are initiated by an interface (RPC, CLI, or SNMP). The change will take effect in the PMS and the data in this file will be updated. Dynamic changes can be made while the PMS is running.

Process Monitoring and Integrity

Static changes are made directly to this file and must be done while the PmsMonitor is not running.

6.9.1 Global Data

This data applies to the PMS as a whole (not specific to a process). There must be one and only one set of this data.

6.9.1.1 PMS Administrative State

The PMS administrative state determines if monitoring of all processes will be allowed. Values: 1 - unlocked (enabled), 2 - locked. Default: 1. (dynamic)

AdminState = 1

6.9.1.2 PMS Excessive Reboot/Failover Cou nt

The maximum number of reboots or failover attempts allowed (over the interval specified in the field below).

Values: 2 - 255. Default: 3.

ExcessiveRebootOrFailoverCount = 3

6.9.1.3 PMS Excessive Reboot/Failover Interval

The interval, in seconds, over which the maximum number of reboots/failovers will be measured. Values: 1 - 65535. Default: 900.

MPCMM0001 Chassis Management Module Software Technical Product Specification 55

Page 56

Process Monitoring and Integrity

ExcessiveRebootOrFailoverInterval = 21600

6.9.2 Process Specific Data

This data applies to a specific process running on the CMM. There will be one set of this data for each process.

The following information describes each of the fields in the process specific section.

6.9.2.1 Process Section Name

The section name MUST follow the pattern "PmsProcessXXX" where XXX is a number from 010 to 175 inclusive. PmsProcess section names must be unique but are NOT significant in any other way. Specifically, they are NOT required to match the UniqueID field for the section.

[PmsProcess151]

6.9.2.2 Unique ID

This is a unique identifier for the program and its arguments. It is essentially the short version of the "Process Name and Arguments" field above.

Values:

0 = Reserved 1 = PMS Monitor (global) 2 = PMS Shadow 3 = PMS Monitor (for shadow monitoring) 4-9 = Reserved 10-150 = CMM processes 151-175 = User processes 176-200 = PIEs 201-255 = Reserved Default: None.

UniqueID = 151

6.9.2.3 Chassis Applicability

This is a list of chassis types for which this particular Uid is valid. The list is comma delimited. Spaces are ignored. If this key is not present, then the Uid is valid on all chassis.

Values: MPCHC0001, ZT5085, ZT5 088, ZT5089, ZT5090, ZT5091.

ChassisApplicability = MPCHC0001, ZT5085, ZT5088, ZT5089, ZT5090, ZT5091

6.9.2.4 Process Name and Arguments

This string contains the program name including its path and its associated command line arguments. This field will be used to monitor a program and therefore must be an exact match to how the program is represented in the OS. The program name and command line arguments are

56 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 57

space separated with the program name being the first entry in the string. If an individual argument contains spaces, the argument must be encapsulated in quotation marks. The program name and arguments will uniquely identify the entry. This means if the same program is started multiple times with different arguments, each of them will require a separate entry.

Values: N/A. Default: None.

CommandLine = MyProcess -x -y

6.9.2.5 Start Program Name and Arguments

This is the program name and arguments used to start the program. This differs from the monitoring program name and arguments because some programs are started via scripts. For example many Linux system programs are started via startup scripts located in the "init.d" directory.

Values: N/A. Default: None.

StartCommandLine = MyProcess -x -y

6.9.2.6 Administrative State

Process Monitoring and Integrity

The process administrative state determines if the process will be monitored. Values: 1 - unlocked (enabled), 2 - locked. Default: 1. (dynamic)

AdminState = 1

6.9.2.7 Process Existence Interval

This is the interval in seconds in which to verify that a process exists. A value of 0 disables Existence Monitoring.

Values: 0 - 65535. Default: 2.

ProcessExistenceInterval = 2

6.9.2.8 Thread Watchdog Retries

This is the number of retries (number of thread watchdog intervals) to wait for notification from a thread. Recovery takes place on retries+1 missed thread watchdog intervals.

Values: 0-10, default: 3.

ThreadWatchdogRetries = 3

6.9.2.9 Process Ramp-up Time

The amount of time in seconds necessary for the process to initialize and be functional. Values: 0-255. Default: 60.

ProcessRampUpTime = 3

MPCMM0001 Chassis Management Module Software Technical Product Specification 57

Page 58

Process Monitoring and Integrity

6.9.2.10 Process Severity

An indicator for the importance of a given process. This severity will determine at what level SEL entries are generated and when reboots should occur on an active CMM.

Values: 1 = minor, 2 = major, 3 = critical. Default: 1.

ProcessSeverity = 1

6.9.2.11 Recovery Action

This is the recovery action to take upon detection of a failed process. Values: 1 = no Action, 2 = process restart, 3 = failover and process restart, 4 = failover and reboot.

Default: 1. (dynamic)

RecoveryAction = 1

6.9.2.12 Process Restart Escalation Action

This determines the action to take if the RecoveryAction includes "process restart" and it fails. Values: 1= no action, 2 = failover and reboot. Default: 1. (dynamic)

ProcessRestartEscalationAction = 1

6.9.2.13 Process Restart Escalation Number

This is the number of process restarts that are allowed (within the interval specified below) before escalation starts.

Values: 1 - 255. Default: 5.

ProcessRestartEscalationNumber = 5

6.9.2.14 Process Restart Escalation Interval

This is the interval in seconds at which the number of restarts will be limited (see above). Values: 1 - 65535. Default: 900.

ProcessRestartEscalationInterval = 900

6.9.3 Process Definition Section of pms.ini

The following sections describe and give examples of each of the process types that are defined in the pms.ini file.

6.9.3.1 Shadow Process

Shadow process must exist to monitor "Monitor Process". Therefore this process should never have a recovery action of "no action".

58 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 59

[PmsProcess002]

UniqueID = 2

CommandLine = ./PmsMonitor shadow

StartCommandLine = ./PmsMonitor shadow

AdminState = 1

ProcessExistenceInterval = 2

ThreadWatchdogRetries = 5

ProcessRampUpTime = 5

ProcessSeverity = 2

RecoveryAction = 2

ProcessRestartEscalationAction = 1

ProcessRestartEscalationNumber = 10

ProcessRestartEscalationInterval = 4800

Process Monitoring and Integrity

6.9.3.2 Monitor Process

This process must exist to monitor all other processes. Therefore this process should never have a recovery action of "no action".

[PmsProcess003]

UniqueID = 3

CommandLine = ./PmsMonitor

StartCommandLine = ./PmsMonitor

AdminState = 1

ProcessExistenceInterval = 2

ThreadWatchdogRetries = 5

ProcessRampUpTime = 5

ProcessSeverity = 2

RecoveryAction = 2

ProcessRestartEscalationAction = 1

ProcessRestartEscalationNumber = 10

ProcessRestartEscalationInterval = 4800

MPCMM0001 Chassis Management Module Software Technical Product Specification 59

Page 60

Process Monitoring and Integrity

6.9.3.3 Chassis Wrapper Process

[PmsProcess023]

UniqueID = 23

CommandLine = ./WrapperProcess 23

StartCommandLine = ./WrapperProcess 23

AdminState = 1

ProcessExistenceInterval = 2

ProcessRampUpTime = 10

ProcessSeverity = 2

RecoveryAction = 2

ProcessRestartEscalationAction = 2

ProcessRestartEscalationNumber = 4

ProcessRestartEscalationInterval = 5400

6.9.3.4 CMM Wrapper Process

This process must exist to execute interface commands (CLI, SNMP, etc.) for the CMM. Therefore this process should never have a recovery action of "no action".

[PmsProcess050]

UniqueID = 50

CommandLine = ./WrapperProcess 255

StartCommandLine = ./WrapperProcess 255

AdminState = 1

ProcessExistenceInterval = 2

ProcessRampUpTime = 10

ProcessSeverity = 2

RecoveryAction = 2

ProcessRestartEscalationAction = 2

ProcessRestartEscalationNumber = 15

ProcessRestartEscalationInterval = 60

60 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 61

6.9.3.5 SNMP

[PmsProcess051]

UniqueID = 51

CommandLine = /usr/sbin/snmpd -c /etc/snmpd.conf

StartCommandLine = /usr/sbin/snmpd -c /etc/snmpd.conf

AdminState = 1

ProcessExistenceInterval = 2

ProcessRampUpTime = 30

ProcessSeverity = 2

RecoveryAction = 2

ProcessRestartEscalationAction = 2

ProcessRestartEscalationNumber = 4

ProcessRestartEscalationInterval = 5400

Process Monitoring and Integrity

6.9.3.6 CLI Server

[PmsProcess052]

UniqueID = 52

CommandLine = ./cli_svr

StartCommandLine = ./cli_svr

AdminState = 1

ProcessExistenceInterval = 2

ProcessRampUpTime = 10

ProcessSeverity = 2

RecoveryAction = 2

ProcessRestartEscalationAction = 2

ProcessRestartEscalationNumber = 5

ProcessRestartEscalationInterval = 300

6.9.3.7 Command Handler

Note: PmsProc053 represents a crucial process cmd_hand (command handler) of the CMM software

stack. This process cannot be restarted properly if it terminates unexpectedly. Hence, none of the recovery actions that attempt to restart a process i.e., 2 (Restart), 3 (Failover & Restart) are allowed

MPCMM0001 Chassis Management Module Software Technical Product Specification 61

Page 62

Process Monitoring and Integrity

as valid recovery actions for cmd_hand. The default recovery action for cmd_hand process is 4 (failover and reboot) and that cannot be changed to anything else. A recovery action of 1 (No Action) is also not allowed because of the severity of the process.

In the event that cmd_hand process terminates unexpectedly, and the default recovery action kicks in, there is 2-3 minute delay before the CMM actually reboots. This is normal and expected because PMS makes multiple tries to failover, and times out because cmd_hand does not respond.

[PmsProcess053]

UniqueID = 53

CommandLine = ./cmd_hand

StartCommandLine = ./cmd_hand

AdminState = 1

ProcessExistenceInterval = 2

ProcessRampUpTime = 10

ProcessSeverity = 3

RecoveryAction = 4

ProcessRestartEscalationAction = 2

ProcessRestartEscalationNumber = 5

ProcessRestartEscalationInterval = 300

6.9.3.8 BPM

Note: PmsProc054 represents a crucial process of the CMM software stack. This process cannot be

restarted properly if it terminates unexpectedly. Hence, none of the recovery actions that attempt to restart a process i.e., 2 (Restart) or 3 (Failover & Restart) are allowed as valid recovery actions for BPM. The default recovery action for BPM process is 4 (failover and reboot) which can only be changed to 1 (No Action).

[PmsProcess054]

UniqueID = 54

CommandLine = ./BPM

StartCommandLine = ./BPM

AdminState = 1

ProcessExistenceInterval = 2

ProcessRampUpTime = 10

ProcessSeverity = 3

RecoveryAction = 4

62 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 63

ProcessRestartEscalationAction = 2

ProcessRestartEscalationNumber = 5

ProcessRestartEscalationInterval = 300

6.9.3.9 Dynamic Wrapper Process 0-39

[PmsProcessXYZ]

UniqueID = XYZ

ChassisApplicability = MPCHC0001, ZT5091, MPCHC5091

CommandLine = /usr/local/cmm/bin/WrapperProcess N

StartCommandLine = /usr/local/cmm/bin/WrapperProcess N

AdminState = 1

ProcessExistenceInterval = 0

ProcessRampUpTime = 10

Process Monitoring and Integrity

ProcessSeverity = 2

RecoveryAction = 2

ProcessRestartEscalationAction = 2

ProcessRestartEscalationNumber = 4

ProcessRestartEscalationInterval = 5400

Where: XYZ: A unique number given to the process in the range defined in Section 6.9.2.2, “Unique ID”

on page 56 above (10-150).

N: The number of the wrapper process you are defining. For example, if defining wrap per process 21, then N would be 21.

6.9.3.10 Inet Daemon

[PmsProcess101] UniqueID = 101 CommandLine = xinetd -stayalive -reuse StartCommandLine = xinetd -stayalive -reuse AdminState = 1 ProcessExistenceInterval = 2 ProcessRampUpTime = 5

MPCMM0001 Chassis Management Module Software Technical Product Specification 63

Page 64

Process Monitoring and Integrity

ProcessSeverity = 2 RecoveryAction = 2 ProcessRestartEscalationAction = 2 ProcessRestartEscalationNumber = 5 ProcessRestartEscalationInterval = 300

6.9.3.1 1 Syslog Daemon

[PmsProcess102] UniqueID = 102 CommandLine = /sbin/syslogd StartCommandLine = /sbin/syslogd AdminState = 1 ProcessExistenceInterval = 2 ProcessRampUpTime = 5 ProcessSeverity = 1 RecoveryAction = 2 ProcessRestartEscalationAction = 2 ProcessRestartEscalationNumber = 5 ProcessRestartEscalationInterval = 300

6.10 Process Integrity Executable (PIE) Specific Data Config

This data applies to each Process Integrity Executable (PIE). One PIE may monitor multiple CMM processes or only one CMM process. There will be one set of this data for each PIE.

The following information describes each of the fields in the PIE specific section. Lines with a '*' prefix, indicate the actual fields (the prefix is not part of the field name).

6.10.1 PIE Section Name

The section name MUST follow the pattern "PmsPieXXX" where XXX is a number from 176 to 200 inclusive. PmsPie section names must be unique but are NOT significant in any other way. Specifically, they are NOT required to match the UniqueID field for the section.

[PmsPie176]

64 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 65

6.10.2 Process Integrity Executable

The name, including its path and command line arguments, of the PIE to be executed periodically. This is used to start the program and may, in the future, be used to monitor the program and therefore must be an exact match to how the program is represented in the OS. The program name and command line arguments will all be space separated with the program name being the first entry in the string. If an individual argument contains spaces, the argument must be encapsulated in quotation marks. The program name and arguments will uniquely identify the entry. This means if the same program is started multiple times with different arguments, each of them will require a separate entry. Each PIE will likely have PIE specific options that can be specified through the command line. This options must be included in the arguments to the "ProcessIntegrityExecutable" command.

ProcessIntegrityExecutable = ./PmsPieSnmp

6.10.3 Unique ID

This is a unique identifier for the executable and its arguments. It is essentially the short version of the "Process Integrity Executable" field above. It is used for logging and CSL access.

Values:

Process Monitoring and Integrity

UniqueID = 176

6.10.4 Administrative State

The PIE administrative state determines if the PIE will be restarted at the next interval. Values: 1 - unlocked (enabled), 2 - locked. Default: 1. (dynamic)

AdminState = 1

MPCMM0001 Chassis Management Module Software Technical Product Specification 65

Page 66

Process Monitoring and Integrity

6.10.5 Process Integrity Interval

This is the interval in seconds between executions of the PIE. Values: 0 - 65535, where 0 indicates that the PIE only gets executed once. Default: 3600.

ProcessIntegrityInterval = 3600

6.10.6 Chassis Applicability

This is a list of chassis types on which this particular Pie should be run. The list is comma delimited. Spaces are ignored. If this key is not present, then the Pie will run on all chassis.

Values: MPCHC0001, ZT5085, MPCHC5085, ZT5088, MPCHC5088, ZT5089, MPCHC5089, ZT5090, MPCHC5090, ZT5091, MPCHC5091.

ChassisApplicability = MPCHC0001, ZT5085, MPCHC5085, ZT5088, MPCHC5088, ZT5089, MPCHC5089, ZT5090, MPCHC5090, ZT5091, MPCHC5091

6.10.7 PmsPieSnmp Command Line

The command line usage of PmsPieSnmp is: PmsPieSnmp [-f SuccessiveFailureNumber] where:

-f : This is the number of allowed successive integrity failures before the PMS performs recovery on the faulting process. PMS performs recovery on "this number + 1".

Values: 1 - 100. Default = 3 For Example

PmsPieSnmp

PmsPieSnmp -f2

PmsPieSnmp -f 2

6.10.8 SNMP PIE Section of pms.ini

[PmsPie176]

ProcessIntegrityExecutable = ./PmsPieSnmp -f2

UniqueID = 176

AdminState = 1

ProcessIntegrityInterval = 300

66 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 67

6.11 WP/BPM PIE

The command line usage of PmsPieWp is: PmsPieWp [-s] [-d[NumberOfDynamicWrappersPerRun]] [-f SuccessiveFailureNumber] where:

-s: check static wrappers (optional)

-d: check dynamic wrappers and bpm threads (optional) Number of dynamic wrappers and bpm threads to check on each run (optional) Values: 0 - 100. Default : 0 = process all dynamic wrappers and bpm threads on each execution.

-f: Successive Failure Number - This is the number of allowed successive integrity failures before the PMS performs recovery on the faulting process. PMS performs recovery on "this number + 1".

Values: 1 - 100. Default = 3

Example:

Process Monitoring and Integrity

PmsPieWp -s -f2 - check static wrappers

PmsPieWp -d0 -f2 - check all dynamic wrappers and all BPM threads

PmsPieWp -s -d0 -f2 - Check static and all dynamic wrappers all BPM threads

PmsPieWp -s -d10 -f2 - Check static and 10 dynamic wrappers and BPM threads

PmsPieWp -s -d10 -f 2 - Check static and 10 dynamic wrappers and BPM threads

6.11.1 WP/BPM Section of pms.ini

[PmsPie180]

ProcessIntegrityExecutable = ./PmsPieWp -s -d0 -f2

UniqueID = 180

AdminState = 1

ProcessIntegrityInterval = 300

MPCMM0001 Chassis Management Module Software Technical Product Specification 67

Page 68

Power and Hot Swap Management

Power and Hot Swap Management 7

The CMM is responsible for the management of FRU hot-swap activities. The CMM listens to FRU hot-swap SEL messages from IPMI devices and distributes power to each FRU after negotiating with the respective IPMI device fronting the FRU. The CMM also manages the shelfwide power budget. The CMM also polls IPMI devices to get the status of each FRU fronted by the IPMI device. The CMM uses shelf FRU information to guarantee power-up sequence delays between boards.

Once the CMM receives the shelf FRU information on power budget and power sequence delays, it is ready to service FRU hot-swap requests from respective IPMI devices.

7.1 Hot Swap States

The CMM defines the hot swap status of a FRU as being in one of eight states. CMM documentation often refers to only the letter/number designation of that state (M0 - M7). Here is a list of what each of those states means:

• "State M0 - Not Installed"

• "State M1 - Inactive"

• "State M2 - Activation Request"

• "State M3 - Activation In Progress"

• "State M4 - Active"

• "State M5 - Deactivation Request"

• "State M6 - Deactivation In Progress"

• "State M7 - Communication Lost"

7.2 FRU Insertion

When the CMM receives a request that a FRU is ready to activate, it will compute the FRU’s power, get the power levels, and check the available power budget.

The Set_Power_Level command will be sent only when the necessary power budget, from each of the redundant power feeds, is available to satisfy FRU's desired power level. If a FRU can't be activated at the time of the request, it should remain in the M3 state and shall be powered up when the necessary power budget becomes available. If the FRU decides to operate at a lower power level and notifies the Shelf Manager and the new power level is within the current Shelf Power envelope, the CMM shall send the Set_Power_Level (new desired level) command to the FRU.

7.3 Graceful FRU Extraction

When the CMM receives a FRU Hot swap request for extraction, the CMM will send the deactivate state command, and the FRU will transition to M6 state and begin its shut-down procedures. Once the FRU has shut down, it transitions to M1 state, and the CMM then reclaims the FRU’s power and adjusts the power budget for the newly available power.

68 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 69

Power and Hot Swap Management

7.4 Surprise FRU Extraction/IPMI Failure

The CMM detects a surprise FRU extraction or a failure of the IPMI device fronting the FRU if a device previously in one of the M2-7 states reports a transition to the M2 state. If this scenario is detected, the CMM assumes one of three things has happened:

• Surprise extraction and reinsertion of the same (or another) FRU.

• IPMI Device fronting the FRU failed, FRU was extracted, then the same (or another) FRU is

reinserted.

• Watchdog Timer (WDT) on the IPMI device restarted the IPMI Device firmware.

Once this occurs, the CMM shall reclaim all the resources allocated to that FRU. The CMM will log a SEL message describing the situation, i.e. IPMI device failure or surprise extraction. From this point the CMM shall follow the sequence of actions described in Section 7.2, “FRU Insertion”.

7.5 Forced Power State Changes

An external authorized entity (e.g., a management interface like RMCP) can request FRU power state changes like Power OFF, RESET etc. The CMM is responsible for handling these requests.

7.6 Power Management on the Standby CMM

The standby CMM does not participate in any power management activities in the standby mode. The CMM is in a hot standby state on a standby CMM. The standby CMM starts performing power management activities as soon as it becomes the active CMM.

7.7 Power Feed Targets

The CLI allows certain get and set actions to be taken on power feeds for a location. They include the following dataitems; maxexternalavailablecurrent, maxinternalcurrent, and minexpectedoperatingvoltage. These dataitems are described in Section 8, “The Command Line

Interface (CLI)” on page 71.

To find the number of feed targets, use the command:

cmmget -l cmm -d feedcount

This returns an integer, indicating the number of power feeds. As an example, the MPCHC0001 chassis with four power feeds coming from the PEMs will return

the number 4, meaning there are four feed targets (feed1, feed2, feed3, and feed4). They correlate to the physical feeds on the MPCHC0001as follows:

feed1 = FeedA1 feed2 = FeedB2 feed3 = FeedA2 feed4 = FeedB1

Refer to the chassis documentation for more information on power feeds.

MPCMM0001 Chassis Management Module Software Technical Product Specification 69

Page 70

Power and Hot Swap Management

7.8 Pinging IPMI Controllers

The following lists the values of time to delay and number of pings that the CMM uses to determine the state of a FRU.

Table 18. Time to Delay and Number of Attempts

Variable Description Value

The number of microseconds to delay between

DelayBetweenPingLoops

DelayBetweenIPMControllerPings

NumberFailedAttemptsBeforeAlert

each ping loop. This is essentially the amount of time from the ping of the last IPMI Controller in the list to the ping of the first controller in the list.

The number of microseconds of delay between the ping on one controller that is in the list and the ping of the next one on the list. This delay does not apply after the last controller in the list.

How many failed attempts to contact the IPMI Controller must occur prior to raising an event that communication has been lost.

10000000 (10 seconds)

70 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 71

The Command Line Interface (CLI)

The Command Line Interface (CLI) 8

8.1 CLI Overview

The Command Line Interface (CLI) connects to and communicates with the intelligent management devices of the chassis, boards, and the CMM itself. The CLI is an IPMI-based library of commands that can be accessed directly or through a higher-level management application. Administrators can access the CLI through a Telnet session, SSH, or through the CMM's front panel serial port. The CLI functions are also available through SNMP get/set commands and an RPC interface. Using the CLI, users can access information about the current state of the system including current sensor values, threshold settings, recent events, and overall chassis health.

Note: The CLI uses the term “blade” when referring to boards.

8.2 Connecting to the CLI

The CMM provides three connections on its front panel.

• Two Ethernet connections via an RJ-45 connector

• An RS-232 serial port interface also via an RJ-45 connector

These same ports are also available on the rear transition module. Any of these interfaces can be used to log into the CMM as well as the Ethernet interface provided

through the backplane of a chassis. Use Telnet to log into the CMM over an Ethernet connection, or use a terminal application or serial console over the RS-232 interface. See the Intel® NetStructure™ MPCMM0001 Hardware Technical Product Specification for electrical pinouts of the above interfaces.

If logging in for the first time to set up or obtain the CMMs IP addresses, use the serial port console interface to perform configuration.

8.2.1 Connecting through a Serial Port Console

Connect an RS-232 serial cable with an RJ-45 connector to the serial console port on the front of the CMM. Set your terminal application settings as follows:

• Baud – 115200

• Data Bits – 8

• Parity – None

• Stop Bits – 1

• Flow Control – Xon/Xoff or none

Connect using your terminal application.

MPCMM0001 Chassis Management Module Software Technical Product Specification 71

Page 72

The Command Line Interface (CLI)

8.3 Initial Setup— Logging in for the First Time

Logging in for the first time must be done through the serial port console to properly configure the Ethernet settings and IP addresses for the network.

The username for the CMM is root. The default password is cmmrootpass. At the login prompt, enter the username: root When prompted for the password, enter: cmmrootpass The root password can be changed using the passwd command. For information on resetting the

CMM password back to default, refer to Section 9, “Resetting the Password” on page 99.

8.3.1 Setting IP Address Properties

Note: Changing any of the IP address settings and restarting the network could result in a failover

occurring based on the rules governing redundancy specified in Section 3, “Redundancy,

Synchronization, and Failover” on page 21.

By default, the CMM assigns IP addresses statically:

• eth0, labeled “Ethernet A” on the front panel, is configured with the static IP address

10.90.90.91

• eth1, labeled “Ethernet B” on the front panel, is configured with a static IP address of

192.168.100.92

• eth1:1, an alias of eth1 is used to always point to and be active on the active CMM, is

configured with a static IP address of 192.168.100.93

On initial power-up of a chassis with two CMMs, both CMMs will have the same IP addresses assigned by default. When the chassis is powered up, the standby CMM automatically decrements its IP address by one less than the active CMM if it detects a conflict.

Example:

1. A dual CMM Chassis is powered up.

2. Active CMM assigns IP address of 192.168.100.92 to eth1 on the acti ve CMM.

3. Standby CMM assigns IP address of 192.168.100.91 to eth1 on the standby CMM.

At this point the static IP addresses must be changed to appropriate values for their network configuration, and ensure that the two CMMs do not contain duplicate IP addresses on eth0 and eth1 to avoid address conflicts on the network.

eth0 and eth1 can also be set using DHCP. eth1:1 will always remain static.When setting both eth0 and eth1 to DHCP, use the /etc/pump.conf to determine which interface should own the default gateway. The default is for eth0 to own the default gateway. To configure eth1 to own the default

72 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 73

gateway, and thereby eth1:1, uncomment the two lines under the eth0 section of /etc/pump.conf and comment the two lines under the eth1 section of that file. Save the file and run the /etc/rc.d/ network reload script.

Note: It is recommended that both CMMs use static IP addresses for all interfaces. DHCP addresses may

be unexpectedly lost or changed in some network configurations.

Note: eth0 should always be set to a different subnet than eth1/eth1:1. Failure to set eth0 to a different

subnet than eth1 will cause network errors on the CMM and redundancy will be lost.

8.3.1.1 Setting Static IP Information for eth0

1. Open the /etc/ifcfg-eth0 file using the vi editor. By default, the file contains three variables.

Example:

— BOOTPROTO=“static” — DEVICE=“eth0” — STATICIP=“10.90.90.91”

2. Ensure the BOOTPROTO value is set to static.

Note: Linux is case sensitive, so ensure that the BOOTPROTO variable is entered in lower case letters in

the step above.

3. Set the STATICIP variable to the IP address you want to assign to that interface.

The Command Line Interface (CLI)

4. To set the netmask for eth0, add the NETMASK0 variable and set it to the appropriate netmask for your network.

5. To set the gateway for eth0, add the GATEWAY0 variable and set it to the appropriate value for the gateway on your network.

6. To activate the changes, at the user prompt (from the root “/” directory), type:

/etc/rc.d/network reload

8.3.1.2 Setting Static IP Information for eth1 and eth1:1

Note: eth1:1 is static IP address only. It does not support DHCP.

1. Open the /etc/ifcfg-eth1 file using the vi editor. By default the file contains five variables.

Example:

— BOOTPROTO=”static” — DEVICE=”eth1” — SETIP=”both” — STATICIP1=”192.168.100.91” — STATICIP2=”192.168.100.93”

2. Set the STATICIP1 variable to the IP address you want to assign to eth1.

3. Set the STATICIP2 variable to the IP address you want to assign to the active CMM on the network. This value should ONL Y be set on the active CMM, as it will be synchronized to and overwritten on the standby CMM.

4. Set the SETIP variable to assign IP addresses eth1 and eth1:1 based on the following table:

MPCMM0001 Chassis Management Module Software Technical Product Specification 73

Page 74

The Command Line Interface (CLI)

Table 19. SETIP Interface Assignments when BOOTPROTO=”static”

Interface SETIP=1 SETIP=2 SETIP=Both Other

eth1 STATICIP1 STATICIP2 STATICIP1 eth1:1 disabled disabled STATICIP2 disabled

5. Add the NETMASK1 variable and set it to the appropriate netmask for STATICIP1 for your network.

6. Add the NETMASK2 variable and set it to the appropriate netmask for STATICIP2 for your network. The NETMASK2 variable needs to be correct to allow for true redundant operation.

7. Add the GATEWAY1 variable and set it to the appropriate value for the gateway for STATICIP1.

8. Add the GATEWAY2 variable and set it to the appropriate value for the gateway for STATICIP2

9. To activate the changes, at the user prompt (from the root “/” directory), type:

/etc/rc.d/network reload

Note: The eth1:1 address should only be changed on the active CMM. The new address will be

synchronized to the standby CMM automatically when the /etc/rc.d/network reload command is executed. Also, the eth1:1 should be changed with the procedure above and NOT by using the ifconfig command manually. This method will cause the eth1:1 information to not be synchronized to the standby.

8.3.1.3 Setting eth0 to DHCP

Previous Value

1. Using the vi editor, change the BOOTPROTO variable in the /etc/ifcfg-eth0 file to dhcp.

Note: Linux is case sensitive, so ensure that the BOOTPROTO value is entered in lower case letters in

the step above.

2. To activate the changes, the user can reboot the CMM, or at the user prompt (from the root “/” directory) on the active CMM, type:

/etc/rc.d/network reload

Note: A DHCP server must be present on the network for the CMM to get a valid IP address. The

network reload command will refresh the IP addresses on both network interfaces.

8.3.1.4 Setting eth1 to DHCP

1. Using the vi editor, change the BOOTPROTO variable in the /etc/ifcfg-eth1 file to dhcp.

2. eth1:1 will still use a static IP address in this configuration. Set the STATICIP2 variable to the IP address you want to assign to the active CMM on the network. This value should ONL Y be set on the active CMM, as it will be synchronized to and overwritten on the standby CMM.

3. Add the NETMASK1 variable and set it to the appropriate netmask for STATICIP1 for your network.

74 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 75

4. Add the NETMASK2 variable and set it to the appropriate netmask for STA TICIP2 for your network. The NETMASK2 variable needs to be correct to allow for true redundant operation.

5. Add the GA TEWAY1 variable and set it to the appropriate value for the gateway for STATICIP1.

6. Add the GA TEWAY2 variable and set it to the appropriate value for the gateway for STATICIP2

7. Set the SETIP variable to assign IP addresses eth1 and eth1:1 based on the following table:

Table 20. SETIP Interface Assignments when BOOTPROTO=”dhcp”

Interface SETIP=1 SETIP=2 SETIP=Both Other

eth1 dynamic STATICIP2 dynamic dynamic eth1:1 disabled disabled STATICIP2 disabled

8. To activate the changes, at the user prompt (from the root “/” directory), type:

/etc/rc.d/network reload

8.3.2 Setting a Hostname

The Command Line Interface (CLI)

The hostname of the CMM is a logical name that is used to identify a particular CMM. This name is shown at login time just to the left of the login prompt on the serial port interface when configured (i.e., “MYHOST login:”) and advertised to any DNS servers on a network. If there is no entry in /etc/HOSTNAME, the login prompt will not have anything next to it. By default, the hostname is set to the product name (i.e. MPCMM0001).

The hostname should be configured on the each CMM. To change the hostname:

1. Using the vi editor, change the HOSTNAME variable in /etc/HOSTNAME to the desired name.

2. To activate the changes, at the user prompt (from the root “/” directory), type:

etc/rc.d/network reload

Note: Executing network reload also causes the network interfaces to reload their IP addresses. If DHCP

is being used on a network interface, then it is possible that the IP address on that interface will change.

8.3.3 Setting the Amount of Time for Auto-Logout

For security purposes, the CMM automatically logs the user out of the current console session after 15 minutes (900 seconds). This auto-logout time can be changed by editing /etc/profile and changing the TMOUT value to the desired setting. The time-out (TMOUT) value is set in seconds (900 seconds is the default). A setting of TMOUT=0 will disable the automatic logout. This can also be set at the command line.

MPCMM0001 Chassis Management Module Software Technical Product Specification 75

Page 76

The Command Line Interface (CLI)

8.3.4 Setting the Date and Time

On the active CMM, use the date command in the CLI to view the current date and time for the CMM. To set the date and time on the CMM use the setdate command. The setdate command should use the following syntax:

setdate “mm/dd/yyyy [timezone] hh:mm:ss”

The date is stored on the CMM in Coordinated Universal Time (UTC). The local timezone can be included in the setdate string, and the CMM will determine the offset and automatically change the date to UTC. An example that will set the date and time to “Thu Mar 11 20:12:00 UTC 2004” is:

setdate “3/11/2004 PST 12:12:00”

The date and time are synchronized to the standby CMM when changed and then every hour.

8.3.5 Telnet into the CMM

T o telnet into the CMM, point your console or telnet application to the IP address of the eth0, eth1, or eth1:1 interface on the CMM you wish to telnet to. If you wish to telnet to the active CMM, you can point the telnet application to the eth1:1 IP address. The “pointing” is accomplished using the Telnet open command. To get the IP address see Section 8.3.1, “Setting IP Address Properties” on

page 72.

8.3.6 Connect Through SSH (Secure Shell)

For a more secure connection, users can connect to the CMM using SSH, or Secure Shell. SSH is a secure interface and protocol used for encrypted connections between a client and a server. Using an SSH client, open the IP address of the eth0, eth1, or eth1:1 interface on the CMM you wish to establish an SSH session with. SSH clients can be found freely available on the Internet.

8.3.7 FTP into the CMM

For security purposes, the CMM will prevent users from accessing the CMM through FTP. So before FTP’ing into the CMM, ensure the “root” entry is removed from the /etc/ftpusers file using a text editor like vi. If this entry is not removed, you will be unable to login via FTP.

Using an FTP client, FTP to the IP address of the CMM you wish to transfer files to or from and use the CLI login and password.

8.3.8 Rebooting the CMM

T o reboot the CMM, type the reboot command in the CLI on the CMM that is to be rebooted. If the reboot command is issued on the active CMM in a redundant configuration, a failover to the standby CMM will occur. If the reboot command is issued on a CMM in a single CMM configuration, chassis management will be lost during the reboot process. Telnet and SSH sessions will have to be reestablished with the CMM after it is rebooted.

Note: Do not use the “init 0” or “init 6” command to reboot the CMM as problems may result.

76 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 77

The Command Line Interface (CLI)

8.4 CLI Command Line Syntax and Arguments

The command line interface on the CMM supports two types of commands: cmmget and cmmset. cmmget is used to query for information, whereas cmmset is used to write information.

There are man pages available on the CMM for these two commands. To access the man page for cmmget use the command man cmmget. To access the man page for cmmset, use the command man cmmset.

8.4.1 Cmmget and Cmmset Syntax

The syntax for calling the CLI from the command line is as follows:

cmmget [-h] [-l location] [-t target] -d dataitem

cmmset [-h] [-l location] [-t target] -d dataitem -v value

Where cmmget and cmmset are the CLI executables. The parameters can be in any order. The CLI is case insensitive, except for the executable name. Parameters shown in brackets are optional.

Any attribute value that contains a space must be enclosed in quotes. This happens often when specifying targets. For example, to get the current value of a sensor called Brd Temp on the CMM, the command would be:

cmmget –l cmm –t “Brd Temp” –d current

8.4.2 Help Parameter: -h

If the Help parameter is given, the rest of the parameters are ignored, and the help text is output to the user.

8.4.3 Location Parameter: -l

The Location parameter is the location in the system on which the user is executing the cmmget or cmmset on. If no location is given then the default location is the CMM.

Use the following cmmget command to list all valid locations in the chassis:

cmmget -d listlocations

The Location keywords are shown in the following table.

Table 21. Location (-l) Keywords

Keyword Function

cmm The Chassis Management Module.

bladeN

system The entire platform.

One of the CPU boards in the chassis. N refers to the chassis slot number into which the CPU board is inserted. Please refer to the Chassis documentation for slot information.

MPCMM0001 Chassis Management Module Software Technical Product Specification 77

Page 78

The Command Line Interface (CLI)

Table 21. Location (-l) Keywords

Keyword Function

chassis Chassis specific information.

The system fantray where N is the number of the fantray. For example,

fantrayN

PEM1 PEM2

fantray1 refers to the single fantray in the MPCHC0001 shelf. NOTE: fantray1 may also be referred to as blade15 in a 14 slot chassis or

blade17 in a 16 slot chassis.

The system Power Entry Modules. PEM1 is in the left slot when looking from the front of chassis and PEM2 is in the right slot. NOTE: PEM1 may also be referred to as blade16 and PEM2 as blade17 in

a 14-slot chassis; correspondingly they can be referred to as blade18 and blade19 in a 16-slot chassis.

8.4.4 Target Parameter: -t

The Target parameter is the sensor or variable that the cmmget or cmmset acts on. If target is not given then it is assumed that dataitem is an attribute of location. An example of this is

presence. To obtain a list of valid targets for a device, issue the following command:

cmmget [-l location] -d listtargets

Where location is the device for which you want to obtain a list of targets. The target parameter for plug-in bo ards and different chassis components is defined by the sensor

name in the Sensor Data Record (SDR) for that device. The various boards, fantrays, and PEMs provide their own SDRs automatically.

78 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 79

The Command Line Interface (CLI)

The following table shows the values target can be for the CMM location.

Table 22. CMM Targets

Brd Temp Board Temperature CPU Temp CPU Temperature FilterTrayTemp[1,2] Filter Tray Temperature Sensors CPU Core V CPU Core Voltage VBAT Battery Voltage VTT DDR CMM Memory voltage +2.5V +2.5V voltage sensor +3.3V +3.3V voltage sensor +5V +5V voltage sensor +12V +12V voltage sensor CDM [1,2] Chassis Data Modules 1 and 2 Air Filter Air Filter Filter Tray Filter Tray FRU Filter Run Time Filter Run Time BIST Built-In Self Test Sensor FRU Hot Swap FRU Hot Swap sensor Filter Tray HS Filter Tray Hot Swap sensor IPMB-0 Snsr [1-16] IPMB 0 sensors FRU FRU file for CMM all_leds Target for configuring all user-definable LEDs on the CMM front panel hsled Hot swap LED on the CMM front panel userled[1-4] Corresponds to userled A-D on the CMM front panel

feedN PmsGlobal Target for PMS global data

PmsProcN Target for each process monitored where N is the process number Datasync Status Datasync Status Sensor CMM Status CMM Status Sensor PmsPieN Process monitoring process integrity sensors None Same as not entering a target

Keyword Description

Corresponds to power feed (i.e. feed1, feed2). Use the feedcount dataitem to determine number of power feeds for component.

MPCMM0001 Chassis Management Module Software Technical Product Specification 79

Page 80

The Command Line Interface (CLI)

8.4.5 Dat aitem Parameter: -d

The dataitem is the parameter, identified by target and/or location, that the user is getting or setting. The dataitem must be given for every CLI command.

8.4.5.1 Location Dataitem lists

Table 23 through Table 29 list the valid dataitems for each location when no target is specified.

Table 23. Dataitem Keywords for All Locations

Dataitem Description Get/Set CLI Get Output Valid Set Values

listdataItems

health

Used to find out what data items are available on a target or location.

Retrieves the health information about a particular location or target.

Get

Table 24. Dataitem Keywords for All Locations Except System

Listing of all valid data items that can be issued for the specified location or target

"Location/Target has no/minor/major/ critical problems"

N/A

Dataitem Description Get/Set CLI Get Output Valid Set Values

listgetdataitems

listsetdataitems

healthevents

listtargets

Lists all available dataitems that can be retrieved with cmmget.

Lists all available dataitems that can be set with cmmset.

Retrieves events that contribute to the health of the location or target. This is a list of events currently active on the location or target. Health events strings are documented in Section 11, “Health Events”

on page 104

Used to find what sensors or targets are available on the location. This is the list of sensors defined by the SDR for that particular location.

Get

Listing of all valid get data items that can be issued for the specified location or target

Listing of all valid set data items that can be issued for the specified location or target

List of currently active events. E.g. "Major Event : +12V_B Lower critical

going low asserted Major Event : +12V_A Lower critical

going low asserted"

Listing of all the targets that are available on the location

N/A

80 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 81

The Command Line Interface (CLI)

Table 25. Dataitem Keywords for All Locations Except Chassis and System (Sheet 1 of 4)

Dataitem Description Get/Set CLI Get Output Valid Set Values

"GetDeviceID: < interpreted string without label>

Device ID = <Device ID> SDR Support = <device provides Device

SDRs> Device Revision = <Device revision> Device Available = <Device available:

0=normal operation, 1=device firmware> Firmware Revision = <Firmware revision:

major.minor> IPMI Version = <IPMI version> Chassis Support = <Additional chassis

device support> Bridge Support = <Additional bridge

support> IPMB Event Generator Support =

IPMB Event Receiver Support = <Additional IPMB Event Receiver support>

FRU Inventory Support = <Additional FRU inventory device support>

SEL Support = <Additional SEL device support>

SDR Repository Support = <Additional SDR Repository device support>

Sensor Support = <Additional sensor device support>

Manufacturer ID = <Manufacturer ID> Product ID = <Product ID> Aux Firmware Revision = <Auxiliary

firmware revision information>"

"<location> activation locked bit is set. If in M1, <location> cannot transition to M2 until unlocked"

OR "<location> activation locked bit is not

set. <location> can transition from M1 to M2"

N/A

1=activate FRU 0=deactivate FRU

1=set locked bit 0=clear locked bit

deviceid

fruactivation

fruactivationpolicy

Retrieves the device’s SDR support, hardware revision, firmware / software revision, and sensor and event interface command specification revision information. Implements Get Device ID command. See IPMI 1.5 Specification Section 17.1.

Set the activation state to either activate or deactivate the FRU. The Deactivate is the same as a Graceful Shutdown.

Get or Set the FRU activation policy. A Get returns whether the “Locked Bit” is set. For example, if blade 11 activation locked bit is set, and if in M1, then blade 11 cannot transition to M2 until unlocked. If blade 11 activation locked bit is not set then blade 11 can transition from M1 to M2.

Get

Set N/A

Both

MPCMM0001 Chassis Management Module Software Technical Product Specification 81

Page 82

The Command Line Interface (CLI)

Table 25. Dataitem Keywords for All Locations Except Chassis and System (Sheet 2 of 4)

Dataitem Description Get/Set CLI Get Output Valid Set Values

Set the FRU payload to do things like Cold Reset, Warm Reset, etc.

frucontrol

hotswapstate

fruextractionnotify

ledproperties

picmgproperties

powerlevels

The CMM location only supports 2 (graceful reboot) and will only work on standby CMM. Using frucontrol on an active or single CMM will attempt a failover before executing the command. If failover is unsuccessful, fruncontrol will not execute and return an error.

Retrieves the FRU’s current M state (0-7).

Used to notify the Shelf Manager that a FRU has been extracted from the shelf. Example is "cmmset -l <location> -d fruextractionnotify -v 1"

Find out the number and type of LEDs the FRU supports and which LED it can control. Implements the Get FRU LED Properties command. See PICMG 3.0 Section 3.2.5.6.

Query the maximum FRU Device ID supported by the IPMI controller. Implements Get PICMG Properties command. See PICMG 3.0 Table 3-9.

Returns the power levels available for a FRU and the number of watts drawn by each.

Set N/A

Get "<location> Hot Swap state is M[x]" N/A

Set N/A 1=Extract FRU

Information pertaining to number and control of the LEDs

"<location> has control of <main_leds> <location> supports

<number_user_leds> user leds" Where

Get

<location> is the -l parameter (can be a sub FRU)

<main_leds> is Comma-separated list of <led> items

<led> is hsled, led1, led2, led3 <number_user_leds> is the decimal

number of user LEDs supported by FRU "PICMG Properties: < interpreted string

without label>

PICMG Properties ID = <PICMG ID> PICMG Extension Version = <PICMG

extension version=major.minor> Max FRU Device ID = <Max FRU device

ID> FRU Device ID = <FRU device ID for

IPMI controller>" "ATCA FRU Power Levels:

Power Level 1 = A watts ... Power Level n = B watts"

0=cold reset 1=warm reset 2=graceful reboot 3=issue diagnostic

interrupt

N/A

82 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 83

The Command Line Interface (CLI)

Table 25. Dataitem Keywords for All Locations Except Chassis and System (Sheet 3 of 4)

Dataitem Description Get/Set CLI Get Output Valid Set Values

Retrieves and sets power state of a blade.

Get: This is used to find out the powered on/off or offline state of a blade

powerstate

presence

presentpowerlevel

sel

Set: T o reboot, shutdown and turn on a blade

If “reset” is used on CMM location, the software will check for redundancy and a reset will only occur if a redundant partner is identified.

“PowerOff” is not supported on the CMM location.

Used to find out if a particular location is occupied or present in the chassis. This can be blades or intelligent entities like fan trays or PEMs.

Get/Set the current power level of a FRU.

Returns the System Event Log of the specified location.

Both

Get

"<location>: <currentState> (Mx)" where Mx is the ATCA-defined M state of

the location

"<location> is <presenceState>." Where <presenceState> is "present" - if the location is present "not present" - if the location is not

present or the CMM can not communicate with it.

"The FRU Power Level is <PwrLevel> Consuming <WattageValue> Watts"

where <PwrLevel> is the current power level of

the FRU in the range 1-20 <WattageValue> is the current power

draw in watts Listing of the interpreted SEL log of the

location. The listing is of the format: “<Entry1>\n\n<Entry2>…” where <EntryM> is of the format:

Base Interface Ekeys: <EkeyList>

"Reset" "PowerOff" "PowerOn"

N/A

Fabric Interface EKeys: <EkeyList>

grantedboardekeys

Get the EKeys that have been granted to the Board.

Get

Update Channel Interface EKeys: <EkeyList>"

where <EkeyList> is a list of ekey settings for the interface such as

Ekey1 : enabled EKey2 : disbaled EKey3 : no set

N/A

MPCMM0001 Chassis Management Module Software Technical Product Specification 83

Page 84

The Command Line Interface (CLI)

Table 25. Dataitem Keywords for All Locations Except Chassis and System (Sheet 4 of 4)

Dataitem Description Get/Set CLI Get Output Valid Set Values

Metallic Test Bus Pair #1: Token Owned: Yes/No Owner's IPMBAddress: IPMBAddress

Metallic Test Bus Pair #2: Token Owned: Yes/No Owner's IPMBAddress: IPMBAddress

Sync Clock Group #1: Token Owned: Yes/No

busedekeys

Get a list of Bused EKeys and who owns them.

Get

Owner's IPMBAddress: IPMBAddress

N/A

Sync Clock Group #2: Token Owned: Yes/No Owner's IPMBAddress: IPMBAddress

Sync Clock Group #3: Token Owned: Yes/No Owner's IPMBAddress: IPMBAddress

Used to query the total number of FRUs in a particular location. Once the number of FRUs for the

totalfrus

frudeactivationpolicy

ipmicommand Set

rawsel

location is known, the FRU can be specified by the format "-l location:fru#". Not specifying the ":fru#" part will direct the command to FRU ID 0

Get/Set the deactivation policy of the FRU. In PICMG

3.0 ECN 1 this refers to the deactivation locked bit. The FRU can be specified by the format “-l [location:fru#]. Not specifying the “:fru#” will direct the command to FRU

Used to list SEL in raw format.

Get integer number N/A

Both

Get

where "Owner's IPMBAddress" is displayed when "Token Owned" is set to "Yes".

1 - Locked bit is set 0 - Locked bit is not set

Command Response string on success or error code on failure.

"Listing of raw format SEL log of the location. The listing is of the format:

<Entry1>\n\n<Entry2>… Where : <EntryM> is of the format: <Timestamp in

Linux date format>\n\t<SensorName>\t<EventDescr iption>"

1- Set the locked bit

0 - Clear the locked bit

Command request string

N/A

84 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 85

The Command Line Interface (CLI)

Table 26. Dataitem Keywords for Chassis Location

dataitem Description Get/Set CLI Get Output Valid Set Values

A numerical value between 0 and 100 (i.e., “70”), “localcontrol”, or “emergencyshutdow n”

(localcontrol is not supported on the MPCHC0001 chassis fan tray.)

Location String less than 16 characters in length.

fanspeed

location

Used to get or set the fan speed of all fans in the chassis. Value is percent of the maximum fan speed. See

Section 16, “Fan Control and Monitoring” on page 132 for

more information.

This is used to get or set the Location field in the chassis FRU and is sent out as a part of SNMP and UDP alarts. This is only used with the chassis location.

Both

The percentage of the max speed, "Emergency Shut Down", or "Local Control". For example, 80 for 80% of the max speed.

"Shelf Address: <address>" Where: <address> is a space-separated list

of two-digit, hex numbers if the address’ type/len byte is 0, decoded string otherwise

MPCMM0001 Chassis Management Module Software Technical Product Specification 85

Page 86

The Command Line Interface (CLI)

Table 27. Dataitem Keywords for Cmm Location (Sheet 1 of 7)

dataitem Description

Used to find out all the

listlocations

listpresent

clearsel

powerbudget

locations that can be queried. This list can contain both present and non-present locations.

Used to query the locations that are currently present in the shelf that the CMM can communicate with.

Clears the event log of the entire shelf.

cmmset -d clearsel -v clear

Get information about the overall power budget, how much is used, how much available

Get/

Set

All possible locations in the shelf e.g. “cmm blade1 blade2 blade3 blade4 blade5 blade6 blade7 blade8

Get

Set N/A clear

Get

blade9 blade10 blade11 blade12 blade13 blade14 FanTray PEM1 PEM2 chassis system”

All the present locations in the shelf. It is the subset of listlocation. e.g.:

“cmm blade1 blade2 FanTray PEM1 PEM2 chassis system”

"Shelf Power Budget Distribution: Feed #1 = A Watts Feed #2 = B Watts Feed #3 = B Watts ... Feed #N = D Watts"

CLI Get Output Valid Set Values

N/A

86 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 87

The Command Line Interface (CLI)

Table 27. Dataitem Keywords for Cmm Location (Sheet 2 of 7)

dataitem Description

Retrieve or set the state of the Telco Alarm cutoff. When enabled, it silences the Telco alarm for active events and

alarmcutoff

alarmtimeout

criticalled minorled majorled

Ethernet

EthernetA EthernetB

blinks the event LEDs on the CMM. This dataitem is only valid when used with the cmm as the location and is used to set the alarm cutoff or get its value.

Retrieve or set the timeout value in minutes for the Telco Alarm cutoff. This is the amount of time before the alarm cutoff will automatically become unset if the user doesn’t unset it themselves. This dataitem is only valid when used with the cmm as the location and is used to set the alarm timeout or get its value.

Used only with the CMM location to turn on or off the critical, major and minor leds. When used with cmmset, a –v value of 1 turns the LED on while a 0 turns it off.

Included for backward compatibility only.

The mapping of the command for existing dataitem is:

Ethernet = EthernetA

Used only with the CMM location to change the eth0/ eth1 direction to either the front panel, the rear panel IO card, or backplane.

The mapping of the command for existing dataitems are:

EthernetA = cmm1EthernetA + cmm2EthernetA

EthernetB = cmm1EthernetB + cmm2EthernetB

Get/

Set

Both

Both "Timeout is <timeoutvalue> minutes."

Both

"Telco Alarm Cutoff is <enabled/ disabled>."

"1" if the LED is On "0" if the LED is Off

"front" or "rear" or "backplane"

For example, bash-2.04# cmmget -d ethernet cmm1ethernetA: front cmm2ethernetA: front

"Front" or "Rear" or "Backplane". For example,

bash-2.04# cmmget -d ethernetA cmm1ethernetA: front cmm2ethernetA: front

bash-2.04# cmmget -d ethernetB cmm1ethernetB: front cmm2ethernetB: front

CLI Get Output Valid Set Values

1 = Set cut off 0 = Unset cut off

Number of minutes: 0-1000.

Value of 0 disables the time-out.

1 - Turn On LED 0 - Turn Off LED

"Front" – Set direction to the front panel.

"Back" – Set direction to the rear IO panel card.

"Front", "Rear", or "Backplane"

MPCMM0001 Chassis Management Module Software Technical Product Specification 87

Page 88

The Command Line Interface (CLI)

Table 27. Dataitem Keywords for Cmm Location (Sheet 3 of 7)

dataitem Description

cmm1EthernetA cmm1EthernetB cmm2EthernetA cmm2EthernetB

version

update

redundancy

failover

rmcpenable

Used only with the CMM location to change the eth0/ eth1 direction to either the front panel, the rear panel IO card, or backplane on CMM1 and/or CMM2.

The version of the CMM software.

Used with the cmmget command to update the CMM firmware on the CMM.

In a redundant system, updates should only be done on one CMM at a time in order to maintain chassis management.

Refer to Section 23, “Updating

CMM Software” on page 204

for additional information.

Returns the list of CMMs in the shelf and their status.

Used with cmmset from the active CMM to force a failover to the standby. This will only complete successfully if the standby CMM is in a state where it can handle a failover.

Enable/Disable RMCP interface.

Get/

Set

"Front" or "Rear" or "Backplane". For example,

bash-2.04# cmmget -d cmm1ethernetA

cmm1ethernetA: front

bash-2.04# cmmget -d

Both

Get

Set N/A

Get

Set N/A

Both

cmm1ethernetB cmm1ethernetB: front

bash-2.04# cmmget -d cmm2ethernetA

cmm2ethernetA: front

bash-2.04# cmmget -d cmm2ethernetB

cmm2ethernetB: front "Version:

[Generation].[SRA].[Patch].[Build]" where Generation: Firmware Generation. SRA: Release in that Generation. Patch: Patch number. Build: Build number. E.g. Version:5.1.0.11

CMM 1: Present (active) * CMM 2: Not Present (standby) * = The CMM you are currently logged

into.

"1" - RMCP Enabled "0" - RMCP disabled

CLI Get Output Valid Set Values

"Front", "Rear", or "Backplane"

N/A

"<cmm image location> ftp:<hostname or IP address>:username :password"

N/A

“1” = Failover to standby CMM with equal or newer firmware version.

“any”= Failover to standby CMM regardless of firmware version.

"1" - RMCP Enable "0" - RMCP disable

88 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 89

The Command Line Interface (CLI)

Table 27. Dataitem Keywords for Cmm Location (Sheet 4 of 7)

dataitem Description

snmpenable

snmptrapaddress[1-5]

snmptrapcommunity

snmptrapport

snmptrapversion

airfilterruntimelimit

resetairfilterruntime

Used to set or query SNMP trap enabled status.

Get or Set the machine’s IP address that will receive SNMP traps from a location. Up to five addresses can be set. Default is 0.0.0.0 for all 5. Example:

cmmset -l cmm -d snmptrapaddress3 -v

10.10.241.105 Get or Set the SNMP trap

community name. Example:

cmmget -l cmm -d snmptrapcommunity

Returns: SNMP trap community:

publiccmm Get or Set the TCP/IP port that

the SNMP trap will be sent to. The default is 162.

Retrieves or sets SNMP trap version. This is either v1 or v3.

Returns the uppercritical limit. Note: It uses the sensor to

display the runtime value in days since the last reset.

To retrieve the uppernoncritical limit use the command:

cmmget -t "filter run time" -d uppernoncritical

(or -d thresholdsall)

Resets the air filter runtime to

0. The set is supported to allow the user to set the run time to zero when the filter is replaced.

Get/

Set

Both SNMP traps are <enabled/disabled>.

"SNMP trap address: IpAddress"

Both

Both "SNMP trap port: portNumber"

Both "SNMP trap version: v1/v3" “v1” or “v3”

Both <uppercritical limit> Days

Set N/A

where IpAddress is of the format A.B.C.D

"SNMP trap community: communityValue"

CLI Get Output Valid Set Values

1=Enable 0=Disable enable disable

<IpAddress> Where: <IpAddress> is a

Valid IP address in the form: A.B.C.D

Where <CommunityName > is any Valid SNMP community name 64 characters or less. Ex: publiccmm

Valid port number 0-65535

1. Disable eventing on air filter run time

-v 0

2. Enable eventing on air filter and set the uppercritical limit to (xxx) days, and it also sets the upper non-critical value to 90% of the uppercritical.

-v xxx

3. Enable eventing on air filter and set the uppercritical limit to (xxx) days and the upper noncritical to (yyy) days

-v xxx,yyy xxx and yyy must

be an integer between 1 and 255.

1 - Only accepts a value of 1

MPCMM0001 Chassis Management Module Software Technical Product Specification 89

Page 90

The Command Line Interface (CLI)

Table 27. Dataitem Keywords for Cmm Location (Sheet 5 of 7)

dataitem Description

syncuserledstate

powersequence

loginmessage

cmdlineprompt

FaultLEDColor

Gets/Sets whether the LED state is synced between the active and standby CMM.

Used to get/set the power sequence order, Power Sequencing Delay, ShelfManagerControlledActiva tion in the CDM.

Note: The power sequencing delay is in tenths of a second to delay before powering up any other FRU after powering this FRU. The value of the power sequencing delay is between 0 and 63. Shelf Manager Controlled Activation determines if the Shelf Manager activates the FRU residing at this location when it reaches M2.

Used to customize the login screen message by allowing user to add the OEM name.

Used to customize the bash prompt by allowing user to add the OEM name.

Get/Set the color of the fault/ health LED on the CMM fronted FRUs (Filter Tray, CDM) to be used when an error is reported. Does not affect CMM Health LED.

Get/

Set

Both "Yes" or "No" “Yes” or “No”

It will be in INI format displayed on the console as follows:

[Settings] EntryCount=xxx

[Power Sequence 1] Location=cmm1 FRUDeviceID=0 ShelfManagerControlledActivation=Ye

Both

Both "amber" or "red" “amber” or “red”

s DelayBeforeNextPowerOn=0

[Power Sequence 2] … …

[Power Sequence xxx] Location=blade12 FRUDeviceID=0 ShelfManagerControlledActivation=N

o DelayBeforeNextPowerOn=0

The OEM string that is displayed at the login screen

The OEM Name string to be prepended to the $PS1 variable

CLI Get Output Valid Set Values

INI file with its path, such as "-v /home/ PowerSeq.INI"

"OEMWelcomeMes sage" where the OEMWelcomeMess age is the message that will appear at the login screen

Max length = 63 characters

"OEMName" where the OEMName is the string that will appear at the beginning of the bash prompt.

Max length = 63 characters

90 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 91

The Command Line Interface (CLI)

Table 27. Dataitem Keywords for Cmm Location (Sheet 6 of 7)

dataitem Description

Used to set or query the adminstrative state of the PMS as a whole or an individual monitored process. A target of “PmsGlobal” will get/set the state of the PMS as a whole. A target of “PmsProc[#]” will get/ set the unique state of an individual process, where # is

AdminState

RecoveryAction

EscalationAction

ProcessName

OpState

the unique process number for the process. A target of "PmsPie[#]" will get/set the unique state of an PIE, where # is the unique pie number.

AdminState is CMM-specific and is not synced between CMMs. It allows individual control of each CMMs adminstate and can be set on either active or standby CMM.

Used to set or query the recovery action of a PMS monitored process. This is only valid for a target of "PmsProc[#]. Where "#" is the unique number for the process.

Used to set or query the process restart escalation action. This is only valid for a target of "PmsProc[#]. Where "#" is the unique number for the process.

Used to query the process name and associated command line arguments for a monitored process. A target of "PmsProc[#]” will retrieve the name of an individual process, where "#" is the unique number for the process. "PmsPie[#]" will retrieve

the path and command line arguments, of the PIE to be executed periodically.

Used to query the operational state of a monitored process. An operational state of disabled indicates that the process has failed and cannot be recovered. This is valid for a target of "PmsProc[#]” and “PmsGlobal”, where "#" is the unique number for the process, and PmsGlobal refers to the OpState for all of PMS. This is also valid for a target of Pie[#].

Get/

Set

Both "1:Unlocked" or "2:Locked"

Both

Both "1:No Action", "2:Failover and Reboot"

Get

Get "1:Enabled", "2:Disabled" N/A

"1:No Action", "2:Process Restart", "3: Failover and Restart", or "4:Failover and Reboot"

"<Process_Name> <Command_Line_Arguments>"

CLI Get Output Valid Set Values

1 = Unlocked 2 = Locked

1 = No Action 2 = Process Restart 3 = Failover and Restart 4 = Failover and Reboot

1 = No Action 2 = Failover and Reboot

N/A

MPCMM0001 Chassis Management Module Software Technical Product Specification 91

Page 92

The Command Line Interface (CLI)

Table 27. Dataitem Keywords for Cmm Location (Sheet 7 of 7)

dataitem Description

standbycmmreboot

feedcount

failoveronredundancy

syncuserscripts

snmptrapformat

snmpsendunrecogniz edevents

selformat

seldisplayunrecogniz edevents

temperaturelevel

Used to request to reboot the standby CMM from the active CMM.

Get the power feed count for that location. Determines number of feed targets (i.e., feed1) for that location. See

Section 7.7, “Power Feed Targets” on page 69.

Used to set the failover configuration flag

Used to set the direction of synchronization for home scripts when the CMM versions differ.

Uset to get/set SNMP trap format

Used to get/set the option if to send unrecognized events. Used only when snmptrapformat is set to 1.

Used to get/set the option if to display unrecognized events in SEL. Used only when selformat is set to 1.

Used to query the current temperature level of the fantray.

Get/

Set

Set N/A

Get Integer number N/A

Both automatic, manual automatic, manual

Both

Get “Normal”, “Minor”, “Major”, “Critical” N/A

“upgrade”, “downgrade”, “always”, “equal”

“1” - text “2” -raw “3”-text+raw

0 - don't send 1 - send

“0” - don't send “1” - send

“0” - don't display “1” - display

CLI Get Output Valid Set Values

1 - Request to reboot the standby CMM

upgrade, downgrade, always, equal

“1” - text “2” - raw “3” - text+raw

“0” - don't send “1” - send

“0” - don't display “1” - display

T a ble 28. Dataitem Keywords for System Location

dataitem Description

unhealthylocations

clearmajor

clearminor

Used to query which locations have active health events

Clear major alarm LED on the active CMM.

Clear minor alarm LED on the active CMM.

Get/

Set

"Critical : CritList Major : MajList

Get

Set N/A

Minor : MinList" where each list is a list of locations

having that level of health events (space separated)

CLI Get Output Valid Set Values

N/A

1 - Only accepts a value of 1

92 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 93

Table 29. Dataitem Keywords for FantrayN Location

The Command Line Interface (CLI)

dataitem Description

minorlevel

normallevel

control

defaultcontrol

restoredefaults

minimumsetting

maximumsetting

recommendedsetting

currentfanlevel

Used to set or query the minorlevel for the fantray.

Used to set or query the normallevel for the fantray.

Used to set or query the control mode of the fantray.

Used to set or query the defaultcontrol mode of the fantray.

Used to restore the cooling table defaults of the fan tray to the vendor defaults or code defaults.

Used to query the minimum setting of the fantray returned via the getfantray properties IPMI command.

Used to query the maximum setting of the fantray returned via the getfantray properties IPMI command.

Used to query the recommended setting of the fantray returned via the getfantray properties IPMI command.

Used to query the current cooling level of the fantray.

Get/

Set

Both

Set N/A true

Get N/A

Get

CLI Get Output Valid Set Values

Any value between the normallevel and the majorlevel of the fantray.

Any value between the minimumsetting and the minorlevel of the fantray.

EmergencyShutdown fantray CMM defaultcontrol

fantray CMM

N/A

8.4.5.2 Target Dataitem Lists

When a target is specified, there is usually a slightly different set of dataitems specifically for that target. Refer to Section 8.4.4, “Target Parameter: -t” on page 78 for more information on the target parameter. Table 30 lists the possible dataitems used with various targ ets.

MPCMM0001 Chassis Management Module Software Technical Product Specification 93

Page 94

The Command Line Interface (CLI)

Table 30. Dataitem Keywords Used with the Target Parameter (Sheet 1 of 4)

dataitem Description

listdataitems

health

healthevents

current The current value of a sensor. Get

thresholdsall

Lists the available dataitems for that target.

Returns the health of the target and if any events exist. The returned values will be one of OK, minor, major, or critical.

Returns the specific health events that are occurring on the target if any exist.

All thresholds of a sensor. This includes lower nonrecoverable, lower critical, lower non-critical, upper noncritical, upper critical, and upper non-recoverable.

Get

Get/

Set

Listing of all valid data items that can be issued for the specified location or target

"Location/Target has no/minor/major/ critical problems"

List of currently active events. E.g. "Major Event : +12V_B Lower critical

going low asserted Major Event : +12V_A Lower critical

going low asserted" "The current value is currentValue

[Units]" "Upper Non-recoverable:

ThresholdValue [Units] Upper Critical:

ThresholdValue [Units] Upper Non-critical:

ThresholdValue [Units] Lower Non-critical:

ThresholdValue [Units] Lower Critical:

ThresholdValue [Units] Lower Non-recoverable:

ThresholdValue [Units]"

CLI Get Output Valid Set Values

N/A

If a certain threshold is not supported, the ThresholdValue will display "Not Supported"

One of the following: “Upper Critical:

ThresholdValue [Units]” uppernonrecoverable uppercritical uppernoncritical lowernoncritical lowercritical lowernonrecoverable

Used to query individual thresholds for a value based sensor, such as temperature or voltage.

Get

“Upper Non-critical:

ThresholdValue [Units]”

“Lower Non-critical:

ThresholdValue [Units]”

“Lower Critical:

ThresholdValue [Units]”

“Lower Non-recoverable:

ThresholdValue [Units]”

“Lower Non-recoverable:

ThresholdValue [Units]”

N/A

94 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 95

The Command Line Interface (CLI)

Table 30. Dataitem Keywords Used with the Target Parameter (Sheet 2 of 4)

dataitem Description

Used to configure user-defined actions when events occur. This dataitem is used with a target (–t) parameter specified sensor and a value (-v) parameter. When an event happens for that particular sensor, then the script defined in the –v parameter will be executed. The script to be

criticalaction majoraction minoraction normalaction

eventaction

ledcolorprops

executed must be located in the /home/scripts/ directory on the CMM and the /home/scripts path should be omitted when specifying the script.

Example: cmmset -l blade9 -t +5V -d

minoraction -v “powerdownblade 9”

In this example, /home/scripts/powerdownblade will be executed with a parameter of 9 when the +5V sensor on blade1 generates a minor event.

Used to trigger a script based on event code of a health event. Refer to Section 18,

“CMM Scripting” on page 164

Gets a FRU LED’s valid color set. This command returns a comma separated list of supported colors, the default local control color, and the default override color. This command should be issued before a ledstate set command. Implements the Get LED Color Capabilities command. See PICMG 3.0 table 3-24.

Get/

Set

If set, the full path of the script e.g. /

Both

Set N/A

Get

home/scripts/EventScript. If not set, output is ““ (null).

Color properties of the LED "<ledtarget> supports <colors> Default local control color is

<colorList> Default override color is <color>" Where: <ledtarget> is One of the valid LEDs

(hsled, led1, led2, led3, userled1userled251)

<colorList> is Comma-separated list of <color> items

<color> is one of blue, red, green, amber, orange, white

CLI Get Output Valid Set Values

"<ScriptName> arg1 arg2 …argN"

Where Script name (not full path) is the script file name and arg1-argN are the parameters to the script.

Use “none” to remove an existing entry.

“<event code>:<ScriptName > arg1 arg2...argN” Where event code is the event code associated with the event to associate with the script. ScriptName (not full path) is the name of the script file, and arg1..argN are any parameters required with the script.

Use “<eventcode>:none” to remove an existing entry.

N/A

MPCMM0001 Chassis Management Module Software Technical Product Specification 95

Page 96

The Command Line Interface (CLI)

Table 30. Dataitem Keywords Used with the Target Parameter (Sheet 3 of 4)

dataitem Description

Gets or Sets a FRU LED’s state. The Get returns the LED’s mode, one of {localcontrol, override, or lamptest}, and a function message. Implements the Get/ Set FRU LED State commands. See PICMG 3.0 tables 3-26 for Get and 3-25 for Set.

Set syntax model: cmmset -l <location> -t <LED>

ledstate

-d ledstate -v <function>, <function options>

Example: cmmset -l cmm -t “userled1” -d

ledstate -v blink,300,700,green

This sets the CMM’s user1 LED to blinking green with an off duration of 300 ms and an on duration of 700 ms.

Get/

Set

Both

CLI Get Output Valid Set Values

"<ledtarget> is in <LEDmode> mode

<function message>"

where

<LEDMode> is one of localcontrol/

override/ lamptest

<function message> is one of the

following, depending on the LED’s

current function:

If LED is off:

function is off

If LED is on:

function is on

color is <color>

If LED is blinking:

function is blink

off time is <offtime> ms

on time is <ontime> ms

color is <color>

If LED is under lamp test:

duration is <duration> ms

<Color> is one of blue, red, green,

amber, orange, white

<Offtime> is Time in milliseconds

that the LED is in the off cycle of a

blink

<Ontime> is Time in milliseconds

that the LED is in the on cycle of a

blink

<Duration> is The duration of the

lamp test in milliseconds

Functions: off, on, blink,

lamptest, localcontrol

Accepted values: <offtime>,

Refer to

Section 12.5, “Setting the State of the User LEDs” on page 125 for more

information.

96 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 97

The Command Line Interface (CLI)

Table 30. Dataitem Keywords Used with the Target Parameter (Sheet 4 of 4)

dataitem Description

Get/Set the field in the CDM regarding the max external available current. Only used

maxexternalavailable current

maxinternalcurrent

minexpectedoperating voltage

with the feedN target. e.g.

cmmget -l cmm -t feed1 -d maxexternalavailablecurrent

See Section 7.7, “Power Feed

Targets” on page 69.

Get the field in the CDM regarding the max internal available current. Only used with the feedN target.

e.g. cmmget -l cmm -t feed1 -d maxinternalcurrent

See Section 7.7, “Power Feed

Targets” on page 69.

Get/Set the field in the CDM regarding the max expected operating voltage. Only used with the feedN target.

e.g. cmmget -l cmm -t feed1 -d minexpectedoperatingvoltage

See Section 7.7, “Power Feed

Targets” on page 69.

Get/

Set

Both current in Amps with 1 decimal point

Get current in Amps with 1 decimal point N/A

Both

voltage value in string between -36 to -72v

CLI Get Output Valid Set Values

current in Amps with 1 decimal point.

voltage value in string between -36 to -72v

8.4.6 Value Parameter: -v

The value parameter specifies the new valu e for a dataitem. This parameter is required for all cmmset commands and is only used for cmmset commands. Valid value parameters are shown in with their corresponding dataitems in the data item tables listed above.

8.4.7 Sample CLI Operations

Sample CLI Operations can be found in Appendix A , “Exam ple CLI Com mands”.

8.5 Generating a System Status Report

The CLI includes an executable script (cmmdump) that is used to generate a system status report for use in communicating system health and configuration information to technical support personnel. This is useful in helping technical support successfully troubleshoot any issues that may be affecting the system. Cmmdump outputs system information to the screen by default or to a file. To send the output to a file use the following command:

cmmdump > [filename]

MPCMM0001 Chassis Management Module Software Technical Product Specification 97

Page 98

The Command Line Interface (CLI)

The filename should refer to a file that is in a valid directory (i.e. /home/cmmdump.txt).The file can then be retrieved off the CMM using FTP (see Section 8.3.7, “FTP into the CMM” on

page 76).

98 MPCMM0001 Chassis Management Module Software Technical Product Specification

Page 99

Resetting the Password

Resetting the Password 9

It may become necessary at some point to reset the CMM password to its default of cmmrootpass. The CMM has one on board dip switch labeled S2-1 to perform this action. Refer to the Intel® NetStructure™ MPCMM0001 Hardware Technical Product Specification for the location of the switch. Setting the switch and powering up the CMM will cause the password to reset to its default. The CMM then needs to be removed and the switch then needs to be turned off again.

9.1 Resetting the Password in a Dual CMM System

In redundant systems containing dual CMMs, one active, one standby , the password should be reset on the standby CMM. Once reset to its default, the default password will synchronize itself to the active CMM. This prevents the need to perform the reset on both CMMs and a failover.

1. Open the ejector latch on the standby CMM and wait for the blue hot swap LED to illuminate, indicating the CMM is safe to remove from the system.

2. Remove the standby CMM from the chassis.

3. Set dip switch S2-1 to “on”. The dip switch has a label indicating which way is on.

4. Re-insert the CMM into the system and allow the CMM to fully boot (blue light will go off when fully booted).

5. An OK health event will occur indicating that the passwords on both CMMs have been reset and were synched from the standby CMM. A SEL entry will be recorded, and a trap will be sent out.

6. Once at the login prompt, the password should now be reset to its default of cmmrootpass.

7. Login to the active CMM to ensure the password was reset.

8. Open the ejector latch on the standby CMM and wait for the blue hot swap LED to illuminate, indicating the CMM is safe to remove from the system.

9. Remove the standby CMM from the chassis.

10. Set dip switch S2-1 back to original “off” position.

11. Re-insert the CMM into the system and allo w the CMM to fu lly boot (blue light will go off when fully booted).

12. Login to the CMM and operate as normal.

13. Use the passwd command on the active CMM to change the CLI password if desired. The new password will sync to the standby.

MPCMM0001 Chassis Management Module Software Technical Product Specification 99

Page 100

Resetting the Password

9.2 Resetting the Password in a Single CMM System

For nonredundant systems that contain only a single CMM, resetting the password will require removing the CMM. This will cause any boards that are power controlled by the CMM become unmanaged. Care should be taken to safely shut down boards in the system prior to removing the CMM.

1. Safely shut down and power off boards being power controlled by the CMM.

2. Remove the CMM from the system.

3. Set dip switch S2-1 to “on”. The dip switch has a label indicating which way is on.

4. Re-insert the CMM into the system and allow the CMM to fully boot (blue light will go off when fully booted).

5. Once at the login prompt, the password should now be reset to its default of cmmrootpass.

6. Login to the CMM to ensure the password was reset.

7. Remove the CMM from the system.

8. Set dip switch S2-1 back to its original “off” position.

9. Re-insert the CMM into the system and allow the CMM to fully boot (blue light will go off when fully booted).

10. Login to the CMM and operate as normal.

11. Use the passwd command on the active CMM to change the CLI password if desired.

100 MPCMM0001 Chassis Management Module Software Technical Product Specification

Intel MPCMM0001 User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

Introduction 1

1.1 Overview

1.2 Terms Used in this Document

Software Specifications 2

2.1 Red Hat* Embedded Debug and Bootstrap (Redboot)

2.2 Operating System

2.3 Command Line Interface (CLI)

2.4 SNMP/UDP

2.5 Remote Procedural Call (RPC) Interface

2.6 RMCP

2.7 Ethernet Interfaces

2.8 Sensor Event Logs (SEL)

2.9 Blade OverTemp Shutdown Script

3.1 Overview

3.2 Synchronization

3.3 Heterogeneous Synchronization

3.4 Initial Data Synchronization

3.5 Datasync Status Sensor

3.6 CMM Failover

3.7 CMM Ready Event

Built-In Self Test (BIST) 4

4.1 BIST Test Flow

4.2 Boot-BIST

4.3 Early-BIST

4.4 Mid-BIST

4.5 Late-BIST

4.6 QuickBoot Feature

4.7 Event Log Area and Event Management

4.8 OS Flash Corruption Detection and Recovery Design

4.9 BIST Test Descriptions

Re-enumeration 5

5.1 Overview

5.2 Re-enumeration on Failover

5.3 Re-enumeration of M5 FRU

5.4 Resolution of EKeys

5.5 Events Regeneration

Process Monitoring and Integrity 6

6.1 Overview

6.2 Processes Monitored

6.3 Process Monitoring Targets

6.4 Process Monitoring Dataitems

6.5 SNMP MIB Commands

6.6 Process Monitoring CMM Events

6.7 Failure Scenarios and Eventing

6.8 Process Integrity Executable (PIE)

6.9 Configuring pms.ini

6.10 Process Integrity Executable (PIE) Specific Data Config

6.11 WP/BPM PIE

Power and Hot Swap Management 7

7.1 Hot Swap States

7.2 FRU Insertion

7.3 Graceful FRU Extraction

7.4 Surprise FRU Extraction/IPMI Failure

7.5 Forced Power State Changes

7.6 Power Management on the Standby CMM

7.7 Power Feed Targets

7.8 Pinging IPMI Controllers

The Command Line Interface (CLI) 8

8.1 CLI Overview

8.2 Connecting to the CLI

8.4 CLI Command Line Syntax and Arguments

8.5 Generating a System Status Report

Resetting the Password 9

9.1 Resetting the Password in a Dual CMM System

9.2 Resetting the Password in a Single CMM System