Maintenance Alarms for
Avaya Communication Manager 3.0,
Media Gateways and Servers
03-300430
June 2005
Issue 1
Page 2
Copyright 2005, Avaya Inc.
All Rights Reserved
Notice
Every effort was made to ensure that the information in this document
was complete and accurate at the time of printing. However, information
is subject to change.
Warranty
Avaya Inc. provides a limited warranty on this product. Refer to your
sales agreement to establish the terms of the limited warranty. In
addition, Avaya’s standard warranty language as well as information
regarding support for this product, while under warranty, is available
through the following Web site: http://www.avaya.com/support
.
Preventing Toll Fraud
"Toll fraud" is the unauthorized use of your telecommunications system
by an unaut horized party (for ex ample, a person who is not a corporate
employee, agent, subcontractor, or is not working on your company's
behalf). Be aware that there may be a risk of toll fraud associated with
your system and that, if toll fraud occurs, it can result in substantial
additional charges for your telecommunications services.
Avaya Fraud Intervention
If you suspect that you are being victimized by toll fraud and you need
technical assistance or support, in the United States and Canada, call the
Technical Service Center's Toll Fraud Intervention Hotline at
1-800-643-2353.
Disclaimer
Avaya is not responsible for any modifications, additions or deletions to
the original published version of this documentation unless such
modifications, additions or deletions were performed by Avaya. Customer
and/or End User agree to indemnify and hold harmless Avaya, Avaya's
agents, servants and employees against all claims, lawsuits, demands
and judgments arising out of, or in connection with, subsequent
modifications, additions or deletions to this documentation to the extent
made by the Customer or End User.
How to Get Help
For additional support telephone numbers, go to the Avaya support Web
site: http://www.avaya.com/support
•Within the United States, click the Escalation Management
link. Then click the appropriate link for the type of support you
. If you are:
need.
•Outside the United States, click the Escalation Management
link. Then click the Internationa l Service s link that includes
telephone numbers for the international Centers of
Excellence.
Providing Telecommunications Security
Telecommunications security (of voice, data, and/or video
communications) is the prevention of any type of intrusion to (that is,
either unauthorized or malicio us access to or use of) your company's
telecommunications equ ipm ent by some party.
Your company's "telecommunications equipment" includes both this
Avaya product and any other voice/data/video equipment that could be
accessed via this Avaya product (that is, "networked equipment").
An "outside party" is anyone who is not a corporate employee, agent,
subcontractor, or is not working on your company's behalf. Whereas, a
"malicious party" is anyone (including someone who may be otherwise
authorized) who accesses your telecommunications equipment with
either malicious or mischievous intent.
Such intrusions may be either to/through synchronous (time-multiplexed
and/or circuit-based), or asynchronous (character-, message-, or
packet-based) equipment, or interfaces for reasons of:
•Utilization (of capabilities special to the accessed equipment)
•Theft (such as, of intellectual property, financial assets, or toll
facility access)
•Eavesdropping (privacy invasi ons to humans)
•Mischief (troubling, but apparently innocuous, tampering)
•Harm (such as harmful tampering, data loss or alteration,
regardless of motive or intent)
Be aware that there may be a risk of unauthorized intrusions associated
with your system and/or its networked equipment. Also realize that, if
such an intrusion should occur, it could result in a variety of losses to your
company (including but not limited to, human/data privacy, intellectual
property, material assets, financial resources, labor costs, and/or legal
costs).
Responsibility for Your Company’s Telecommunications Security
The final responsibility for securing both this system and its networked
equipment rests with you - Avaya’s customer system administrator, your
telecommunications peers, and your managers. Base the fulfillment of
your responsibility on acquired knowledge and resources from a variety
of sources including but not limited to:
•Installation docume nts
•System administration documents
•Security documents
•Hardware-/software-based security tools
•Shared information between you and your peers
•Telecommunications security experts
To prevent intrusions to your telecommunications equipment, you and
your peers should carefully program and configure:
•Your Avaya-provided telecommunications systems and their
interfaces
•Your Avaya-provided software applications, as well as their
underlying hardware/software platforms and interfaces
•Any other equipment networked to your Avaya products
TCP/IP Facilities
Customers may expe rien ce dif fer ences i n prod uct per forma nce, relia bility
and security depending upon network configurations/design and
topologies, even when the product performs as warranted.
Standards Compliance
Avaya Inc. is not responsible for any radio or television interference
caused by unauthorized modifications of this equipment or the
substitution or attachment of connec ting cab les and equ i pme nt oth er
than those specified by Avaya Inc. The correction of interference caused
by such unauthorized modifications, substitution or attachment will be the
responsibility of the user. Pursuant to Part 15 of the Federal
Communications Commission (FCC) Rules, the user is cautioned that
changes or modifications not expressly approved by Avaya Inc. could
void the user’s authority to operate this equipment.
Product Safety Standards
This product complies with and conforms to the following international
Product Safety standards as applicable:
Safety of Information Technology Equipment, IEC 60950, 3rd Edition, or
IEC 60950-1, 1st Edition, including all relevant national deviations as
listed in Compliance with IEC for Electrical Equipment (IECEE) CB-96A.
Safety of Information Technology Equipment, CAN/CSA-C22.2
No. 60950-00 / UL 60950, 3rd Edition, or CAN/CSA-C22.2 No.
60950-1-03 / UL 60950-1.
Safety Requirements for Customer Equipment, ACA Technical Standard
(TS) 001 - 1997.
One or more of the following Mexican national standards, as applicable:
NOM 001 SCFI 1993, NOM SCFI 016 1993, NOM 019 SCFI 1998.
The equipment described in this document may contain Class 1 LASER
Device(s). These devices comply with the following standards:
•EN 60825-1, Edition 1.1, 1998-01
•21 CFR 1040.10 and CFR 1040.11.
The LASER devices used in Avaya e quipment typically operate within th e
following parameters:
Luokan 1 Laserlaite
Klass 1 Laser Apparat
Use of controls or adjustments or performance of procedures other than
those specified herein may result in hazardous radiation exposures.
Contact your Avaya representative for more laser product information.
Page 3
Electromagnetic Compatibility (EMC) Standards
This product complies with and conforms to the following international
EMC standards and all relevant national deviations:
Limits and Methods of Measurement of Radio Interference of Information
Technology Equipment, CISPR 22:1 99 7 and EN5 50 22: 199 8.
Information Technology Equipment - Immunity Characteristics - Limits
and Methods of Measurement, CISPR 24:1997 and EN55024:1998,
including:
•Electrostatic Discharge (ESD) IEC 61000-4-2
•Radiated Immunity IEC 61000-4-3
•Electrical Fast Transient IEC 61000-4-4
•Lightning Effects IEC 61000-4-5
•Conducted Immunity IEC 61000-4-6
•Mains Frequency Magnetic Field IEC 61000-4-8
•Voltage Dips and Variations IEC 61000-4-11
Power Line Emissions, IEC 61000-3-2: Electromagnetic compatibility
(EMC) - Part 3-2: Limits - Limits for harmonic current emissions.
Power Line Emissions, IEC 61000-3-3: Electromagnetic compatibility
(EMC) - Part 3-3: Limits - Limitation of voltage changes, voltage
fluctuations and flicker in public low-voltage supply systems.
Federal Communications Commission Statement
Part 15:
Note: This e quip m en t ha s b ee n test e d a nd fo un d t o comp l y w it h
the limits for a Class A digital device, pursuant to Part 15 of the
FCC Rules. These limits are designed to provide reasonable
protection against harmful interference when the equipment is
operated in a commercial environment. This equipment
generates, uses, and can radiate radio frequency energy and, if
not installed and used in accordance with the instruction
manual, may cause harmful interference to radio
communications. Operation of this equipment in a residential
area is likely to cause harmful interference in which case the
user will be required to correct the i n terference at his own
expense.
Means of Connection
Connection of this equipment to the telephone network is shown in the
following tables.
For MCC1, SCC1, CMC1, G600, and G650 Media Gateways:
Manufacturer’s Port
Identifier
FIC CodeSOC/
REN/
Network
Jacks
A.S. Code
Off premises stationOL13C9.0FRJ2GX,
RJ21X,
RJ11C
DID trunk02RV2-T0.0BRJ2GX,
RJ21X
CO trunk02GS20.3ARJ21X
02LS20.3ARJ21X
Tie trunkTL31M9.0FRJ 2GX
Basic Rate Interface02IS56.0F, 6.0YRJ49C
1.544 digital interface04DU9-BN6.0FRJ48C,
RJ48M
04DU9-IKN6.0FRJ48C,
RJ48M
04DU9-ISN6.0FRJ48C,
RJ48M
120A4 channel service
04DU9-DN6.0YRJ48C
unit
Part 68: Answer-Supervision Signaling
Allowing this equipment to be operated in a manner that does not provide
proper answer-supervision signaling is in violation of Part 68 rules. This
equipment returns answer-supervision signals to the public switched
network when:
•answered by the called station,
•answered by the attendant, or
•routed to a recorded announcement that can be administered
by the customer premises equipment (CPE) user.
This equipment returns answer-supervision signals on all direct inward
dialed (DID) calls forwarded back to the public switched telephone
network. Permissible exceptions are:
•A call is unanswered.
•A busy tone is received.
•A reorder tone is received.
Avaya at test s that thi s re gis tere d eq ui pmen t is cap abl e o f pr ovid ing u ser s
access to interstate providers of operator services through the use of
access codes. Modification of this equipment by call aggregators to block
access dialing codes is a violation of the Telephone Operator Consumers
Act of 1990.
REN Number
For MCC1, SCC1, CMC1, G600, and G650 Media Gateways:
This equipment complies with Part 68 of the FCC rules. On either the
rear or inside the front cover of this equipment is a label that contains,
among other information, the FCC registration number, and ringer
equivalence number (REN) for this equipment. If requested, this
information must be provided to the telephone company.
For G350 and G700 Media Gateways:
This equipment complies with Part 68 of the FCC rules and the
requirements adopted by the ACTA. On the rear of this equipment is a
label that contains, among other information, a product identifier in the
format US:AAAEQ##TXXXX. The digits represented by ## are the ringer
equivalence number (REN) without a decimal point (for example, 03 is a
REN of 0.3). If requested, this number must be provided to the telephone
company.
For all media gateways:
The REN is used to determine the quantity of devices that may be
connected to the telephone line. Excessive RENs on the telephone line
may result in devices not ringing in response to an incoming call. In most,
but not all areas, the sum of RENs should not exceed 5.0. To be certain
of the number of devices that may be co nnected to a line, as determined
by the total RENs, contact the local telephone company.
REN is not required for some types of analog or digital facilities.
For G350 and G700 Media Gateways:
Manufacturer’s Port
Identifier
FIC CodeSOC/
REN/
A.S. Code
Network
Jacks
Ground Start CO trunk02GS21.0ARJ11C
DID trunk02RV2-TAS.0RJ11C
Loop Start CO trunk02LS20.5ARJ11C
1.544 digital interface04DU9-BN6 .0YR J48C
04DU9-DN6.0YRJ48C
04DU9-IKN6.0YRJ48C
04DU9-ISN6.0YRJ48C
Basic Rate Interface02IS56.0FRJ49C
For all media gateways:
If the terminal equipment (for example, the media server or media
gateway) causes harm to the telephone network, the telephone company
will notify you in advance that temporary discontinuance of service may
be required. But if advance notice is not practical, the telephone
company will notify the customer as soon as possible. Also, you will be
advised of your right to file a complaint with the FCC if you believe it is
necessary.
The telephone company may make changes in its facilities, equipment,
operations or procedures that could affect the operation of the
equipment. If this happens, the telephone company will provide advance
notice in order for you to make necessary modifications to maintain
uninterrupted service.
If trouble is experienced with this equipment, for repair or warranty
information, please contact the Technical Service Center at
1-800-242- 2121 or contact your local Avaya representative. If the
equipment is causing harm to the telephone network, the telephone
company may request that you disconnect the equipment until the
problem is resolved.
Page 4
A plug and jack used to connect this equipment to the premises wiring
and telephone network must comply with the applicable FCC Part 68
rules and requirements adopted by the ACTA. A compliant telephone
cord and modular plug is provided with this product. It is designed to be
connected to a compatible modular jack that is also compliant. It is
recommended that repairs be performed by Avaya certified technicians.
The equipment cannot be used on public coin phone service provided by
the telephone company. Connection to party line service is subject to
state tariffs. Contact the state public utility commission, public service
commission or corporation commission for information.
This equipment, if it uses a telephone receiver, is hearing aid compatible.
Canadian Department of Communications (DOC) Interference
Information
This Class A digital apparatus complies with Canadian ICES-003.
Cet appareil numérique de la classe A est conforme à la norme
NMB-003 du Canada.
This equipment meets the applicable Industry Canada Terminal
Equipment Technical Specifications. This is confirmed by the registration
number. The abbreviation, IC, before the registration number signifies
that registration was performed based on a Declaration of Conformity
indicating that Industry Canada technical specifications were met. It does
not imply that Industry Canada approved the equipment.
Installation and Repairs
Before installing this equipment, users should ensure that it is
permissible to be connected to the facilities of the local
telecommunications company. The equipment must also be installed
using an acceptable method of connection. The customer should be
aware that compliance with the above conditions may not prevent
degradation of service in some situations.
Repairs to certified equipment should be coordinated by a representative
designated by the supplier. Any repairs or alterations made by the user to
this equipment, or equipment malfunctions, may give the
telecommunications company c ause to request the user to disconnect
the equipment.
Declarations of Conformity
United States FCC Part 68 Supplier’s Declaration of Conformity (SDoC)
Avaya Inc. in the United States of America hereby certifies that the
equipment described in this document and bearing a TIA TSB-168 label
identification number complies with the FCC’s Rules and Regulations 47
CFR Part 68, and the Administrative Council on Terminal Attachments
(ACTA) adopted technical criteria.
Avaya further asserts that Avaya handset-equipped terminal equipment
described in this document complies with Paragraph 68.316 of the FCC
Rules and Regulations defining Hearing Aid Compatibility and is deemed
compatible with hearing aids.
Copies of SDoCs signed by the Responsible Party in the U. S. can be
obtained by contacting your local sales representative and are available
on the following Web site: http://www.avaya.com/support
All Avaya media servers and media gateways are compliant with FCC
Part 68, but many have been registered with the FCC before the SDoC
process was available. A list of all Avaya registered products may be
found at: http://www.part68.org
manufacturer.
European Union Declarations of Conformity
by conducting a search using "Avaya" as
.
To order copies of this and other documents:
Call:Avaya Publications Center
Voice 1.800.457.1235 or 1.207.866.6701
FAX 1.800.457.1764 or 1.207.626.72 69
Write: Globalware Solutions
200 Ward Hill Avenue
Haverhill, MA 01835 USA
Attention: Avaya Account Management
E-mail: totalware@gwsmail.com
For the most current versions of documentation, go to the Avaya support
Web site: http://www.avaya.com/support
.
Avaya Inc. declares that the equipment specified in this document
bearing the "CE" (Conformité Europeénne) mark conforms to the
European Union Radio and Telecommunications Terminal Equipment
Directive (1999/5/EC), including the Electromagnetic Compatibility
Directive (89/336/EEC ) and Low Voltage Directive (73/23/ EEC ) .
Copies of these Declarations of Conform ity (DoCs) can be obtaine d by
contacting your local sales representative and are available on the
following Web site: http://www.avaya.com/support
Japan
This is a Class A product based on the standard of the Voluntary Control
Council for Interference by Information Technology Equipment (VCCI). If
this equipment is used in a domestic environment, radio disturbance may
occur, in which case, the user may be required to take corrective actions.
12 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 13
About this book
Overview
This document provides procedures to monitor, test, and maintain an Avaya Media Server or
Gateway system. It covers many of the faults and troubles that can occur and provides simple
procedures to correct them. Simple, traditional troubleshooting methods are sometimes
sufficient to locate and clear faults. The traditional methods include substitution, visual
inspections, continuity checks, and clarification of operating procedures with end users.
Using this documentation, the Avaya technicians and the technicians of their business partners
and customers should be able to follow detailed procedures for:
● Monitoring, testing, and maintaining an Avaya Media Server, Media Gateway, and many
other system components.
● Using troubleshooting methods to clear faults.
● Required replacements, visual inspections, continuity checks, and clarifying operating
procedures with end users.
Document set
Although this maintenance book is published separately, it is part of a set:
This book contains information about the following equipment/platforms
● Avaya S8700/S8710 Media Servers
● Avaya S8500 Media Servers
● Avaya S8300 Media Servers
● Avaya G700/G650/G600/MCC/SCC Media Gateways
Issue 1 June 200513
Page 14
About this book
It does not contain information about
● DEFINITY G3R (see 555-233-117: Maintenance for DEFINITY R Servers or 555-233-142:
Maintenance for Avaya S8700 Media Servers with G600 Media Gateway)
● DEFINITY SI (see 555-233-119: Maintenance for DEFINITY SI Servers or 555-233-143:
Avaya S8700 Media Servers with MCC1/SCC1)
● Avaya S8100 Media Server (see 555-233-123: Maintenance for DEFINITY CSI Servers)
● IBM eServer BladeCenter HS20 Type 8832
● G150/G250/G350 Media Gateways
Structure of book
The following document contains combined Maintenance Alarms information for:
● S8300, S8500, and S8700 media servers
● MCC1, SCC1, and CMC1 media gateways
● G600, G650, and G700 media gateways
The document includes new information developed for Communication Manager Release 2.0,
and preexisting or modified information brought together from Release 1.3 maintenance
documentation.
This document is the first of three reference documents:
● Maintenance Alarms Reference (555-245-102)
● Maintenance Commands Reference (03-300191)
● Maintenance Procedures (03-300192)
The basis for these reference documents was the Release 1.3 S8700 media server with the
MCC1 and SCC1 media gateways maintenance document. To this document were added
Release 1.3 maintenance information for the S8300 media server, the G700, G600 and CMC1
media gateways, as well as new material developed for the S8500 media server and G650
media gateway.
In order to present maintenance information from all these sources side-by-side, when it was
not clear from a chapter or section title, marking conventions were adopted to delineate material
specific to a particular source. The markers act on three levels:
● Chapters or Maintenance Objects (MOs)
● Major and minor sections
● Paragraphs or in-line comments
14 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 15
Chapters or Maintenance Objects (MOs)
At the Chapter or MO level, bold names of the server(s) or gateway(s) that are represented
within the sections to follow are inserted immediately after the Chapter title or MO title. For
example, the heading for the SER-BUS (Serial communication bus) MO looks like:
SER-BUS (Serial communication bus)
G650
The G650 after the title indicates that the material in this MO relates to the G650 media gateway.
Major and minor sections
At the Major and minor sections level, a similar bold name along with a ruled line delineates the
beginning of a section of material specific to the media server or gateway identified. At the
conclusion of the section, another ruled line marks the end of the specific material and a return
to common text. For example, a section of material specific to the S8700 or S8500 media server
looks like:
Structure of book
S8700 | 8710 / S8500
1. If only 1 analog circuit pack in the system has this problem, replace the circuit pack.
2. If only analog circuit packs on a particular carrier have this error, the ringing generator
may not be connected to this carrier.
3. If analog circuit packs on many carriers have this error, it is probably a problem with the
ringing generator.
Such sections can occasionally extend for several pages.
Issue 1 June 200515
Page 16
About this book
Paragraphs or in-line comments
At the paragraph level and for comments in-line, the specific media server or gateway is
indicated by its bold name, and the parenthetical information follows immediately afterward. For
example, a paragraph insert for the S8700 and S8500 media servers might looks like:
1. If the Tone-Clock circuit is a slave clock, then the EI to which it is listening is providing a bad
timing source. Follow the diagnostic procedures specified for TDM-CLK Error Code 2305.
2.
S8700 | 8710 / S8500: If no problem can be found with the incoming synchronization signal,
replace the IPSI or Tone-Clock circuit pack. See Replacing the IPSI or Tone-Clock Circuit
Pack on page 2337.
In such cases, it is not necessary to delineate the beginning and end of the material.
An example of an in-line comment might look like:
3. Error Type 1: There is a serial number mismatch between the hardware serial number and
installed license file (
IPSI and a subsequent License Error failure.
G700 motherboard on which the serial number resides and a subsequent License Error
failure). This error is caused by the:
S8700 | 8710 / S8500: there is a serial-number mismatch of the reference
S8300: there is a serial-number mismatch of the
● S8700 | 8710 / S8500: Reference IPSI not responding
S8300: G700 motherboard not responding
● Expiration of the 10-day timer
The system enters No-License mode.
It is hoped that, by these techniques, material specific to several different sources can be
combined and viewed side-by-side without confusion.
Audience
The information in this book is intended for use by:
Avaya technicians, provisioning specialists, business partners, and customers, specifically:
● Trained Avaya technicians
● A maintenance technician dispatched to a customer site in response to a trouble alarm or
a user trouble report
● A maintenance technician located at a remote maintenance facility
● The customer’s assigned maintenance technician
The technician is expected to have a knowledge of telecommunications fundamentals and
of the particular Avaya Media Server and/or Media Gateway to the extent that the
procedures in this book can be performed, in most cases, without assistance.
16 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 17
Downloading this book and updates from the Web
This book is not intended to solve all levels of troubles. It is limited to troubles that can be solved
using:
● The Alarm Log
● The Error Log
● Trouble-clearing procedures
● Maintenance tests
● Traditional troubleshooting methods
If the trouble still has not been resolved, it is the maintenance technician’s responsibility to
escalate the problem to a higher level of technical support. Escalation should conform to the
procedures in the Technical and Administration Escalation Plan.
Downloading this book and updates from the Web
You can download the latest version of this book from the Avaya Web site. You must have
access to the Internet, and a copy of Acrobat Reader must be installed on your personal
computer.
Avaya makes every effort to ensure that the information in this book is complete and accurate.
However, information can change after we publish this book. Therefore, the Avaya Web site
might also contain new product information and updates to the information in this book. You can
also download these updates from the Avaya Web site.
Downloading this book
To download the latest version of this book:
1. Access the Avaya web site at http://support.avaya.com.
2. At the top center of the page, click Product Documentation.
The system displays the Welcome to Product Documentation page.
3. In the upper-left corner type the 9-digit book number in the Search Support field, and then
click Go.
The system displays the Product Documentation Search Results page.
4. Scroll down to find the latest issue number, and then click the book title that is to the right of
the latest issue number.
5. On the next page, scroll down and click one of the following options:
● PDF Format to download the book in regular PDF format
● ZIP Format to download the book in zipped PDF format
Issue 1 June 200517
Page 18
About this book
Safety labels and security alert labels
Observe all caution, warning, and danger statements to help prevent loss of service, equipment
damage, personal injury, and security problems. This book uses the following safety labels and
security alert labels:
!
CAUTION:
CAUTION:A caution statement calls attention to a situation that can result in harm to
software, loss of data, or an interruption in service.
!
WARNING:
WARNING:A warning statement calls attention to a situation that can result in harm to
hardware or equipment.
!
DANGER
DANGER:A danger statement calls attention to a situation that can result in harm to
:
personnel.
!
SECURITY ALERT:
SECURITY ALERT:A security alert calls attention to a situation that can increase the potential for
unauthorized use of a telecommunications system.
Safety precautions
When performing maintenance or translation procedures on the system, users must observe
certain precautions. Observe all caution, warning, and danger admonishments to prevent loss
of service, possible equipment damage, and possible personal injury. In addition, the following
precautions regarding electromagnetic interference (EMI) and static electricity must be
observed:
Electromagnetic interference
This equipment generates, uses, and can radiate radio frequency energy. Electromagnetic
fields radiating from the switch may cause noise in the customer’s equipment. If the equipment
is not installed and used in accordance with the instruction book, radio interference may result.
!
WARNING:
WARNING:To maintain the EMI integrity of the system, maintenance personnel must ensure
that all cabinet panels, covers, and so forth, are firmly secured before leaving the
customer’s premises.
18 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 19
Related resources
Table 1: Additional document resources on page 19 lists additional documentation that is
available for you, and which has been referenced within this document.
Table 1: Additional document resources
DocumentNumber
Avaya Enterprise Survivable Servers (ESS) User Guide, 03-30042803-300428
Hardware Guide for Avaya Communication Manager, 555-245-207555-245-207
Overview for Avaya Communication Manager, 03-30046803-300468
Administrator Guide for Avaya Communication Manager, 03-30050903-300509
Related resources
Installation and Upgrades for the Avaya G700 Media Gateway Controlled by
an Avaya S8300 Media Server or an Avaya S8700 Media Server,
555-234-100
Maintenance for Avaya DEFINITY® Server R, 555-233-117555-233-117
Avaya P333T User’s GuideN/A
Avaya S8300 and Avaya S8700 Media Server Library, 555-233-825555-233-825
EMBEDDED AUDIX System Maintenance, 585-300-110585-300-110
DEFINITY AUDIX System Release 3.2.4 Maintenance, 585-300-110585-300-110
AT&T Network and Data Connectivity, 555-025-201555-025-201
Digital PBX Standards, RS4648RS4648
User Manual Z3A Asynchronous Data Unit, 555-401-701555-401-701
DEFINITY® Communications System Generic 1, Generic 2 and Generic 3
V1 and 2 – Integrated Channel Service Unit (CSU) Module Installation and
Operation, 555-230-193
555-234-100
555-230-193
DEFINITY® Communications System Generic 2.2 and Generic 3 V2 DS1/
CEPT1/ISDN-PRI Reference, 555-025-107
DEFINITY® Communications System Generic 1 and Generic 3i Wiring,
555-204-111
Job Aid: Repacking the S8500 Dual Network Interface, 555-245-760 555-245-760
Job Aid: Replacing the G700 Media Gateway, 555-245-752555-245-752
Technical assistance
Avaya provides the following resources for technical assistance.
Within the United States
Technical assistance
3 of 3
For help with:
● Feature Administration and system applications, call Avaya Technical Consulting Support
at 1-800-225-7585
● Maintenance and repair, call the Avaya National Customer Care Support Line at
1-800-242-2121
● Toll fraud, call Avaya Toll Fraud Intervention at 1-800-643-2353
International
For all international resources, contact your local Avaya authorized dealer for additional help.
Trademarks
All trademarks identified by the ® or ™ are registered trademarks or trademarks, respectively,
of Avaya Inc. All other trademarks are the property of their respective owners.
Issue 1 June 200521
Page 22
About this book
Sending us comments
Avaya welcomes your comments about this book. To reach us by:
● Mail, send your comments to:
Avaya Inc.
Product Documentation Group
Room B3-H13
1300 W. 120th Avenue
Westminster, CO 80234 USA
● E-mail, send your comments to:
document@avaya.com
● Fax, send your comments to:
1-303-538-1741
Ensure that you mention the name and number of this book.
How to use this Document
Most maintenance sessions involve analyzing the Alarm and Error Logs to diagnose a trouble
source and replacing a component such as a circuit pack or media module. The information in
Chapter 5: Communication Manager Maintenance-Object Repair Procedures
will generally suffice to address these needs. Certain complex elements of the system require a
more comprehensive approach. Special procedures for these elements appear in Chapter
4: General troubleshooting of Maintenance Procedures (03-300192).
Note:
Note:This document is designed to be read online and in paper format. Because of the
large volume of information, additional cross-references have been added to
make it easier to locate information when using the manual online.
of this reference
22 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 23
Organization
Chapter 1: Server Alarms, contains information on alarms generated on various platforms,
including the S8300, S8500, and S8700 media server. These alarms cover such categories as
process watchdog, environmental, login, translation monitoring, and power supply alarms.
Alarm identifications, levels, and resolutions are given.
Organization
Chapter 2: Denial Events
Communication Manager. Denial events are displayed via the Events Report (display events screen) of Avaya Communication Manager.
Chapter 3: LEDs
be found on various system components, including servers, gateways, circuit packs, and media
modules.
Chapter 4: G700 Media Gateway Traps
gateways. Trap identifications, alarm levels, trap descriptions, and recommended resolutions
are given.
Chapter 5: Communication Manager Maintenance-Object Repair Procedures
troubleshooting and repair instructions for every component in the system. The maintenance
objects are listed alphabetically by name as they appear in the Alarm and Error Logs. Under
each maintenance object appears a description of the object’s function, tables for interpreting
alarm and error logs, and instructions on how to use tests, commands, and replacements to
resolve associated problems.
, contains information on the definition and interpretation of LED indicators to
, contains information about denial events that are generated by Avaya
, contains information on traps that can occur on media
, contains specific
Issue 1 June 200523
Page 24
About this book
Conventions used in this document
Table 2: Typography used in this book on page 24 lists the typographic conventions in this
document.
Table 2: Typography used in this book
To represent . . .This typeface and syntax
are shown as . . .
SAT commands
● Bold for literals
● Bold italicfor variables
● Square brackets [ ]
around optional
parameters
● “|” between exclusive
choices
SAT screen input
● Bold for input
and output
● Constant width for
output (screen
displays and
messages)
Linux commands
● Constant-width bold for
literals
● Constant-width bold
italics for variables
For example, . . .
refresh ip-route [all | location]
Set the Save Translation
field to daily.
The message Command
successfully completed
should appear.
testmodem [-s] | [-t
arg]
● Square brackets []
around optional
arguments
● “Or” sign | between
exclusive choices
Linux outputConstant widthLinux returns the message
almdisplay 4: Unable to
connect to MultiVantage.
1 of 2
24 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 25
Table 2: Typography used in this book (continued)
Conventions used in this document
To represent . . .This typeface and syntax
For example, . . .
are shown as . . .
Web interface
● Bold for menu
selections, tabs,
buttons, and field
names
● Right arrow > to
Select Alarms and
Notification, the appropriate
alarm, and then click Clear.
Select Diagnostics > View
System Logs, then click
Watchdog Logs.
separate a sequence
of menu selections
KeysSpecial font for keyboard
keys and SAT screen
Press Tab.
Click Next Page.
clickable buttons
2 of 2
Other conventions used in this book:
● Physical dimensions are in English [Foot Pound Second (FPS)] units, followed by metric
[Centimeter Gram Second) (CGS)] units in parentheses.
Wire-gauge measurements are in AWG, followed by the diameter in millimeters in
parentheses.
● Circuit-pack codes (such as TN790B or TN2182B) are shown with the minimum
acceptable alphabetic suffix (like the “B” in the code TN2182B).
Generally, an alphabetic suffix higher than that shown is also acceptable. However, not
every vintage of either the minimum suffix or a higher suffix code is necessarily acceptable.
Issue 1 June 200525
Page 26
About this book
Useful terms
Table 3: Terminology summary on page 26 summarizes some of the terms used in this book
and relates them to former terminology.
Table 3: Terminology summary
Present Terminology
Communication ManagerMultiVantage
S8300 Media ServerICC, Internal Call Controller
S8700 Media Server (or
non-co-resident S8300)
MGP, Media Gateway Processor860T Processor
Layer 2 Switching ProcessorP330 Stack Processor
Former Terminology
Avaya Call Processing
ECC, External Call Controller
Cajun Stack Processor
i960 Processor
26 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 27
Chapter 1:Server Alarms
S8700 | 8710 / S8500 / S8300
This chapter provides background information on server alarming. For detailed information on
G700 Media Gateway Traps, refer to Chapter 4: G700 Media Gateway Traps
Introduction
During normal operations, software or firmware may detect error conditions pertaining to
specific Maintenance Objects (MOs) or other subsystems. The system automatically attempts
either to fix or circumvent these problems. Errors are detected in two ways:
● Firmware on the component during ongoing operations
● A “periodic test” or a “scheduled test” started by software
The technician can run tests on demand that are generally more comprehensive (and
potentially disruptive) than are the "scheduled tests".
.
When an error is detected, the maintenance software puts the error in the Error Log and
increments the error counter for that error. When an error counter is “active” (greater than 0),
there is a maintenance record for the MO. If a hardware component incurs too many errors, an
alarm is raised.
Alarms on the Linux media servers can occur in several areas:
● Media Modules, Media Servers, the Media Gateway Processor, and the Layer 2 Switching
Processor are all capable of detecting internal failures and generating traps and alarms.
● Media gateways, such as the G700, detect faults and alert the Media Server; the Media
Server then raises an alarm, and sends the alarm to an appropriate network management
site.
● Communication Manager alarms reflect health status of network elements such as media
gateways, circuit packs, media modules, and their associated links, ports, and trunks.
● Messaging alarms provide health status of embedded or external messaging systems.
Alarms may be viewed using the following:
● Maintenance Web Interface
Provides alarms information related to Communication Manager, the media server, and
messaging.
Issue 1 June 200527
Page 28
Server Alarms
Note:For non-Communication Manager alarms, use the Web Page header "Alarms
● Media Server bash shell
● Media Server SAT CLI
● MGP CLI (on the G700 Media Gateway)
● Layer 2 Switching Processor CLI (on the G700 Media gateway)
Information related to Communication Manager, the media server, and messaging alarms can
be displayed using either the Maintenance Web Interface or the media server bash shell;
however, this document (Maintenance Alarms Reference (555-245-102)) provides maintenance
information only for Communication Manager (Chapter 5: Communication Manager
Maintenance-Object Repair Procedures)and media server alarms (Chapter 1: Server Alarms).
For messaging alarms and repair procedures, refer to the appropriate documentation for your
messaging system.
Note:
and Notification" and "Diagnostics: View System Log". Choose the appropriate
heading and, if necessary, call Avaya support.
Provides alarms information related to Communication Manager, the media server, and
messaging.
Provides alarms information related to Communication Manager.
Provides alarms and traps information related to the G700 platform and its subsystems.
Provides information related to the media gateway stack.
Alarm Classifications
Alarms are classified depending on their effect on system operation:
● MAJOR alarms identify failures that cause a critical degradation of service. These alarms
require immediate attention.
● MINOR alarms identify failures that cause some service degradation but that do not render
a crucial portion of the system inoperable. Minor alarms require attention. However,
typically a minor alarm affects only a few trunks, stations, or a single feature.
● WARNING alarms identify failures that cause no significant degradation of service or
equipment failures external to the switch. These failures are not reported to INADS or to
the attendant console.
● ON-BOARD problems originate in the circuitry on the alarmed Media Module.
● OFF-BOARD problems originate in a process or component that is external to the Media
Module.
28 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 29
Background Terms
Table 4: Alarming Background Terms on page 29 gives a useful explanation of terms.
Table 4: Alarming Background Terms
TermExplanation
TRAPA trap is an event notification that is sent to the SNMP trap manager and
received from the Media Gateway Processor, Layer 2 Switching Processor,
or RTCP Monitor (Avaya VisAbility).
ALARMSome traps are determined to be an alarm. If determined to be an alarm they
are sent to an appropriate alarm management site, such as INADS.
INADSInitialization and Administration System, a software tool used by Avaya
services personnel to initialize, administer, and troubleshoot customer
communications systems remotely.
SNMPSimple Network Management Protocol, the industry standard protocol
governing network management and the monitoring of network devices and
their functions.
Background Terms
RTCPReal Time Control Protocol, contained in IETF RFC 1889.
ISMIntelligent Site Manager, a VPN gateway on the customer’s LAN that
provides a means for services personnel to access the customer’s LAN in a
secure manner via the Internet.
VPNVirtual Private Network, a private data network that makes use of the public
telecommunication infrastructure, maintaining privacy through the use of a
tunneling protocol and security procedures.
Issue 1 June 200529
Page 30
Server Alarms
Alarm-Related LEDs
Table 5: Alarm-Related LEDs on page 30 shows alarm-related LEDs on the faceplate of the
G700 or on an attendant console, and shows how certain LEDs reflect specific alarm situations.
Table 5: Alarm-Related LEDs
LEDLocationAlarm-Related Cause
ALARM LEDAttendant ConsoleThe system alarm causes the attendant console ALARM
ACK LEDAttendant ConsoleThe ACK LED on the attendant console reflects the state
LED to light.
of acknowledgement of the alarm report from INADS.
However, this is only possible for S8700-based Media
Servers.
RED ALM or
ALARM LED
LED Panel of G700
Media Gateway
Alarm Content
Alarms logged by Communication Manager are stored in an alarm log. All alarms include a date
and time stamp that reflects the date and time of the sending device. The alarm contains:
● Device type
● Component type
● Device name
● Current ip address
The RED ALM or ALARM LED indicates the "health" of
the G700 by lighting when there are impaired functions
of the Media Gateway Processor, Layer 2 Switching
Processor, or VOIP engine. It lights, for example, when
the power supply voltage is out of bounds, if the G700
cannot locate a Media Servers, or when the unit is
overheating. It also indicates when the system is in
Power-up mode, or when a Media Module is resetting.
● Additional information necessary for identification of alarm origination
● Severity level to indicate the priority of the alarm
Alarms originating in a specific media server, such as an S8300, have a prefix denoting that of
an S8300.
30 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 31
QOS Alarms
An RTCP monitor using the local SNMP agent generates traps to a pre-administered trap
collector. The following QoS alarms are generated:
● The voip-callqos alarm is generated if a single session exceeds configured QOS levels. It
can generate a warning or an SNMP trap. Warnings are used for less severe problems.
They can be accumulated internally within Avaya VoIP Monitoring Manager for use by the
alarms defined below.
● The voip-systemqos alarm is generated if the number of voip-callqos warnings from all
terminals exceeds a configured count over a given period (e.g. 100 alarms over 24 hours).
The alarm causes a SNMP trap to be sent.
● The voip-terminalqos alarm is like the voip-systemqos alarm except it applies to a single
terminal. If any one terminal generates a number of voip-callqos warnings that exceed a
threshold then the alarm is generated.
QOS Alarms
Alarm Management
This section describes methods to determine the source of alarms that are generated when an
error occurs. The alarm log is viewable and follows that defined in Maintenance for Avaya DEFINITY® Server R, 555-233-117. Technicians can view alarms via the Web Interface, CLI,
and SAT command-line interface.
SNMP management is a function of the Avaya MultiService Network Manager application. For
additional information, including information on event logs and trap logs, please refer to the
Avaya P333T User’s Guide.
Alarm management follows the S8700 Media Server Alarming Architecture Design; see
Maintenance for the Avaya S8700 Media Server with an Avaya SCC1 Media Gateway or an
Avaya MCC1 Media Gateway, 555-233-143.
Connection Strategies to a Services Organization
A services organization, such as INADS, receives alarms from the Media Server and connects
to the media server for troubleshooting. There are currently two product-connect strategies:
dialup modem access and Virtual Private Network (VPN) access over the Internet.
For dialup modem access:
1. Connect a USB modem, connected to a telephone line, to the USB port on the faceplate of
the media server.
Issue 1 June 200531
Page 32
Server Alarms
2. Enable the modem from the media server Web Interface. In addition, use the Setup Modem
3. With this modem, a client PC uses the Point-to-Point Protocol (PPP) to access the media
4. Once logged into the media server, you can telnet out to media gateways, such as the
Note:Additionally, this modem can be used to allow the media server to call out to the
The VPN alternative is achieved by the use of the Intelligent Site Manager (ISM) application.
The ISM is a VPN gateway that resides on the customer’s LAN and provides a means for
services personnel to gain access to the customer’s LAN in a secure manner over the Internet.
Telnet is then used to access the media server and/or media gateways and other IP network
equipment.
Interface under the Configure Server pages.
server and connect via telnet to a Linux shell.
G700, and other devices on the network.
Note:
INADS or other alarm receiving system to report alarms. When performing
remote diagnostic tests, Services personnel should disable alarm call-outs to
INADS to avoid generating unnecessary alarms. Alarm suppression is released
after 30 minutes. If you are remotely logged in through the modem you prevent
alarms from being sent because you are using the modem, but you do not
prevent an alarm noting the absence of alarm reporting service being logged at
the alarm receiving site.
Alarms in Linux Media Servers
S8700 | 8710 / S8500 / S8300
A Linux-based media server can be configured so that it serves as the trap collector and
provides external alarm notification.
A process called the Global Maintenance Manager (GMM) runs on the media server and
collects events that are logged to the Linux syslog_d process. These events consist primarily of
failure notification events logged by Communication Manager and INTUITY maintenance
subsystems, or of traps sent by media gateways (
notification, one option is to call the Avaya technical service center’s INADS (Initialization and
Administration System). However, other possible options include sending an e-mail to specified
destinations, or sending an SNMP trap to a specified network management address.
The media server has an SNMP trap manager that collects traps from:
● Uploads and downloads to media modules
● VoIP Media Modules
● VoIP engines on G700 motherboards
● G700-associated UPS systems
G700). For events that require external
32 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 33
Media server alarms perform a similar role to Communication Manager alarms in a traditional
telephony context. Media Server alarms:
● Comprise related sets of alarms, known as MOs
● Create an internal record of actual or potential problems
● Notify maintenance personnel of a problem
● Help isolate the problem’s source
● Point to and facilitate local and remote resolution of a problem
Note:
Note:If a user is logged into a server by an analog modem that is also the server’s only
alarm-reporting interface, enter almsuppress to suppress alarm reporting.
Otherwise, the other server logs an occurrence of SME Event ID #1 (see SME
Alarm in Media Server on page 78).
Clearing Media Server Alarms
Alarms in Linux Media Servers
A media server is an open standards-based CPU in the data- communications context. Unlike a
Communication Manager alarm, which cannot be cleared unless it is also resolved, a server
alarm:
● Can be manually cleared from its log, with the almclear Linux command
● Should not be considered resolved until it is actually repaired
Displaying Media Server Alarms
In following sections, each server alarm is described, and its resolution procedure is provided.
Like traditional Communication Manager MOs, the 3-column table for each server MO shows an
alarm’s:
1. Event ID
2. Severity
3. Definition, probable cause, and troubleshooting procedure
rd
To help isolate a server problem, the 3
event (unlike traditional Communication Manager MOs). The text consists of the verbose (-v)
output of the almdisplay -v Linux command. For example, “interchange hand off failed” is
the quoted text for Arbiter’s Event ID #3
column of these tables begins with quoted text for each
.
If the almdisplay command returns a failure message, such as:
almdisplay: 4: Unable to connect to MultiVantage
enter the man almdisplay Linux command for command-related information.
Issue 1 June 200533
Page 34
Server Alarms
S8300
Alarming on the S8300 Functioning as a Local Survivable Processor
The S8300 functioning as a Local Survivable Processor (LSP) logs an alarm when it becomes
active. It also logs an alarm for every G700 Media Gateway that registers with it. It does NOT
log alarms when IP phones register with it; rather, it logs a warning.
Communication Manager Hardware Traps
Table 6: Communication Manager Hardware Traps on page 34 illustrates hardware traps that
apply to Communication Manager.
Table 6: Communication Manager Hardware Traps
TrapDescription
Media Server HW trapHardware faults are analyzed by maintenance software and
correlate fault conditions to determine the appropriate action. If
appropriate action requires attention, a trap of critical severity is
sent.
Media Server HW clear
trap
Media Server with
administered MG that’s
not registered
Hardware faults that have created traps send a clear trap upon
clearing.
If a Media Server has an administered G700 but it has not
registered after an appropriate amount of time, send an alarm of
major severity indicating such.
Note:
Note:The Avaya S8300 Media Server on a G700 Media Gateway platform has several
watchdog timers. If any one of them is not verified regularly, a trap of major
severity is sent. The timer associated with the S8300 is the S8300 Software
watchdog, which resets the S8300 processor if its connection is not verified
regularly.
34 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 35
Backup and Restore Traps
The S8300 uses the LAN to backup a copy of its translation data. Table 7: Backup and Restore
Traps on page 35 illustrates the backup and restore traps.
Table 7: Backup and Restore Traps
TrapDescription
Alarms in Linux Media Servers
Successfully stored
backup
A trap of informational severity is sent when backup is successful.
(REPLY_ACK) The trap reads “Successful backup of S8300
translation data,” and names the backup location stored in the
string “BACKUP_LOCATION.”
This information also goes to the local maintenance screen, since
it is very possible that a backup is being requested as a result of an
on-site attempt to replace the S8300.
No backup data storedA trap of major severity is sent as soon as a REPLY_ERROR
message is returned. The trap states “Translation Data backup not
available,” and names the backup location stored in the string
“BACKUP_LOCATION.”
Linux Media Server MOs and Alarms
Hardware MOs
The server’s hardware MOs are described in the following sections:
● DAJ1/DAL1 (Duplication Memory Board) on page 49
● RALM-SVC (Remote Alarm Service) on page 76
● USB1 (Modem Testing) Alarms on page 95
● UPS (Uninterruptible Power Supply) on page 89
● Remote Maintenance Board (RMB) Alarms on page 116
Issue 1 June 200535
Page 36
Server Alarms
Server-related alarms
Server-related alarms and their troubleshooting procedures are described in the following
tables:
● ARB (Arbiter) on page 37
● DAJ1/DAL1 Alarms in S8700 Media Server on page 54
● DUP (Duplication Manager) on page 56
● ENV (Environment) on page 60
● FSY (File Synchronization) on page 67
● HDD (Hard Disk Drive) on page 69
● KRN (Kernel) on page 73
● Login Alarms on page 74
● NIC (Network Interface Card) on page 75
● RALM-SVC (Remote Alarm Service) on page 76
● SME Alarm in S8700 Media Server on page 78
● SVC_MON (Service Monitor) on page 79
● _TM (Translation Manager) on page 88
● UPS Alarms to the Media Server on page 92
● USB1 (Modem Testing) Alarms on page 95
● _WD (Watchdog) Alarms on page 99
● Login Alarms - S8300 on page 115
● Virtual Alarms on page 116
● Remote Maintenance Board (RMB) Alarms on page 116
● S8500B Augmentix Server Availability Management Processor™ (A+SAMP) Alarms on
page 120
● S8710 environmental alarms on page 121
● S8710 server BIOS error messages on page 123
36 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 37
ARB (Arbiter)
S8700 | 8710 only
The Arbiter process runs on S8700 Media Servers to:
● Decide which server is healthier and more able to be active
● Coordinate data shadowing between servers, under the Duplication Manager’s control
At the physical and data-link layers, an Ethernet-based duplication link provides an inter-arbiter
UDP communication path to:
● Enable this arbitration between the active and standby servers
● Provide the necessary status signaling for memory refreshes
Alarms in Linux Media Servers
Issue 1 June 200537
Page 38
Server Alarms
Table 8: ARB Alarms in Media Server on page 38 describes the Arbiter’s alarms and their
troubleshooting procedures. See DUP (Duplication Manager)
Table 8: ARB Alarms in Media Server
on page 56 for more information.
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
3MIN“Interchange hand off failed” — Standby server could not process active
server’s interchange request.
The interchange does not occur, and the active side remains active.
1. See if the standby side is RESET, either from the:
- Web interface’s Server section, by selecting View Summary
Status
- Linux command line, by entering server
2. Manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
3. If the problem persists, troubleshoot the standby server.
a. See if the standby side is RESET, either from the:
- Web interface’s Server section, by selecting View Summary
Status
- Linux command line, by entering server
b. Check for application problems, either from the:
- Web interface, by selecting View Process Status
- Linux command line, by entering statapp,
and restore any applications with problems.
c. Check for problems with an Ethernet interface, either from the:
- Web interface, by selecting the Execute Pingall diagnostic
- Linux command line, by entering pingall -a
Check both sides of each failed link, and make any necessary
repairs.
4. If the applications and interfaces are okay but the problem persists,
escalate the problem.
1 of 12
38 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 39
Table 8: ARB Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
7MAJ“Arbiter in invalid/unknown state” — Memory corruption or bad code/build
1. Verify that the server’s state is “Corrupt!”, either from the:
- Web interface’s Server section, by selecting View Summary
Status
- Linux command line, by entering server
2. Compare the suspected arbiter with the one in /root2 — using
the Linux commands:
/opt/ecs/sbin/acpfindvers /opt/ws/arbiter
(This command shows the arbiter’s version string.)
/sbin/cksum /opt/ws/arbiter
[This command runs a cyclical redundancy check (CRC) against
the arbiter, and then shows both the CRC’s output value and the
number of bytes in the arbiter file.]
3. If the two arbiter files differ:
a. Get a fresh copy of arbiter from the CD.
b. Manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
4. If the arbiter file is OK or the problem persists, escalate the
problem.
2 of 12
Issue 1 June 200539
Page 40
Server Alarms
Table 8: ARB Alarms in Media Server (continued)
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
8MIN“Both servers thought they were active”
1. To verify this condition, either from the:
- Web interface’s Server section, select View Summary Status
- Linux command line, enter server
2. To distinguish the cause, examine the trace logs for Interarbiter
messages with timestamps shortly before to shortly after the loss of
heartbeat, either from the:
- Web interface, by:
a. Selecting the View System Logs diagnostic and Logmanager
Debug trace
b. Specifying the Event Range for the appropriate time frame
c. Matching the Interarb pattern
- Linux command line, by entering logv -t ts
Depending on the cause, continue with either Step 3
or Step 4.
3 of 12
40 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 41
Table 8: ARB Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Level
8
MIN3. A high-priority process caused the active Arbiter to hang for at least
(cont’d)
Alarm Text, Cause/Description, Recommendation
4.5 seconds (causing an interchange). Then, the hang lifted, and
each Arbiter realized that the other had assumed the active role.
An automatic resolution process should leave the newly active server
active, while the other server backs down to the standby role.
a. If so, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
b. If the problem recurs, escalate the problem.
4. Every Interarbiter link is down or mis-configured.
a. Check for problems with an Ethernet interface, either from the:
- Web interface, by selecting the Execute Pingall diagnostic
- Linux command line, by entering pingall -a
Check both sides of each failed link, and make any necessary
repairs.
b. If the links are OK but the problem persists, escalate the problem.
9WRN<SOH (state-of-health) string> — Before an interchange, the standby
server is significantly healthier than the active server requesting the
interchange. (The active server is probably unable to sustain call
processing.)
Understanding ARB Event #9’s String Pairs
ARB Event #9 generates pairs of SOH strings, where in each string pair,
the:
st
● 1
string represents the active
nd
● 2
string represents the standby
server’s SOH just before an interchange. Since – (unless prevented by
external circumstances) – Event 9 triggers a server interchange, the 1
string normally represents the less healthy server – which became the
standby. So, the 1
st
string’s data is usually more pertinent.
4 of 12
st
Issue 1 June 200541
Page 42
Server Alarms
Table 8: ARB Alarms in Media Server (continued)
EventIDAlarm
Level
9
WRNThe following is a sample string pair generated by ARB Event #9. Within
(cont’d)
Alarm Text, Cause/Description, Recommendation
this sample, four pairs of digits in each string have special meaning, and
are labeled “aa” through “dd.”
aa bb cc dd
↓↓ ØØ ØØ ØØ
gmm 0700, pcd 00/00, dup 270, wd 81, actv 004
gmm 0700, pcd 06/06, dup 370, wd 01, actv 014
● For “aa,” any value other than “00” indicates a hardware problem.
(For example, the value “20” is common for a power failure.)
In the previous sample, neither server had hardware trouble.
● For “bb” and “cc,” different values within the same string indicate a
problem connecting to one or more IPSI connected PNs.
A PN reset can cause both server’s strings to reflect equally degraded
health, but that event (in itself) should not trigger a server interchange.
In the previous sample, both servers’ connectivity to IPSI connected
PNs is OK. (The 1
st
and 2
nd
strings have like “00” and “06” values,
respectively.)
● For “dd,” any value other than “01” indicates a failed software
process. (More precisely, a certain value indicates a problem with a
discrete portion of the platform’s process set, including:
- “21” for a Linux daemon (for example, “atd”, “httpd”, “inetd”, or
“xntpd”)
- “41” for a platform service (for example, “dbgserv”, “prune”, or
“syslog”)
- “81” for reloaded Communication Manager software, as in the
previous sample
5 of 12
42 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 43
Table 8: ARB Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Level
9
WRN
(cont’d)
Alarm Text, Cause/Description, Recommendation
Troubleshooting ARB Event #9
1. compare the health of both servers, either from the:
- Web interface’s Server section, by selecting View Summary
Status
- Linux command line, by entering server
2. Using the output from Step 1
individual processes.
3. Check the health of the active server’s individual processes, either
from the:
- Web interface, by selecting View Process Status
- Linux command line, by entering statapp
and restore any applications with problems.
4. See if the standby side is RESET, either from the:
- Web interface’s Server section, by selecting View Summary
Status
, check the health of each server’s
- Linux command line, by entering server
5. Check the health of the standby server’s individual processes, either
from the:
- Web interface, by selecting View Process Status
- Linux command line, by entering statapp,
and restore any applications with problems.
6. Check for problems with an Ethernet interface, either from the:
- Web interface, by selecting the Execute Pingall diagnostic
- Linux command line, by entering pingall -a
Check both sides of each failed link, and make any necessary repairs.
7. If the standby’s applications and interfaces are OK but the problem
persists, escalate the problem.
6 of 12
Issue 1 June 200543
Page 44
Server Alarms
Table 8: ARB Alarms in Media Server (continued)
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
After the interchange, the newly active server’s health should be
significantly better (lower SOH value) than the standby server’s. If so,
troubleshoot the standby server:
If not:
1. Manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
Since the Arbiter continuously attempts to create or bind the socket, the
problem may resolve itself. Once resolved, the Arbiter can send and
receive across every Interarbiter link (no subsequent error messages in
the trace log).
1. Examine the alarm log to distinguish between a:
Bind or create problem
Send or receive socket problem
by accessing either the:
- Web interface, by:
a. Selecting Alarms and Notification and the appropriate alarm
b. Selecting the View System Logs diagnostic
c. Selecting the Logmanager Debug trace
d. Specifying the Event Range for the appropriate time frame
e. Matching the “cannot create” pattern
- Linux command line, by entering almdisplay -v
7 of 12
44 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 45
Table 8: ARB Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Level
11
(cont’d)
WRN2. Check for both the completeness and consistency of the servers’
Alarm Text, Cause/Description, Recommendation
hosts and servers.conf files (containing IP addresses of the
system’s configured components), either from the:
- Web interface, by selecting Configure Server
- Linux command line, by entering:
more /etc/hosts
more /etc/opt/ecs/servers.conf
3. If the files are OK, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
4. If this problem affects call processing or if the problem persists,
continue with Step 5
now.
If not, continue only at the customer’s convenience.
5. Escalate this problem for explicit guidance with Steps 5a
through 6.
a. Enter server to verify that the suspected server is the standby.
b. If not, enter server -if to force a server interchange.
c. Busy out the standby server from the Linux command line, by
entering server -b.
d. Reboot the server (as the standby), either from the:
- Web interface, by selecting Shutdown This Server
- Linux command line, by entering /sbin/shutdown -r
now
6. If rebooting the standby does not help or if the problem recurs,
escalate the problem to the next higher tier.
8 of 12
Issue 1 June 200545
Page 46
Server Alarms
Table 8: ARB Alarms in Media Server (continued)
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
12MIN“Interchange without doing prep” — Since the Arbiter could not create a
thread to request a file synchronization, some files did not get shadowed.
1. Examine the trace logs for the entry, Can't create interchange-prep thread, either from the:
- Web interface by:
a. Selecting the View System Logs diagnostic and Logmanager
Debug trace
b. Specifying the Event Range for the appropriate time frame
c. Matching the “interchange-prep” pattern
- Linux command line, by entering logv -t ts
2. Resubmit any translation changes entered during the last 15-minute
file-synchronization interval.
3. Manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
9 of 12
46 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 47
Table 8: ARB Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
13MIN“Heartbeat timeout from ACTIVE” — When the timeout occurs, this alarm
is only logged on the standby side. After logging the alarm, the servers
should have interchanged, so that the:
● Alarm normally resides on the newly active (healthier) server
● Previously active server has backed down to the standby role
As potential causes, either the:
Alternate side is in normal shutdown (irregular, but possibly innocuous).
1. On the standby server, look for occurrences of the stop command,
either from the:
- Web interface, by:
a. Selecting View System Logs
b. Selecting Platform command history log
c. Specifying the Event Range for the appropriate time frame
d. Matching the “Stop” pattern
- Linux command line, by entering listhistory
Note: From the system’s perspective, this is normal behavior.
However, in terms of potential service outage due to human error, this
is quite irregular. Shutting down a server effectively downgrades a
duplex-, high-, or critical-reliability system to an unsupported
standard-reliability system.
10 of 12
Issue 1 June 200547
Page 48
Server Alarms
Table 8: ARB Alarms in Media Server (continued)
EventIDAlarm
Level
13
(cont’d)
MIN2. From the Linux command line, enter start -a to restart the standby
Alarm Text, Cause/Description, Recommendation
server.
3. Prevent any future misuse of the stop command.
4. Manually clear the alarm on the active server, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
Either the:
● The alternate side is hung (ARB Event 8 is not being generated)
● Two or more Interarbiter links are down (ARB Event 8 is also being
generated)
Therefore, if the servers interchanged (the previously active server
backed down to standby), use the following procedure:
5. Check for problems with an Ethernet interface, either from the:
- Web interface, by selecting the Execute Pingall diagnostic
- Linux command line, by entering pingall -a
Check both sides of each failed link, and make any necessary repairs.
a. If the Ethernet interfaces are OK, see if the standby server is
busied-out, either from the:
- Web interface, by selecting View Summary Status
- Linux command line, by entering server
b. If so, release the standby server, either from the:
- Web interface, by selecting Release Server
- Linux command line, by entering server -r
If not, check for related alarms to troubleshoot the standby.
6. Manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
7. If the problem persists, escalate the problem.
11 of 12
48 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 49
Table 8: ARB Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
13
(cont’d)
MINIf the servers did not interchange, use the following procedure:
8. See if the standby server is busied-out, either from the:
- Web interface, by selecting View Summary Status
- Linux command line, by entering server
9. If so, release the standby server, either from the:
- Web interface, by selecting Release Server
- Linux command line, by entering server -r
If not, escalate this problem for explicit guidance with general
troubleshooting of both servers.
Back to: Hardware MOsServer-related alarms
DAJ1/DAL1 (Duplication Memory Board)
12 of 12
S8700 | 8710 only
This MO supports each S8700 media server’s Duplication Memory board (DAJ1) and the S8710
media server’s Duplication Memory board (DAL1). These Duplication Memory boards are a NIC
(network interface card) serving as the physical and data-link interface for an Ethernet-based
duplication link between the servers. This link provides a call-status data path for sending:
● TCP-based communication between each server’s Process Manager
● UDP-based communication between each server’s Arbiter to:
- Enable arbitration between the active and standby servers
- Provide status signaling for memory refreshes
Note:
Note:The Duplication Memory cards are not interchangeable between media servers.
Note:
Note:This call-status data is separate from the translations and Linux files shadowed
The DAJ1 will only work in S8700 media servers and the DAL1 will only work in
S8710 media servers.
between servers. (See FSY (File Synchronization)
on page 67.)
Issue 1 June 200549
Page 50
Server Alarms
(Table 13: DAJ1/DAL1 Alarms in Media Server on page 54 describes the Duplication Memory
board’s alarms and their troubleshooting procedures.) [See also, ARB (Arbiter)
DUP (Duplication Manager)
Both periodic and on-demand testing is provided for this MO. The periodic test runs the Read
Error Register test at 15-minute intervals. On-demand testing includes the Read Error Register
and Local Loop tests.
on page 37,
on page 56, and NIC (Network Interface Card) on page 75.]
MO’s Name (in
Alarm Log)
DAJ1 or DAL1MAJtestdupboard
Alarm
Level
Initial Linux Command to RunFull Name of MO
1
Server Dup Mem board
DAJ1 or DAL1MINtestdupboardServer Dup Mem board
DAJ1 or DAL1WRNtestdupboardServer Dup Mem board
1. See Table 9
Table 9: Testdupboard command syntax and arguments
e. Event ID #6 Inability to open a communications link with the Duplication Memory board.
The board is out of service if this failure occurs once, and a Major alarm is logged:
#1,ACT,[DAJ1|DAL1],A,6,MAJ,Failed to open [DAJ1|DAL1] card
Issue 1 June 200551
Page 52
Server Alarms
System Technician-Demanded Tests:
Descriptions and Error Codes
Investigate tests in the order shown in. By clearing error codes associated with the first test, you
may clear errors generated from other tests in the sequence.
Order of InvestigationShort Test
Sequence
Long Test
Sequence
Read Error Register testXXD
[DAJ1|DAL1] Board Local Loop testXXD
1. D = Destructive; ND = Nondestructive
Read Error Register Test
The Read Error Register test queries three registers including the:
● Optical line receiver’s CRC error register
CRC errors indicate problems with the optical interface between the active and standby
servers.
● SDRAM’s single-bit error register
Although the Duplication Memory board can “self heal” single-bit errors in the SDRAM’s
error register, chronic problems can indicate a more serious problem.
● SDRAM’s multiple-bit error register
An SDRAM multiple-bit error condition indicates a problem in the Duplication Memory
board’s memory, and cannot be recovered.
D/ND
1
The following errors can be detected:
52 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 53
Table 11: TEST Read Error Register Test
Alarms in Linux Media Servers
Error CodeTest
Result
Open failed
ABRTThe Memory Duplication Driver (MDD) is the system driver that
to MDD
System Error
ABRTThe test ran, but, for some reason the MDD could not return data.
–
MDD failed
to
return data
communicates with the [DAJ1|DAL1] board. If this driver cannot be
opened, then the [DAJ1|DAL1] board’s registers cannot be read.
1. This is a system type error, try again.
1. This is a system type error, try again.
indicates which of the [DAJ1|DAL1] board’s error counters had
positive data. The failure code can be 1–7 and is determined by the
bit vector, “0xxx”. (Every “x” bit could be set, indicating that every
error register had errors.) Specifically, if bit:
1 is set (0xx1) – Single-bit errors occurred.
2 is set (0x1x) – CRC errors occurred.
3 is set (01xx) – Multibit errors occurred.
[DAJ1|DAL1] Local Loop Test
Note:
Note:This is an on-demand test that only runs on the standby media server when it is
busied out.
This test runs a local loop-around test on the standby media server’s [DAJ1|DAL1] board. A
32-bit data number is written to an address and verified for correct transmission. The test
reads the contents of the:
● Last data received registers
● Last address received register
and then compares the data. If the data matches, the test passes. If not, the test fails.
The following errors can be detected:
Issue 1 June 200553
Page 54
Server Alarms
Table 12: TEST DAJ1/DAL1 Local Loop Test
Error CodeTest ResultDescription / Recommendation
Open failed
to MDD
System Error
MDD failed
to return
data
Loop-around
test failed
Back to: Hardware MOsServer-related alarms
ABRTThe MDD is the system driver that communicates with the
Duplication Memory board. If this driver cannot be opened, the
board’s registers cannot be read.
This is a system type error, try again.
ABRTThe test ran, but for some reason the MDD could not return
data.
This is a system type error, try again.
FAILThe last address received does not match the address that was
written, or the last data received does not match the data that
was written.
DAJ1/DAL1 Alarms in S8700 Media Server
S8700 | 8710 only
Table 13: DAJ1/DAL1 Alarms in Media Server
card MO’s alarms and their troubleshooting procedures.
on page 54 describes the Duplication Memory
Table 13: DAJ1/DAL1 Alarms in Media Server
EventIDAlarm
Level
2WRN“Single-bit EDC test (bad SB err cnt)” — Single-bit SDRAM error occurred
54 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Alarm Text, Cause/Description, Recommendation
20 times.
Software automatically clears the single-bit error register. This is a
log-only indication of the error’s occurrence.
1. Manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
1 of 2
Page 55
Table 13: DAJ1/DAL1 Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
3MAJ“Multibit EDC Test (bad err register)” — Catastrophic multibit SDRAM
error occurred. (Usually due to a hardware problem.)
1. Enter testdupboard on the Linux command line.
2. If the test fails, escalate this problem for explicit guidance with Steps
through 5.
3
3. Power-cycle the server.
4. Enter testdupboard again.
5. If the test still fails, replace the server.
4MIN“Local-loop failure” — On-demand local-loop test failed 3 times. (Cannot
read from or write to DAJ1|DAL1 buffers.)
The Localloop test only runs on a busied-out standby server.
1. If the on-demand test is failing (but a running duplicated system has
no problems), do nothing.
If the running duplicated system has problems, continue with Step 2
2. Enter testdupboard on the Linux command line.
3. If the test fails, escalate this problem for explicit guidance with Steps
through 6.
4
4. Power-cycle the server.
5. Enter testdupboard again.
6. If the test still fails, replace the server.
5MIN“Optical link received CRC errors” — Received multiple CRC errors
across the fiber link.
1. Run testdupboard on both servers.
2. If CRC errors are occurring on both servers, it may be a:
● Likely problem with the fiber link
● Far less likely problem with both Duplication Memory cards
If not, the other server’s Duplication Memory card may be faulty.
3. If the running system has duplication-related problems, escalate this
problem to replace the server.
If not, ignore the alarm.
Back to: Hardware MOsServer-related alarms
2 of 2
Issue 1 June 200555
Page 56
Server Alarms
DUP (Duplication Manager)
S8700 | 8710 only
The Duplication Manager process, via coordination of the Arbiter process, runs on each S8700
Media Server to control data shadowing between them.
At the physical and data-link layers, an Ethernet-based duplication link provides a TCP
communication path between each server’s Duplication Manager to enable their control of data
shadowing.
Table 14: DUP Alarms in Media Server
alarms and their troubleshooting procedures.
See ARB (Arbiter)
more information.
on page 37 and DAJ1/DAL1 (Duplication Memory Board) on page 49 for
on page 57 describes the Duplication Manager’s
56 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 57
Table 14: DUP Alarms in Media Server
Alarms in Linux Media Servers
Event IDAlarm
Alarm Text, Cause/Description, Recommendation
Level
1MAJ“Duplication card error” — Duplication Manager determined that the
duplication card is not functioning, but it cannot distinguish between a bad
card, an unplugged card, or a bad fiber link.
1. Check the physical fiber connectivity at each server.
2. Verify the alarm, by accessing the trace log, either from the:
● Web interface, by:
a. Selecting the View System Logs diagnostic and Logmanager
Debug trace
b. Specifying the Event Range for the appropriate time frame
c. Matching the “dup” pattern
● Linux command line, by entering logv -t ts
3. Examine the trace-log query’s output for one of these messages:
“glbi: couldn't open Dup Card, errno=<#>. ndm exiting”
“glbi: mmap failed, errno=<#>. ndm exiting”
“Haven't heard from active dupmgr. Dup fiber link down.”
“san_check_rsp() FAILED: Dup Fiber link down.”
4. See if the dup link is both “up” and “refreshed”, either from the:
● Web interface’s Server section, by selecting View Summary
Status
● Linux command line, by entering the server command
5. If so, manually clear the alarm, either from the:
● Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
● Linux command line, by entering almclear -n #id
1 of 3
Issue 1 June 200557
Page 58
Server Alarms
Table 14: DUP Alarms in Media Server (continued)
Event IDAlarm
Level
1
(cont’d)
MAJIf not and at the customer’s convenience:
Alarm Text, Cause/Description, Recommendation
Since the following commands cause a brief service outage, they should
only be executed at the customer’s convenience.
a. Force a server interchange to make the suspected server standby,
either from the Linux command line, by entering server -if.
b. Busy out the standby server from the Linux command line, by
entering server -b.
c. Reboot the standby server, either from the:
- Web interface, by selecting Release Server
- Linux command line, by entering server -r
6. If the problem persists, you can try:
a. Replacing the fiber between the two servers
b. Rebooting the standby server
7. If the problem continues to persist, escalate for a probable server
replacement.
2MAJ“Duplication link down” — One server’s Duplication Manager cannot
communicate with the other server’s Duplication Manager.
1. Access the trace log, either from the:
- Web interface, by:
a. Selecting the View System Logs diagnostic and Logmanager
Debug trace
b. Specifying the Event Range for the appropriate time frame
c. Matching the “dup” pattern
- Linux command line, by entering logv -t ts
2. Examine the trace-log query’s output for one of these messages:
“mainlp: get_addrs returned ***. Could not get IP address for other
server.
Verify name and address in servers.conf. ndm exiting.”
“san_check_msg() sync_msg failed: DUPLINK DOWN.”
2 of 3
58 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 59
Table 14: DUP Alarms in Media Server (continued)
Alarms in Linux Media Servers
Event IDAlarm
Level
2
(cont’d)
MAJ3. See if the dup link is “up”, either from the:
Alarm Text, Cause/Description, Recommendation
- Web interface’s Server section, by selecting View Summary
Status
- Linux command line, by entering the server command
4. If so, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
If not, check the duplication interface’s Ethernet connectivity, either
from the:
- Web interface, by selecting the Execute Pingall diagnostic
- Linux command line, by entering pingall -d
5. If pingall passes, check the other server’s applications, either from
the:
- Web interface, by selecting View Process Status
- Linux command line, by entering statapp
Back to: Hardware MOsServer-related alarms
3 of 3
Issue 1 June 200559
Page 60
Server Alarms
ENV (Environment)
S8700 | 8710 / S8300 / S8500B
The ENV MO monitors environmental variables within the server, including temperature,
voltages, and fans. Table 15: ENV Alarms in Media Server
MO’s alarms and their troubleshooting procedures.
Table 15: ENV Alarms in Media Server
on page 60 describes the ENV
EventIDAlarm
Level
1MAJ“Temperature reached critical low” — Motherboard's temperature reached
2MIN“Temperature reached warning low” — Motherboard's temperature
Alarm Text, Cause/Description, Recommendation
a critically low level.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
reached a warning low.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
60 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
1 of 8
Page 61
Table 15: ENV Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
3MIN“Temperature reached warning high” — Motherboard's temperature
reached a warning high.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
4
S8300
MAJ“Temperature reached critical high” — Motherboard's temperature
reached a critically high level.
1. Look for any obstructions blocking the server’s fans.
2. Check for any fan alarms, and clear those alarms.
3. Shut down and restart the system.
4. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
5. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear.
- Linux command line, by entering almclear -n #id
2 of 8
Issue 1 June 200561
Page 62
Server Alarms
Table 15: ENV Alarms in Media Server (continued)
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
6MAJ“+3.3 voltage reached critical low” — Motherboard's nominal +3.3 voltage
reached a critically low level.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
8MAJ“+3.3 voltage reached critical high” — Motherboard's nominal +3.3 voltage
reached a critically high level.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
10MAJ“+5 voltage reached critical low” — Motherboard's nominal +5 voltage
reached a critically low level.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
3 of 8
62 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 63
Table 15: ENV Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
12MAJ“+5 voltage reached critical high” — Motherboard's nominal +5 voltage
reached a critically high level.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
14MAJ“+12 voltage reached critical low” — Motherboard's nominal +12 voltage
reached a critically low level.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
16MAJ“+12 voltage reached critical high” — Motherboard's nominal +12 voltage
reached a critically high level.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
4 of 8
Issue 1 June 200563
Page 64
Server Alarms
Table 15: ENV Alarms in Media Server (continued)
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
18MAJ“-12 voltage reached critical low” — Motherboard's nominal -12 voltage
reached a critically low level.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
20MAJ“-12 voltage reached critical high” — Motherboard's nominal -12 voltage
reached a critically high level.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
22MAJ“CPU core voltage reached critical low” — Motherboard's CPU core
voltage reached a critically low level.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
5 of 8
64 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 65
Table 15: ENV Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
24MAJ“CPU core voltage reached critical high” — Motherboard's CPU core
voltage reached a critically high level.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
26MAJ“CPU I/O voltage reached critical low” — Motherboard's CPU I/O voltage
reached a critically low level.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
28MAJ“CPU I/O voltage reached critical high” — Motherboard's CPU I/O voltage
reached a critically high level.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
6 of 8
Issue 1 June 200565
Page 66
Server Alarms
Table 15: ENV Alarms in Media Server (continued)
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
29MAJ“All fan failure” — Every fan is running at a critically low speed.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
31
S8500B
MAJ+1.5V under voltage. S8500B media server environment.
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
33
S8500B
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
MAJ+1.5V over voltage. S8500B media server environment.\
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
7 of 8
66 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 67
Table 15: ENV Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Level
35
S8500B
37
S8500B
MAJ+2.5V under voltage. S8500B media server environment.
MAJ+2.5V over voltage. S8500B media server environment.
Alarm Text, Cause/Description, Recommendation
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
1. See if the alarmed condition is still present, either from the:
- Web interface, by selecting the View Temperature/Voltage
diagnostic
- Linux command line, by entering environment
2. If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
Back to: Hardware MOsServer-related alarms
FSY (File Synchronization)
S8700 | 8710 / S8500
The File Synchronization (FSY) process uses TCP-based communication over 100Base-T
Ethernet links to provide synchronized duplication of critical data-shadowed files, including
translations and important Linux files.
Note:
Note:This set of files is separate from the data shadowed between each server’s
DAJ1/DAL1 (Duplication Memory Board)
8 of 8
on page 49.
Issue 1 June 200567
Page 68
Server Alarms
Table 16: FSY Alarm in Media Server on page 68 describes the FSY MO’s alarms and their
1. See if the filesyncd (file sync daemon) process is up, either from the:
- Web interface, by selecting View Process Status
- Linux command line, by entering statapp
2. Check the trace log for more granular information. (The file sync
daemon can report failures of synchronizing one or more files.)
Access the trace log, either from the:
- Web interface, by:
a. Selecting the View System Logs diagnostic and Logmanager
Debug trace
b. Specifying the Event Range for the appropriate time frame
c. Matching the “file sync failed” pattern
- Linux command line, by entering logv -t ts
3. (Except
S8500) Verify that the dup link is both “up” and “refreshed”,
either from the:
● Web interface’s Server section, by selecting View Summary Status
● Linux command line, by entering the server command
(Neither side should be “off-line” nor “down”.)
4. (Except
from the:
- Web interface, by selecting the Execute Pingall diagnostic
- Linux command line, by entering pingall -a
If not, check each side of this failed link, and make any necessary
repairs.
5. (Except
verify that this alarm is not a consequence of other duplication-related
problems.
6. If the problem persists, escalate the problem.
Back to: Hardware MOs
S8500) Make sure that the Ethernet duplication link is up, either
S8500) Check the physical fiber connectivity at each server to
Server-related alarms
68 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 69
HDD (Hard Disk Drive)
The HDD MO monitors the hard drive via the Self Monitoring, Analysis, and Reporting
Technology (S.M.A.R.T) capability that is built into the hard disk drive unit. The integrated
controller for the hard disk drive works with sensors incorporated in the hard drive to monitor the
drive’s performance. The SMART technology makes status information concerning the disk
drive available to monitoring software.
The basic concept with SMART is that some hard disk drive problems do not occur suddenly.
They are the result of a gradual degradation of disk components. For example, if the value for
Reallocated Event Count (count of remap operations, both successful and non-successful) for
Event ID 21 is going up it may indicate an impending disk failure. At the very least it should be
monitored closely.
This message indicates that the attribute value has exceeded its threshold
value.
1. If posted, user very likely has a drive problem and should definitely
consider replacing the drive.
smartd: Device: device_name, Read Smartd Threshold Failed
This message indicates the SMART utility was unable to read the current
SMART values or thresholds for the drive. This may result in SMART not
executing and the values that are reported may be stale (or outdated). See
also Event ID 22.
smartd: Device: /dev/had, S.M.A.R.T. Attribute: attr_number Changed
chng_value.
The value for the specified attribute number (attr_number) has changed by
the specified value (chng_value). Posting of this alarm may/may not
indicate possible drive problems.
on page 69 describes the HDD Event IDs and their
Definitions for the attributes are:
1 of 4
Issue 1 June 200569
Page 70
Server Alarms
Table 17: HDD Alarm in Media Server (continued)
EventIDAlarm
Level
21
(cont’d)
Alarm Text, Cause/Description, Recommendation
NumNameDescription
1Raw Read ErrorIndicates the rate of hardware read errors
that occur when reading data from the
disk surface. This error is critical. An
increasing error rate may indicate a
failing disk drive. (Samsung, Seagate,
IBM (Hitachi), Fujitsu, Maxtor, Western
Digital)
2Throughput
Performance
Overall throughput performance of the
hard disk.
3Spin Up TimeRaw value average of time to spin up
drive spindle. (Samsung, Seagate, IBM
(Hitachi), Fujitsu, Maxtor, Western Digital)
4Start Stop CountCount of hard disk spindle start/stop
cycles. (Samsung, Seagate, IBM
(Hitachi), Fujitsu, Maxtor, Western Digital)
5Reallocated Sector
Amount of remapped sectors.
Count
6Read Channel MarginNo explanation of attribute available
7Seek Error RateAverage rate of seek errors: if this value
continues to increase it indicates there
may be a problem with the disk surface or
a mechanical problem. (Samsung,
Seagate, IBM (Hitachi), Fujitsu, Maxtor,
Western Digital)
8Seek Time PerformanceDisk seek system performance.
(Samsung, Seagate, IBM (Hitachi),
Fujitsu, Maxtor, Western Digital)
9Power_On_HoursNumber of hours of the power-on state of
the drive. This value indicates aging.
(Samsung, Seagate, IBM (Hitachi),
Fujitsu, Maxtor, Western Digital)
10Spin_Retry_CountCount of retry of drive spindle spine start
up attempts. (Samsung, Seagate, IBM
(Hitachi), Fujitsu, Maxtor, Western Digital)
2 of 4
70 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 71
Table 17: HDD Alarm in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Level
21
(cont’d)
Alarm Text, Cause/Description, Recommendation
11Recalibration RetriesNumber of times recalibration was
requested after initial request. (Samsung,
Seagate, IBM (Hitachi), Fujitsu, Maxtor,
Western Digital)
12Device Power Cycle
Count
Count of full hard disk power on/off
cycles. (Samsung, Seagate, IBM
(Hitachi), Fujitsu, Maxtor, Western Digital)
13Soft Read Error RateRate of program read errors when
reading data from disk. (Samsung,
Seagate, IBM (Hitachi), Fujitsu, Maxtor,
Western Digital)
193Load/Unload CycleCount of load/unload cycles into landing
zone position. (Samsung, Seagate, IBM
(Hitachi), Fujitsu, Maxtor, Western Digital)
194TemperatureHard disk drive temperature. (Samsung,
Seagate, IBM (Hitachi), Fujitsu, Maxtor,
Western Digital - select models)
196Reallocated Event
Count
Count of remap operations (transferring
of data from bad sector to reserved disk
area) successful and non-successful.
This error is critical. An increasing
count for this error may indicate a
failing disk drive. (Samsung, Seagate,
IBM (Hitachi), Fujitsu, Maxtor, Western
Digital - select models)
197Current Pending Sector
Count
198Uncorrectable Sector
Count
Current count of unstable sectors (waiting
for remap). This error is critical. An
increasing count for this error may
indicate a failing disk drive. (Samsung,
Seagate, IBM (Hitachi), Fujitsu, Maxtor,
Western Digital)
Count of uncorrectable errors when
reading/writing a sector. This error is
critical. An increasing count for this
error may indicate a failing disk drive.
(Samsung, Seagate, IBM (Hitachi),
Fujitsu, Maxtor, Western Digital)
3 of 4
Issue 1 June 200571
Page 72
Server Alarms
Table 17: HDD Alarm in Media Server (continued)
EventIDAlarm
Level
21
(cont’d)
Alarm Text, Cause/Description, Recommendation
199UltraDMA CRC Error
Count
Count of Cyclic Redundancy Check
(CRC) errors during UltraDMA mode
(Samsung, Seagate, IBM (Hitachi),
Fujitsu - select models, Maxtor, Western
Digital - select models)
200Write Error Rate (Multi
Zone Error Rate
Total number of errors found when writing
a sector. (Samsung, Seagate, IBM
(Hitachi), Fujitsu, Maxtor, Western Digital)
220Disk ShiftIndicates how much the disk has shifted
(unit of measure unknown). This error is
critical. An increasing value for this
error may indicate a failing disk drive.
(Seagate)
221G-Sense Error RateRate of errors occurring as a result of
impact loads such as dropping the drive,
wrong installation, etc. (Seagate, Hitachi)
222Loaded HoursLoading on magnetic heads actuator
caused by the general operating time.
223Load/Unload Retry
Count
Loading on magnetic heads actuator
caused by numerous recurrences of
operations like: reading, recording,
positioning, etc.
224Load FrictionLoading of magnetic heads actuator
caused by friction in mechanical part of
the store.
226Load-in TimeTotal time of loading on the magnetic
heads actuator.
227Torque Amplification
Count
228Power-Off Retract
Count
Count of efforts of the rotating moment of
a drive
Count of the number of times the drive
was powered off.
230GMR Head AmplitudeAmplitude of the heads trembling in
running mode.
22WRNFailed to read smart values/thresholds
This indicates that the smart utility was not able to read the smart values/
thresholds from the drive. The smart utility is unable to function due to drive
access problems.
4 of 4
72 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 73
KRN (Kernel)
S8700 | 8710
Alarms in Linux Media Servers
The KRN MO monitors the operating system Kernel. Table 18: KRN Alarm in Media Server
page 73 describes the MO’s alarms and their troubleshooting procedures.
Table 18: KRN Alarm in Media Server
EventIDAlarm
Level
10MAJFailure in a Synchronous Dynamic Random Access Memory (SDRAM)
Back to: Hardware MOs
Alarm Text, Cause/Description, Recommendation
Dual Inline Memory Module (DIMM). Error Correcting Code (ECC) memory
detected a multi-bit error (MBE) but cannot correct it.
1. The SDRAM DIMM within the media server is not field replaceable.
Replace the media server.
Server-related alarms
on
Issue 1 June 200573
Page 74
Server Alarms
Login Alarms
The Login MO monitors access to the server and alarms suspicious activity. Table 19: Login
Alarms in Media Server on page 74 describes the Login MO’s alarms and their troubleshooting
procedures.
Table 19: Login Alarms in Media Server
EventIDAlarm
Level
2WRN“sat_Auth:Login for [inads] invalid password” — An SAT login to
4WRN“Login for [inads] – failed – password check” — A login to a server’s Linux
Alarm Text, Cause/Description, Recommendation
Communication Manager failed.
1. Verify the alarm, either from the:
- Web interface, by selecting View Current Alarms
- Linux command line, by entering almdisplay -v
2. Since mis-typing a login sequence usually causes this alarm, enter
almclear -n#id to clear the alarm.
3. If this alarm is perceived as a security threat (often due to its
persistence or frequent recurrence), notify the customer.
command line failed.
1. Verify the alarm, either from the:
- Web interface, by selecting View Current Alarms
- Linux command line, by entering almdisplay -v
2. Since mis-typing a login sequence usually causes this alarm, enter
almclear -n#id to clear the alarm.
3. If this alarm is perceived as a security threat (often due to its
persistence or frequent recurrence), notify the customer.
1. If this alarm is perceived as a security threat, notify the customer.
74 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 75
NIC (Network Interface Card)
S8700 | 8710 / S8500
This MO supports the NICs in each S8700 media server providing the physical and data-link
interfaces for Ethernet-based links.
Alarms in Linux Media Servers
Table 20: NIC Alarms in Media Server
troubleshooting procedures. See DAJ1/DAL1 (Duplication Memory Board)
information.
Table 20: NIC Alarms in Media Server
EventIDAlarm
Level
1MIN“eth0 NIC Link is Down” — Ethernet link on native NIC 0 is down.
Alarm Text, Cause/Description, Recommendation
1. Verify Ethernet connectivity, either from the:
- Web interface, by selecting the Execute Pingall diagnostic
- Linux command line, by entering pingall -a
Check both sides of each failed link, and make any necessary repairs.
2. If the ping test fails, check the physical connections of NIC 0’s Ethernet
cable.
If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
on page 75 describes NIC’s alarms and their
on page 49 for more
- Linux command line, by entering almclear -n #id
Issue 1 June 200575
1 of 2
Page 76
Server Alarms
Table 20: NIC Alarms in Media Server (continued)
EventIDAlarm
Level
2MIN“eth1 NIC Link is Down” — Ethernet link on native NIC 1 is down.
Back to: Hardware MOsServer-related alarms
Alarm Text, Cause/Description, Recommendation
1. Verify Ethernet connectivity, either from the:
- Web interface, by selecting the Execute Pingall diagnostic
- Linux command line, by entering pingall -a
Check both sides of each failed link, and make any necessary repairs.
2. If the ping test fails, check the physical connections of NIC 0’s Ethernet
cable.
If not, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and “Clear”
- Linux command line, by entering almclear -n #id
2 of 2
RALM-SVC (Remote Alarm Service)
S8700 | 8710 only
For the RALM-SVC
seconds.
MOs NameAlarm LevelInitial Command to RunFull Name of MO
RALM-SVCMAJORNoneRemote Alarm Service
RALM-SVCMAJORNoneRemote Alarm Service
MO, maintenance software performed special periodic testing every 60
76 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 77
Alarm Log Entries
These tests verify that the standby media server’s components are operating correctly.
Table 21: RALM-SVC Alarm Log Entries
Alarms in Linux Media Servers
EVENT
ID#
1 (1
2 (2
Notes:
1. Event ID #1
capability by querying the local server’s Arbiter to report on the other server’s Remote
Alarm Service functionality. Both media servers must have their own alarm-generation
utilities. If the test fails 3 consecutive times, the following Major alarm is logged:
2. Event ID #2 Standby_PN_Check test failed — This periodic test:
1. Queries the standby media server about its PNs’ state of health
2. Compares the acquired information to an administered value
A failure indicates that a mismatch occurred and that the standby media server is out of
sync. If this test fails 2 consecutive times, the following Major alarm is logged:
Aux
Data
)SME_ARB test MAJORONtestdupboard
)Standby_PN Check testMAJORONtestdupboard
#1,ACT,SME,A,1,MAJ,Far-end alarm service is down
Associated TestAlarm
Level
SME_ARB test failed — This test determines the other server’s alarm-generation
System Technician-Demanded Tests: Descriptions
and Error Codes
This MO provides no on-demand tests for system technicians.
Back to: Hardware MOs
Server-related alarms
Issue 1 June 200577
Page 78
Server Alarms
SME Alarm in S8700 Media Server
Table 22: SME Alarm in Media Server on page 78 describes the SME alarm (for RALM-SVC
(Remote Alarm Service) and its troubleshooting procedures.
Table 22: SME Alarm in Media Server
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
1MAJ“Far-end alarm service is down” — No remote alarm service is available
since the other server is unable to report alarms — due to a failure of either
the GMM or administered reporting mechanisms (SNMP and/or modem).
1. Look for any GMM failures on the other server, either using the:
- Web interface, by selecting Diagnostics > View System Logs and
Watchdog Logs
- Linux command line, by entering logv -w or, directly, by examining
/var/log/ecs/wdlog.
2. If a GMM failure was found:
a. See if the GMM application is up, either from the:
- Web interface, by selecting View Process Status
- Linux command line, by entering statapp
b. If so, continue with Step 3
.
If not, try to restart this application by entering start -s GMM on the
Linux command line.
c. If the GMM application successfully restarts, continue with Step 4
If not, escalate the problem to the next higher tier.
3. If a GMM failure was not found, see if alarm reporting failed by looking in
the trace log for a string that includes “snd2Inads”, either from the:
- Web interface, by:
a. Selecting the View System Logs diagnostic and Logmanager
Debug trace
b. Specifying the “Event Range” for the appropriate time frame
c. Matching the “snd2Inads” pattern
- Linux command line, by entering logv -t ts
4. Test the administered reporting mechanisms, by entering testinads
on the Linux command line.
5. Once the alarm is resolved, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the appropriate
alarm, and Clear
- Linux command line, by entering almclear -n #id
.
Back to: Hardware MOs
78 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Server-related alarms
Page 79
SVC_MON (Service Monitor)
S8700 | 8710 / S8500
SVC_MON is a media server process, started by Watchdog, to monitor Linux services and
daemons. It also starts up threads to communicate with a hardware-sanity device.
Alarms in Linux Media Servers
Table 23: SVC_MON Alarms in Media Server
alarms and their troubleshooting procedures.
For information about Watchdog, see _WD (Watchdog) Alarms
on page 80 describes the SVC_MON MO’s
on page 99.
Issue 1 June 200579
Page 80
Server Alarms
Table 23: SVC_MON Alarms in Media Server
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
1MIN“service atd could not be restarted” — The Linux at daemon is down.
Scheduled services such as session cleanup or daily filesync will not work.
1. From the /sbin directory type service atd restart to restart the at
daemon.
2. If the daemon restarts, manually clear the alarm, either from the:
● Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
● Linux command line, by entering almclear -n #id
If not, escalate this problem for explicit guidance with steps 2a
a. Enter grep svc_mon /var/log/messages to investigate why the
daemon failed.
!
CAUTION:
CAUTION:Since the following commands cause a brief service outage,
they should only be executed at the customer’s convenience.
b. If the grep command’s output does not help:
● S8700 | 8710: enter server to verify that the suspected server is the
standby. If necessary and at the customer’s convenience, enter
server -if to force a server interchange.
● S8500: Proceed to Step d.
If necessary and at the customer’s convenience, enter server -if to
force a server interchange.
S8700 | 8710: Reboot the standby server, either from the:
c.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
S8500: Reboot the server, either from the:
d.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
3. If rebooting the standby does not help or if the problem recurs, escalate
the problem to the next higher tier.
through 3
80 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
1 of 8
Page 81
Table 23: SVC_MON Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
2MIN“service crond could not be restarted” — The Linux cron daemon is down.
Periodic services such as session cleanup or daily filesync will not work.
1. Enter /sbin/service cron restart to restart the cron daemon.
2. If the daemon restarts, manually clear the alarm, either from the:
● Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
● Linux command line, by entering almclear -n #id
If not, escalate this problem for explicit guidance with steps 2a
through 3
a. Enter grep svc_mon /var/log/messages to investigate why the
daemon failed.
Since the following commands cause a brief service outage, they should only
be executed at the customer’s convenience.
b. If the grep command’s output does not help:
● S8700 | 8710: enter server to verify that the suspected server is the
standby. If necessary and at the customer’s convenience, enter
server -if to force a server interchange.
● S8500: Proceed to Step d.
If necessary and at the customer’s convenience, enter server -if to
force a server interchange.
S8700 | 8710: Reboot the standby server, either from the:
c.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
S8500: Reboot the server, either from the:
d.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
3. If rebooting the standby does not help or if the problem recurs, escalate
the problem to the next higher tier.
2 of 8
Issue 1 June 200581
Page 82
Server Alarms
Table 23: SVC_MON Alarms in Media Server (continued)
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
3MIN“service inet could not be restarted” — The Linux internet server daemon is
down. Networking services will not work.
1. Enter /sbin/service inet restart to restart the inet daemon.
2. If the daemon restarts, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the appropriate
alarm, and Clear
- Linux command line, by entering almclear -n #id
If not, escalate this problem for explicit guidance with steps 2a
3
.
through
a. Enter grep svc_mon /var/log/messages to investigate why the
daemon failed.
b. If this problem affects call processing, continue with the following
steps now.
If not, continue only at the customer’s convenience – since the
following commands cause a brief service outage.
The following commands cause a brief service outage.
c. If the grep command’s output does not help:
● S8700 | 8710: enter server to verify that the suspected server is the
standby. If necessary and at the customer’s convenience, enter
server -if to force a server interchange.
● S8500: Proceed to Step e.
If necessary and at the customer’s convenience, enter server -if to
force a server interchange.
S8700 | 8710: Reboot the standby server, either from the:
d.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
S8500: Reboot the server, either from the:
e.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
3. If rebooting the standby does not help or if the problem recurs, escalate
the problem to the next higher tier.
3 of 8
82 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 83
Table 23: SVC_MON Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
4MIN“service syslog could not be restarted” — Linux “syslog” service is down.
Event logging to syslog and alarm generation will fail.
1. Enter /sbin/service syslog restart to restart the syslog service.
2. If the service restarts, manually clear the alarm, either from the:
● Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
● Linux command line, by entering almclear -n #id
If not, escalate this problem for explicit guidance with steps 2a
a. Enter grep svc_mon /var/log/messages to investigate why the
daemon failed.
Since the following commands cause a brief service outage, they should only
be executed at the customer’s convenience.
b. If the grep command’s output does not help:
● S8700 | 8710: enter server to verify that the suspected server is the
standby. If necessary and at the customer’s convenience, enter
server -if to force a server interchange.
through 3
● S8500: Proceed to Step d.
If necessary and at the customer’s convenience, enter server -if to
force a server interchange.
S8700 | 8710: Reboot the standby server, either from the:
c.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
S8500: Reboot the server, either from the:
d.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
3. If rebooting the standby does not help or if the problem recurs, escalate
the problem to the next higher tier.
4 of 8
Issue 1 June 200583
Page 84
Server Alarms
Table 23: SVC_MON Alarms in Media Server (continued)
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
5MIN“service xntpd could not be restarted” — The Linux network time protocol
daemon is down. The server’s clock and recently logged time stamps may
be inaccurate.
1. Enter /sbin/service xntpd restart to restart the xntpd daemon.
2. If the daemon restarts, manually clear the alarm, either from the:
● Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
● Linux command line, by entering almclear -n #id
If not, escalate this problem for explicit guidance with steps 2a
a. Enter grep svc_mon /var/log/messages to investigate why the
daemon failed.
b. If this problem affects call processing, continue with the following
steps now.
If not, continue only at the customer’s convenience – since the
following commands cause a brief service outage.
The following commands cause a brief service outage.
through 3
c. If the grep command’s output does not help:
● S8700 | 8710: enter server to verify that the suspected server is the
standby. If necessary and at the customer’s convenience, enter
server -if to force a server interchange.
● S8500: Proceed to Step e.
If necessary and at the customer’s convenience, enter server -if to
force a server interchange.
S8700 | 8710: Reboot the standby server, either from the:
d.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
S8500: Reboot the server, either from the:
e.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
3. If rebooting the standby does not help or if the problem recurs, escalate
the problem to the next higher tier.
5 of 8
84 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 85
Table 23: SVC_MON Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
6MIN“service dbgserv could not be restarted” — Debug server is down, and
Gemini debugger may not work. Although losing this service does not affect
operations, the debugging of a running system is prevented.
1. Enter /sbin/service dbgserv restart to restart the dbgserv
service.
2. If the service restarts, manually clear the alarm, either from the:
● Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
● Linux command line, by entering almclear -n #id
If not, escalate this problem for explicit guidance with steps 2a
a. Enter grep svc_mon /var/log/messages to investigate why the
daemon failed.
Since the following commands cause a brief service outage, they should only
be executed at the customer’s convenience.
b. If the grep command’s output does not help:
● S8700 | 8710: enter server to verify that the suspected server is the
standby. If necessary and at the customer’s convenience, enter
server -if to force a server interchange.
through 3
● S8500: Proceed to Step d.
If necessary and at the customer’s convenience, enter server -if to
force a server interchange.
S8700 | 8710: Reboot the standby server, either from the:
c.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
d.
S8500: Reboot the server, either from the:
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
3. If rebooting the standby does not help or if the problem recurs, escalate
the problem to the next higher tier.
6 of 8
Issue 1 June 200585
Page 86
Server Alarms
Table 23: SVC_MON Alarms in Media Server (continued)
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
7MIN“service prune could not be restarted” — The prune service is not running.
The hard disk’s partition usage is not being monitored or cleaned.
1. Enter /sbin/service prune restart to restart the prune service.
2. If the service restarts, manually clear the alarm, either from the:
● Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
● Linux command line, by entering almclear -n #id
If not, escalate this problem for explicit guidance with steps 2a
through 3
a. Enter grep svc_mon /var/log/messages to investigate why the
daemon failed.
b. If this problem affects call processing, continue with the following
steps now.
If not, continue only at the customer’s convenience – since the
following commands cause a brief service outage.
The following commands cause a brief service outage.
c. If the grep command’s output does not help:
● S8700 | 8710: enter server to verify that the suspected server is the
standby. If necessary and at the customer’s convenience, enter
server -if to force a server interchange.
● S8500: Proceed to Step e.
If necessary and at the customer’s convenience, enter server -if to
force a server interchange.
S8700 | 8710: Reboot the standby server, either from the:
d.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
e.
S8500: Reboot the server, either from the:
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
3. If rebooting the standby does not help or if the problem recurs, escalate
the problem to the next higher tier.
7 of 8
86 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 87
Table 23: SVC_MON Alarms in Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
8MIN“service httpd could not be restarted” — The hypertext transfer protocol
daemon is down. The Web interface will not work.
1. Enter /sbin/service httpd restart to restart the http daemon.
2. If the daemon restarts, manually clear the alarm, either from the:
● Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
● Linux command line, by entering almclear -n #id
If not, escalate this problem for explicit guidance with steps 2a
a. Enter grep svc_mon /var/log/messages to investigate why the
daemon failed.
Since the following commands cause a brief service outage, they should only
be executed at the customer’s convenience.
b. If the grep command’s output does not help:
● S8700 | 8710: enter server to verify that the suspected server is the
standby. If necessary and at the customer’s convenience, enter
server -if to force a server interchange.
through 3
● S8500: Proceed to Step d.
If necessary and at the customer’s convenience, enter server -if to
force a server interchange.
S8700 | 8710: Reboot the standby server, either from the:
c.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
S8500: Reboot the server, either from the:
d.
● Web interface, by selecting Shutdown This Server
● Linux command line, entering /sbin/shutdown -r now
3. If rebooting the standby does not help or if the problem recurs, escalate
the problem to the next higher tier.
Back to: Hardware MOsServer-related alarms
8 of 8
Issue 1 June 200587
Page 88
Server Alarms
_TM (Translation Manager)
S8700 | 8710 / S8500 / S8300
The _TM MO is an S8700 Media Server that monitors the server’s ability to read
Communication Manager translations. Table 24: TM Alarm in Media Server
describes the _TM MO’s alarm and its troubleshooting procedures.
Table 24: TM Alarm in Media Server
on page 88
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
1MAJ“Cannot read translations” — Server could not read translations. Usually,
indicates a failure loading translations, but can also infrequently occur on a
running system.
S8700 | 8710: The servers spontaneously interchange.
S8500: The server reboots.
1. Check the integrity of the translation files xln1 and xln2 in /etc/opt/defty,
and verify that they are of the same non-zero length.
2. From the /etc/opt/defty directory enter the Linux command cksum xln1xln2 to verify that the checksum of the files are identical.
3.
S8700 | 8710: Copy the translation files from the backup or the other
server.
S8500: Copy the translation files from the backup.
4.
5. If Steps 1
to 3 do not help, load the system with null translations.
6. If the system comes up, this is probably a translation problem.
If not, escalate the problem.
7. Once resolved, manually clear the alarm, either from the:
- Web interface, by selecting Alarms and Notification, the
appropriate alarm, and Clear
- Linux command line, by entering almclear -n #id
Back to: Hardware MOs
88 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Server-related alarms
Page 89
UPS (Uninterruptible Power Supply)
S8700 | 8710 / S8500 / S8300
The UPS MO supports the UPS device for each media server. This MO’s maintenance software
reacts to UPS-generated in-line errors via SNMP traps.
Note:
Note:This MO provides no periodic or on-demand tests.
UPS Traps to Media Server
Table 25: Enterprise-Specific UPS Traps to Server on page 89 contains the various
UPS-generated SNMP traps to the media server.
Table 25: Enterprise-Specific UPS Traps to Server
SNMP Trap from UPSEventIDDefinition of Trap
Alarms in Linux Media Servers
Trap (1)#6
(1–8
Trap (1)#7
(1–8
Alarm string = #1, ACT, UPS, A, 6, MAJ, power failure:
Shutting down in 6 minutes.
)
If the Event ID is 6, the UPS battery’s power is in a
critically low condition, with an estimated 6 minutes of
remaining holdover.
● A warning is written to every logged-in user of the
server.
● When 6 minutes elapse, the server begins shutting
down.
(For troubleshooting procedures, see Events #1–8
page 92)
Alarm string = #1, ACT, UPS, A, 7, MAJ, power failure:
Shutting down in 7 minutes.
)
The UPS battery’s power is in a critically low condition,
with an estimated 7 minutes of remaining holdover.
● A warning is written to every logged-in user of the
server.
● When 7 minutes elapse, the server begins shutting
down.
(For troubleshooting procedures, see Events #1–8
page 92)
on
on
1 of 4
Issue 1 June 200589
Page 90
Server Alarms
Table 25: Enterprise-Specific UPS Traps to Server (continued)
SNMP Trap from UPSEventIDDefinition of Trap
Trap (1)#8
(1–8
Trap (3)
#12
upsAlarmShutdownPend
ing
Trap (3)
1
#13
upsAlarmShutdownImmi
nent
Trap (3)
1
#14
upsAlarmDepletedBatter
y
Alarm string = #1, ACT, UPS, A, 8, MAJ, power failure:
Shutting down in 8 minutes.
)
The UPS battery’s power is in a critically low condition,
with an estimated 8 minutes of remaining holdover.
● A warning is written to every logged-in user of the
server.
● When 8 minutes elapse, the server begins shutting
down.
(For troubleshooting procedures, see Events #1–8
page 92)
Alarm string = #1, ACT, UPS, A, 12, WRN, Miscellaneous
trap, e.g., bad battery.
(For troubleshooting procedures, see Event #12
page 92)
Alarm string = #1,ACT, UPS, A, 13, MAJ, Miscellaneous
trap, e.g., bad battery.
(For troubleshooting procedures, see Event #13
page 92)
Alarm string = #1,ACT,UPS,A,14,MAJ,Miscellaneous
trap, e.g., bad battery.
(For troubleshooting procedures, see Event #14
page 93)
on
on
on
on
Trap (3)
1
upsAlarmBatteryBad
#15
Alarm string = #1,ACT,UPS,A,15,MIN,Miscellaneous
trap, e.g., bad battery
(For troubleshooting procedures, see Event #15
on
page 93)
Trap (3)
upsAlarmInputBad
#16
Alarm string = #1,ACT,UPS,A,16,MIN,Miscellaneous
trap, e.g., bad battery
(For troubleshooting procedures, see Event #16
on
page 93)
Trap (3)
upsAlarmTempBad
#17
Alarm string = #1,ACT,UPS,A,17,WRN,Miscellaneous
trap, e.g., bad battery
(For troubleshooting procedures, see Event #17
on
page 93)
90 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
2 of 4
Page 91
Table 25: Enterprise-Specific UPS Traps to Server (continued)
SNMP Trap from UPSEventIDDefinition of Trap
Alarms in Linux Media Servers
Trap (3)
upsAlarmCommunicatio
nsLost
Trap (3)
upsAlarmBypassBad
Trap (3)
upsAlarmLowBattery
Trap (3)
upsAlarmUpsOutputOff
Trap (3)
upsAlarmOutputBad
Trap (3)
upsAlarmOutputOverloa
d
#18
#19
#20
#21
#22
#23
Alarm string = #1,ACT,UPS,A,18,WRN,Miscellaneous
trap, e.g., bad battery
(For troubleshooting procedures, see Event #18
on
page 93)
Alarm string = #1,ACT,UPS,A,19,WRN,Miscellaneous
trap, e.g., bad battery
(For troubleshooting procedures, see Event #19
on
page 94)
Alarm string = #1,ACT,UPS,A,20,WRN,Miscellaneous
trap, e.g., bad battery
(For troubleshooting procedures, see Event #20
on
page 94)
Alarm string = #1,ACT,UPS,A,21,WRN,Miscellaneous
trap, e.g., bad battery
(For troubleshooting procedures, see Event #21
on
page 94)
Alarm string = #1,ACT,UPS,A,22,WRN,Miscellaneous
trap, e.g., bad battery
(For troubleshooting procedures, see Event #22
on
page 94)
Alarm string = #1,ACT,UPS,A,23,WRN,Miscellaneous
trap, e.g., bad battery
(For troubleshooting procedures, see Event #23
on
page 94)
Trap (3)
upsAlarmChargerFailed
Trap (3) –
upsAlarmFanFailure
Trap (3) –
upsAlarmFuseFailure
#24
#25
#26
Alarm string = #1,ACT,UPS,A,24,WRN,Miscellaneous
trap, e.g., bad battery
(For troubleshooting procedures, see Event #24
on
page 94)
Alarm string = #1,ACT,UPS,A,25,WRN,Miscellaneous
trap, e.g., bad battery
(For troubleshooting procedures, see Event #25
on
page 94)
Alarm string = #1,ACT,UPS,A,26,WRN,Miscellaneous
trap, e.g., bad battery
(For troubleshooting procedures, see Event #26
on
page 95)
3 of 4
Issue 1 June 200591
Page 92
Server Alarms
Table 25: Enterprise-Specific UPS Traps to Server (continued)
SNMP Trap from UPSEventIDDefinition of Trap
Trap (3) –
upsAlarmGeneralFault
1. These events will degrade the server’s state of health.
#27
#1,ACT,UPS,A,27,WRN,Miscellaneous trap, e.g., bad
battery
(For troubleshooting procedures, see Event #27
page 95)
System Technician-Demanded Tests:
Descriptions and Error Codes
This MO provides no on-demand tests for system technicians.
Back to: Hardware MOs
Server-related alarms
UPS Alarms to the Media Server
S8700 | 8710 / S8500 / S8300
Table 26: UPS Alarms to the Media Server on page 92 describes the server’s UPS-related
alarms and their troubleshooting procedures.
on
4 of 4
Table 26: UPS Alarms to the Media Server
EventIDAlarm
Level
1–8MAJ“upsEstimatedMinutesRemaining” — UPS does not have an AC-power
12MAJ“upsAlarmShutdownPending” — A shutdown-after-delay countdown is
13MAJ“upsAlarmShutdownImminent” — The UPS will turn off power to the load in
Alarm Text, Cause/Description, Recommendation
source.
1. Restore AC power to the UPS.
underway (i.e., the UPS has been commanded off).
1. Stop countdown timer. (Can be done via SNMP messages.)
< 5 seconds.
1. Restore AC power to the UPS.
1 of 4
92 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 93
Table 26: UPS Alarms to the Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
14MAJ“upsAlarmDepletedBattery” — If primary power is lost, the UPS could not
sustain the current load.
1. Charge or replace the batteries in the UPS, according to procedures in
its Site Preparation, Installation, and Operator’s Manual, 167-405-035.
15MAJ“upsAlarmBatteryBad” — One or more batteries needs to be replaced.
1. Replace any defective batteries in the UPS, according to procedures in
its Site Preparation, Installation, and Operator’s Manual, 167-405-035.
16MIN“upsAlarmInputBad” — An input condition is out of tolerance.
1. Provide appropriate AC power to the UPS.
17MIN“upsAlarmTempBad” — The internal temperature of a UPS is out of
tolerance. (On the UPS, the “over temperature” alarm indicator flashes,
and the UPS changes to Bypass mode for cooling. Either:
1. Look for and remove any obstructions to the UPS’s fans.
2. Wait at least 5 minutes, and restart the UPS.
3. Check for and resolve any fan alarms (Event ID 25) against the UPS.
4. Either:
● Change (increase or decrease) the environment’s temperature.
● Change the alarming thresholds.
18MIN“upsAlarmCommunicationsLost” — The SNMP agent and the UPS are
having communications problems. (A UPS diagnosis may be required.)
1. Behind the UPS in its upper left-hand corner, verify that an SNMP card
(with an RJ45 connector) resides in the UPS — instead of a serial card
with DB9 and DB25 connectors.
2. Verify that the server is physically connected to the UPS via the RJ45
connector.
3. Verify that the SNMP card is properly administered according to the
procedures in its users guide, provided by the vendor.
4. If necessary, replace the SNMP card in the UPS.
5. If the problem persists, replace the UPS, and diagnose it later.
2 of 4
Issue 1 June 200593
Page 94
Server Alarms
Table 26: UPS Alarms to the Media Server (continued)
EventIDAlarm
Alarm Text, Cause/Description, Recommendation
Level
19WRN“upsAlarmBypassBad” — The “source” power to the UPS, which (during a
UPS overload or failure) also serves as “bypass” power to the load, is out of
tolerance — incorrect voltage by > ±12% or frequency > ±3%.
This on-line UPS normally regenerates its source power into clean AC
power for the load. However, the source power’s quality is currently
unacceptable as bypass power to the load).
1. Verify that the UPS expects the correct “nominal input voltage” from its
power source.
2. If so, restore acceptable AC power to the UPS.
If not, reconfigure the UPS to expect the correct voltage, according to
procedures in its Site Preparation, Installation, and Operator’s Manual, 167-405-035.
20WRN“upsAlarmLowBattery” — The battery’s remaining run time ≤ specified
threshold.
1. Restore AC power to the UPS.
21WRN“upsAlarmUpsOutputOff” — As requested, UPS has shut down output
power. The UPS is in Standby mode.
1. Turn on output power. (Can be done via SNMP messages.)
22WRN“upsAlarmOutputBad” — A receptacle’s output is out of tolerance. (A UPS
diagnosis is required.)
1. Replace the UPS, and diagnose it later.
23WRN“upsAlarmOutputOverload” — The load on the UPS exceeds its output
capacity. The UPS enters Bypass mode.
1. Reduce the load on the UPS.
2. Verify that the UPS returns to Normal mode.
24WRN“upsAlarmChargerFailed” — The UPS battery charger has failed. (A UPS
diagnosis is required.)
1. Replace the UPS, and diagnose it later.
25WRN“upsAlarmFanFailure” — One or more UPS fans have failed. Unless lightly
loaded, the UPS enters Bypass mode.
1. Replace the UPS, and diagnose it later.
3 of 4
94 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Page 95
Table 26: UPS Alarms to the Media Server (continued)
Alarms in Linux Media Servers
EventIDAlarm
Level
26WRN“upsAlarmFuseFailure” — One or more UPS fuses have failed.
27WRN“upsAlarmGeneralFault” — A general fault occurred in the UPS. (A UPS
This MO provides periodic and on-demand testing. The periodic test runs the “handshake” test
every 15 minutes. The on-demand tests include the “handshake”, “off-hook”, and “reset_usb”
tests.
Alarm Text, Cause/Description, Recommendation
1. Replace the UPS, and diagnose it later.
diagnosis is required.)
1. Replace the UPS, and diagnose it later.
Hardware MOsServer-related alarms
USB1 (Modem Testing) Alarms
S8700 | 8710 / S8500
4 of 4
This MO provides periodic and on-demand testing. The periodic test runs the “handshake” test
every 15 minutes. The on-demand tests include the “handshake”, “off-hook”, and “reset_usb”
tests.
MO’s Name
(in Alarm Log)
USB1MAJtestmodemUSB Port Modem Testing
USB1MINtestmodemUSB Port Modem Testing
USB1WRNtestmodemUSB Port Modem Testing
Usage: testmodem [-s] | [-l] | [-t arg] | [-?]
no argumentPerforms “short” test.
-sShort test, performs handshake and off-hook tests this is also the default
-lPerforms “long” tests.
Alarm
Level
option.
Initial Linux Command to RunFull Name of MO
Issue 1 June 200595
Page 96
Server Alarms
-t argSpecific test to perform.
-?Usage (this).
Alarm Log Entries
Table 27: USB1 Alarm Log Entries
Possible values for arg are: reset_usb | handshake | off-hook |
EVENT
ID#
1 (1
2 (2
Notes:
1. Event ID #1 Handshake Test failed — With 3 consecutive failures of either the periodic or
2. Event ID #2 Modem Off-Hook test failed — With 3 consecutive failures of this test, the
Aux
Data
)Handshake test MINONtestdupboard
)Modem Off-Hook testMAJONtestdupboard
on-demand test, the following Minor alarm is logged:
#1,ACT,USB1,A,1,MIN,USB Modem handshake test failed
following Major alarm is logged:
#1,ACT,USB1,A,2,MAJ,USB Modem Off-Hook test failed
Associated TestAlarm
System Technician-Demanded Tests:
Descriptions and Error Codes
Always investigate tests in the order presented in the following table. For example, by clearing
error codes associated with the Handshake test, you may also clear errors generated by other
tests in the testing sequence.
Level
On/
Off
Board
Linux Test to
Clear Value
Order of InvestigationShort Test
Sequence
Handshake testXXD
Off-Hook testXXD
Reset USB testXXD
1. D = Destructive; ND = Nondestructive
96 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Long Test
Sequence
D/ND
1
Page 97
Alarms in Linux Media Servers
Handshake Test
This test is destructive.
This test verifies that modem’s hardware is attached and that the modem can “handshake” with
the USB port. The test:
1. Tries to open the device
2. Sends a handshake string (ATZ) to the modem
The modem should return an “OK” string.
The following errors can be detected:
Table 28: TEST Handshake Test
Error CodeTest
Result
Modem in
ABRTAnother application is currently using the modem. Try again later.
use, try again
later
Could not
ABRTSystem error: An attempt to open the USB device failed.
open USB
port
Read error,
ABRTAfter the ATZ is sent, a read of the USB device is done. If the
could not run
test
Modem
FAILThe modem did not return an “OK” string within a 15-second
Handshake
test failed
Off-Hook Test
Description / Recommendation
1. Retry the test in about 5 minutes.
2. If the test still fails, escalate the problem.
read fails, a probable system error aborted the test.
1. Retry the test in about 5 minutes.
2. If the test still fails, escalate the problem.
interval.
This test is destructive.
Issue 1 June 200597
Page 98
Server Alarms
This test runs an off-hook test for the modem attached to the USB port on the server. The test
sends an ATH1 string to check that a line is connected to the modem. The following errors can
be detected:
Table 29: TEST Off-Hook Test
Error CodeTest
Description / Recommendation
Result
Modem in
use, try again
ABRTSome other application is currently using the modem. Try again
later.
later
Could not
open USB
ABRTSystem error: An attempt to open the USB device failed. Try again
later
port
Read error,
could not run
ABRTAfter the ATZ is sent, a read of the USB device is done. If the read
fails, a probable system error aborted the test. Try again.
test
Modem
Off-Hook test
failed
FAILModem Off-Hook test failed: The modem did not return an “OK”
string within a 15-second interval. This indicates that dial tone was
not detected in the allotted time.
Reset USB Test
This test is destructive.
This test causes the modem to be reset, even if the modem is in use. Although no failures are
produced by this test, the following error can be detected:
Table 30: TEST Reset USB
Error
Code
Could not
open USB port
Back to: Hardware MOs
98 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Test
Description / Recommendation
Result
ABRTSystem error: An attempt to open the USB device failed. Try
again later.
Server-related alarms
Page 99
_WD (Watchdog) Alarms
S8700 | 8710 / S8500 / S8300
The Watchdog MO is a media server process that:
● Creates other Communication Manager processes
● Monitors their sanity
● Can recover their failures
These applications come up and start heartbeats to the Watchdog. For more information about
Watchdog, see Table 31: _WD Alarms in Media Server
MO’s alarms and their troubleshooting procedures.
Watchdog also starts up a script to monitor Linux services and daemons, and threads to
communicate with a hardware-sanity device. For alarm-related information about these
services, daemons, and threads, see SVC_MON (Service Monitor)
Alarms in Linux Media Servers
on page 100 that describes the _WD
on page 79.
Issue 1 June 200599
Page 100
Server Alarms
Table 31: _WD Alarms in Media Server
EventIDAlarm
Level
4
S8300
MAJ“Application <name> (pid) TOTALLY FAILED” — An application is present,
Alarm Text, Cause/Description, Recommendation
but not launching. The application could not start the maximum allowed
number of times. (This alarm usually occurs with Event ID #20.)
1. To verify the alarm, look for the application’s name or process ID
(PID),” either using the:
- Web interface, by selecting Diagnostics > View System Logs and
Watchdog Logs
- Linux command line, by entering logv -w or, directly, by
examining /var/log/ecs/wdlog.
2. If the application is down, enter start -s application to start the
application.
3. If the application comes up, continue with Step 7
.
If not, check the trace log to further investigate why the application
fails, either from the:
- Web interface by:
a. Selecting the View System Logs diagnostic and Logmanager
Debug trace
b. Specifying the Event Range for the appropriate time frame
c. Matching the application’s PID as the pattern
- Linux command line, by entering logv -t ts
Look for a related core-dump file in /var/crash, and escalate for an
analysis of this file.
4. Verify that the executable file named in the log exists and is
executable.
To locate the application’s executable file, enter the Linux command:
ls -l /opt/ecs/sbin/appl
If the executable is present, Linux returns a symbolic link to its
location.
1 of 15
100 Maintenance Procedures for Avaya Communication Manager 3.0, Media Gateways and Servers
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.