Reproduction of these materials in any manner whatsoever without the written permission of Dell Inc.
is strictly forbidden.
Trademarks used in this text: Dell™, the DELL™ logo, PowerEdge™, and PowerVault™ are
trademarks of Dell Inc.; EMC
Pentium
Oracle
Enterprise Linux
Other trademarks and trade names may be used in this publication to refer to either the entities claiming
the marks and names or their products. Dell Inc. disclaims any proprietary interest in trademarks and
trade names other than its own.
July 2010Rev. A02
®
, and Celeron® are registered trademarks of Intel Corporation in the U.S. and other countries;
®
is a registered trademark of Oracle Inc. in the US and other countries; Red Hat® and Red Hat
®
are registered trademarks of Red Hat, Inc. in the U.S. and other countries.
®
and PowerPath® are trademarks of EMC Corporation; Intel®,
The Oracle Database on Linux Advanced Server Troubleshooting Guide
applies to Oracle Database 10g R2 running on Red Hat Enterprise Linux
or Oracle Enterprise Linux 5.5 AS x86_64.
Required Documentation for Deploying the
Dell
The Dell|Oracle Database Installation Documentation set is organized into a
series of modules. These modules cover the following topics:
Oracle Database
•
Dell PowerEdge Systems Oracle Database on Enterprise Linux x86_64Operating System and Hardware Installation Guide
required minimum hardware and software versions, how to install and
configure the operating system, how to verify the hardware and software
configurations, and how to obtain open source files.
•
Dell PowerEdge Systems Oracle Database on Enterprise Linux x86_64Storage and Network Guide
network and storage solutions.
•
Dell PowerEdge Systems Oracle Database on Enterprise Linux x86_64Database Setup and Installation Guide
configure the Oracle database.
•
Dell PowerEdge Systems Oracle Database on Enterprise Linux x86_64Troubleshooting Guide
encountered during the installation procedures described in the
previous
modules.
—Describes how to install and configure the
—Describes how to install and
—Describes how to troubleshoot and resolve errors
—Describes the
All modules provide information on how to receive technical assistance
from Dell.
Overview7
Terminology Used in this Document
This document uses the terms logical unit number (LUN) and virtual disk.
These terms are synonymous and can be used interchangeably. The term
LUN is commonly used in a Dell/EMC Fibre Channel storage system
environment and virtual disk is commonly used in a Dell Power Vault SAS
and iSCSI (Dell MD3000 and Dell MD3000i with MD1000 expansion)
storage environment.
This document uses the term Enterprise Linux that applies to both
Red Hat Enterprise Linux and Oracle Enterprise Linux unless
stated specifically.
Getting Help
This section provides information on contacting Dell or Oracle for
whitepapers, supported configurations, training, technical support,
and general information.
Dell Support
•For detailed information about using your system, see the documentation
that came with your system components.
•For whitepapers, Dell-supported configurations, and general information,
see the Dell|Oracle Tested and Validated Configurations website at
dell.com/oracle
.
8Overview
Troubleshooting
This section provides recommended actions for problems that you
may encounter while deploying and using your Enterprise Linux and
Oracle software.
Performance and Stability
Enterprise Linux exhibits poor performance and instability;
excessive
When Oracle System Global Area (SGA) exceeds the recommended size,
Enterprise Linux exhibits poor performance. Always ensure that the SGA size
does not exceed 65% of the total system RAM. To decrease the SGA size:
Enter free at a command prompt to determine the total RAM and reduce
the values of db_cache_size and shared_pool_size parameters in the Oracle
parameter file accordingly.
use of swap space
Unknown interface-type warning appears in the Oracle alert file;
poor system performance
The problem occurs when the public interface is configured as cluster
communications (private interface).
Perform the following steps on one node to force cluster communications to
the private interface:
a
b
Log in as
Enter
The
oracle
sqlplus "/ as sysdba"
SQL>
.
prompt appears.
at the command prompt.
c
Enter the following lines at the
alter system set cluster_interconnects=<private
IP address node1>’ scope=spfile sid=’<SID1>’
alter system set cluster_interconnects =
’<private IP address node2>’ scope=spfile sid=
’<SID2>’
SQL>
prompt:
Troubleshooting9
d
Create these entries for each node in the cluster.
e
Restart the database on all nodes by entering the
following
Open the
file, and verify that the private IP addresses are being used for all
instances.
commands:
/opt/oracle/admin/<dbname>/bdump/alert_<SID>.log
Enterprise Manager
The Enterprise Manager agent fails
The Enterprise Manager fails when the Enterprise Manager repository is
not populated.
Enter the following to re-create the configuration file and repository for the
Database Console:
emca -config dbcontrol db repos recreate
For detailed instructions, see the Oracle Metalink Note # 330976.1.
Oracle Clustered File System2 (OCFS2)
System hangs while mounting or unmounting OCFS partitions
The problem occurs when OCFS partitions are unmounted on two nodes at
exactly the same time.
CAUTION: Do not reboot more than one system at the same time.
NOTE: It is not recommended to restart the network on a live node. When trying to
restart the network service from any live node in the cluster, the node hangs
indefinitely. This behavior is expected for OCFS2.
10Troubleshooting
Network Configuration Assistant (NETCA)
NETCA fails, resulting in database creation errors
NETCA fails because the public network, hostname, or virtual IP is not listed
in the /etc/hosts.equiv file.
Before launching NETCA, ensure that a hostname is assigned to the public
network and that the public and virtual IP addresses are listed in the
/etc/hosts.equiv file.
NETCA cannot configure remote nodes or a RAW device validation error
occurs while running DBCA
This issue occurs when the /etc/hosts.equiv file either does not exist or does
not include the assigned public or virtual IP addresses.
Verify that the /etc/hosts.equiv file on each node contains the correct public
and virtual IP address. Try to rsh to other public names and VIP addresses as
the user oracle.
Cluster Ready Services (CRS)
CRS fails prematurely when trying to start
Refer Oracle Bug 4698419. See the My Oracle Support website at
support.oracle.com.
Apply Patch 4698419 available on the My Oracle Support website at
support.oracle.com.
The Oracle Clusterware installation procedure fails
The Oracle Clusterware installation fails because the EMC PowerPath device
names are not uniform across the nodes.
Before you install Oracle Clusterware, restart PowerPath and ensure that the
PowerPath device names are uniform across the nodes.
Troubleshooting11
CRS fails to start when you reboot the nodes, or after entering
/etc/init.d/init.crs start
CRS fails to start when the Cluster Ready Services CSS daemon is unable to
write to the quorum disk.
Attempt to start the service again by rebooting the node or typing:
root.sh from /crs/oracle/product/11.1.0/crs/
Verify that each node has access to the quorum disk and the user logged in as
root can write to the disk.
Check the last line in the file $ORA_CRS_HOME/css/log/ocssd.log.
If you see:
clssnmvWriteBlocks: Failed to flush writes to
(votingdisk), then
verify the following:
•The
/etc/hosts
file on each node contains the correct IP addresses for all
node hostnames, including the virtual IP addresses.
•You can ping the public and private hostnames.
•The Oracle Cluster Registry (OCR) file and Voting disk is writable.
When you run root.sh, CRS fails to start
NOTE: Ensure that you have all the public and private node names defined and you
can ping the node names.
Attempt to start the service again by rebooting the node or by running
root.sh from /crs/oracle/product/11.1.0/crs/ after correcting the
networking issues.
The following is the list of issues that can result in CRS failure:
•OCR file and Voting disk are inaccessible.
Correct the I/O problem and attempt to start the service again by
rebooting the node or by running
/crs/oracle/product/11.1.0/crs/.
root.sh
from
•OCR file and Voting disk have not been cleared and contain
old
information.
Clear the OCR and Voting disks to erase the old information.
12Troubleshooting
You can do this on RHEL4 by entering the following lines:
dd if=/dev/zero of=/dev/raw/ocr.dbf
dd if=/dev/zero of=/dev/raw/votingdisk
Attempt to start the service again by rebooting the node or by running
root.sh from /crs/oracle/product/11.1.0/crs/.
•The Oracle User does not have permissions on
(specifically
a
Make User Oracle the owner of
/var/tmp/.oracle
).
/var/tmp/.oracle
/var/tmp
by entering the
following command:
chown oracle.oinstall /var/tmp/.oracle
b
Attempt to start the service again by rebooting the node or by running
root.sh from: /crs/oracle/product/11.1.0/crs/.
If all the other CRS troubleshooting steps fail, then perform the following:
a
Enable debugging by adding the following line:
to root.sh:
set -x
b
Attempt to start the service again by running
root.sh
from:
/crs/oracle/product/11.1.0/crs/
c
Check log files in the following directories to diagnose the issue:
$ORA_CRS_HOME/crs/log
$ORA_CRS_HOME/crs/init
$ORA_CRS_HOME/css/log
$ORA_CRS_HOME/css/init
$ORA_CRS_HOME/evm/log
$ORA_CRS_HOME/evm/init
$ORA_CRS_HOME/srvm/log
d
Check
CRS
e
Capture all log files for support diagnosis.
/var/log/messages
init scripts.
for any error messages regarding
Troubleshooting13
Node continuously reboots
Node reboots continuously when the node does not have access to the
quorum disk on shared storage.
Perform the following steps:
a
Start Linux in single-user mode and enter the following command:
/etc/init.d/init.crs disable
b
Verify that the quorum disk is available and the private interconnect
is
alive.
c
Reboot and type:
If the private interconnect is down:
a
Start Linux in single-user mode.
b
Enter the following command:
/etc/init.d/init.crs disable
c
Verify that the node can ping over the private interconnect to the
remaining nodes in the cluster.
/etc/init.d/init.crs enable
d
Enter the following command:
/etc/init.d/init.crs enable
Reboot the system. In some cases, the network has a latency of up to
seconds before it can ping the remaining nodes in the cluster after
30
reboot. If this situation occurs, add the following line to the beginning of
your
/etc/inet.d/init.crs
file and reboot your system:
/bin/sleep 30.
CRS fails to start when you reboot the nodes, or after entering
/etc/init.d/init.crs start
1
Change
node.session.timeo.replacement_timeout = 144
to
node.session.timeo.replacement_timeout = 30
2
Logout from the existing iscsi sessions, rediscover, and re-login to set the
change in timeout.
You can check the above settings under
/var/lib/iscsi/nodes/iqn*/<any target_port_ip>/default.
Database Configuration Assistant (DBCA)
There is no response when you click OK in the DBCA Summary window
This is a Java Runtime Environment timing issue.
Click OK again. If there is still no response, restart the DBCA
software installation.
Miscellaneous
You receive dd failure error messages while installing the software
using Dell Deployment CD 1
This issue occurs when a copy of the Enterprise Linux CD is used. Always use
the original CD.
When burning the CD images (ISOs), use the proper options such as -dao if
using the cdrecord command.
Troubleshooting15
When connecting to the database as a user other than oracle,
receive the error messages ORA01034: ORACLE not available and
you
Linux Error 13: Permission denied
This issue occurs when the required permissions are not set on the
remote node.
On all remote nodes, as user root, type: chmod 6751 $ORACLE_HOME
Installation
Oracle software fails to install on the nodes
This issue occurs when the nodes’ system clocks are not identical.
Perform one of the following procedures:
•Ensure that the system clock on the Oracle software installation node is set
to a later time than the remaining nodes.
•Configure one of your nodes as an NTP server to synchronize the
remaining nodes in the cluster.
When you run root.sh, the utility fails to format the OCR disk
Download and apply Oracle Patch 4679769 available on the
My Oracle Support website at support.oracle.com.
Networking
The cluster verification check fails
This issue occurs when the public network IP address is not routable; for
example: 192.168.xxx.xxx
Assign a valid, routable public IP address.
16Troubleshooting
Fibre Channel Storage System
You receive I/O errors and warnings when you load the Fibre Channel
HBA driver module
The HBA driver, BIOS, or firmware must be updated.
Check the Solution Deliverable List (SDL) on the Dell|Oracle Tested and
Validated Configurations website at dell.com/oracle for the supported
versions. Update as required the driver, BIOS, and firmware for the
Fibre Channel HBAs.
Operating System
When you add a new peripheral device to your Dell PowerEdge system,
the operating system does not recognize the device
The problem occurs when Kudzu is disabled.
Run Kudzu manually after you add the new peripheral to your system.
Using Dell DKMS Drivers After Upgrading the Kernel
If the kernel is upgraded in a system where the DKMS driver is installed, then
after the kernel upgrade perform the following procedure to ensure that the
updated DKMS driver is installed for the latest kernel.
•If the module version of the updated kernel is higher than the dkms driver
version, then continue using the native driver.
•If the module version in the updated kernel is lesser than the DKMS driver
version, then use the DKMS driver. Create a file in
filename
For example, for the bnx2 driver, create a file bnx2.conf in /etc/depmod.d/
with the below contents:
override bnx2 2.6.18-x.el5 weak-updates
Run the depmod -a command
dkms_module_name.conf
with an entry as below:
/etc/depmod.d
with the
For more information on DKMS, see the DKMS main page on your system.
Troubleshooting17
18Troubleshooting
Oracle Security Patches and
Recommended Patches
This section provides information about the recommended Oracle security
patch updates and recommended patches.
Critical Patch Updates
Oracle releases quarterly Critical Patch Updates (CPUs) for fixing potential
security vulnerabilities for Oracle products. These CPU patches are required
to be applied to the production systems.
Currently, the latest CPU patch for the Linux x86_64 platform is the
Oracle 11g R1 11.1.0.7 Clusterware CPU patch 9369783.
Recommended Patches
Dell recommends that you apply the Oracle-recommended database
patchsets for the Linux x86_64 platform. For latest Oracle-recommended
patches, see the Metalink Note #756671.1 on the My Oracle Support website
at support.oracle.com.
The current Oracle recommended patches for Oracle 11g R1 11.1.0.7 Clusterware on Linux x86_64 are: