VMware ESXI - 6.5.1 Troubleshooting Guide

vSphere Troubleshooting
Update 1 Modified on 04 OCT 2017 VMware vSphere 6.5 VMware ESXi 6.5 vCenter Server 6.5
vSphere Troubleshooting
https://docs.vmware.com/
If you have comments about this documentation, submit your feedback to
docfeedback@vmware.com
VMware, Inc.
3401 Hillview Ave. Palo Alto, CA 94304 www.vmware.com
Copyright © 2010–2017 VMware, Inc. All rights reserved. Copyright and trademark information.
VMware, Inc. 2

Contents

About vSphere Troubleshooting 5
Updated Information 6
Troubleshooting Overview 7
1
Guidelines for Troubleshooting 7
Troubleshooting with Logs 9
Troubleshooting Virtual Machines 11
2
Troubleshooting Fault Tolerant Virtual Machines 11
Troubleshooting USB Passthrough Devices 16
Recover Orphaned Virtual Machines 18
Virtual Machine Does Not Power On After Cloning or Deploying from Template 19
Troubleshooting Hosts 21
3
Troubleshooting vSphere HA Host States 21
Troubleshooting vSphere Auto Deploy 26
Authentication Token Manipulation Error 34
Active Directory Rule Set Error Causes Host Profile Compliance Failure 34
Unable to Download VIBs When Using vCenter Server Reverse Proxy 35
Troubleshooting vCenter Server and the vSphere Web Client 38
4
Troubleshooting vCenter Server 38
Troubleshooting the vSphere Web Client 39
Troubleshooting vCenter Server and ESXi Host Certificates 41
VMware, Inc.
Troubleshooting Availability 43
5
Troubleshooting vSphere HA Admission Control 43
Troubleshooting Heartbeat Datastores 45
Troubleshooting vSphere HA Failure Response 47
Troubleshooting vSphere Fault Tolerance in Network Partitions 49
Troubleshooting VM Component Protection 50
Troubleshooting Resource Management 52
6
Troubleshooting Storage DRS 52
Troubleshooting Storage I/O Control 58
3
vSphere Troubleshooting
Troubleshooting Storage 61
7
Resolving SAN Storage Display Problems 61
Resolving SAN Performance Problems 63
Virtual Machines with RDMs Need to Ignore SCSI INQUIRY Cache 68
Software iSCSI Adapter Is Enabled When Not Needed 69
Failure to Mount NFS Datastores 69
Troubleshooting Storage Adapters 70
Checking Metadata Consistency with VOMA 70
No Failover for Storage Path When TUR Command Is Unsuccessful 72
Troubleshooting Flash Devices 74
Troubleshooting Virtual Volumes 77
Troubleshooting VAIO Filters 80
Troubleshooting Networking 82
8
Troubleshooting MAC Address Allocation 82
The Conversion to the Enhanced LACP Support Fails 86
Unable to Remove a Host from a vSphere Distributed Switch 87
Hosts on a vSphere Distributed Switch 5.1 and Later Lose Connectivity to vCenter Server 88
Hosts on vSphere Distributed Switch 5.0 and Earlier Lose Connectivity to vCenter Server 90
Alarm for Loss of Network Redundancy on a Host 91
Virtual Machines Lose Connectivity After Changing the Uplink Failover Order of a Distributed Port
Group 92
Unable to Add a Physical Adapter to a vSphere Distributed Switch That Has Network I/O Control
Enabled 93
Troubleshooting SR-IOV Enabled Workloads 94
A Virtual Machine that Runs a VPN Client Causes Denial of Service for Virtual Machines on the
Host or Across a vSphere HA Cluster 95
Low Throughput for UDP Workloads on Windows Virtual Machines 97
Virtual Machines on the Same Distributed Port Group and on Different Hosts Cannot Communicate
with Each Other 99
Attempt to Power On a Migrated vApp Fails Because the Associated Protocol Profile Is Missing 100
Networking Configuration Operation Is Rolled Back and a Host Is Disconnected from
vCenter Server 101
Troubleshooting Licensing 103
9
Troubleshooting Host Licensing 103
Unable to Power On a Virtual Machine 104
Unable to Configure or Use a Feature 105
VMware, Inc. 4

About vSphere Troubleshooting

vSphere Troubleshooting describes troubleshooting issues and procedures for VMware vCenter Server
implementations and related components.
®
Intended Audience
This information is for anyone who wants to troubleshoot virtual machines, ESXi hosts, clusters, and
related storage solutions. The information in this book is for experienced Windows or Linux system
administrators who are familiar with virtual machine technology and data center operations.
Note Not all functionality in the vSphere Web Client has been implemented for the vSphere Client in the
vSphere 6.5 release. For an up-to-date list of unsupported functionality, see Functionality Updates for the
vSphere Client Guide at http://www.vmware.com/info?id=1413.
VMware, Inc.
5

Updated Information

This vSphere Troubleshooting is updated with each release of the product or when necessary.
This table provides the update history of the vSphere Troubleshooting.
Revision Description
04 OCT 2017
EN-002608-00 Initial release.
n
Updated log information in vSphere Auto Deploy TFTP Timeout Error at Boot Time.
n
Updated log directories in Troubleshooting with Logs.
VMware, Inc. 6

Troubleshooting Overview 1

vSphere Troubleshooting contains common troubleshooting scenarios and provides solutions for each of
these problems. You can also find guidance here for resolving problems that have similar origins. For
unique problems, consider developing and adopting a troubleshooting methodology.
The following approach for effective troubleshooting elaborates on how to gather troubleshooting
information, such as identifying symptoms and defining the problem space. Troubleshooting with log files
is also discussed.
This chapter includes the following topics:
n

Guidelines for Troubleshooting

n
Troubleshooting with Logs
Guidelines for Troubleshooting
To troubleshoot your implementation of vSphere, identify the symptoms of the problem, determine which
of the components are affected, and test possible solutions.
Identifying Symptoms A number of potential causes might lead to the under-performance or
nonperformance of your implementation. The first step in efficient
troubleshooting is to identify exactly what is going wrong.
Defining the Problem
Space
Testing Possible
Solutions
Troubleshooting Basics (http://link.brightcove.com/services/player/bcpid2296383276001?
bctid=ref:video_vsphere_troubleshooting)
After you have isolated the symptoms of the problem, you must define the
problem space. Identify the software or hardware components that are
affected and might be causing the problem and those components that are
not involved.
When you know what the symptoms of the problem are and which
components are involved, test the solutions systematically until the problem
is resolved.

Identifying Symptoms

Before you attempt to resolve a problem in your implementation, you must identify precisely how it is
failing.
VMware, Inc.
7
vSphere Troubleshooting
The first step in the troubleshooting process is to gather information that defines the specific symptoms of
what is happening. You might ask these questions when gathering this information:
n
What is the task or expected behavior that is not occurring?
n
Can the affected task be divided into subtasks that you can evaluate separately?
n
Is the task ending in an error? Is an error message associated with it?
n
Is the task completing but in an unacceptably long time?
n
Is the failure consistent or sporadic?
n
What has changed recently in the software or hardware that might be related to the failure?
Defining the Problem Space
After you identify the symptoms of the problem, determine which components in your setup are affected,
which components might be causing the problem, and which components are not involved.
To define the problem space in an implementation of vSphere, be aware of the components present. In
addition to VMware software, consider third-party software in use and which hardware is being used with
the VMware virtual hardware.
Recognizing the characteristics of the software and hardware elements and how they can impact the
problem, you can explore general problems that might be causing the symptoms.
n
Misconfiguration of software settings
n
Failure of physical hardware
n
Incompatibility of components
Break down the process and consider each piece and the likelihood of its involvement separately. For
example, a case that is related to a virtual disk on local storage is probably unrelated to third-party router
configuration. However, a local disk controller setting might be contributing to the problem. If a component
is unrelated to the specific symptoms, you can probably eliminate it as a candidate for solution testing.
Think about what changed in the configuration recently before the problems started. Look for what is
common in the problem. If several problems started at the same time, you can probably trace all the
problems to the same cause.

Testing Possible Solutions

After you know the problem's symptoms and which software or hardware components are most likely
involved, you can systematically test solutions until you resolve the problem.
With the information that you have gained about the symptoms and affected components, you can design
tests for pinpointing and resolving the problem. These tips might make this process more effective.
n
Generate ideas for as many potential solutions as you can.
n
Verify that each solution determines unequivocally whether the problem is fixed. Test each potential
solution but move on promptly if the fix does not resolve the problem.
VMware, Inc. 8
vSphere Troubleshooting
n
Develop and pursue a hierarchy of potential solutions based on likelihood. Systematically eliminate
each potential problem from the most likely to the least likely until the symptoms disappear.
n
When testing potential solutions, change only one thing at a time. If your setup works after many
things are changed at once, you might not be able to discern which of those things made a difference.
n
If the changes that you made for a solution do not help resolve the problem, return the
implementation to its previous status. If you do not return the implementation to its previous status,
new errors might be introduced.
n
Find a similar implementation that is working and test it in parallel with the implementation that is not
working properly. Make changes on both systems at the same time until few differences or only one
difference remains between them.

Troubleshooting with Logs

You can often obtain valuable troubleshooting information by looking at the logs provided by the various
services and agents that your implementation is using.
Most logs are located in C:\ProgramData\VMware\vCenterServer\logs for Windows deployments
or /var/log/ for Linux deployments. Common logs are available in all implementations. Other logs are
unique to certain deployment options (Management Node or Platform Services Controller).
Common Logs
The following logs are common to all deployments on Windows or Linux.
Table 11. Common Log Directories
Log Directory Description
applmgmt VMware Appliance Management Service
cloudvm Logs for allotment and distribution of resources between services
cm VMware Component Manager
firstboot Location where first boot logs are stored
rhttpproxy Reverse Web Proxy
sca VMware Service Control Agent
statsmonitor Vmware Appliance Monitoring Service (Linux only)
vapi VMware vAPI Endpoint
vmaffd VMware Authentication Framework daemon
vmdird VMware Directory Service daemon
vmon VMware Service Lifecycle Manager
Management Node Logs
The following logs are available if a management node deployment is chosen.
VMware, Inc. 9
vSphere Troubleshooting
Table 12. Management Node Log Directories
Log Directory Description
autodeploy VMware vSphere Auto Deploy Waiter
content-library VMware Content Library Service
eam VMware ESX Agent Manager
invsvc VMware Inventory Service
mbcs VMware Message Bus Config Service
netdump VMware vSphere ESXi Dump Collector
perfcharts VMware Performance Charts
vmcam VMware vSphere Authentication Proxy
vmdird VMware Directory Service daemon
vmsyslog collector vSphere Syslog Collector (Windows only)
vmware-sps VMware vSphere Profile-Driven Storage Service
vmware-vpx VMware VirtualCenter Server
vpostgres vFabric Postgres database service
mbcs VMware Message Bus Config Service
vsphere-client VMware vSphere Web Client
vcha VMware High Availability Service (Linux only)
Platform Services Controller Logs
You can examine the following logs if a Platform Services Controller node deployment is chosen.
Table 13. Platform Services Controller Node Log Directories
Log Directory Description
cis-license VMware Licensing Service
sso VMware Secure Token Service
vmcad VMware Certificate Authority daemon
vmdird VMware Directory Service
For Platform Services Controller node deployments, additional runtime logs are located at
C:\ProgramData\VMware\CIS\runtime\VMwareSTSService\logs.
VMware, Inc. 10
Troubleshooting Virtual
Machines 2
The virtual machine troubleshooting topics provide solutions to potential problems that you might
encounter when using your virtual machines.
This chapter includes the following topics:
n

Troubleshooting Fault Tolerant Virtual Machines

n
Troubleshooting USB Passthrough Devices
n
Recover Orphaned Virtual Machines
n
Virtual Machine Does Not Power On After Cloning or Deploying from Template
Troubleshooting Fault Tolerant Virtual Machines
To maintain a high level of performance and stability for your fault tolerant virtual machines and also to
minimize failover rates, you should be aware of certain troubleshooting issues.
The troubleshooting topics discussed focus on problems that you might encounter when using the
vSphere Fault Tolerance feature on your virtual machines. The topics also describe how to resolve
problems.
You can also see the VMware knowledge base article at http://kb.vmware.com/kb/1033634 to help you
troubleshoot Fault Tolerance. This article contains a list of error messages that you might encounter when
you attempt to use the feature and, where applicable, advice on how to resolve each error.

Hardware Virtualization Not Enabled

You must enable Hardware Virtualization (HV) before you use vSphere Fault Tolerance.
Problem
When you attempt to power on a virtual machine with Fault Tolerance enabled, an error message might
appear if you did not enable HV.
Cause
This error is often the result of HV not being available on the ESXi server on which you are attempting to
power on the virtual machine. HV might not be available either because it is not supported by the ESXi
server hardware or because HV is not enabled in the BIOS.
VMware, Inc.
11
vSphere Troubleshooting
Solution
If the ESXi server hardware supports HV, but HV is not currently enabled, enable HV in the BIOS on that
server. The process for enabling HV varies among BIOSes. See the documentation for your hosts'
BIOSes for details on how to enable HV.
If the ESXi server hardware does not support HV, switch to hardware that uses processors that support
Fault Tolerance.

Compatible Hosts Not Available for Secondary VM

If you power on a virtual machine with Fault Tolerance enabled and no compatible hosts are available for
its Secondary VM, you might receive an error message.
Problem
You might encounter the following error message:
Secondary VM could not be powered on as there are no compatible hosts that can accommodate it.
Cause
This can occur for a variety of reasons including that there are no other hosts in the cluster, there are no
other hosts with HV enabled, Hardware MMU Virtualization is not supported by host CPUs, data stores
are inaccessible, there is no available capacity, or hosts are in maintenance mode.
Solution
If there are insufficient hosts, add more hosts to the cluster. If there are hosts in the cluster, ensure they
support HV and that HV is enabled. The process for enabling HV varies among BIOSes. See the
documentation for your hosts' BIOSes for details on how to enable HV. Check that hosts have sufficient
capacity and that they are not in maintenance mode.

Secondary VM on Overcommitted Host Degrades Performance of Primary VM

If a Primary VM appears to be executing slowly, even though its host is lightly loaded and retains idle
CPU time, check the host where the Secondary VM is running to see if it is heavily loaded.
Problem
When a Secondary VM resides on a host that is heavily loaded, the Secondary VM can affect the
performance of the Primary VM.
Cause
A Secondary VM running on a host that is overcommitted (for example, with its CPU resources) might not
get the same amount of resources as the Primary VM. When this occurs, the Primary VM must slow down
to allow the Secondary VM to keep up, effectively reducing its execution speed to the slower speed of the
Secondary VM.
VMware, Inc. 12
vSphere Troubleshooting
Solution
If the Secondary VM is on an overcommitted host, you can move the VM to another location without
resource contention problems. Or more specifically, do the following:
n
For FT networking contention, use vMotion technology to move the Secondary VM to a host with
fewer FT VMs contending on the FT network. Verify that the quality of the storage access to the VM is
not asymmetric.
n
For storage contention problems, turn FT off and on again. When you recreate the Secondary VM,
change its datastore to a location with less resource contention and better performance potential.
n
To resolve a CPU resources problem, set an explicit CPU reservation for the Primary VM at an MHz
value sufficient to run its workload at the desired performance level. This reservation is applied to
both the Primary and Secondary VMs, ensuring that both VMs can execute at a specified rate. For
guidance in setting this reservation, view the performance graphs of the virtual machine (before Fault
Tolerance was enabled) to see how many CPU resources it used under normal conditions.

Increased Network Latency Observed in FT Virtual Machines

If your FT network is not optimally configured, you might experience latency problems with the FT VMs.
Problem
FT VMs might see a variable increase in packet latency (on the order of milliseconds). Applications that
demand very low network packet latency or jitter (for example, certain real-time applications) might see a
degradation in performance.
Cause
Some increase in network latency is expected overhead for Fault Tolerance, but certain factors can add to
this latency. For example, if the FT network is on a particularly high latency link, this latency is passed on
to the applications. Also, if the FT network has insufficient bandwidth (fewer than 10 Gbps), greater
latency might occur.
Solution
Verify that the FT network has sufficient bandwidth (10 Gbps or more) and uses a low latency link
between the Primary VM and Secondary VM. These precautions do not eliminate network latency, but
minimize its potential impact.

Some Hosts Are Overloaded with FT Virtual Machines

You might encounter performance problems if your cluster's hosts have an imbalanced distribution of FT
VMs.
Problem
Some hosts in the cluster might become overloaded with FT VMs, while other hosts might have unused
resources.
VMware, Inc. 13
vSphere Troubleshooting
Cause
vSphere DRS does not load balance FT VMs (unless they are using legacy FT). This limitation might
result in a cluster where hosts are unevenly distributed with FT VMs.
Solution
Manually rebalance the FT VMs across the cluster by using vSphere vMotion. Generally, the fewer FT
VMs that are on a host, the better they perform, due to reduced contention for FT network bandwidth and
CPU resources.

Losing Access to FT Metadata Datastore

Access to the Fault Tolerance metadata datastore is essential for the proper functioning of an FT VM.
Loss of this access can cause a variety of problems.
Problem
These problems include the following:
n
FT can terminate unexpectedly.
n
If both the Primary VM and Secondary VM cannot access the metadata datastore, the VMs might fail
unexpectedly. Typically, an unrelated failure that terminates FT must also occur when access to the
FT metadata datastore is lost by both VMs. vSphere HA then tries to restart the Primary VM on a host
with access to the metadata datastore.
n
The VM might stop being recognized as an FT VM by vCenter Server. This failed recognition can
allow unsupported operations such as taking snapshots to be performed on the VM and cause
problematic behavior.
Cause
Lack of access to the Fault Tolerance metadata datastore can lead to the undesirable outcomes in the
previous list.
Solution
When planning your FT deployment, place the metadata datastore on highly available storage. While FT
is running, if you see that the access to the metadata datastore is lost on either the Primary VM or the
Secondary VM, promptly address the storage problem before loss of access causes one of the previous
problems. If a VM stops being recognized as an FT VM by vCenter Server, do not perform unsupported
operations on the VM. Restore access to the metadata datastore. After access is restored for the FT VMs
and the refresh period has ended, the VMs are recognizable.

Turning On vSphere FT for Powered-On VM Fails

If you try to turn on vSphere Fault Tolerance for a powered-on VM, this operation can fail.
VMware, Inc. 14
vSphere Troubleshooting
Problem
When you select Turn On Fault Tolerance for a powered-on VM, the operation fails and you see an
Unknown error message.
Cause
This operation can fail if the host that the VM is running on has insufficient memory resources to provide
fault tolerant protection. vSphere Fault Tolerance automatically tries to allocate a full memory reservation
on the host for the VM. Overhead memory is required for fault tolerant VMs and can sometimes expand to
1 to 2 GB. If the powered-on VM is running on a host that has insufficient memory resources to
accommodate the full reservation plus the overhead memory, trying to turn on Fault Tolerance fails.
Subsequently, the Unknown error message is returned.
Solution
Choose from these solutions:
n
Free up memory resources on the host to accommodate the VM's memory reservation and the added
overhead.
n
Move the VM to a host with ample free memory resources and try again.

FT Virtual Machines not Placed or Evacuated by vSphere DRS

FT virtual machines in a cluster that is enabled with vSphere DRS do not function correctly if
Enhanced vMotion Compatibility (EVC) is currently disabled.
Problem
Because EVC is a prerequisite for using DRS with FT VMs, DRS does not place or evacuate them if EVC
has been disabled (even if it is later reenabled).
Cause
When EVC is disabled on a DRS cluster, a VM override that disables DRS on an FT VM might be added.
Even if EVC is later reenabled, this override is not canceled.
Solution
If DRS does not place or evacuate FT VMs in the cluster, check the VMs for a VM override that is
disabling DRS. If you find one, remove the override that is disabling DRS.
Note For more information on how to edit or delete VM overrides, see vSphere Resource Management.

Fault Tolerant Virtual Machine Failovers

A Primary or Secondary VM can fail over even though its ESXi host has not crashed. In such cases,
virtual machine execution is not interrupted, but redundancy is temporarily lost. To avoid this type of
failover, be aware of some of the situations when it can occur and take steps to avoid them.
VMware, Inc. 15
vSphere Troubleshooting
Partial Hardware Failure Related to Storage
This problem can arise when access to storage is slow or down for one of the hosts. When this occurs
there are many storage errors listed in the VMkernel log. To resolve this problem you must address your
storage-related problems.
Partial Hardware Failure Related to Network
If the logging NIC is not functioning or connections to other hosts through that NIC are down, this can
trigger a fault tolerant virtual machine to be failed over so that redundancy can be reestablished. To avoid
this problem, dedicate a separate NIC each for vMotion and FT logging traffic and perform vMotion
migrations only when the virtual machines are less active.
Insucient Bandwidth on the Logging NIC Network
This can happen because of too many fault tolerant virtual machines being on a host. To resolve this
problem, more broadly distribute pairs of fault tolerant virtual machines across different hosts.
Use a10-Gbit logging network for FT and verify that the network is low latency.
vMotion Failures Due to Virtual Machine Activity Level
If the vMotion migration of a fault tolerant virtual machine fails, the virtual machine might need to be failed
over. Usually, this occurs when the virtual machine is too active for the migration to be completed with
only minimal disruption to the activity. To avoid this problem, perform vMotion migrations only when the
virtual machines are less active.
Too Much Activity on VMFS Volume Can Lead to Virtual Machine Failovers
When a number of file system locking operations, virtual machine power ons, power offs, or vMotion
migrations occur on a single VMFS volume, this can trigger fault tolerant virtual machines to be failed
over. A symptom that this might be occurring is receiving many warnings about SCSI reservations in the
VMkernel log. To resolve this problem, reduce the number of file system operations or ensure that the
fault tolerant virtual machine is on a VMFS volume that does not have an abundance of other virtual
machines that are regularly being powered on, powered off, or migrated using vMotion.
Lack of File System Space Prevents Secondary VM Startup
Check whether or not your /(root) or /vmfs/datasource file systems have available space. These file
systems can become full for many reasons, and a lack of space might prevent you from being able to
start a new Secondary VM.

Troubleshooting USB Passthrough Devices

Information about feature behavior can help you troubleshoot or avoid potential problems when USB
devices are connected to a virtual machine.
VMware, Inc. 16
vSphere Troubleshooting

Error Message When You Try to Migrate Virtual Machine with USB Devices Attached

Migration with vMotion cannot proceed and issues a confusing error message when you connect multiple
USB devices from an ESXi host to a virtual machine and one or more devices are not enabled for
vMotion.
Problem
The Migrate Virtual Machine wizard runs a compatibility check before a migration operation begins. If
unsupported USB devices are detected, the compatibility check fails and an error message similar to the
following appears: Currently connected device 'USB 1' uses backing 'path:1/7/1', which
is not accessible.
Cause
To successfully pass vMotion compatibility checks, you must enable all USB devices that are connected
to the virtual machine from a host for vMotion. If one or more devices are not enabled for vMotion,
migration will fail.
Solution
1 Make sure that the devices are not in the process of transferring data before removing them.
2 Re-add and enable vMotion for each affected USB device.

Cannot Copy Data From an ESXi Host to a USB Device That Is Connected to the Host

You can connect a USB device to an ESXi host and copy data to the device from the host. For example,
you might want to gather the vm-support bundle from the host after the host loses network connectivity.
To perform this task, you must stop the USB arbitrator.
Problem
If the USB arbitrator is being used for USB passthrough from an ESXi host to a virtual machine the USB
device appears under lsusb but does not mount correctly.
Cause
This problem occurs because the nonbootable USB device is reserved for the virtual machine by default.
It does not appear on the host's file system, even though lsusb can see the device.
Solution
1 Stop the usbarbitrator service:/etc/init.d/usbarbitrator stop
2 Physically disconnect and reconnect the USB device.
By default, the device location is /vmfs/devices/disks/mpx.vmhbaXX:C0:T0:L0.
VMware, Inc. 17
vSphere Troubleshooting
3 After you reconnect the device, restart the usbarbitrator service:/etc/init.d/usbarbitrator
start
4 Restart hostd and any running virtual machines to restore access to the passthrough devices in the
virtual machine.
What to do next
Reconnect the USB devices to the virtual machine.

Recover Orphaned Virtual Machines

Virtual machines appear with (orphaned) appended to their names.
Problem
Virtual machines that reside on an ESXi host that vCenter Server manages might become orphaned in
rare cases. Such virtual machines exist in the vCenter Server database, but the ESXi host no longer
recognizes them.
Cause
Virtual machines can become orphaned if a host failover is unsuccessful, or when the virtual machine is
unregistered directly on the host. If this situation occurs, move the orphaned virtual machine to another
host in the data center on which the virtual machine files are stored.
Solution
1 Determine the datastore where the virtual machine configuration (.vmx) file is located.
a Select the virtual machine in the vSphere Web Client inventory, and click the Datastores tab.
The datastore or datastores where the virtual machine files are stored are displayed.
b If more than one datastore is displayed, select each datastore and click the file browser icon to
browse for the .vmx file.
c Verify the location of the .vmx file.
2 Return to the virtual machine in the vSphere Web Client, right-click it, and select All Virtual
Infrastructure Actions > Remove from Inventory.
3 Click Yes to confirm the removal of the virtual machine.
4 Reregister the virtual machine with vCenter Server.
a Right-click the datastore where the virtual machine file is located and select Register VM.
b Browse to the .vmx file and click OK.
c Select the location for the virtual machine and click Next.
d Select the host on which to run the virtual machine and click Next.
e Click Finish.
VMware, Inc. 18
vSphere Troubleshooting

Virtual Machine Does Not Power On After Cloning or Deploying from Template

Virtual machines do not power on after you complete the clone or deploy from template workflow in the
vSphere Web Client.
Problem
When you clone a virtual machine or deploy a virtual machine from a template, you might not be able to
power on the virtual machine after creation.
Cause
The swap file size is not reserved when the virtual machine disks are created.
Solution
n
Reduce the size of the swap file that is required for the virtual machine. You can do this by increasing
the virtual machine memory reservation.
a Right-click the virtual machine and select Edit Settings.
b Select Virtual Hardware and click Memory.
c Use the Reservation drop-down menu to increase the amount of memory allocated to the virtual
machine.
d Click OK.
n
Alternatively, you can increase the amount of space available for the swap file by moving other virtual
machine disks off the datastore that is being used for the swap file.
a Browse to the datastore in the vSphere Web Client object navigator.
b Select the VMs tab.
c For each virtual machine to move, right-click the virtual machine and select Migrate.
d Select Change storage only.
e Proceed through the Migrate Virtual Machine wizard.
n
You can also increase the amount of space available for the swap file by changing the swap file
location to a datastore with adequate space.
a Browse to the host in the vSphere Web Client object navigator.
b Select the Configure tab.
c Under Virtual Machines, select Swap file location.
VMware, Inc. 19
vSphere Troubleshooting
d Click Edit.
Note If the host is part of a cluster that specifies that the virtual machine swap files are stored in
the same directory as the virtual machine, you cannot click Edit. You must use the Cluster
Settings dialog box to change the swap file location policy for the cluster.
e Select Use a specific datastore and select a datastore from the list.
f Click OK.
VMware, Inc. 20

Troubleshooting Hosts 3

The host troubleshooting topics provide solutions to potential problems that you might encounter when
using your vCenter Servers and ESXi hosts.
This chapter includes the following topics:
n

Troubleshooting vSphere HA Host States

n
Troubleshooting vSphere Auto Deploy
n
Authentication Token Manipulation Error
n
Active Directory Rule Set Error Causes Host Profile Compliance Failure
n
Unable to Download VIBs When Using vCenter Server Reverse Proxy
Troubleshooting vSphere HA Host States
vCenter Server reports vSphere HA host states that indicate an error condition on the host. Such errors
can prevent vSphere HA from fully protecting the virtual machines on the host and can impede vSphere
HA's ability to restart virtual machines after a failure. Errors can occur when vSphere HA is being
configured or unconfigured on a host or, more rarely, during normal operation. When this happens, you
should determine how to resolve the error, so that vSphere HA is fully operational.

vSphere HA Agent Is in the Agent Unreachable State

The vSphere HA agent on a host is in the Agent Unreachable state for a minute or more. User
intervention might be required to resolve this situation.
Problem
vSphere HA reports that an agent is in the Agent Unreachable state when the agent for the host cannot
be contacted by the master host or by vCenter Server. Consequently, vSphere HA is not able to monitor
the virtual machines on the host and might not restart them after a failure.
VMware, Inc.
21
vSphere Troubleshooting
Cause
A vSphere HA agent can be in the Agent Unreachable state for several reasons. This condition most
often indicates that a networking problem is preventing vCenter Server or the master host from contacting
the agent on the host, or that all hosts in the cluster have failed. This condition can also indicate the
unlikely situation that vSphere HA was disabled and then re-enabled on the cluster while vCenter Server
could not communicate with the vSphere HA agent on the host, or that the ESXi host agent on the host
has failed, and the watchdog process was unable to restart it. In any of these cases, a failover event is
not triggered when a host goes into the Unreachable state.
Solution
Determine if vCenter Server is reporting the host as not responding. If so, there is a networking problem,
an ESXi host agent failure, or a total cluster failure. After the condition is resolved, vSphere HA should
work correctly. If not, reconfigure vSphere HA on the host. Similarly, if vCenter Server reports the hosts
are responding but a host's state is Agent Unreachable, reconfigure vSphere HA on that host.

vSphere HA Agent is in the Uninitialized State

The vSphere HA agent on a host is in the Uninitialized state for a minute or more. User intervention might
be required to resolve this situation.
Problem
vSphere HA reports that an agent is in the Uninitialized state when the agent for the host is unable to
enter the run state and become the master host or to connect to the master host. Consequently, vSphere
HA is not able to monitor the virtual machines on the host and might not restart them after a failure.
Cause
A vSphere HA agent can be in the Uninitialized state for one or more reasons. This condition most often
indicates that the host does not have access to any datastores. Less frequently, this condition indicates
that the host does not have access to its local datastore on which vSphere HA caches state information,
the agent on the host is inaccessible, or the vSphere HA agent is unable to open required firewall ports. It
is also possible that the ESXi host agent has stopped.
Solution
Search the list of the host's events for recent occurrences of the event vSphere HA Agent for the
host has an error. This event indicates the reason for the host being in the uninitialized state. If the
condition exists because of a datastore problem, resolve whatever is preventing the host from accessing
the affected datastores. If the ESXi host agent has stopped, you must restart it. After the problem has
been resolved, if the agent does not return to an operational state, reconfigure vSphere HA on the host.
Note If the condition exists because of a firewall problem, check if there is another service on the host
that is using port 8182. If so, shut down that service, and reconfigure vSphere HA.
VMware, Inc. 22
vSphere Troubleshooting

vSphere HA Agent is in the Initialization Error State

The vSphere HA agent on a host is in the Initialization Error state for a minute or more. User intervention
is required to resolve this situation.
Problem
vSphere HA reports that an agent is in the Initialization Error state when the last attempt to configure
vSphere HA for the host failed. vSphere HA does not monitor the virtual machines on such a host and
might not restart them after a failure.
Cause
This condition most often indicates that vCenter Server was unable to connect to the host while the
vSphere HA agent was being installed or configured on the host. This condition might also indicate that
the installation and configuration completed, but the agent did not become a master host or a slave host
within a timeout period. Less frequently, the condition is an indication that there is insufficient disk space
on the host's local datastore to install the agent, or that there are insufficient unreserved memory
resources on the host for the agent resource pool. Finally, for ESXi 5.x hosts, the configuration fails if a
previous installation of another component required a host reboot, but the reboot has not yet occurred.
Solution
When a Configure HA task fails, a reason for the failure is reported.
Reason for Failure Action
Host communication
errors
Timeout errors Possible causes include that the host crashed during the configuration task, the agent failed to start after
Lack of resources Free up approximately 75MB of disk space. If the failure is due to insufficient unreserved memory, free
Reboot pending If an installation for a 5.0 or later host fails because a reboot is pending, reboot the host and retry the
Resolve any communication problems with the host and retry the configuration operation.
being installed, or the agent was unable to initialize itself after starting up. Verify that vCenter Server is
able to communicate with the host. If so, see vSphere HA Agent Is in the Agent Unreachable State or
vSphere HA Agent is in the Uninitialized State for possible solutions.
up memory on the host by either relocating virtual machines to another host or reducing their
reservations. In either case, retry the vSphere HA configuration task after resolving the problem.
vSphere HA configuration task.

vSphere HA Agent is in the Uninitialization Error State

The vSphere HA agent on a host is in the Uninitialization Error state. User intervention is required to
resolve this situation.
VMware, Inc. 23
vSphere Troubleshooting
Problem
vSphere HA reports that an agent is in the Uninitialization Error state when vCenter Server is unable to
unconfigure the agent on the host during the Unconfigure HA task. An agent left in this state can interfere
with the operation of the cluster. For example, the agent on the host might elect itself as master host and
lock a datastore. Locking a datastore prevents the valid cluster master host from managing the virtual
machines with configuration files on that datastore.
Cause
This condition usually indicates that vCenter Server lost the connection to the host while the agent was
being unconfigured.
Solution
Add the host back to vCenter Server (version 5.0 or later). The host can be added as a stand-alone host
or added to any cluster.

vSphere HA Agent is in the Host Failed State

The vSphere HA agent on a host is in the Host Failed state. User intervention is required to resolve the
situation.
Problem
Usually, such reports indicate that a host has actually failed, but failure reports can sometimes be
incorrect. A failed host reduces the available capacity in the cluster and, in the case of an incorrect report,
prevents vSphere HA from protecting the virtual machines running on the host.
Cause
This host state is reported when the vSphere HA master host to which vCenter Server is connected is
unable to communicate with the host and with the heartbeat datastores that are in use for the host. Any
storage failure that makes the datastores inaccessible to hosts can cause this condition if accompanied
by a network failure.
Solution
Check for the noted failure conditions and resolve any that are found.

vSphere HA Agent is in the Network Partitioned State

The vSphere HA agent on a host is in the Network Partitioned state. User intervention might be required
to resolve this situation.
VMware, Inc. 24
vSphere Troubleshooting
Problem
While the virtual machines running on the host continue to be monitored by the master hosts that are
responsible for them, vSphere HA's ability to restart the virtual machines after a failure is affected. First,
each master host has access to a subset of the hosts, so less failover capacity is available to each host.
Second, vSphere HA might be unable to restart a FT Secondary VM after a failure (see Primary VM
Remains in the Need Secondary State).
Cause
A host is reported as partitioned if both of the following conditions are met:
n
The vSphere HA master host to which vCenter Server is connected is unable to communicate with
the host by using the management (or VMware vSAN™)network, but is able to communicate with that
host by using the heartbeat datastores that have been selected for it.
n
The host is not isolated.
A network partition can occur for a number of reasons including incorrect VLAN tagging, the failure of a
physical NIC or switch, configuring a cluster with some hosts that use only IPv4 and others that use only
IPv6, or the management networks for some hosts were moved to a different virtual switch without first
putting the host into maintenance mode.
Solution
Resolve the networking problem that prevents the hosts from communicating by using the management
networks.

vSphere HA Agent is in the Network Isolated State

The vSphere HA agent on a host is in the Network Isolated state. User intervention is required to resolve
this situation.
Problem
When a host is in the Network Isolated state, there are two things to consider -- the isolated host and the
vSphere HA agent that holds the master role.
n
On the isolated host, the vSphere HA agent applies the configured isolation response to the running
VMs, determining if they should be shut down or powered off. It does this after checking whether a
master agent is able to take responsibility for each VM (by locking the VM's home datastore.) If not,
the agent defers applying the isolation response for the VM and rechecks the datastore state after a
short delay.
n
If the vSphere HA master agent can access one or more of the datastores, it monitors the VMs that
were running on the host when it became isolated and attempts to restart any that were powered off
or shut down.
VMware, Inc. 25
vSphere Troubleshooting
Cause
A host is network isolated if both of the following conditions are met:
n
Isolation addresses have been configured and the host is unable to ping them.
n
The vSphere HA agent on the host is unable to access any of the agents running on the other cluster
hosts.
Note If your vSphere HA cluster has vSAN enabled, a host is determined to be isolated if it cannot
communicate with the other vSphere HA agents in the cluster and cannot reach the configured isolation
addresses. Although the vSphere HA agents use the vSAN network for inter-agent communication, the
default isolation address is still the gateway of the host. Hence, in the default configuration, both networks
must fail for a host be declared isolated.
Solution
Resolve the networking problem that is preventing the host from pinging its isolation addresses and
communicating with other hosts.
Configuration of vSphere HA on Hosts Times Out
The configuration of a vSphere HA cluster might time out on some of the hosts added to it.
Problem
When you enable vSphere HA on an existing cluster with a large number of hosts and virtual machines,
the setup of vSphere HA on some of the hosts might fail.
Cause
This failure is the result of a time out occurring before the installation of vSphere HA on the host(s)
completes.
Solution
Set the vCenter Server advanced option config.vpxd.das.electionWaitTimeSec to value=240. Once this
change is made, the time outs do not occur.

Troubleshooting vSphere Auto Deploy

The vSphere Auto Deploy troubleshooting topics offer solutions for situations when provisioning hosts
with vSphere Auto Deploy does not work as expected.

vSphere Auto Deploy TFTP Timeout Error at Boot Time

A TFTP Timeout error message appears when a host provisioned with vSphere Auto Deploy boots. The
text of the message depends on the BIOS.
VMware, Inc. 26
vSphere Troubleshooting
Problem
A TFTP Timeout error message appears when a host provisioned with vSphere Auto Deploy boots. The
text of the message depends on the BIOS.
Cause
The TFTP server is down or unreachable.
Solution
n
Ensure that your TFTP service is running and reachable by the host that you are trying to boot.
n
To view the diagnostic logs for details on the present error, see your TFTP service documentation.
vSphere Auto Deploy Host Boots with Wrong Configuration
A host is booting with a different ESXi image, host profile, or folder location than the one specified in the
rules.
Problem
A host is booting with a different ESXi image profile or configuration than the image profile or
configuration that the rules specify. For example, you change the rules to assign a different image profile,
but the host still uses the old image profile.
Cause
After the host has been added to a vCenter Server system, the boot configuration is determined by the
vCenter Server system. The vCenter Server system associates an image profile, host profile, or folder
location with the host.
Solution
u
Use the Test-DeployRuleSetCompliance and Repair-DeployRuleSetCompliance vSphere
PowerCLI cmdlets to reevalute the rules and to associate the correct image profile, host profile, or
folder location with the host.

Host Is Not Redirected to vSphere Auto Deploy Server

During boot, a host that you want to provision with vSphere Auto Deploy loads iPXE. The host is not
redirected to the vSphere Auto Deploy server.
Problem
During boot, a host that you want to provision with vSphere Auto Deploy loads iPXE. The host is not
redirected to the vSphere Auto Deploy server.
Cause
The tramp file that is included in the TFTP ZIP file has the wrong IP address for the vSphere Auto Deploy
server.
VMware, Inc. 27
vSphere Troubleshooting
Solution
u
Correct the IP address of the vSphere Auto Deploy server in the tramp file, as explained in the
vSphere Installation and Setup documentation.
Package Warning Message When You Assign an Image Profile to a vSphere Auto Deploy Host
When you run a vSphere PowerCLI cmdlet that assigns an image profile that is not vSphere Auto Deploy
ready, a warning message appears.
Problem
When you write or modify rules to assign an image profile to one or more hosts, the following error
results:
Warning: Image Profile <name-here> contains one or more software packages that are
not stateless-ready. You may experience problems when using this profile with Auto
Deploy.
Cause
Each VIB in an image profile has a stateless-ready flag that indicates that the VIB is meant for use
with vSphere Auto Deploy. You get the error if you attempt to write a vSphere Auto Deploy rule that uses
an image profile in which one or more VIBs have that flag set to FALSE.
Note You can use hosts provisioned with vSphere Auto Deploy that include VIBs that are not stateless
ready without problems. However booting with an image profile that includes VIBs that are not stateless
ready is treated like a fresh install. Each time you boot the host, you lose any configuration data that
would otherwise be available across reboots for hosts provisioned with vSphere Auto Deploy.
Solution
1 Use vSphere ESXi Image Builder cmdlets in a vSphere PowerCLI session to view the VIBs in the
image profile.
2 Remove any VIBs that are not stateless-ready.
3 Rerun the vSphere Auto Deploy cmdlet.

vSphere Auto Deploy Host with a Built-In USB Flash Drive Does Not Send Coredumps to Local Disk

If your vSphere Auto Deploy host has a built-in USB flash drive, and an error results in a coredump, the
coredump is lost. Set up your system to use ESXi Dump Collector to store coredumps on a networked
host.
Problem
If your vSphere Auto Deploy host has a built-in USB Flash, and if it encounters an error that results in a
coredump, the coredump is not sent to the local disk.
VMware, Inc. 28
vSphere Troubleshooting
Solution
1 Install ESXi Dump Collector on a system of your choice.
ESXi Dump Collector is included with the vCenter Server installer.
2 Use ESXCLI to configure the host to use ESXi Dump Collector.
esxcli conn_options system coredump network set IP-addr,port
esxcli system coredump network set -e true
3 Use ESXCLI to disable local coredump partitions.
esxcli conn_options system coredump partition set -e false

vSphere Auto Deploy Host Reboots After Five Minutes

A vSphere Auto Deploy host boots and displays iPXE information, but reboots after five minutes.
Problem
A host to be provisioned with vSphere Auto Deploy boots from iPXE and displays iPXE information on the
console. However, after five minutes, the host displays the following message to the console and reboots.
This host is attempting to network-boot using VMware
AutoDeploy. However, there is no ESXi image associated with this host.
Details: No rules containing an Image Profile match this
host. You can create a rule with the New-DeployRule PowerCLI cmdlet
and add it to the rule set with Add-DeployRule or Set-DeployRuleSet.
The rule should have a pattern that matches one or more of the attributes
listed below.
The host might also display the following details:
Details: This host has been added to VC, but no Image Profile
is associated with it. You can use Apply-ESXImageProfile in the
PowerCLI to associate an Image Profile with this host.
Alternatively, you can reevaluate the rules for this host with the
Test-DeployRuleSetCompliance and Repair-DeployRuleSetCompliance cmdlets.
The console then displays the host's machine attributes including vendor, serial number, IP address, and
so on.
Cause
No image profile is currently associated with this host.
VMware, Inc. 29
vSphere Troubleshooting
Solution
You can assign an image profile to the host by running the Apply-EsxImageProfile cmdlet, or by
creating the following rule:
1 Run the New-DeployRule cmdlet to create a rule that includes a pattern that matches the host with
an image profile.
2 Run the Add-DeployRule cmdlet to add the rule to a ruleset.
3 Run the Test-DeployRuleSetCompliance cmdlet and use the output of that cmdlet as the input to
the Repair-DeployRuleSetCompliance cmdlet.

vSphere Auto Deploy Host Cannot Contact TFTP Server

The host that you provision with vSphere Auto Deploy cannot contact the TFTP server.
Problem
When you attempt to boot a host provisioned with vSphere Auto Deploy, the host performs a network boot
and is assigned a DHCP address by the DHCP server, but the host cannot contact the TFTP server.
Cause
The TFTP server might have stopped running, or a firewall might block the TFTP port.
Solution
n
If you installed the WinAgents TFTP server, open the WinAgents TFTP management console and
verify that the service is running. If the service is running, check the Windows firewall's inbound rules
to make sure the TFTP port is not blocked. Turn off the firewall temporarily to see whether the firewall
is the problem.
n
For all other TFTP servers, see the server documentation for debugging procedures.

vSphere Auto Deploy Host Cannot Retrieve ESXi Image from vSphere Auto Deploy Server

The host that you provision with vSphere Auto Deploy stops at the iPXE boot screen.
Problem
When you attempt to boot a host provisioned with vSphere Auto Deploy, the boot process stops at the
iPXE boot screen and the status message indicates that the host is attempting to get the ESXi image
from the vSphere Auto Deploy server.
Cause
The vSphere Auto Deploy service might be stopped or the vSphere Auto Deploy server might be
unaccessible.
VMware, Inc. 30
Loading...
+ 75 hidden pages