Disk consolidation: Unable to access file

Last week a customer called me with a strange behavior of some VMs. disk_consolidate He noticed that his backup software (HP Data Protector) left snapshots on some VMs. After he manually removed it, he got the message “Virtual machine disk consolidation is needed”. Starting the consolidation process always ended up with the pop-up you can see on the right “Unable to access file since it is locked.”.

If you take a look into the VMware KB you find 2 articles, which describes the following scenarios: The first one assumes, that you are running a virtual backup appliance, which still has the VMDK added. The second one assumes your VM is powered off and you can´t power it on.
In my specific case all VMs were running fine and HP Data Protector was installed on a physical server (no appliance!).

vmware_log

A closer look to the log files (vmware.log) of the affected VMs shows locking error on the base VMDK file.


The command “vmkfstools -D /path/to/vmdk_file” showed me the current locks for the specific file. vmkfstools_D I found two locks with the related mac address of the lock holder (the last sector of the line) but only one of these addresses belong to the ESXi host, which was running the VM.

host_mac
The ESXi cluster was with just 4 nodes quite handy, so I used the GUI to find out the ESXi host with the mac address, which shouldn´t have any relationship with the VM (at least at that time).

A reboot of this ESXi host released the lock and the disk consolidation task was running successful!



How could this problem occur??

Well, I admit the reason for this problem is for sure a corner-case scenario.
As far as I could reconstruct the events (based on log files and customer statements):


-HP Data Protector did some parallel backup jobs (which created some snapshots)
-Planned maintenance work of the storage began, which included a shutdown off all VMs
-Backup jobs failed by timeouts (Datastores no longer available)
-Data Protector was of course not able to delete the snapshots
-After the maintenance work all VMs got powered on again
DRS was enabled on the cluster, so not all VMs stayed on the same ESXi
-All VMs with Data Protector snapshots on a new host experienced the above scenario

After I solved the problem by rebooting two of the four cluster notes I thought that I should have tested to vMotion the affected VMs to its original ESXi host. Maybe this would have also solved the issue…

Comments

  1. Mohammed says:

    Thanks Dear nice info

Speak Your Mind

*