Virtual machine suspended not boot again

Goog morning!

I have an OpenNebula 4.6.2 with vmware.
When I suspend a virtual machine and I try to boot after that state (SUSPENDED), always fails and VM doesn’t change of state.
In vmware the VM appears power off and when I try boot it from there, it boots fine but it isn’t synchronized with Sunstone. In Sunstone it appears suspended.

The problem is:
Tue Mar 31 12:17:49 2015 [VMM][I]: Command execution fail: /var/lib/one/remotes/vmm/vmware/restore ‘/vmfs/volumes/0/128/checkpoint’ ‘X.X.X.X’ ‘one-128’ 128 X.X.X.X
Tue Mar 31 12:17:49 2015 [VMM][I]: /var/lib/one/remotes/vmm/vmware/vmware_driver.rb:212: warning: Object#id will be deprecated; use Object#object_id
Tue Mar 31 12:17:49 2015 [VMM][E]: restore: Error executing: virsh -c ‘esx://X.X.X.X/?no_verify=1&auto_answer=1’ snapshot-revert one-128 checkpoint err: ExitCode: 1
Tue Mar 31 12:17:49 2015 [VMM][I]: out:
Tue Mar 31 12:17:49 2015 [VMM][I]: error: internal error Could not revert to snapshot ‘checkpoint’: FileLocked - Unable to access file since it is locked
Tue Mar 31 12:17:49 2015 [VMM][I]:
Tue Mar 31 12:17:49 2015 [VMM][I]: ExitCode: 1
Tue Mar 31 12:17:49 2015 [VMM][I]: Failed to execute virtualization driver operation: restore.
Tue Mar 31 12:17:49 2015 [VMM][E]: Error restoring VM
Tue Mar 31 12:17:49 2015 [LCM][I]: Fail to boot VM. New VM state is SUSPENDED

How can I solve it?

Thank you!

The error seems to come from the snapshot file being locked. To better understand why, could you please send the vmware.log file? This should be placed in the /vmfs/volumes/<system_ds_id>//disk.0/ directory in the ESX, where vid is the id of the VM, and system_ds_id is the system datastore id.

Hello,

Really, the problem is the snapshot.
Wed Apr 1 13:36:51 2015 [VMM][I]: Command execution fail: /var/lib/one/remotes/vmm/vmware/restore ‘/vmfs/volumes/0/125/checkpoint’ ‘X.X.X.X’ ‘one-125’ 125 X.X.X.X
Wed Apr 1 13:36:51 2015 [VMM][E]: restore: Error executing: virsh -c ‘esx://X.X.X.X/?no_verify=1&auto_answer=1’ snapshot-revert one-125 checkpoint err: ExitCode: 1
Wed Apr 1 13:36:51 2015 [VMM][I]: out:
Wed Apr 1 13:36:51 2015 [VMM][I]: error: internal error Could not revert to snapshot ‘checkpoint’: FileLocked - Unable to access file since it is locked

How can I attach you the vmware.log file.

Regards and thank you!

Hi,

You can post a link to a GitHub gist.

THANK YOU!

Hello again,

I continue without solving the problem with snapshot.
Mon Apr 6 10:19:34 2015 [VMM][I]: Command execution fail: /var/lib/one/remotes/vmm/vmware/restore ‘/vmfs/volumes/0/126/checkpoint’ ‘X.X.X.X’ ‘one-126’ 126 X.X.X.X
Mon Apr 6 10:19:34 2015 [VMM][E]: restore: Error executing: virsh -c ‘esx://X.X.X.X/?no_verify=1&auto_answer=1’ snapshot-revert one-126 checkpoint err: ExitCode: 1
Mon Apr 6 10:19:34 2015 [VMM][I]: out:
Mon Apr 6 10:19:34 2015 [VMM][I]: error: internal error Could not revert to snapshot ‘checkpoint’: FileLocked - Unable to access file since it is locked
Mon Apr 6 10:19:34 2015 [VMM][I]:
Mon Apr 6 10:19:34 2015 [VMM][I]: ExitCode: 1
Mon Apr 6 10:19:34 2015 [VMM][I]: Failed to execute virtualization driver operation: restore.
Mon Apr 6 10:19:34 2015 [VMM][E]: Error restoring VM
Mon Apr 6 10:19:34 2015 [LCM][I]: Fail to boot VM. New VM state is SUSPENDED

I attached the logs last week.
Any idea?

Thank you!
Regards.

Hi,

After analysing the log, we cannot find any reference to the locked file. We suggest following the steps proposed by VMware to solve the issue. If this does not work, we suggest copying the relevant disks from the VM, register them again and launch a new VM based on this new disks to avoid losing any data.

Hello,
I have already tested the VMware steps to solve the problem. This doesn’t work.
The question is this problem happens in all machines when theses are suspended, never start again.
Can it be a template or image mistake?
Thank you. Regards.

We cannot reproduce the problem, but it is unlikely that it is a template or image error. Does this happen with only VMs from one VM template and one image?

If you access the ESX directly via the vSphere client, are you able to restore the “checkpoint” snapshot? Are you able to create more snapshots, and restore them?

Hello,

I have just created a new virtual machine and the problem is the same when I suspend the machine.
From vSphere client I can create new snapshosts, revert to checkpoint snapshot. It’s very strange.
Can it be a permission problem in the datastores inside vSphere? I have created a new user called oneadmin but when I create a machine for example, the owner is root inside the ESXi.

Thank you!

Permissions could explain the problem. Are you using NFS? If you change the checkpoint snapshot file ownership to oneadmin, does the VM go from suspended to running?

Hello,

I’m using iSCSI.
First I do chown -R oneadmin.oneadmin /vmfs/volumes/0/one-125/ to change the permissions.
Then when I try to resume the virtual machine again, some permission change to root.root.
-rw------- 1 oneadmin oneadmin 16801792 Apr 1 08:32 disk-000001-delta.vmdk
-rw------- 1 oneadmin oneadmin 311 Apr 1 08:31 disk-000001.vmdk
-rw------- 1 oneadmin oneadmin 24576 Apr 1 08:39 disk-000002-delta.vmdk
-rw------- 1 oneadmin oneadmin 311 Apr 1 08:39 disk-000002.vmdk
-rw------- 1 oneadmin oneadmin 24576 Apr 1 10:44 disk-000003-delta.vmdk
-rw------- 1 oneadmin oneadmin 311 Apr 1 10:44 disk-000003.vmdk
-rw------- 1 oneadmin oneadmin 24576 Apr 1 10:46 disk-000004-delta.vmdk
-rw------- 1 oneadmin oneadmin 311 Apr 1 10:46 disk-000004.vmdk
-rw------- 1 oneadmin oneadmin 24576 Apr 1 11:12 disk-000005-delta.vmdk
-rw------- 1 oneadmin oneadmin 311 Apr 1 11:12 disk-000005.vmdk
-rw------- 1 oneadmin oneadmin 24576 Apr 1 11:17 disk-000006-delta.vmdk
-rw------- 1 oneadmin oneadmin 311 Apr 1 11:17 disk-000006.vmdk
-rw------- 1 oneadmin oneadmin 24576 Apr 1 11:33 disk-000007-delta.vmdk
-rw------- 1 oneadmin oneadmin 311 Apr 1 11:33 disk-000007.vmdk
-rw------- 1 oneadmin oneadmin 24576 Apr 1 12:05 disk-000008-delta.vmdk
-rw------- 1 oneadmin oneadmin 311 Apr 1 12:05 disk-000008.vmdk
-rw------- 1 oneadmin oneadmin 24576 Apr 6 16:27 disk-000009-delta.vmdk
-rw------- 1 oneadmin oneadmin 311 Apr 6 16:27 disk-000009.vmdk
-rw------- 1 root root 24576 Apr 6 16:41 disk-000010-delta.vmdk
-rw------- 1 root root 311 Apr 6 16:41 disk-000010.vmdk
-rw------- 1 oneadmin oneadmin 10485760000 Apr 1 08:31 disk-flat.vmdk
-rw------- 1 root root 541 Mar 31 12:14 disk.vmdk
-rw------- 1 root root 8684 Apr 6 16:41 nvram
-rw------- 1 root root 538103426 Apr 1 08:32 one-125-Snapshot1.vmsn
-rw------- 1 root root 538103372 Apr 1 08:32 one-125-f12dc81c.vmss
-rw-r–r-- 1 root root 390 Apr 6 16:29 one-125.vmsd
-rwx------ 1 root root 1838 Apr 6 16:41 one-125.vmx
-rw-r–r-- 1 root root 262 Mar 9 12:31 one-125.vmxf
-rw-r–r-- 1 root root 159737 Mar 30 08:24 vmware-1.log
-rw-r–r-- 1 root root 224509 Mar 31 11:58 vmware-2.log
-rw-r–r-- 1 root root 172562 Apr 1 08:32 vmware.log

The error persists.
How can i change this permanently?

Thank you!

We can confirm the permissions issue by setting ESX root credentials in /etc/one/vmwarerc and trying again the process. Please let us know the outcome of the experiment.

Hello,

This is my /etc/one/vmwarerc file. Is it right?

Libvirt congfiguration

:libvirt_uri: “‘esx://@HOST@/?no_verify=1&auto_answer=1’”

Username and password of the VMware hypervisor

:username: “root”
:password: “…”

VMotion configuration attributes

:datacenter: “ha-datacenter”
:vcenter: “IP”

Can datacenter and vcenter attributes be deleted?
What attributes must put double quotes, single quotes or both in?

Thank you!

The error continues.

I have tried to resume the machine with root credentials but the error is the same:
Wed Apr 8 10:49:30 2015 [VMM][I]: Command execution fail: /var/lib/one/remotes/vmm/vmware/restore ‘/vmfs/volumes/0/126/checkpoint’ ‘X.X.X.X’ ‘one-126’ 126 X.X.X.X
Wed Apr 8 10:49:30 2015 [VMM][E]: restore: Error executing: virsh -c ‘esx://X.X.X.X/?no_verify=1&auto_answer=1’ snapshot-revert one-126 checkpoint err: ExitCode: 1
Wed Apr 8 10:49:30 2015 [VMM][I]: out:
Wed Apr 8 10:49:30 2015 [VMM][I]: error: internal error Could not revert to snapshot ‘checkpoint’: FileLocked - Unable to access file since it is locked
Wed Apr 8 10:49:30 2015 [VMM][I]:
Wed Apr 8 10:49:30 2015 [VMM][I]: ExitCode: 1
Wed Apr 8 10:49:30 2015 [VMM][I]: Failed to execute virtualization driver operation: restore.
Wed Apr 8 10:49:30 2015 [VMM][E]: Error restoring VM
Wed Apr 8 10:49:30 2015 [LCM][I]: Fail to boot VM. New VM state is SUSPENDED

Perhaps, I have an error in vmwarerc file…
This is in a previous message.

Thank you!
Regards.

The vmwarerc looks fine. You can try to set the password without quotes (although it should not make a difference if at leat one operation is working). The libvirt URI is correct

The datacenter and vCenter variables can be commented if you do not need the VMotion capabilities.

We are failing to reproduce this problem. Using root, are all the files belonging to root in /vmfs/volumes/0/<vid>?

Also, what version of VMware ESX are you using?

I have deleted the quotes’ password but the erro is the same.
VMotion attributes are commented now.
Yes, all the files are belonging to root in /vmfs/volumes/0/126
And my hypervisor is an ESXi 5.5.
I have a federated platform with other hypervisor: KVM, and suspended action works fine.
The problem is with VMware driver…

Thank you!

If I run these manual commands:
/var/lib/one/remotes/vmm/vmware/restore ‘/vmfs/volumes/0/126/checkpoint’ ‘X.X.X.X’ ‘one-126’ 126 X.X.X.X
virsh -c ‘esx://X.X.X.X/?no_verify=1&auto_answer=1’ snapshot-revert one-126 checkpoint

Everything works fine!
The problem is when I run SUSPEND from the Sunstone or with onevm suspend 126. It’s strange

How could I change the owner user in the ESXi from root to oneadmin?

Thank you!

It is indeed strange that the following command works since it is exactly what OpenNebula issues:

/var/lib/one/remotes/vmm/vmware/restore ‘/vmfs/volumes/0/126/checkpoint’ ‘X.X.X.X’ ‘one-126’ 126 X.X.X.X

Are you executing it using the oneadmin user?

Not sure if I understand the question though (How could I change the owner user in the ESXi from root to oneadmin?). If you mean of the files generated by the ESX there is no way. If you input the root username and password in the vmwarerc, all the files should belong to root though.

Apologies but since we cannot reproduce the error we are running out of ideas.