How to recover BOOT_UNDEPLOY_FAILURE?

Trunyx · January 31, 2022, 12:50pm

Hi,

A user undeployed his VM, but it failed because of storage problems at the VM host (atleast that’s what I think/thought). I deleted a VM on the host, but a retry does not help. The VM stays stuck at BOOT_UNDEPLOY_FAILURE and a --recover --interactive shows me an error that BOOT_UNDEPLOY_FAILURE does not support these options.
On the target host I can see the disk file, however it is only a few KBs in size. On the previous host, where it was stored during undeployment, I can see the 18GB disk file. Can somebody help me to recover the VM either into running or is there a way to safely undeploy it again (undoing the failed deploy)?

The log says:
Mon Jan 31 10:48:13 2022 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy ‘/var/lib/one//datastores/101/237/deployment.4’ ‘host1’ 237 host1
Mon Jan 31 10:48:13 2022 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/101/237/deployment.4
Mon Jan 31 10:48:13 2022 [Z0][VMM][I]: error: Cannot access storage file ‘/var/lib/one//datastores/101/237/disk.0’ (as uid:9869, gid:9869): No such file or directory
Mon Jan 31 10:48:13 2022 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/101/237/deployment.4

Thanks

ruben · January 31, 2022, 6:12pm

Unfortunately you may be hit by this one: VM may lose qcow2 disk after undeploy/resume · Issue #5702 · OpenNebula/one · GitHub

There is a tentative patch (that you can apply) linked to the issue

jorel · January 31, 2022, 6:47pm

Can you post the VM directory content from the host?

ls -la /var/lib/one/datastores/101/237/

Trunyx · January 31, 2022, 7:16pm

Hi,

In the meantime I resolved the problem myself. I looked at other running VMs and saw that they have a disk.0 image with variable size and a rather small disk.1 image. For the stuck VM, only disk.1 was copied but not disk.0. I manually copied disk.0 from the previous host to the one the VM was deployed too. I moved it into the datastore and appropriate folder. After changing the rights and ownership of the copied disk to the opennebula user on the machine, I was able to press the recover button and it started up. So I think the problem here was, that Opennebula thought that the files are already on the other system but they weren’t, probably because the file system was full before.

Is my fix okay, or may this break more stuff in the future (e.g., when the VM gets undeployed) ?

Topic		Replies	Views
No bootable device after failed resize/undeploy Product Support	0	354	December 10, 2021
Error when restarting VM Product Support	3	663	October 25, 2022
Error during undeploy Product Support	21	2270	March 4, 2019
How to restore VM Product Support	9	4329	May 16, 2017
How to recover from FAILED state? Product Support	3	5860	October 24, 2015

How to recover BOOT_UNDEPLOY_FAILURE?

Related topics