No bootable device after failed resize/undeploy

Hi,
I basically inherited a Opennebula system with 3 servers. One of them is the frontend and a vm host and two other servers, which are only vm hosts. I also want to mention, that we have some space problem which we want to fix by adding more disk space later. However, this is not possible right now.

So I got a message from a user, who tried to change the CPU amount of a vm. He decided to undeploy it. He then said “it apparently was stuck in undeploy so I pressed recover”. Afterwards, he was able to start the VM again, however the machine wouldn’t boot anymore. Instead the VNC showed “no bootable device”. He proceeded to undeploy the machine again. Then it was stuck in EPILOGE_FAILURE.

I read some guides and the log, and basically it said that the transfer from the vmhost2 to the frontend could not complete. The log also told me that apparently datastore 0 ran out of space. I thus deleted an unused VM to make some space. I was then able to use onevm recover ID --retry to restart the undeployment. This worked fine. However, when I booted the machine, I still got the “no bootable device” error.

Here is my log (with some domain names stripped):

Fri Dec 10 15:36:45 2021 [Z0][TM][W]: Ignored: LOG I 256 Command execution failed (exit code: 1): /var/lib/one/remotes/tm/ssh/mv vmhost2:/var/lib/one//datastores/0/256 frontend+vm-host:/var/lib/one//datastores/0/256 256 0

Fri Dec 10 15:36:45 2021 [Z0][TM][W]: Ignored: LOG I 256 mv: Moving vmhost2:/var/lib/one/datastores/0/256 to frontend+vm-host:/var/lib/one/datastores/0/256

Fri Dec 10 15:36:45 2021 [Z0][TM][W]: Ignored: LOG E 256 mv: Command "set -e -o pipefail

Fri Dec 10 15:36:45 2021 [Z0][TM][W]: Ignored: LOG I 256

Fri Dec 10 15:36:45 2021 [Z0][TM][W]: Ignored: LOG I 256 tar -C /var/lib/one/datastores/0 --sparse -cf - 256 | ssh frontend+vm-host 'tar -C /var/lib/one/datastores/0 --sparse -xf -'

Fri Dec 10 15:36:45 2021 [Z0][TM][W]: Ignored: LOG I 256 rm -rf /var/lib/one/datastores/0/256" failed: tar: 256/disk.0: file changed as we read it

Fri Dec 10 15:36:45 2021 [Z0][TM][W]: Ignored: LOG I 256 tar: 256/disk.1: File removed before we read it

Fri Dec 10 15:36:45 2021 [Z0][TM][W]: Ignored: LOG I 256 tar: 256: file changed as we read it

Fri Dec 10 15:36:45 2021 [Z0][TM][W]: Ignored: LOG E 256 Error copying disk directory to target host

Fri Dec 10 15:36:45 2021 [Z0][TM][W]: Ignored: TRANSFER FAILURE 256 Error copying disk directory to target host

Fri Dec 10 15:37:35 2021 [Z0][VM][I]: New state is ACTIVE
Fri Dec 10 15:37:35 2021 [Z0][VM][I]: New LCM state is PROLOG_UNDEPLOY
Fri Dec 10 15:42:20 2021 [Z0][VM][I]: New LCM state is BOOT_UNDEPLOY
Fri Dec 10 15:42:20 2021 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/256/deployment.2
Fri Dec 10 15:42:21 2021 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Fri Dec 10 15:42:21 2021 [Z0][VMM][I]: ExitCode: 0
Fri Dec 10 15:42:22 2021 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Fri Dec 10 15:42:23 2021 [Z0][VMM][I]: ExitCode: 0
Fri Dec 10 15:42:23 2021 [Z0][VMM][I]: Successfully execute virtualization driver operation: deploy.
Fri Dec 10 15:42:23 2021 [Z0][VMM][I]: ExitCode: 0
Fri Dec 10 15:42:23 2021 [Z0][VMM][I]: Successfully execute network driver operation: post.
Fri Dec 10 15:42:23 2021 [Z0][VM][I]: New LCM state is RUNNING
Fri Dec 10 15:43:13 2021 [Z0][VMM][I]: VM successfully rebooted.
Fri Dec 10 15:43:18 2021 [Z0][VM][I]: New LCM state is SHUTDOWN_POWEROFF
Fri Dec 10 15:43:19 2021 [Z0][VMM][I]: ExitCode: 0
Fri Dec 10 15:43:19 2021 [Z0][VMM][I]: Successfully execute virtualization driver operation: cancel.
Fri Dec 10 15:43:20 2021 [Z0][VMM][I]: ExitCode: 0
Fri Dec 10 15:43:20 2021 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Fri Dec 10 15:43:20 2021 [Z0][VM][I]: New state is POWEROFF
Fri Dec 10 15:43:20 2021 [Z0][VM][I]: New LCM state is LCM_INIT
Fri Dec 10 15:44:09 2021 [Z0][VM][I]: New state is ACTIVE
Fri Dec 10 15:44:09 2021 [Z0][VM][I]: New LCM state is BOOT_POWEROFF
Fri Dec 10 15:44:10 2021 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/256/deployment.3
Fri Dec 10 15:44:10 2021 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Fri Dec 10 15:44:11 2021 [Z0][VMM][I]: ExitCode: 0
Fri Dec 10 15:44:11 2021 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Fri Dec 10 15:44:12 2021 [Z0][VMM][I]: ExitCode: 0
Fri Dec 10 15:44:12 2021 [Z0][VMM][I]: Successfully execute virtualization driver operation: deploy.
Fri Dec 10 15:44:12 2021 [Z0][VMM][I]: ExitCode: 0
Fri Dec 10 15:44:12 2021 [Z0][VMM][I]: Successfully execute network driver operation: post.
Fri Dec 10 15:44:12 2021 [Z0][VM][I]: New LCM state is RUNNING
Fri Dec 10 15:44:53 2021 [Z0][VM][I]: New LCM state is SHUTDOWN_UNDEPLOY
Fri Dec 10 15:44:54 2021 [Z0][VMM][I]: ExitCode: 0
Fri Dec 10 15:44:54 2021 [Z0][VMM][I]: Successfully execute virtualization driver operation: cancel.
Fri Dec 10 15:44:54 2021 [Z0][VMM][I]: ExitCode: 0
Fri Dec 10 15:44:54 2021 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Fri Dec 10 15:44:54 2021 [Z0][VM][I]: New LCM state is EPILOG_UNDEPLOY
Fri Dec 10 15:46:20 2021 [Z0][TM][I]: Command execution failed (exit code: 1): /var/lib/one/remotes/tm/ssh/mv vmhost2:/var/lib/one//datastores/0/256 frontend+vm-host:/var/lib/one//datastores/0/256 256 0
Fri Dec 10 15:46:20 2021 [Z0][TM][I]: mv: Moving vmhost2:/var/lib/one/datastores/0/256 to frontend+vm-host:/var/lib/one/datastores/0/256
Fri Dec 10 15:46:20 2021 [Z0][TM][E]: mv: Command "set -e -o pipefail
Fri Dec 10 15:46:20 2021 [Z0][TM][I]:
Fri Dec 10 15:46:20 2021 [Z0][TM][I]: tar -C /var/lib/one/datastores/0 --sparse -cf - 256 | ssh frontend+vm-host 'tar -C /var/lib/one/datastores/0 --sparse -xf -'
Fri Dec 10 15:46:20 2021 [Z0][TM][I]: rm -rf /var/lib/one/datastores/0/256" failed: tar: 256/disk.0: file changed as we read it
Fri Dec 10 15:46:20 2021 [Z0][TM][E]: Error copying disk directory to target host
Fri Dec 10 15:46:20 2021 [Z0][TM][E]: Error executing image transfer script: Error copying disk directory to target host
Fri Dec 10 15:46:20 2021 [Z0][VM][I]: New LCM state is EPILOG_UNDEPLOY_FAILURE
Fri Dec 10 15:57:08 2021 [Z0][TM][I]: Command execution failed (exit code: 2): /var/lib/one/remotes/tm/ssh/mv frontend+vm-host:/var/lib/one//datastores/0/256 vmhost2:/var/lib/one//datastores/0/256 256 0
Fri Dec 10 15:57:08 2021 [Z0][TM][I]: mv: Moving frontend+vm-host:/var/lib/one/datastores/0/256 to vmhost2:/var/lib/one/datastores/0/256
Fri Dec 10 15:57:08 2021 [Z0][TM][E]: mv: Command "set -e -o pipefail
Fri Dec 10 15:57:08 2021 [Z0][TM][I]:
Fri Dec 10 15:57:08 2021 [Z0][TM][I]: tar -C /var/lib/one/datastores/0 --sparse -cf - 256 | ssh vmhost2 'tar -C /var/lib/one/datastores/0 --sparse -xf -'
Fri Dec 10 15:57:08 2021 [Z0][TM][I]: rm -rf /var/lib/one/datastores/0/256" failed: tar: 256/disk.0: Cannot write: No space left on device
Fri Dec 10 15:57:08 2021 [Z0][TM][I]: tar: 256/disk.0: file changed as we read it
Fri Dec 10 15:57:08 2021 [Z0][TM][I]: tar: 256/disk.1: File removed before we read it
Fri Dec 10 15:57:08 2021 [Z0][TM][I]: tar: 256: file changed as we read it
Fri Dec 10 15:57:08 2021 [Z0][TM][I]: tar: Exiting with failure status due to previous errors
Fri Dec 10 15:57:08 2021 [Z0][TM][E]: Error copying disk directory to target host
Fri Dec 10 15:57:09 2021 [Z0][TM][E]: Error executing image transfer script: Error copying disk directory to target host
Fri Dec 10 15:57:09 2021 [Z0][TM][E]: Wrong state in TM answer for VM 256
Fri Dec 10 16:01:31 2021 [Z0][TM][I]: Command execution failed (exit code: 2): /var/lib/one/remotes/tm/ssh/mv frontend+vm-host:/var/lib/one//datastores/0/256 vmhost2:/var/lib/one//datastores/0/256 256 0
Fri Dec 10 16:01:31 2021 [Z0][TM][I]: mv: Moving frontend+vm-host:/var/lib/one/datastores/0/256 to vmhost2:/var/lib/one/datastores/0/256
Fri Dec 10 16:01:31 2021 [Z0][TM][E]: mv: Command "set -e -o pipefail
Fri Dec 10 16:01:31 2021 [Z0][TM][I]:
Fri Dec 10 16:01:31 2021 [Z0][TM][I]: tar -C /var/lib/one/datastores/0 --sparse -cf - 256 | ssh vmhost2 'tar -C /var/lib/one/datastores/0 --sparse -xf -'
Fri Dec 10 16:01:31 2021 [Z0][TM][I]: rm -rf /var/lib/one/datastores/0/256" failed: tar: 256/disk.0: Cannot write: No space left on device
Fri Dec 10 16:01:31 2021 [Z0][TM][I]: tar: 256/disk.0: file changed as we read it
Fri Dec 10 16:01:31 2021 [Z0][TM][I]: tar: 256/disk.1: File removed before we read it
Fri Dec 10 16:01:31 2021 [Z0][TM][I]: tar: 256: file changed as we read it
Fri Dec 10 16:01:31 2021 [Z0][TM][I]: tar: Exiting with failure status due to previous errors
Fri Dec 10 16:01:31 2021 [Z0][TM][E]: Error copying disk directory to target host
Fri Dec 10 16:01:32 2021 [Z0][TM][E]: Error executing image transfer script: Error copying disk directory to target host
Fri Dec 10 16:01:32 2021 [Z0][TM][E]: Wrong state in TM answer for VM 256
Fri Dec 10 16:55:57 2021 [Z0][VM][I]: New LCM state is EPILOG_UNDEPLOY
Fri Dec 10 16:59:59 2021 [Z0][VM][I]: New state is UNDEPLOYED
Fri Dec 10 16:59:59 2021 [Z0][VM][I]: New LCM state is LCM_INIT
Fri Dec 10 17:00:55 2021 [Z0][VM][I]: New state is PENDING
Fri Dec 10 17:01:24 2021 [Z0][VM][I]: New state is ACTIVE
Fri Dec 10 17:01:24 2021 [Z0][VM][I]: New LCM state is PROLOG_UNDEPLOY
Fri Dec 10 17:05:26 2021 [Z0][VM][I]: New LCM state is BOOT_UNDEPLOY
Fri Dec 10 17:05:26 2021 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/256/deployment.4
Fri Dec 10 17:05:27 2021 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Fri Dec 10 17:05:27 2021 [Z0][VMM][I]: ExitCode: 0
Fri Dec 10 17:05:27 2021 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Fri Dec 10 17:05:28 2021 [Z0][VMM][I]: ExitCode: 0
Fri Dec 10 17:05:28 2021 [Z0][VMM][I]: Successfully execute virtualization driver operation: deploy.
Fri Dec 10 17:05:28 2021 [Z0][VMM][I]: ExitCode: 0
Fri Dec 10 17:05:28 2021 [Z0][VMM][I]: Successfully execute network driver operation: post.
Fri Dec 10 17:05:28 2021 [Z0][VM][I]: New LCM state is RUNNING

We are running OpenNebula 5.6.1.
Do you have any ideas how to fix this? Is this even fixable?

Thanks in advance,
Trunyx