Error when restarting VM

Hello @dclavijo @ahuertas @pczerny

I have got the following problem, but it is better first you look at this screenshot, taken from “Placement” tab:

Then, and I don’t understand why, in line #1 appears another “undeploy”, but this time VM appears in host “localhost”… but nobody has replaced this VM (anyway…)

In line #3, a user with admin group privileges, executes a “poweroff-hard”. Then, it seems VM falls in “some” error state, reason why I need to execute two “resume” operations and one “migrate” operation.
However, VMs tries to start but always fails. Then, I check logs and I find these lines:

Driver Error
Wed Oct 19 10:53:05 2022: DEPLOY: error: Failed to create domain from /var/lib/one//datastores/0/11128/deployment.8 error: Cannot access storage file '/var/lib/one//datastores/0/11128/disk.0' (as uid:9869, gid:9869): No such file or directory Could not create domain from /var/lib/one//datastores/0/11128/deployment.8 ExitCode: 255
Dismiss
Wed Oct 19 10:52:34 2022 [Z0][VM][I]: New state is PENDING
Wed Oct 19 10:53:02 2022 [Z0][VM][I]: New state is ACTIVE
Wed Oct 19 10:53:02 2022 [Z0][VM][I]: New LCM state is PROLOG_UNDEPLOY
Wed Oct 19 10:53:03 2022 [Z0][VM][I]: New LCM state is BOOT_UNDEPLOY
Wed Oct 19 10:53:03 2022 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/11128/deployment.8
Wed Oct 19 10:53:03 2022 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: ExitCode: 0
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: ExitCode: 0
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/mkdir -p.
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: ExitCode: 0
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/0/11128/vm.xml.
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: ExitCode: 0
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/0/11128/ds.xml.
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: Command execution fail (exit code: 255): cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/0/11128/deployment.8' 'localhost' 11128 localhost
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/0/11128/deployment.8
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: error: Cannot access storage file '/var/lib/one//datastores/0/11128/disk.0' (as uid:9869, gid:9869): No such file or directory
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: Could not create domain from /var/lib/one//datastores/0/11128/deployment.8
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: ExitCode: 255
Wed Oct 19 10:53:05 2022 [Z0][VMM][I]: ExitCode: 0
Wed Oct 19 10:53:05 2022 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Wed Oct 19 10:53:05 2022 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Wed Oct 19 10:53:05 2022 [Z0][VMM][E]: DEPLOY: error: Failed to create domain from /var/lib/one//datastores/0/11128/deployment.8 error: Cannot access storage file '/var/lib/one//datastores/0/11128/disk.0' (as uid:9869, gid:9869): No such file or directory Could not create domain from /var/lib/one//datastores/0/11128/deployment.8 ExitCode: 255
Wed Oct 19 10:53:05 2022 [Z0][VM][I]: New LCM state is BOOT_UNDEPLOY_FAILURE
Wed Oct 19 12:00:05 2022 [Z0][VM][I]: New LCM state is RUNNING
Wed Oct 19 12:01:55 2022 [Z0][LCM][I]: VM running but monitor state is POWEROFF
Wed Oct 19 12:01:55 2022 [Z0][VM][I]: New LCM state is SHUTDOWN_POWEROFF
Wed Oct 19 12:01:55 2022 [Z0][VM][I]: New state is POWEROFF
Wed Oct 19 12:01:55 2022 [Z0][VM][I]: New LCM state is LCM_INIT

After comparing folder “11128” in /var/lib/one/datastores/0" in both servers, I find out that in nebulacaos2 (where VM was created in the first instantiation time) exists a “disk.0.snap” subfolder and a disk.0 to disk.0.snap soft-link, but in server “localhost”, folder “11128” not contains this subfolder and this soft-lnk, so VM can’t start.

User “oneadmin” has same UID and GID in both servers and, also, both belongs to the same groups.

Why has happened this?

Thanks.

Hi again,

Now, the same problem has occured with another VM. This VM was created in “nebulacaos2” and, I supose that, when user has restarted it again, scheduler policy “-CPU_USAGE” has determinated that VM needed to be reallocated to “localhost” server (my server that acts as head server and KVM too). Then, and I don’t know why, transfer between servers has been executed with “some problem” (???) and subfolder “disk.0.snap”, all its content and soft-link disk.0 to disk.0.snap folder hasn’t been created.

Here, I attach “Placement” of this VM:

and, also, I attach a screenshot where you will see differences between folder “11126” in host “localhost” and host “nebulacaos2”:

Also, I attach some lines from /var/log/one/11126.log file:

one/11126.log:Mon Oct 17 15:50:03 2022 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/11126/deployment.0
one/11126.log:Mon Oct 17 15:50:05 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/0/11126/vm.xml.
one/11126.log:Mon Oct 17 15:50:05 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/0/11126/ds.xml.
one/11126.log:Wed Oct 19 13:43:26 2022 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/11126/deployment.1
one/11126.log:Wed Oct 19 13:43:27 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/0/11126/vm.xml.
one/11126.log:Wed Oct 19 13:43:27 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/0/11126/ds.xml.
one/11126.log:Wed Oct 19 13:43:28 2022 [Z0][VMM][I]: Command execution fail (exit code: 255): cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/0/11126/deployment.1' 'localhost' 11126 localhost
one/11126.log:Wed Oct 19 13:43:28 2022 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/0/11126/deployment.1
one/11126.log:Wed Oct 19 13:43:28 2022 [Z0][VMM][I]: error: Cannot access storage file '/var/lib/one//datastores/0/11126/disk.0' (as uid:9869, gid:9869): No such file or directory
one/11126.log:Wed Oct 19 13:43:28 2022 [Z0][VMM][I]: Could not create domain from /var/lib/one//datastores/0/11126/deployment.1
one/11126.log:Wed Oct 19 13:43:28 2022 [Z0][VMM][E]: DEPLOY: error: Failed to create domain from /var/lib/one//datastores/0/11126/deployment.1 error: Cannot access storage file '/var/lib/one//datastores/0/11126/disk.0' (as uid:9869, gid:
9869): No such file or directory Could not create domain from /var/lib/one//datastores/0/11126/deployment.1 ExitCode: 255

Thank you so very much.

Hi,

this problem is happening every day when a machine needs to be migrated from one server to the other server…

Help, please.

Hi, @pczerny @ahuertas @cgonzalez

error log is this:

Tue Oct 25 14:42:59 2022 [Z0][VM][I]: New LCM state is RUNNING
Tue Oct 25 14:44:01 2022 [Z0][VM][I]: New LCM state is SAVE_MIGRATE
Tue Oct 25 14:44:06 2022 [Z0][VMM][I]: ExitCode: 0
Tue Oct 25 14:44:06 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: save.
Tue Oct 25 14:44:08 2022 [Z0][VMM][I]: ExitCode: 0
Tue Oct 25 14:44:08 2022 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Tue Oct 25 14:44:08 2022 [Z0][VM][I]: New LCM state is PROLOG_MIGRATE
Tue Oct 25 14:44:09 2022 [Z0][VM][I]: New LCM state is BOOT_MIGRATE
Tue Oct 25 14:44:10 2022 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: ExitCode: 0
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: Command execution fail (exit code: 1): cat << EOT | /var/tmp/one/vmm/kvm/restore ‘/var/lib/one//datastores/0/11346/checkpoint’ ‘nebulacaos2’ ‘3f187475-0df9-4b5f-a54f-36532f1207a0’ 11346 nebulacaos2
Tue Oct 25 14:44:11 2022 [Z0][VMM][E]: restore: Command “set -e -o pipefail
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]:
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: # extract the xml from the checkpoint
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]:
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: virsh --connect qemu:///system save-image-dumpxml /var/lib/one//datastores/0/11346/checkpoint > /var/lib/one//datastores/0/11346/checkpoint.xml
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]:
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: # Eeplace all occurrences of the DS_LOCATION/<DS_ID>/<VM_ID> with the specific
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: # DS_ID where the checkpoint is placed. This is done in case there was a
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: # system DS migration
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]:
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: sed -i “s%/var/lib/one//datastores/[0-9]+/11346/%/var/lib/one//datastores/0/11346/%g” /var/lib/one//datastores/0/11346/checkpoint.xml
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: sed -i “s%/var/lib/one/datastores/[0-9]+/11346/%/var/lib/one//datastores/0/11346/%g” /var/lib/one//datastores/0/11346/checkpoint.xml” failed: error: Failed to open file ‘/var/lib/one//datastores/0/11346/checkpoint’: No such file or directory
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: Could not recalculate paths in /var/lib/one//datastores/0/11346/checkpoint.xml
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: ExitCode: 1
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: Failed to execute virtualization driver operation: restore.
Tue Oct 25 14:44:11 2022 [Z0][VMM][E]: RESTORE: ERROR: restore: Command “set -e -o pipefail # extract the xml from the checkpoint virsh --connect qemu:///system save-image-dumpxml /var/lib/one//datastores/0/11346/checkpoint > /var/lib/one//datastores/0/11346/checkpoint.xml # Eeplace all occurrences of the DS_LOCATION/<DS_ID>/<VM_ID> with the specific # DS_ID where the checkpoint is placed. This is done in case there was a # system DS migration sed -i “s%/var/lib/one//datastores/[0-9]+/11346/%/var/lib/one//datastores/0/11346/%g” /var/lib/one//datastores/0/11346/checkpoint.xml sed -i “s%/var/lib/one/datastores/[0-9]+/11346/%/var/lib/one//datastores/0/11346/%g” /var/lib/one//datastores/0/11346/checkpoint.xml” failed: error: Failed to open file ‘/var/lib/one//datastores/0/11346/checkpoint’: No such file or directory Could not recalculate paths in /var/lib/one//datastores/0/11346/checkpoint.xml ExitCode: 1
Tue Oct 25 14:44:11 2022 [Z0][VM][I]: New LCM state is BOOT_MIGRATE_FAILURE

How can I debug “restore” script? Maybe have I wrote a “typo” in some other file?

I don’t understand anything…

Thanks.