Error when restarting VM

Daniel_Ruiz_Molina · October 19, 2022, 10:54am

I have got the following problem, but it is better first you look at this screenshot, taken from “Placement” tab:

Then, and I don’t understand why, in line #1 appears another “undeploy”, but this time VM appears in host “localhost”… but nobody has replaced this VM (anyway…)

In line #3, a user with admin group privileges, executes a “poweroff-hard”. Then, it seems VM falls in “some” error state, reason why I need to execute two “resume” operations and one “migrate” operation.
However, VMs tries to start but always fails. Then, I check logs and I find these lines:

Driver Error
Wed Oct 19 10:53:05 2022: DEPLOY: error: Failed to create domain from /var/lib/one//datastores/0/11128/deployment.8 error: Cannot access storage file '/var/lib/one//datastores/0/11128/disk.0' (as uid:9869, gid:9869): No such file or directory Could not create domain from /var/lib/one//datastores/0/11128/deployment.8 ExitCode: 255
Dismiss

Wed Oct 19 10:52:34 2022 [Z0][VM][I]: New state is PENDING
Wed Oct 19 10:53:02 2022 [Z0][VM][I]: New state is ACTIVE
Wed Oct 19 10:53:02 2022 [Z0][VM][I]: New LCM state is PROLOG_UNDEPLOY
Wed Oct 19 10:53:03 2022 [Z0][VM][I]: New LCM state is BOOT_UNDEPLOY
Wed Oct 19 10:53:03 2022 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/11128/deployment.8
Wed Oct 19 10:53:03 2022 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: ExitCode: 0
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: ExitCode: 0
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/mkdir -p.
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: ExitCode: 0
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/0/11128/vm.xml.
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: ExitCode: 0
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/0/11128/ds.xml.
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: Command execution fail (exit code: 255): cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/0/11128/deployment.8' 'localhost' 11128 localhost
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/0/11128/deployment.8
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: error: Cannot access storage file '/var/lib/one//datastores/0/11128/disk.0' (as uid:9869, gid:9869): No such file or directory
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: Could not create domain from /var/lib/one//datastores/0/11128/deployment.8
Wed Oct 19 10:53:04 2022 [Z0][VMM][I]: ExitCode: 255
Wed Oct 19 10:53:05 2022 [Z0][VMM][I]: ExitCode: 0
Wed Oct 19 10:53:05 2022 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Wed Oct 19 10:53:05 2022 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Wed Oct 19 10:53:05 2022 [Z0][VMM][E]: DEPLOY: error: Failed to create domain from /var/lib/one//datastores/0/11128/deployment.8 error: Cannot access storage file '/var/lib/one//datastores/0/11128/disk.0' (as uid:9869, gid:9869): No such file or directory Could not create domain from /var/lib/one//datastores/0/11128/deployment.8 ExitCode: 255
Wed Oct 19 10:53:05 2022 [Z0][VM][I]: New LCM state is BOOT_UNDEPLOY_FAILURE
Wed Oct 19 12:00:05 2022 [Z0][VM][I]: New LCM state is RUNNING
Wed Oct 19 12:01:55 2022 [Z0][LCM][I]: VM running but monitor state is POWEROFF
Wed Oct 19 12:01:55 2022 [Z0][VM][I]: New LCM state is SHUTDOWN_POWEROFF
Wed Oct 19 12:01:55 2022 [Z0][VM][I]: New state is POWEROFF
Wed Oct 19 12:01:55 2022 [Z0][VM][I]: New LCM state is LCM_INIT

After comparing folder “11128” in /var/lib/one/datastores/0" in both servers, I find out that in nebulacaos2 (where VM was created in the first instantiation time) exists a “disk.0.snap” subfolder and a disk.0 to disk.0.snap soft-link, but in server “localhost”, folder “11128” not contains this subfolder and this soft-lnk, so VM can’t start.

User “oneadmin” has same UID and GID in both servers and, also, both belongs to the same groups.

Why has happened this?

Thanks.

Daniel_Ruiz_Molina · October 19, 2022, 12:50pm

Hi again,

Now, the same problem has occured with another VM. This VM was created in “nebulacaos2” and, I supose that, when user has restarted it again, scheduler policy “-CPU_USAGE” has determinated that VM needed to be reallocated to “localhost” server (my server that acts as head server and KVM too). Then, and I don’t know why, transfer between servers has been executed with “some problem” (???) and subfolder “disk.0.snap”, all its content and soft-link disk.0 to disk.0.snap folder hasn’t been created.

Here, I attach “Placement” of this VM:

and, also, I attach a screenshot where you will see differences between folder “11126” in host “localhost” and host “nebulacaos2”:

Also, I attach some lines from /var/log/one/11126.log file:

one/11126.log:Mon Oct 17 15:50:03 2022 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/11126/deployment.0
one/11126.log:Mon Oct 17 15:50:05 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/0/11126/vm.xml.
one/11126.log:Mon Oct 17 15:50:05 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/0/11126/ds.xml.
one/11126.log:Wed Oct 19 13:43:26 2022 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/11126/deployment.1
one/11126.log:Wed Oct 19 13:43:27 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/0/11126/vm.xml.
one/11126.log:Wed Oct 19 13:43:27 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/0/11126/ds.xml.
one/11126.log:Wed Oct 19 13:43:28 2022 [Z0][VMM][I]: Command execution fail (exit code: 255): cat << EOT | /var/tmp/one/vmm/kvm/deploy '/var/lib/one//datastores/0/11126/deployment.1' 'localhost' 11126 localhost
one/11126.log:Wed Oct 19 13:43:28 2022 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/0/11126/deployment.1
one/11126.log:Wed Oct 19 13:43:28 2022 [Z0][VMM][I]: error: Cannot access storage file '/var/lib/one//datastores/0/11126/disk.0' (as uid:9869, gid:9869): No such file or directory
one/11126.log:Wed Oct 19 13:43:28 2022 [Z0][VMM][I]: Could not create domain from /var/lib/one//datastores/0/11126/deployment.1
one/11126.log:Wed Oct 19 13:43:28 2022 [Z0][VMM][E]: DEPLOY: error: Failed to create domain from /var/lib/one//datastores/0/11126/deployment.1 error: Cannot access storage file '/var/lib/one//datastores/0/11126/disk.0' (as uid:9869, gid:
9869): No such file or directory Could not create domain from /var/lib/one//datastores/0/11126/deployment.1 ExitCode: 255

Thank you so very much.

Daniel_Ruiz_Molina · October 25, 2022, 5:45am

Hi,

this problem is happening every day when a machine needs to be migrated from one server to the other server…

Help, please.

Daniel_Ruiz_Molina · October 25, 2022, 12:58pm

Hi, @pczerny @ahuertas @cgonzalez

error log is this:

Tue Oct 25 14:42:59 2022 [Z0][VM][I]: New LCM state is RUNNING
Tue Oct 25 14:44:01 2022 [Z0][VM][I]: New LCM state is SAVE_MIGRATE
Tue Oct 25 14:44:06 2022 [Z0][VMM][I]: ExitCode: 0
Tue Oct 25 14:44:06 2022 [Z0][VMM][I]: Successfully execute virtualization driver operation: save.
Tue Oct 25 14:44:08 2022 [Z0][VMM][I]: ExitCode: 0
Tue Oct 25 14:44:08 2022 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Tue Oct 25 14:44:08 2022 [Z0][VM][I]: New LCM state is PROLOG_MIGRATE
Tue Oct 25 14:44:09 2022 [Z0][VM][I]: New LCM state is BOOT_MIGRATE
Tue Oct 25 14:44:10 2022 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: ExitCode: 0
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: Command execution fail (exit code: 1): cat << EOT | /var/tmp/one/vmm/kvm/restore ‘/var/lib/one//datastores/0/11346/checkpoint’ ‘nebulacaos2’ ‘3f187475-0df9-4b5f-a54f-36532f1207a0’ 11346 nebulacaos2
Tue Oct 25 14:44:11 2022 [Z0][VMM][E]: restore: Command “set -e -o pipefail
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]:
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: # extract the xml from the checkpoint
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]:
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: virsh --connect qemu:///system save-image-dumpxml /var/lib/one//datastores/0/11346/checkpoint > /var/lib/one//datastores/0/11346/checkpoint.xml
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]:
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: # Eeplace all occurrences of the DS_LOCATION/<DS_ID>/<VM_ID> with the specific
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: # DS_ID where the checkpoint is placed. This is done in case there was a
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: # system DS migration
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]:
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: sed -i “s%/var/lib/one//datastores/[0-9]+/11346/%/var/lib/one//datastores/0/11346/%g” /var/lib/one//datastores/0/11346/checkpoint.xml
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: sed -i “s%/var/lib/one/datastores/[0-9]+/11346/%/var/lib/one//datastores/0/11346/%g” /var/lib/one//datastores/0/11346/checkpoint.xml” failed: error: Failed to open file ‘/var/lib/one//datastores/0/11346/checkpoint’: No such file or directory
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: Could not recalculate paths in /var/lib/one//datastores/0/11346/checkpoint.xml
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: ExitCode: 1
Tue Oct 25 14:44:11 2022 [Z0][VMM][I]: Failed to execute virtualization driver operation: restore.
Tue Oct 25 14:44:11 2022 [Z0][VMM][E]: RESTORE: ERROR: restore: Command “set -e -o pipefail # extract the xml from the checkpoint virsh --connect qemu:///system save-image-dumpxml /var/lib/one//datastores/0/11346/checkpoint > /var/lib/one//datastores/0/11346/checkpoint.xml # Eeplace all occurrences of the DS_LOCATION/<DS_ID>/<VM_ID> with the specific # DS_ID where the checkpoint is placed. This is done in case there was a # system DS migration sed -i “s%/var/lib/one//datastores/[0-9]+/11346/%/var/lib/one//datastores/0/11346/%g” /var/lib/one//datastores/0/11346/checkpoint.xml sed -i “s%/var/lib/one/datastores/[0-9]+/11346/%/var/lib/one//datastores/0/11346/%g” /var/lib/one//datastores/0/11346/checkpoint.xml” failed: error: Failed to open file ‘/var/lib/one//datastores/0/11346/checkpoint’: No such file or directory Could not recalculate paths in /var/lib/one//datastores/0/11346/checkpoint.xml ExitCode: 1
Tue Oct 25 14:44:11 2022 [Z0][VM][I]: New LCM state is BOOT_MIGRATE_FAILURE

How can I debug “restore” script? Maybe have I wrote a “typo” in some other file?

I don’t understand anything…

Thanks.

Topic		Replies	Views
Fail to launch VM Installation & Configuration	23	2211	January 3, 2020
Virtual machine suspended not boot again Product Support	18	4231	April 9, 2015
Deployment of the VM on CEPH problem Product Support	4	1891	March 9, 2016
One 5.12 - Help! kvm deploys in wrong datastore? Product Support	3	233	May 11, 2023
Onevm migrate and cloned images Product Support	14	2332	June 3, 2015

Error when restarting VM

Related topics