VM Migration fail after after Host Node Down

We are conducting a high availability test where we simulated a scenario with one host down in the cluster. After bringing the node down, we attempted to migrate a VM that was running on the affected node. However, the migration failed.

Cold and live migration are functioning correctly, but they do not work in a node failure scenario.

Can you please suggest.

Please find the logs below.

Mon Jun 3 10:55:06 2024 [Z0][VM][I]: New state is ACTIVE
Mon Jun 3 10:55:06 2024 [Z0][VM][I]: New LCM state is BOOT_POWEROFF
Mon Jun 3 10:55:06 2024 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/107/deployment.8
Mon Jun 3 10:55:08 2024 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_context.
Mon Jun 3 10:55:08 2024 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Mon Jun 3 10:55:08 2024 [Z0][VMM][I]: ExitCode: 0
Mon Jun 3 10:55:08 2024 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/mkdir -p.
Mon Jun 3 10:55:08 2024 [Z0][VMM][I]: ExitCode: 0
Mon Jun 3 10:55:08 2024 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/117/107/vm.xml.
Mon Jun 3 10:55:08 2024 [Z0][VMM][I]: ExitCode: 0
Mon Jun 3 10:55:08 2024 [Z0][VMM][I]: Successfully execute virtualization driver operation: /bin/cat - >/var/lib/one//datastores/117/107/ds.xml.
Mon Jun 3 10:55:09 2024 [Z0][VMM][I]: Command execution fail (exit code: 255): cat << ‘EOT’ | /var/tmp/one/vmm/kvm/deploy ‘/var/lib/one//datastores/117/107/deployment.8’ ‘kvm1.apps.ae’ 107 kvm1.apps.ae
Mon Jun 3 10:55:09 2024 [Z0][VMM][I]: XPath set is empty
Mon Jun 3 10:55:09 2024 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/117/107/deployment.8
Mon Jun 3 10:55:09 2024 [Z0][VMM][I]: error: Cannot access storage file ‘/var/lib/one//datastores/117/107/disk.0’ (as uid:9869, gid:9869): No such file or directory
Mon Jun 3 10:55:09 2024 [Z0][VMM][I]: Could not create domain from /var/lib/one//datastores/117/107/deployment.8
Mon Jun 3 10:55:09 2024 [Z0][VMM][I]: ExitCode: 255
Mon Jun 3 10:55:09 2024 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Mon Jun 3 10:55:09 2024 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
Mon Jun 3 10:55:09 2024 [Z0][VMM][E]: DEPLOY: XPath set is empty error: Failed to create domain from /var/lib/one//datastores/117/107/deployment.8 error: Cannot access storage file ‘/var/lib/one//datastores/117/107/disk.0’ (as uid:9869, gid:9869): No such file or directory Could not create domain from /var/lib/one//datastores/117/107/deployment.8 ExitCode: 255
Mon Jun 3 10:55:09 2024 [Z0][VM][I]: New state is POWEROFF
Mon Jun 3 10:55:09 2024 [Z0][VM][I]: New LCM state is LCM_INIT

Hi @bwinfra :wave:

Welcome to the OpenNebula Forum! :smiley:

Thanks for sharing with us the VM logs, could you tell us the datastore driver (SSH, LVM, Ceph…) you are using in your datastores?

Best,
Victor.

Hi,

i am using linstor , linbit