Question about VM migration

Hello,

Some days ago, I attended to the “Introduction to the OpenNebula operations” webinar. Although I’m managing an OpenNebula environment since 2018, I think all information I can gather from webinars, forums and so on is always important and necessary.

In that webinar it was explained “VM Live migration” between nodes. However, in my environment, when I try to run a “migrate”, OpenNebula returns me an error.
This is my configuration:

  1. One server that acts as controller (scheduler, Sunstone, etc) and, also, as KVM hypervisor.
  2. One node thats acts as KVM hypervisor.

KVM node mounts using NFS datastores 1 (images, default) and 2 (files) and runs locally datastore 0 (VMs, system). Also, datastores 0 and 1 have TM_MAD as “qcow2”. I configured in this way because I read that qcow2 allows some features/capabilities for running/deploying faster that “ssh”…

When I instantiate one VM in each KVM node and, then, I try to run a “migrate” to swap node, I get this error in “oned.log”:

Mon Feb 24 08:37:56 2025 [Z0][VMM][D]: Message received: SAVE SUCCESS 56 -

Mon Feb 24 08:37:56 2025 [Z0][TrM][D]: Message received: TRANSFER SUCCESS 56 -

Mon Feb 24 08:37:58 2025 [Z0][VMM][D]: Message received: RESTORE FAILURE 56 error: failed to get domain 'one-56' ERROR: restore: Command "set -e -o pipefail  # extract the xml from the checkpoint  virsh --connect qemu:///system save-image-dumpxml /var/lib/one//datastores/0/56/checkpoint > /var/lib/one//datastores/0/56/checkpoint.xml  # Eeplace all occurrences of the DS_LOCATION/<DS_ID>/<VM_ID> with the specific # DS_ID where the checkpoint is placed. This is done in case there was a # system DS migration  sed -i "s%/var/lib/one//datastores/[0-9]\+/56/%/var/lib/one//datastores/0/56/%g" /var/lib/one//datastores/0/56/checkpoint.xml sed -i "s%/var/lib/one/datastores/[0-9]\+/56/%/var/lib/one//datastores/0/56/%g" /var/lib/one//datastores/0/56/checkpoint.xml" failed: error: Failed to open file '/var/lib/one/datastores/0/56/checkpoint': No such file or directory Could not recalculate paths in /var/lib/one//datastores/0/56/checkpoint.xml ExitCode: 1

and this other error in “$vm_id.log”:

Mon Feb 24 08:37:57 2025 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]: Command execution fail (exit code: 1): cat << 'EOT' | /var/tmp/one/vmm/kvm/restore '/var/lib/one//datastores/0/56/checkpoint' 'nebulacaos-1-test' '708e6bc1-0bef-4dd6-9de1-64cb47cc284f' 56 nebulacaos-1-test
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]: error: failed to get domain 'one-56'
Mon Feb 24 08:37:58 2025 [Z0][VMM][E]: restore: Command "set -e -o pipefail
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]:
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]: # extract the xml from the checkpoint
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]:
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]: virsh --connect qemu:///system save-image-dumpxml /var/lib/one//datastores/0/56/checkpoint > /var/lib/one//datastores/0/56/checkpoint.xml
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]:
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]: # Eeplace all occurrences of the DS_LOCATION/<DS_ID>/<VM_ID> with the specific
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]: # DS_ID where the checkpoint is placed. This is done in case there was a
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]: # system DS migration
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]:
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]: sed -i "s%/var/lib/one//datastores/[0-9]\+/56/%/var/lib/one//datastores/0/56/%g" /var/lib/one//datastores/0/56/checkpoint.xml
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]: sed -i "s%/var/lib/one/datastores/[0-9]\+/56/%/var/lib/one//datastores/0/56/%g" /var/lib/one//datastores/0/56/checkpoint.xml" failed: error: Failed to open file '/var/lib/one/datastores/0/56/checkpoint': No such file or directory
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]: Could not recalculate paths in /var/lib/one//datastores/0/56/checkpoint.xml
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]: ExitCode: 1
Mon Feb 24 08:37:58 2025 [Z0][VMM][I]: Failed to execute virtualization driver operation: restore.
Mon Feb 24 08:37:58 2025 [Z0][VMM][E]: RESTORE: error: failed to get domain 'one-56' ERROR: restore: Command "set -e -o pipefail  # extract the xml from the checkpoint  virsh --connect qemu:///system save-image-dumpxml /var/lib/one//datastores/0/56/checkpoint > /var/lib/one//datastores/0/56/checkpoint.xml  # Eeplace all occurrences of the DS_LOCATION/<DS_ID>/<VM_ID> with the specific # DS_ID where the checkpoint is placed. This is done in case there was a # system DS migration  sed -i "s%/var/lib/one//datastores/[0-9]\+/56/%/var/lib/one//datastores/0/56/%g" /var/lib/one//datastores/0/56/checkpoint.xml sed -i "s%/var/lib/one/datastores/[0-9]\+/56/%/var/lib/one//datastores/0/56/%g" /var/lib/one//datastores/0/56/checkpoint.xml" failed: error: Failed to open file '/var/lib/one/datastores/0/56/checkpoint': No such file or directory Could not recalculate paths in /var/lib/one//datastores/0/56/checkpoint.xml ExitCode: 1
Mon Feb 24 08:37:58 2025 [Z0][VM][I]: New LCM state is BOOT_MIGRATE_FAILURE

Why? Where is the configuration problem and how could I solve and allow “migrate” with my configuration?

Thanks.

Well, IMO, the qcow2 is the successor of the shared TM_MAD. So you should use NFS for the SYSTEM datastore on the KVM-only host and use the qcow2 or switch to ssh for the SYSTEM datastore.

Hope this helps,

Best Regards,
Anton Todorov

Hi @atodorov_storpool,

Excuse me about this answer, but I don’t understand what you are explaining me. In my environment, I only see three datastores, those that locally belong to the controller (also, KVM hypervisor)

“default” and “files” are, also, shared through NFS to the KVM hypervisor node, but his local storage is not seen in “Datastores”.

So, when you say "you should use NFS for the SYSTEM datastore on the KVM-only host ", I don’t know how configure datastore 0 as “NFS” on the KVM-only host and I don’t understand neither what you say about “use the qcow2 or switch to ssh for the SYSTEM datastore.

Thanks in advance.

Hi,
Let’s first recap what you have currently:

KVM node mounts using NFS datastores 1 (images, default) and 2 (files) and runs locally datastore 0 (VMs, system)
The Files Datastore usually does not matter for the VM operation unless you are using the kernel and initrd from there. But it is an advanced operation, so I scratched this option for now.

So I assume that you have datastore 1 (the Image Datastore) on the frontend node in /var/lib/one/datastores/1, exported via NFS, and mounted on the KVM node as /var/lib/one/datastores/1

Also, datastores 0 and 1 have TM_MAD as “qcow2”.

But you say you have configured TM_MAD for 0(the system datastore) as TM_MAD qcow2, which expects to have a shared filesystem. You could refer to the datastore definition in the oned.conf file where clearly the SHARED=YES attribute is defined.

You are not saying that the datastore 0 is exported via NFS so I assume it isn’t, so you have misconfiguration. And I’ve referenced the possible solutions: Export the system datastore via NFS and use the qcow2 TM_MAD, or, reconfigure the systems datastore to use a driver that is mapping the current setup - this is the ssh TM_MAD.

I hope this clears the confusion.

Best Regards,
Anton Todorov

Hi @atodorov_storpool

Yes, this is my configuration

I have always thought that NFS is a shared filesystem, so because of this I use “qcow2”. In datastore definition in the oned.conf file I haven’t found anything about NFS.

But, why is it a misconfiguration? Can’t I have “datastore 0” locally in each KVM-node hypervisor and have “datastore 1” mounted via NFS? If I export “datastore 0” via NFS from the server, all VMs will be created in a NFS mountpoint and what I want to get is reduce network I/O. I have assumed that writing VMs to local datastore would be possitive
As second possible solution, if I reconfigure TM_MAD as “ssh”, would I get better or worse I/O performance compared with “qcow2”?

Thanks.

Hi,

Yes, NFS is a shared file system, and using it for the Image datastore TM_MAD is absolutely fine.

But, for the System Datastore, the TM_MADs with the SHARED=YES attribute assume that the datastore is available on every host and, on migration, does not create the VM home folder on the destination host. The non-shared TM_MAD, the “SSH” one, does not have such an assumption in the logic.

You have the Image Datastore mounted on all KVM nodes via NFS, but the System Datastore is not mounted. The System Datastore’s TM_MAD must match the node’s actual setup. In your case, as there is no NFS mounted, you should use TM_MAD=ssh.

If you are looking for such a hybrid setup, you should take a look at the mixed-mode setup of NFS/NAS and Local Storage. In this mode, the VM disk images will be copied to the KVM that is “local” for the VM. This has its pros and cons—having a performance gain but at the cost of data loss in case of failure of the local KVM disk. But it is your call…

I do not have much experience with file-backed storage, so I cannot give advice about performance.

Hope this helps,

Best Regards,
Anton Todorov