VM migrate fails with TM_MAD=qcow2

Hello @pczerny @dclavijo @atodorov_storpool @cgonzalez,

I have configured a small OpenNebula environment with a similar OneStor configuration. So, my main server is serving through NFS datastores 1 (images) and 2 (files) and each KVM or Firecracker node is importing theses datastores and have its own datastore 0 (system) locally. With this configuration, TM_MAD must be qcow2 for datatores 0 and 1 (2 could be ssh).

VMs deploy with a good speed, so some big VMs start in a few seconds. However, I can’t migrate (or migrate live) any VMs, because I get some errors. This problem don’t allow me to distribute VMs across all my nodes (KVM and/or Firecracker) in some situations when one KVM is full and VM is “forced” to be scheduled and placed in a determined host.

I have run a small test and I have got some screenshots:

  1. I instantiate 2 VMs (IDs 57 and 58):

That VMs are running in differents nodes: one is running in “nodo1” and another one in “servidor” (that, also, acts as KVM node).

  1. Now, I migrate VM 58:

  2. However, VM 58 fails:

  3. This is the error message:

  4. Now, I migrate “live” VM 57:

  5. However, VM is not migrate “live”. VM log reports this message:

  6. This is VM message error:

So my question is: am I doing some wrong or OpenNebula doesn’t allow migration with TM_MAD=qcow2?

Thanks!!

The above statement implies that the system datastore is not on a shared filesystem, right?

I think in your case you should try the mixed mode as described in the docs. Try using TM_MAD=qcow2 instead of shared

Best,
Anton

Hi,

Yes, system datastore is local to each node, so it is NOT a shared filesystem.

About using “mixed mode”, I’m already using TM_MAD=qcow2, not shared, in my “system” datastore as you could see below:

and I can’t change SHARED=YES to SHARED=no. The only way I could change that is if I change TM_MAD=ssh.

In my “image” datastore, this is the configuration:

Thanks.

Did you try setting TM_MAD_SYSTEM=ssh attribute to use the alternate deployment method?

Hi,

I have added “TM_MAD_SYSTEM=ssh” to all my three datastors (system, default and files), but when I have tried to migrate a VM the errors appears again:


**Driver Error**
Mon Apr 24 13:49:30 2023: RESTORE: ERROR: restore: Command "set -e -o pipefail # extract the xml from the checkpoint virsh --connect qemu:///system save-image-dumpxml /var/lib/one//datastores/0/64/checkpoint > /var/lib/one//datastores/0/64/checkpoint.xml # Eeplace all occurrences of the DS_LOCATION/<DS_ID>/<VM_ID> with the specific # DS_ID where the checkpoint is placed. This is done in case there was a # system DS migration sed -i "s%/var/lib/one//datastores/[0-9]\+/64/%/var/lib/one//datastores/0/64/%g" /var/lib/one//datastores/0/64/checkpoint.xml sed -i "s%/var/lib/one/datastores/[0-9]\+/64/%/var/lib/one//datastores/0/64/%g" /var/lib/one//datastores/0/64/checkpoint.xml" failed: error: Falló al abrir el disco '/var/lib/one//datastores/0/64/checkpoint': No existe el fichero o el directorio Could not recalculate paths in /var/lib/one//datastores/0/64/checkpoint.xml ExitCode: 1

This problem don’t allow me to distribute VMs across all my nodes (KVM and/or Firecracker) in some situations when one KVM is full and VM is “forced” to be scheduled and placed in a determined host.

Hi,

Any new idea?

Thanks.