[Solved] OpenNebula 5.0.1 - Migrate Failed

HI all,

I’m just install a fresh OpenNebula 5.0.1 on a three nodes cluster.
I can launch VM without any problem, but when I try to migrate from one host to another, it’s failed.
The System DataStore and Image DataStore are QCOW2 (to use symlink from host to the SAN LUN).

Below the log
Tue Jul 19 19:42:32 2016 [Z0][VM][I]: New LCM state is MIGRATE Tue Jul 19 19:42:32 2016 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_premigrate. Tue Jul 19 19:42:32 2016 [Z0][VMM][I]: ExitCode: 0 Tue Jul 19 19:42:32 2016 [Z0][VMM][I]: Successfully execute network driver operation: pre. Tue Jul 19 19:42:32 2016 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/migrate 'one-4' 'adnpvirt07' 'adnpvirt13' 4 adnpvirt13 Tue Jul 19 19:42:32 2016 [Z0][VMM][E]: migrate: Command "virsh --connect qemu:///system migrate --live one-4 qemu+ssh://adnpvirt07/system" failed: error: Cannot access storage file '/var/lib/one//datastores/0/4/disk.1' (as uid:9869, gid:9869): Aucun fichier ou dossier de ce type Tue Jul 19 19:42:32 2016 [Z0][VMM][E]: Could not migrate one-4 to adnpvirt07 Tue Jul 19 19:42:32 2016 [Z0][VMM][I]: ExitCode: 1 Tue Jul 19 19:42:32 2016 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_failmigrate. Tue Jul 19 19:42:32 2016 [Z0][VMM][I]: Failed to execute virtualization driver operation: migrate. Tue Jul 19 19:42:32 2016 [Z0][VMM][E]: Error live migrating VM: Could not migrate one-4 to adnpvirt07 Tue Jul 19 19:42:32 2016 [Z0][VM][I]: New LCM state is RUNNING Tue Jul 19 19:42:33 2016 [Z0][LCM][I]: Fail to live migrate VM. Assuming that the VM is still RUNNING (will poll VM).
Thanks for help
Yannick

Add 1 : Disk.1 is the context image.
Add 2 : Migrate show as Succes when VM is POWEROFF. but boot failed with this log
> Tue Jul 19 20:10:50 2016 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/deploy ‘/var/lib/one//datastores/0/4/deployment.13’ ‘adnpvirt07’ 4 adnpvirt07
> Tue Jul 19 20:10:50 2016 [Z0][VMM][I]: error: Failed to create domain from /var/lib/one//datastores/0/4/deployment.13
> Tue Jul 19 20:10:50 2016 [Z0][VMM][I]: error: Cannot access storage file ‘/var/lib/one//datastores/0/4/disk.0’ (as uid:9869, gid:9869): Aucun fichier ou dossier de ce type
> Tue Jul 19 20:10:50 2016 [Z0][VMM][E]: Could not create domain from /var/lib/one//datastores/0/4/deployment.13
> Tue Jul 19 20:10:50 2016 [Z0][VMM][I]: ExitCode: 255
> Tue Jul 19 20:10:50 2016 [Z0][VMM][I]: Failed to execute virtualization driver operation: deploy.
> Tue Jul 19 20:10:50 2016 [Z0][VMM][E]: Error deploying virtual machine: Could not create domain from /var/lib/one//datastores/0/4/deployment.13
> Tue Jul 19 20:10:50 2016 [Z0][VM][I]: New state is POWEROFF
> Tue Jul 19 20:10:50 2016 [Z0][VM][I]: New LCM state is LCM_INIT

Hi Yannick,

It looks like your system datastore is not on shared file system.

did you follow these instructions?

http://docs.opennebula.org/5.0/deployment/open_cloud_storage_setup/fs_ds.html#shared-qcow2-transfer-modes

Kind Regards,
Anton Todorov

Hi Anton,

Thanks for your answer.
In fact, I passed a lot of time to understand all OpenNebula mechanisms, especially for the storage.
I understund that QCow2 driver enable to link an image to the image datastore but not that the system datastore must be also shared.
What is the best pratice ? GlusterFS in réplication mode ? Only on KVM Host, not on the one server ?

Thanks,
Yannick

Hi Yannick,

try system ds as shared and image ds as qcow2. It should work…

As Anton mentionned, I have set a GlusterFS replication between all kvm host of the system datastore (keeping qcow2 driver). It’s perfectly working.