Live Migration fails on OpenNebula 4.14.2

Hi, I have upgrade from OpenNebula 4.12 to 4.14 everything loads fine but live migration fails, we have ceph storgare with KVM ubuntu 14.04. Show the following error: It seems that not found the DataStore. Please could you help me troubleshooting this error.

Jan 26 16:22:51 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VM][I]: New LCM state is MIGRATE
Jan 26 16:22:51 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VM][I]: New LCM state is MIGRATE
Jan 26 16:22:53 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_premigrate.
Jan 26 16:22:53 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_premigrate.
Jan 26 16:22:55 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: ExitCode: 0
Jan 26 16:22:55 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: ExitCode: 0
Jan 26 16:22:55 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: Successfully execute network driver operation: pre.
Jan 26 16:22:55 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: Successfully execute network driver operation: pre.
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: Command execution fail: /var/tmp/one/vmm/kvm/migrate ‘one-494’ ‘lpr-01-hvm08.cloud.internal’ ‘lpr-01-hvm07.cloud.internal’ 494 lpr-01-hvm07.cloud.internal
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: Command execution fail: /var/tmp/one/vmm/kvm/migrate ‘one-494’ ‘lpr-01-hvm08.cloud.internal’ ‘lpr-01-hvm07.cloud.internal’ 494 lpr-01-hvm07.cloud.internal
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][E]: migrate: Command “virsh --connect qemu:///system migrate --live one-494 qemu+ssh://lpr-01-hvm08.cloud.internal/system” failed: error: Failed to open file ‘/var/lib/one//datastores/101/494/disk.2’: No such file or directory
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][E]: migrate: Command “virsh --connect qemu:///system migrate --live one-494 qemu+ssh://lpr-01-hvm08.cloud.internal/system” failed: error: Failed to open file ‘/var/lib/one//datastores/101/494/disk.2’: No such file or directory
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][E]: Could not migrate one-494 to lpr-01-hvm08.cloud.internal
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][E]: Could not migrate one-494 to lpr-01-hvm08.cloud.internal
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: ExitCode: 1
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: ExitCode: 1
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_failmigrate.
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_failmigrate.
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: Failed to execute virtualization driver operation: migrate.
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][I]: Failed to execute virtualization driver operation: migrate.
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][E]: Error live migrating VM: Could not migrate one-494 to lpr-01-hvm08.cloud.internal
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VMM][E]: Error live migrating VM: Could not migrate one-494 to lpr-01-hvm08.cloud.internal
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VM][I]: New LCM state is RUNNING
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][VM][I]: New LCM state is RUNNING
Jan 26 16:22:57 lpr-vm-nebu01 oned[18460]: [VM 494][Z0][LCM][I]: Fail to live migrate VM. Assuming that the VM is still RUNNING (will poll VM).
^C
root@lpr-vm-nebu01:/var/lib/one/remotes/datastore/ceph# cd /var/lib/one/remotes/tm/

Probably disk.2 is a volatile one, in that case you need to share the
system ds using CephFS (or other shared FS). Next release will support also
Ceph as a system DS.

Hi Ruben, thanks for your answer but let me explain more about it. We have the following scenario:

  • Storage: Ceph
  • Sytem Datastore: File system with driver ssh

before upgrade(OpenNebula 4.12): everything functionality was fine (deploy, redeploy, live-migration,etc)
after upgrade(OpenNebula 4.14): fails live-migration with new instance and old instance virtual machine. But we can deploy new virtual machines. the problem is related with live-migration.

Depending on the storage requirements of the VM it may or may not work with ssh + cpeh. In general it will fail for 4.12 or 4.14.

Can you post the output of onevm show -x for a VM that fails live-migration? I’m only interested in the DISK and CONTEXT attributes.

Cheers

Sure I have enclosed the output, onevm.txt (4.4 KB)

OK, yes in your case the problem is the context disk (disk 2). This disk is not present in the target host, so this VM (or any other using context) will fail live-migrating. The option would be just scp the directory to the destination host as suggested here:

http://docs.opennebula.org/4.14/administration/storage/ceph_ds.html#using-ssh-system-datastore

Note that this is not a 4.12 -> 4.14 issue, it may happen that you weren’t hit by this before…

Cheers