ONE 6.0.0.2 (CE): live migration fails

Hi everyone :slightly_smiling_face:,

at the moment I’m working on setting up a new OpenNebula instance for our company. Everything is working so far, with exeption of the live migration feature.

If i try to live migrate a VM to another node, the following error is presented in the log files:

Thu Jul 22 15:25:27 2021 [Z0][ReM][D]: Req:1856 UID:2 IP:127.0.0.1 one.vm.migrate invoked , 17, 2, true, false, -1, 0
Thu Jul 22 15:25:27 2021 [Z0][DiM][D]: Live-migrating VM 17
Thu Jul 22 15:25:27 2021 [Z0][ReM][D]: Req:1856 UID:2 one.vm.migrate result SUCCESS, 17
Thu Jul 22 15:25:27 2021 [Z0][ReM][D]: Req:864 UID:2 IP:127.0.0.1 one.vm.info invoked , 17, false
Thu Jul 22 15:25:27 2021 [Z0][ReM][D]: Req:864 UID:2 one.vm.info result SUCCESS, "<VM><ID>17</ID><UID>..."
Thu Jul 22 15:25:28 2021 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_premigrate.
Thu Jul 22 15:25:29 2021 [Z0][VMM][I]: ExitCode: 0
Thu Jul 22 15:25:29 2021 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Thu Jul 22 15:25:31 2021 [Z0][InM][D]: VM_STATE update from host: 3. VM id: 10, state: RUNNING
Thu Jul 22 15:25:31 2021 [Z0][InM][D]: VM_STATE update from host: 3. VM id: 17, state: RUNNING
Thu Jul 22 15:25:31 2021 [Z0][ReM][D]: Req:2592 UID:0 IP:127.0.0.1 one.zone.raftstatus invoked 
Thu Jul 22 15:25:31 2021 [Z0][ReM][D]: Req:2592 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>-1<..."
Thu Jul 22 15:25:31 2021 [Z0][ReM][D]: Req:6368 UID:0 IP:127.0.0.1 one.vmpool.infoextended invoked , -2, -1, -1, -1
Thu Jul 22 15:25:31 2021 [Z0][ReM][D]: Req:6368 UID:0 one.vmpool.infoextended result SUCCESS, "<VM_POOL><VM><ID>17<..."
Thu Jul 22 15:25:31 2021 [Z0][ReM][D]: Req:0 UID:0 IP:127.0.0.1 one.vmpool.infoextended invoked , -2, -1, -1, -1
Thu Jul 22 15:25:31 2021 [Z0][ReM][D]: Req:0 UID:0 one.vmpool.infoextended result SUCCESS, "<VM_POOL><VM><ID>17<..."
Thu Jul 22 15:25:34 2021 [Z0][VMM][I]: Command execution fail: cat << EOT | /var/tmp/one/vmm/kvm/migrate '6b3c1edb-2611-485e-8032-2f784ae94f46' 'node03' 'node04' 17 node04
Thu Jul 22 15:25:34 2021 [Z0][VMM][I]: qemu-img: Could not open '/media/vm_store/opennebula/datastores/0/17/disk.0': Could not open '/media/vm_store/opennebula/datastores/0/17/disk.0': No such file or directory
Thu Jul 22 15:25:34 2021 [Z0][VMM][I]: qemu-img: Could not open '/media/vm_store/opennebula/datastores/0/17/disk.1': Could not open '/media/vm_store/opennebula/datastores/0/17/disk.1': No such file or directory
Thu Jul 22 15:25:34 2021 [Z0][VMM][I]: error: Cannot access storage file '/media/vm_store/opennebula/datastores/0/17/disk.1': No such file or directory
Thu Jul 22 15:25:34 2021 [Z0][VMM][E]: Could not migrate 6b3c1edb-2611-485e-8032-2f784ae94f46 to node03
Thu Jul 22 15:25:34 2021 [Z0][VMM][I]: ExitCode: 1
Thu Jul 22 15:25:34 2021 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_failmigrate.
Thu Jul 22 15:25:34 2021 [Z0][VMM][I]: Failed to execute virtualization driver operation: migrate.
Thu Jul 22 15:25:34 2021 [Z0][IPM][D]: Message received: MIGRATE FAILURE 17 Could not migrate 6b3c1edb-2611-485e-8032-2f784ae94f46 to node03

I’ve set up the Image Datastore and File Datastore both as shared Datastores. The System Datastore is in ssh mode.

So I have checked, if every node has access to the Datastores and if passwordless SSH connections from the frontend to any node and from any node to any other node are possible for the oneadmin user. Both seems to be okay.

If I have a look at the mentioned directory /media/vm_store/opennebula/datastores/0/ there is indeed no directory 17. So I’ve checked the file and directory permissions but they seem to be fine too. On every directory the oneadmin user is setup as directory owner. I’ve tested the write permissions by creating a little testfile as oneadmin user in the datastore 0, which worked also without trouble.

As filesystem underneath the datastores works an OCFS2 filesystem.

The “normal” migration feature is working perfectly fine. It’s just the live migration feature, which doesn’t work.

Now I’m a bit clueless where else I could search for the error. If somebody could help me, I would be very grateful.

Thanks for your help in advance :slightly_smiling_face:

It seems that the shared datastore is not mounted in the hypervisors? in this case it seems that the folder /media/vm_store/opennebula/datastores/0/ should be mounted in the hosts, could you double check?

1 Like

Hi Rubén,

thank you for your help and the tip. :slightly_smiling_face:

I’ve double-checked, if the Datastores are correctly mounted on every node, but couldn’t find any problems. The System, Image and File Datastore are existing on the same path on /media/vm_store/opennebula/datastores. The OCFS2 filesystem seems to be mounted correctly at boot time under /media/vm_store.

But I’ve found a solution on my own in the meantime. I’ve changed the type of the System Datastore from an SSH to a shared Datastore, so it has the same type definition like the other two. That did the trick, so that the live migration feature works now. I’ve over read the notice that you should use the same TM_MAD for both the Image and the System Datastore in the official OpenNebula docs. Probably that was the problem and at the same time my own fault.