Live migration failure on Xen

Hi guys,

I am very much new to OpenNebula. I have setup a private on OpenNebula with xen Hypervisor. I am trying to perform live VM migration but I am getting the below error. Can somebody please advise me what am I doing wrong or what is missing here.

Mon Jun 27 13:37:19 2016 [Z0][LCM][I]: New VM state is MIGRATE
Mon Jun 27 13:37:19 2016 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_premigrate.
Mon Jun 27 13:37:19 2016 [Z0][VMM][I]: ExitCode: 0
Mon Jun 27 13:37:19 2016 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Mon
Jun 27 13:37:20 2016 [Z0][VMM][I]: Command execution fail:
/var/tmp/one/vmm/xen4/migrate ‘one-13’ ‘192.168.100.4’ '192.168.100.2’
13 192.168.100.2
Mon Jun 27 13:37:20 2016
[Z0][VMM][E]: migrate: Command "sudo /usr/sbin/xl migrate -s "su -
oneadmin -c ‘ssh 192.168.100.4 sudo /usr/sbin/xl migrate-receive’“
one-13 192.168.100.4” failed: migration target: Ready to receive domain.
Mon Jun 27 13:37:20 2016 [Z0][VMM][I]: Saving to migration stream new xl format (info 0x0/0x0/150)
Mon Jun 27 13:37:20 2016 [Z0][VMM][I]: Loading new save file (new xl fmt info 0x0/0x0/150)
Mon Jun 27 13:37:20 2016 [Z0][VMM][I]: Savefile contains xl domain config
Mon
Jun 27 13:37:20 2016 [Z0][VMM][I]: libxl: error:
libxl_device.c:265:libxl__device_disk_set_backend: Disk vdev=sda failed
to stat: /var/lib/one//datastores/0/13/disk.0: No such file or directory
Mon
Jun 27 13:37:20 2016 [Z0][VMM][I]: libxl: error:
libxl_dm.c:1489:kill_device_model: unable to find device model pid in
/local/domain/12/image/device-model-pid
Mon Jun 27 13:37:20 2016
[Z0][VMM][I]: libxl: error: libxl.c:1421:libxl__destroy_domid:
libxl__destroy_device_model failed for 12
Mon Jun 27 13:37:20 2016 [Z0][VMM][I]: migration target: Domain creation failed (code -3).
Mon
Jun 27 13:37:20 2016 [Z0][VMM][I]: libxl: error:
libxl_utils.c:396:libxl_read_exactly: file/stream truncated reading ipc
msg header from domain 2 save/restore helper stdout pipe
Mon Jun 27
13:37:20 2016 [Z0][VMM][I]: libxl: error:
libxl_exec.c:129:libxl_report_child_exitstatus: domain 2 save/restore
helper [-1] died due to fatal signal Broken pipe
Mon Jun 27 13:37:20 2016 [Z0][VMM][I]: migration sender: libxl_domain_suspend failed (rc=-3)
Mon
Jun 27 13:37:20 2016 [Z0][VMM][I]: libxl: info:
libxl_exec.c:118:libxl_report_child_exitstatus: migration target process
[12620] exited with error status 3
Mon Jun 27 13:37:20 2016 [Z0][VMM][I]: Migration failed, resuming at sender.
Mon Jun 27 13:37:20 2016 [Z0][VMM][E]: Could not migrate one-13 to 192.168.100.4
Mon Jun 27 13:37:20 2016 [Z0][VMM][I]: ExitCode: 3
Mon Jun 27 13:37:20 2016 [Z0][VMM][I]: Failed to execute virtualization driver operation: migrate.
Mon Jun 27 13:37:20 2016 [Z0][VMM][E]: Error live migrating VM: Could not migrate one-13 to 192.168.100.4
Mon Jun 27 13:37:21 2016 [Z0][LCM][I]: Fail to live migrate VM. Assuming that the VM is still RUNNING (will poll VM).

Thanks and regards,
Arshad

Make sure that /var/lib/one//datastores/0 is shared to both nodes.

Thanks for replying Javi.

I have configured the shared storage on all the nodes and now I am getting the following error when I try to do the live migration.

Tue Jul 26 15:48:18 2016 [Z0][DiM][I]: New VM state is ACTIVE.
Tue Jul 26 15:48:20 2016 [Z0][LCM][I]: New VM state is BOOT_POWEROFF
Tue Jul 26 15:48:20 2016 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/31/deployment.9
Tue Jul 26 15:48:36 2016 [Z0][VMM][I]: ExitCode: 0
Tue Jul 26 15:48:36 2016 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Tue Jul 26 15:48:43 2016 [Z0][VMM][D]: deploy: Credits set to 256
Tue Jul 26 15:48:43 2016 [Z0][VMM][I]: ExitCode: 0
Tue Jul 26 15:48:43 2016 [Z0][VMM][I]: Successfully execute virtualization driver operation: deploy.
Tue Jul 26 15:48:43 2016 [Z0][VMM][I]: ExitCode: 0
Tue Jul 26 15:48:43 2016 [Z0][VMM][I]: Successfully execute network driver operation: post.
Tue Jul 26 15:48:44 2016 [Z0][LCM][I]: New VM state is RUNNING
Tue Jul 26 15:58:48 2016 [Z0][LCM][I]: New VM state is MIGRATE
Tue Jul 26 15:58:48 2016 [Z0][VMM][I]: Successfully execute transfer manager driver operation: tm_premigrate.
Tue Jul 26 15:58:48 2016 [Z0][VMM][I]: ExitCode: 0
Tue Jul 26 15:58:48 2016 [Z0][VMM][I]: Successfully execute network driver operation: pre.
Tue
Jul 26 15:58:49 2016 [Z0][VMM][I]: Command execution fail:
/var/tmp/one/vmm/xen4/migrate ‘one-31’ ‘192.168.100.3’ '192.168.100.2’
31 192.168.100.2
Tue Jul 26 15:58:49 2016
[Z0][VMM][E]: migrate: Command "sudo /usr/sbin/xl migrate -s "su -
oneadmin -c ‘ssh 192.168.100.3 sudo /usr/sbin/xl migrate-receive’“
one-31 192.168.100.3” failed: one-31 is an invalid domain identifier
(rc=-6)
Tue Jul 26 15:58:49 2016 [Z0][VMM][E]: Could not migrate one-31 to 192.168.100.3
Tue Jul 26 15:58:49 2016 [Z0][VMM][I]: ExitCode: 2
Tue Jul 26 15:58:49 2016 [Z0][VMM][I]: Failed to execute virtualization driver operation: migrate.
Tue Jul 26 15:58:49 2016 [Z0][VMM][E]: Error live migrating VM: Could not migrate one-31 to 192.168.100.3
Tue Jul 26 15:58:51 2016 [Z0][LCM][I]: Fail to live migrate VM. Assuming that the VM is still RUNNING (will poll VM).

It says something related to ssh but I have checked that and ssh is working fine. I am able to ssh from any node to any other node in the cloud. Any suggestions?

Regards,
Arshad