Migrate mv to another host

Hello,
I try to migrate a instance which run on Node1 to Node2 on the same cluster
The message i have is:

Command execution fail: /var/lib/one/remotes/tm/ssh/mv 192.168.233.102:/var/lib/one//datastores/0/29 192.168.233.101:/var/lib/one//datastores/0/29 29 0
Fri Aug 3 11:05:19 2018 [Z0][TM][I]: mv: Moving 192.168.233.102:/var/lib/one/datastores/0/29 to 192.168.233.101:/var/lib/one/datastores/0/29
Fri Aug 3 11:05:19 2018 [Z0][TM][E]: mv: Command "set -e -o pipefail
Fri Aug 3 11:05:19 2018 [Z0][TM][I]:
Fri Aug 3 11:05:19 2018 [Z0][TM][I]: tar -C /var/lib/one/datastores/0 --sparse -cf - 29 | ssh 192.168.233.101 'tar -C /var/lib/one/datastores/0 --sparse -xf -'
Fri Aug 3 11:05:19 2018 [Z0][TM][I]: rm -rf /var/lib/one/datastores/0/29" failed: tar: 29: Cannot stat: No such file or directory
Fri Aug 3 11:05:19 2018 [Z0][TM][I]: tar: Exiting with failure status due to previous errors
Fri Aug 3 11:05:19 2018 [Z0][TM][E]: Error copying disk directory to target host
Fri Aug 3 11:05:19 2018 [Z0][TM][I]: ExitCode: 2
Fri Aug 3 11:05:19 2018 [Z0][TM][E]: Error executing image transfer script: Error copying disk directory to target host

The nodes are the same architecture and the version is : OpenNebula 5.4.13
The ssh connexions works fine.
Have you got an idea?

ā€œIā€™m newbee in Opennebulaā€ .

Thankā€¦

Hi,

Could you share some more background regarding your setup?

It looks like /var/lib/one/datastores/0 is in on shared filesystem but opennebula is not reconfigured and is using default ā€œsshā€ transfer manager.

BR,
Anton

Hi,

The share is on the frontend. Itā€™s a NFS share.
Show the mount:
The node1:
192.168.233.100:/var/lib/one on /var/lib/one type nfs4 (rw,relatime,vers=4.2,rsize=8192,wsize=8192,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.233.101,local_lock=none,addr=192.168.233.100)

The node2:
192.168.233.100:/var/lib/one on /var/lib/one type nfs4 (rw,relatime,vers=4.2,rsize=8192,wsize=8192,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.233.102,local_lock=none,addr=192.168.233.100)

The .100 is the front.

Itā€™s ok?

thanks
Hugues

Can you go to your opennebula server, log with oneadmin user and do that :

ssh node1
ls -altr /var/lib/one/datastores/0/29

This could be a user right issue, because everything is done with oneadmin user.

Also, I donā€™t really understand your log, it seems that you try to migrate a VM from 192.168.233.102 to 192.168.233.101, is that right ?

Nicolas

donā€™t know exactly solution but maybe you can look for selinux rule for redhat/centos and apparmor rule for debian/ubuntu.

Hi Nicolas,

It is good.
I try to migrate of node2 towards node1.
Here is the exit. Apparently it is good for the rights.

    root@node1:~# ls -altr /var/lib/one/datastores/0/
    15/       16/       17/       24/       29/       .monitor  
    root@node1:~# ls -altr /var/lib/one/datastores/0/29/
    total 10723528
    drwxr-xr-x 7 oneadmin oneadmin        4096 aoƻt   3 11:07 ..
    -rw-r--r-- 1 oneadmin oneadmin      372736 aoƻt   3 11:10 disk.1
    drwxr-xr-x 2 oneadmin oneadmin        4096 aoƻt   3 11:10 .
    -rw-r--r-- 1 oneadmin oneadmin        1110 aoƻt   3 11:10 deployment.6
    -rw-r--r-- 1 oneadmin oneadmin 11276255232 aoƻt   8 22:13 disk.0

oneadmin@node2:~$ ls -altr /var/lib/one/datastores/0/29/
total 10723528
drwxr-xr-x 7 oneadmin oneadmin        4096 aoƻt   3 11:07 ..
-rw-r--r-- 1 oneadmin oneadmin      372736 aoƻt   3 11:10 disk.1
drwxr-xr-x 2 oneadmin oneadmin        4096 aoƻt   3 11:10 .
-rw-r--r-- 1 oneadmin oneadmin        1110 aoƻt   3 11:10 deployment.6
-rw-r--r-- 1 oneadmin oneadmin 11276255232 aoƻt   8 22:22 disk.0

This is an other test.
Migrate vm on Node1 towards Node2.
The log are:
Wed Aug 8 22:45:51 2018 [Z0][VM][I]: New LCM state is SAVE_MIGRATE
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: /var/tmp/one/vmm/kvm/save: line 58: warning: command substitution: ignored null byte in input
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: ExitCode: 0
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: Successfully execute virtualization driver operation: save.
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: ExitCode: 0
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Wed Aug 8 22:46:12 2018 [Z0][VM][I]: New LCM state is PROLOG_MIGRATE
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: Command execution fail: /var/lib/one/remotes/tm/ssh/mv 192.168.233.101:/var/lib/one//datastores/0/17 192.168.233.102:/var/lib/one//datastores/0/17 17 0
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: mv: Moving 192.168.233.101:/var/lib/one/datastores/0/17 to 192.168.233.102:/var/lib/one/datastores/0/17
Wed Aug 8 22:46:14 2018 [Z0][TM][E]: mv: Command ā€œset -e -o pipefail
Wed Aug 8 22:46:14 2018 [Z0][TM][I]:
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: tar -C /var/lib/one/datastores/0 --sparse -cf - 17 | ssh 192.168.233.102 ā€˜tar -C /var/lib/one/datastores/0 --sparse -xf -ā€™
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: rm -rf /var/lib/one/datastores/0/17ā€ failed: tar: 17: Cannot stat: No such file or directory
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: tar: Exiting with failure status due to previous errors
Wed Aug 8 22:46:14 2018 [Z0][TM][E]: Error copying disk directory to target host
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: ExitCode: 2
Wed Aug 8 22:46:14 2018 [Z0][TM][E]: Error executing image transfer script: Error copying disk directory to target host
Wed Aug 8 22:46:14 2018 [Z0][VM][I]: New LCM state is PROLOG_MIGRATE_FAILURE

Thanks for your help.
Hugues

Hi,

The install is on Debian9. There is no apparmor install.

Thank you
Hugues

Ok ! I have an idea.

It seems like you use the migration not in live mode. Actually, itā€™s not needed to share the disk between the node.

At a moment, it tries to rm the directories on one node and tar these directories on another. Oh ! rm -rf win so the tar command failā€¦ Itā€™s all about scp, you shouldnā€™t to NFS in this case.
But you could tell me : ā€œif there is a rm, why there is still my directory !?ā€ => OpenNebula make a backup of your datastore before a migration - Just in case - and it restores it after a fail.
If you have a share datastore (with Ceph also, for example), you could try Migration (Live).

Maybe iā€™m not clear in my explanation, but I think the answer is in the log.

Nicolas

As I already said if you have shared filesystem under the datastore locations you must use shared for TM_MAD in both system and image datastores.

@Nicolas_Beguier
The live migration will work but undeploy action will be failing then because it is using the same (wrong) mv script to ā€œparkā€ the image files back on the front endā€¦

BR,
Anton.

Ok, it works.
I modified the graphic configuration and changed as you said the TM_MAD ssh by shared.
Can i modified this with the configuration file?
Thank you

Looks like a lot of users have this issue, ON should be able to identify if it is a shared storageā€¦

Please excuse me in advance if I am souunding rude.

It looks like a lot of users are not reading the docs, especially Open Cloud Storage Setup and its chapters regarding the Datastore Layout and how to setup the front-end and the nodes depending on the chosen environment.

OpenNebula is very flexible regarding the configuration and the authors decided to have a default setup that is using the ssh driver. If you follow the installation guide step-by-step there is no instruction to setup shared filesystem for the datastores.

Regards,
Anton Todorov

1 Like

Thank you for your answer, you are right, unfortunately I didnā€™t noticed the datastore chapter.
I was just wondering, because in the past, with the older versions I never had issues like this one.

Regards