Migrate mv to another host

Hello,
I try to migrate a instance which run on Node1 to Node2 on the same cluster
The message i have is:

Command execution fail: /var/lib/one/remotes/tm/ssh/mv 192.168.233.102:/var/lib/one//datastores/0/29 192.168.233.101:/var/lib/one//datastores/0/29 29 0
Fri Aug 3 11:05:19 2018 [Z0][TM][I]: mv: Moving 192.168.233.102:/var/lib/one/datastores/0/29 to 192.168.233.101:/var/lib/one/datastores/0/29
Fri Aug 3 11:05:19 2018 [Z0][TM][E]: mv: Command "set -e -o pipefail
Fri Aug 3 11:05:19 2018 [Z0][TM][I]:
Fri Aug 3 11:05:19 2018 [Z0][TM][I]: tar -C /var/lib/one/datastores/0 --sparse -cf - 29 | ssh 192.168.233.101 'tar -C /var/lib/one/datastores/0 --sparse -xf -'
Fri Aug 3 11:05:19 2018 [Z0][TM][I]: rm -rf /var/lib/one/datastores/0/29" failed: tar: 29: Cannot stat: No such file or directory
Fri Aug 3 11:05:19 2018 [Z0][TM][I]: tar: Exiting with failure status due to previous errors
Fri Aug 3 11:05:19 2018 [Z0][TM][E]: Error copying disk directory to target host
Fri Aug 3 11:05:19 2018 [Z0][TM][I]: ExitCode: 2
Fri Aug 3 11:05:19 2018 [Z0][TM][E]: Error executing image transfer script: Error copying disk directory to target host

The nodes are the same architecture and the version is : OpenNebula 5.4.13
The ssh connexions works fine.
Have you got an idea?

“I’m newbee in Opennebula” .

Thank…

Hi,

Could you share some more background regarding your setup?

It looks like /var/lib/one/datastores/0 is in on shared filesystem but opennebula is not reconfigured and is using default “ssh” transfer manager.

BR,
Anton

Hi,

The share is on the frontend. It’s a NFS share.
Show the mount:
The node1:
192.168.233.100:/var/lib/one on /var/lib/one type nfs4 (rw,relatime,vers=4.2,rsize=8192,wsize=8192,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.233.101,local_lock=none,addr=192.168.233.100)

The node2:
192.168.233.100:/var/lib/one on /var/lib/one type nfs4 (rw,relatime,vers=4.2,rsize=8192,wsize=8192,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.233.102,local_lock=none,addr=192.168.233.100)

The .100 is the front.

It’s ok?

thanks
Hugues

Can you go to your opennebula server, log with oneadmin user and do that :

ssh node1
ls -altr /var/lib/one/datastores/0/29

This could be a user right issue, because everything is done with oneadmin user.

Also, I don’t really understand your log, it seems that you try to migrate a VM from 192.168.233.102 to 192.168.233.101, is that right ?

Nicolas

don’t know exactly solution but maybe you can look for selinux rule for redhat/centos and apparmor rule for debian/ubuntu.

Hi Nicolas,

It is good.
I try to migrate of node2 towards node1.
Here is the exit. Apparently it is good for the rights.

    root@node1:~# ls -altr /var/lib/one/datastores/0/
    15/       16/       17/       24/       29/       .monitor  
    root@node1:~# ls -altr /var/lib/one/datastores/0/29/
    total 10723528
    drwxr-xr-x 7 oneadmin oneadmin        4096 août   3 11:07 ..
    -rw-r--r-- 1 oneadmin oneadmin      372736 août   3 11:10 disk.1
    drwxr-xr-x 2 oneadmin oneadmin        4096 août   3 11:10 .
    -rw-r--r-- 1 oneadmin oneadmin        1110 août   3 11:10 deployment.6
    -rw-r--r-- 1 oneadmin oneadmin 11276255232 août   8 22:13 disk.0

oneadmin@node2:~$ ls -altr /var/lib/one/datastores/0/29/
total 10723528
drwxr-xr-x 7 oneadmin oneadmin        4096 août   3 11:07 ..
-rw-r--r-- 1 oneadmin oneadmin      372736 août   3 11:10 disk.1
drwxr-xr-x 2 oneadmin oneadmin        4096 août   3 11:10 .
-rw-r--r-- 1 oneadmin oneadmin        1110 août   3 11:10 deployment.6
-rw-r--r-- 1 oneadmin oneadmin 11276255232 août   8 22:22 disk.0

This is an other test.
Migrate vm on Node1 towards Node2.
The log are:
Wed Aug 8 22:45:51 2018 [Z0][VM][I]: New LCM state is SAVE_MIGRATE
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: /var/tmp/one/vmm/kvm/save: line 58: warning: command substitution: ignored null byte in input
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: ExitCode: 0
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: Successfully execute virtualization driver operation: save.
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: ExitCode: 0
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Wed Aug 8 22:46:12 2018 [Z0][VM][I]: New LCM state is PROLOG_MIGRATE
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: Command execution fail: /var/lib/one/remotes/tm/ssh/mv 192.168.233.101:/var/lib/one//datastores/0/17 192.168.233.102:/var/lib/one//datastores/0/17 17 0
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: mv: Moving 192.168.233.101:/var/lib/one/datastores/0/17 to 192.168.233.102:/var/lib/one/datastores/0/17
Wed Aug 8 22:46:14 2018 [Z0][TM][E]: mv: Command “set -e -o pipefail
Wed Aug 8 22:46:14 2018 [Z0][TM][I]:
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: tar -C /var/lib/one/datastores/0 --sparse -cf - 17 | ssh 192.168.233.102 ‘tar -C /var/lib/one/datastores/0 --sparse -xf -’
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: rm -rf /var/lib/one/datastores/0/17” failed: tar: 17: Cannot stat: No such file or directory
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: tar: Exiting with failure status due to previous errors
Wed Aug 8 22:46:14 2018 [Z0][TM][E]: Error copying disk directory to target host
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: ExitCode: 2
Wed Aug 8 22:46:14 2018 [Z0][TM][E]: Error executing image transfer script: Error copying disk directory to target host
Wed Aug 8 22:46:14 2018 [Z0][VM][I]: New LCM state is PROLOG_MIGRATE_FAILURE

Thanks for your help.
Hugues

Hi,

The install is on Debian9. There is no apparmor install.

Thank you
Hugues

Ok ! I have an idea.

It seems like you use the migration not in live mode. Actually, it’s not needed to share the disk between the node.

At a moment, it tries to rm the directories on one node and tar these directories on another. Oh ! rm -rf win so the tar command fail… It’s all about scp, you shouldn’t to NFS in this case.
But you could tell me : “if there is a rm, why there is still my directory !?” => OpenNebula make a backup of your datastore before a migration - Just in case - and it restores it after a fail.
If you have a share datastore (with Ceph also, for example), you could try Migration (Live).

Maybe i’m not clear in my explanation, but I think the answer is in the log.

Nicolas

As I already said if you have shared filesystem under the datastore locations you must use shared for TM_MAD in both system and image datastores.

@Nicolas_Beguier
The live migration will work but undeploy action will be failing then because it is using the same (wrong) mv script to “park” the image files back on the front end…

BR,
Anton.

Ok, it works.
I modified the graphic configuration and changed as you said the TM_MAD ssh by shared.
Can i modified this with the configuration file?
Thank you

Looks like a lot of users have this issue, ON should be able to identify if it is a shared storage…

Please excuse me in advance if I am souunding rude.

It looks like a lot of users are not reading the docs, especially Open Cloud Storage Setup and its chapters regarding the Datastore Layout and how to setup the front-end and the nodes depending on the chosen environment.

OpenNebula is very flexible regarding the configuration and the authors decided to have a default setup that is using the ssh driver. If you follow the installation guide step-by-step there is no instruction to setup shared filesystem for the datastores.

Regards,
Anton Todorov

1 Like

Thank you for your answer, you are right, unfortunately I didn’t noticed the datastore chapter.
I was just wondering, because in the past, with the older versions I never had issues like this one.

Regards