The share is on the frontend. Itās a NFS share.
Show the mount:
The node1: 192.168.233.100:/var/lib/one on /var/lib/one type nfs4 (rw,relatime,vers=4.2,rsize=8192,wsize=8192,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.233.101,local_lock=none,addr=192.168.233.100)
The node2:
192.168.233.100:/var/lib/one on /var/lib/one type nfs4 (rw,relatime,vers=4.2,rsize=8192,wsize=8192,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.233.102,local_lock=none,addr=192.168.233.100)
This is an other test.
Migrate vm on Node1 towards Node2.
The log are:
Wed Aug 8 22:45:51 2018 [Z0][VM][I]: New LCM state is SAVE_MIGRATE
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: /var/tmp/one/vmm/kvm/save: line 58: warning: command substitution: ignored null byte in input
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: ExitCode: 0
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: Successfully execute virtualization driver operation: save.
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: ExitCode: 0
Wed Aug 8 22:46:12 2018 [Z0][VMM][I]: Successfully execute network driver operation: clean.
Wed Aug 8 22:46:12 2018 [Z0][VM][I]: New LCM state is PROLOG_MIGRATE
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: Command execution fail: /var/lib/one/remotes/tm/ssh/mv 192.168.233.101:/var/lib/one//datastores/0/17 192.168.233.102:/var/lib/one//datastores/0/17 17 0
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: mv: Moving 192.168.233.101:/var/lib/one/datastores/0/17 to 192.168.233.102:/var/lib/one/datastores/0/17
Wed Aug 8 22:46:14 2018 [Z0][TM][E]: mv: Command āset -e -o pipefail
Wed Aug 8 22:46:14 2018 [Z0][TM][I]:
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: tar -C /var/lib/one/datastores/0 --sparse -cf - 17 | ssh 192.168.233.102 ātar -C /var/lib/one/datastores/0 --sparse -xf -ā
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: rm -rf /var/lib/one/datastores/0/17ā failed: tar: 17: Cannot stat: No such file or directory
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: tar: Exiting with failure status due to previous errors
Wed Aug 8 22:46:14 2018 [Z0][TM][E]: Error copying disk directory to target host
Wed Aug 8 22:46:14 2018 [Z0][TM][I]: ExitCode: 2
Wed Aug 8 22:46:14 2018 [Z0][TM][E]: Error executing image transfer script: Error copying disk directory to target host
Wed Aug 8 22:46:14 2018 [Z0][VM][I]: New LCM state is PROLOG_MIGRATE_FAILURE
It seems like you use the migration not in live mode. Actually, itās not needed to share the disk between the node.
At a moment, it tries to rm the directories on one node and tar these directories on another. Oh ! rm -rf win so the tar command fail⦠Itās all about scp, you shouldnāt to NFS in this case.
But you could tell me : āif there is a rm, why there is still my directory !?ā => OpenNebula make a backup of your datastore before a migration - Just in case - and it restores it after a fail.
If you have a share datastore (with Ceph also, for example), you could try Migration (Live).
Maybe iām not clear in my explanation, but I think the answer is in the log.
As I already said if you have shared filesystem under the datastore locations you must use shared for TM_MAD in both system and image datastores.
@Nicolas_Beguier
The live migration will work but undeploy action will be failing then because it is using the same (wrong) mv script to āparkā the image files back on the front endā¦
Ok, it works.
I modified the graphic configuration and changed as you said the TM_MAD ssh by shared.
Can i modified this with the configuration file?
Thank you
Please excuse me in advance if I am souunding rude.
It looks like a lot of users are not reading the docs, especially Open Cloud Storage Setup and its chapters regarding the Datastore Layout and how to setup the front-end and the nodes depending on the chosen environment.
OpenNebula is very flexible regarding the configuration and the authors decided to have a default setup that is using the ssh driver. If you follow the installation guide step-by-step there is no instruction to setup shared filesystem for the datastores.
Thank you for your answer, you are right, unfortunately I didnāt noticed the datastore chapter.
I was just wondering, because in the past, with the older versions I never had issues like this one.