Non-persistent to persistent and back again (CEPH)

Hello,

[TL;DR: when I make an image from non-persistent to persistent, make some changes,and return it to non-persistent again, the modifications made during the persistent state are lost.]

Longer description:
I want to make some changes to an existing image, which has been previously used as non-persistent. I delete all VMs which have been using it, change the status to persistent, instantiate a new VM using this image (the image is visible as USED_PERS in Sunstone), make some changes, shut the VM down, wait for its state to be poweroff/lcm_init in Sunstone, delete the VM, make the image non-persistent, and instantiate yet another VM on top of it. The disk looks the same as it was at the beginning, the modifications made during the last persistent state are lost.

What works is to clone the image, immediately make it persistent (before instantiating any VMs on top of it), only then instantiate the first VM on top of it, make the necessary modifications, shut the VM down, make the image non-persistent, (delete the original image and rename the cloned one,) and instantiate VMs on top of it as needed.

It seems that the difference is whether there previously has been a non-persistent VM instantiated on top of the image or not. Is it a bug or an expected behaviour?

Thanks!

Hi

No, there should not be any functional difference, in fact the logic
underneath used in the both scenarios is the same, so I am not really sure
how clone+persistent works but the persistent to not-persistent doesn’t.
We’ll try to reproduce and update this thread if find any difference.

OK, thanks. I have repeatedly tested this on several different source images (Fedora, Windows), but obviously on the same ONe/CEPH cluster.

Maybe the switch to persistent takes some time, and the new VM is instantiated before that? I can test it again if you give me hints what to look for - would there be for example a visible difference in the qemu command line for persistent and non-persistent images?

Not really, when an image is persistent is simply takes the original rbd as
source for disk. Maybe you can try to instantiate a VM and check the
deployment file /var/lib/one/datastores/<system_ds_id>/<vm_id>/deployment.0
there you should see that the VM is using the original Ceph volume…

Okay, I captured the deployment.0 files for the following cases:

  • clone the image (as non-persistent), instantiate vm on top of it
  • the above, make the image persistent, and instantiate another vm on top of it
  • the above, make the image non-persistent, and instantiate another vm on top of it
  • clone another image, make it persistent, and instantiate vm on top of it
  • the above, make the image non-persistent, and instantiate another vm on top of it

The only difference is (obviously) the VM ID, and for non-persistent instances the name of the disk is one-X-Y-0 instead of one-X.

So I think the problem is not in instantiating the VM itself, but in making the image persistent and non-persistent again. How is it done? I tried to look at /var/lib/one/remotes/datastore/ceph/, but I am not sure which scripts are called when the image is made persistent and non-persistent.

Reading the latest release notes, this seems to be yet another instance of the following issue, fixed in 5.2.1:

https://dev.opennebula.org/issues/4878