CEPH snapshots broken

Hello,

I think disk snapshots do not work with CEPH datastore at all. Maybe I am doing something wrong.

What I am trying to do is to create a snapshot of a disk attached to a running VM, and then copy the snapshotted state to a new image.


Versions of the related components and OS (frontend, hypervisors, VMs):
CentOS7, ONe 5.4, Ceph 10-based system datastore.

Steps to reproduce:

  • create a VM running Linux on a persistent disk, let’s say the VM id is 1234
  • ssh to the VM
  • touch “before-snapshot.txt”; sync
  • create a snapshot of the root disk (onevm disk-snapshot-create 1234 0 testsnap)
  • wait till the snapshot completes
  • ssh to the VM again, and run this: touch “after-snapshot.txt”; sync
  • try to copy the snapshot as a new image: onevm disk-saveas 1234 0 --snapshot testsnap testsnap-copy
  • now attach the disk testsnap-copy to another VM (it cannot be mounted in the same VM, because it has the same UUID as the original disk), and look inside.

Current results:
the “after-snapshot.txt” file is present on the testsnap-copy image, even though it has been created after the snapshot was taken.

Expected results:
the testsnap-copy image should not contain the “after-snapshot.txt” file.

I get the same results when using Sunstone instead of command line - when I open the VM and its “Storage” tab, I can create a snapshot of disk 0, after it completes create a new file inside the VM, then click on a snapshot and use the “Save As” button, but the resulting saved image contains even the data created after the snapshot has been created.

So as far as I can tell, there is no difference between onevm disk-saveas of the active disk an onevm disk-saveas --snapshot X.

Maybe I am doing something wrong. Thanks for any help.

-Yenya

Anyway, I have examined the created snapshots using direct access to CEPH (i.e. the rbd export command), and the snapshots look OK - they contain only the data which has been there when the snapshot was created, nothing newer. So the problem seems to be that both onevm disk-saveas and Sunstone “Save As” button use the present disk state instead of copying the snapshot requested.

Should I convert this to a bug report?

-Yenya

Hello Yenya,

Can you open a bug into our github?

Thanks!

There is really a bug in disk-saveas not respecting the selected disk snapshot ID for Ceph:
https://github.com/OpenNebula/one/issues/2429

@vholer - thanks very much for fixing this. I have applied your diff to my ONe 5.4.x /var/lib/one/remotes/tm/ceph/cpds, and it works now. Thanks again!

Now that my biggest problem with snapshots on CEPH is fixed, there are some others. One of them is that it is not possible to delete snapshots:

# onevm disk-snapshot-create 1992 0 testsnap1
# onevm disk-snapshot-create 1992 0 testsnap2
# onevm disk-snapshot-delete 1992 0 testsnap2
[one.vm.disksnapshotdelete] Cannot delete snapshot with children
# onevm disk-snapshot-delete 1992 0 testsnap1
[one.vm.disksnapshotdelete] Cannot delete snapshot with children
# 

Another interesting thing is that trying to delete the newest (only) snaphsot in Sunstone instead of the command line yields a different error message:

[one.vm.disksnapshotdelete] Cannot delete the active snapshot

One layer deeper, in CEPH itself, the snapshots can be removed without problem, and nothing is broken. The only non-trivial thing is that ONe apparently creates snapshots as protected (rbd protect), even though this is not needed for ordinary snapshots, only when layering, clones, or writable snapshots are needed (none of which ONe uses, as far as I can tell. EDIT: nope, ONe apparently calls rbd clone in tm/ceph/snap_revert. So the fix is probably to try to unprotect the snapshot, and let CEPH decide whether there is something depending on it): http://docs.ceph.com/docs/master/rbd/rbd-snapshot/

# rbd --pool one snap list one-627
SNAPID NAME    SIZE
   988 0    4096 MB
   989 1    4096 MB
# rbd --pool one snap rm one-627@0
rbd: snapshot '0' is protected from removal.
# rbd --pool one snap rm one-627@1
rbd: snapshot '1' is protected from removal.
# rbd --pool one snap unprotect one-627@0
# rbd --pool one snap unprotect one-627@1
# rbd --pool one snap rm one-627@1
# rbd --pool one snap rm one-627@0
#