Ceph datastore and snapshots

Hi! How to work with snaphots if I have ceph datastore? I don’t understand how it work from sunstone! I create snaphot but
I can’t delete it. I can’t delete it from console. Snapshot is protected.

[root@node01 ~]# rbd --pool one snap unprotect --image one-0 --snap 2016-07-26
rbd: unprotecting snap failed: (22) Invalid argument

I guess you are not supposed to modify snapshots directly using rbd(8) behind the ONed/Sunstone back.

That said, I have discovered I have some rbd volumes in my “one” CEPH pool, which are not referenced by OpenNebula by any means I am aware of, but which cannot be deleted using the rbd(8) command.

I found! In “Snapshots” - snapshot not work on ceph. Working snapshots in “Storage”. But it work is strange. I can create snapshot, can revert to old but how delete old snapshots if snapshot with children I can’t deleted?

Hi UAnton,

Are you using CEPH in production ? Cloud you do some benchmark the disk I/O of your VM?

Thanks,

I have Dell C6100 with 3 nodes (2xXeon E6520/64GB/LSI SAS2008/HP D2700). In each node 8x450GB SAS + 3 SSD. 10GB network.
Standart ceph test speed: 1300MB/s read / 400MB/s write
Test in Windows VM:

Tests in Linux I made later.

How I can delete snapshots 0,2,3,4,5?

Hi @UAnton ,

In short, AFAIK on CEPH you can’t delete a snapshot if it has leafs.

According to http://docs.ceph.com/docs/master/dev/rbd-layering/#command-line-interface

“Before cloning a snapshot, you must mark it as protected, to prevent
it from being deleted while child images refer to it:”

“To delete the parent, you must first mark it unprotected, which checks
that there are no children left:”

There is a feature request in the backlog to made the snapshot delete limitations that are implemented in OpenNebula as optional because there are storages that do not suffer from such limitations ;).

IMO the current implementation is suitable in the cases when you want to test something and if succeeded - revert and do it permanently. I am using it this way to test the automated installation of our addon.

Kind Regards,
Anton Todorov

1 Like

Hi @atodorov_storpool,
Thanks for answer! Now I understand how it works :wink:

Hi UAnton,

Thank you for your reply. With your disk benchmark, I think your CEPH cluster not good. Are you using VirtIO driver? If yes, I think you should consider deploy SSD to use for journals (cache write).

Hi! It’s test cluster with 3G SSD. In production we can use SAS SSD. Now we wait update ceph to use “bluestore”.

1 Like

Hi UAnton,

BlueStore, an entirely new OSD storage backend. It’s awesome but it’s not stable. If you can, please give me some benchmark when you deploy CEPH with BlueStore.

Regards,

Hello Anton and all,

you opened up a feature request to support the unprotection and deletion of intermediate snapshots.

I also think that this would be a nice feature because the current implementation “overprotects” snapshots from being deleted accidentally.
I already had a look into the (c++) code of opennebula oned and identified the file and place to possibly allow that deletion: Snapshots.cc (around line 325). But I did not test my changes yet.

I also assume that the shown snapshot tree is no correct representation of snapshots for every datastore type.
With Ceph for example a new snapshot is not necessarily based on its “predecessor” as far as I know.
Still, in Sunstone the tree looks like this:

But the representation inside the rbd pool looks like that:

# rbd -p one_ssd ls -l
NAME                    SIZE PARENT              FMT PROT LOCK 
one-3                  8192M                       2           
one-3@snap             8192M                       2 yes       
one-3-72-0            65536M one_ssd/one-3@snap    2      excl 
one-3-72-0@0          65536M one_ssd/one-3@snap    2 yes       
one-3-72-0@1          65536M one_ssd/one-3@snap    2 yes       
one-3-72-0@2          65536M one_ssd/one-3@snap    2 yes       

# rbd -p one_ssd children one-3@snap
one_ssd/one-3-72-0
# rbd -p one_ssd children one-3-72-0@0
[no output -> no children]
# rbd -p one_ssd children one-3-72-0@1
[no output -> no children]
# rbd -p one_ssd children one-3-72-0@2
[no output -> no children]

From my understanding this means that there is no relationship between those snapshots and it would be possible to delete intermediate snapshots without loosing the ability to revert to any later snapshot.
This would also mean that the snapshot tree is actually a flat list where every snapshot should have -1 as its parent.
Am I right with this assumption?
If so - can this behavior be changed?

Thank you in advance.

Bernhard J. M. Grün

Hi Bernhard,

Agree.

IMO this is only one part of the puzzle. Here is only the deletion check. On another place the ‘tree’ structure is kept and must be rebuilt after successful deletion of an intermediate snapshot.

I have a golden rule: Avoid final conclusions because the universe is too big to have an exception ;). For example for qcow2 and storpool this representation is correct. And in qcow2 you can’t delete the parent of the snapshots… So I think that we should have an option to select the type of view depending on the storage backend.

Currently I’ve no access to ceph cluster so my current conclusions are based on yor example and the ceph documentation. I think that OpenNebula is following the Layering approach described in the docs. But given your example I think that the snapshots should be protected before cloning an image from then not when creating them. And yes, in the current implementation it sounds that the deletion of intermediate snapshots is possible :).

I believe it can :). Just back/comment the feature request to note the importance of the feature. Also, I think that separate feature request should be opened for the option to define “snapshot views” depending on the backend.

in brief - a feature to allow intermediate snapshot deletion and the type of visualization in the CLI/sunstone - both configurable in the datastore MAD i think.
Besides that, judging from your observations the ceph MAD should have some refactoring - to protect a snapshot when creating clone not when the snapshot is created.

I think that some of the core developers should comment on the matter too :wink: (is it possible to move the thread to the Development?)

Kind Regards,
Anton Todorov