HOTPLUG_SNAPSHOT takes about 1h after changing cache=writethrough to none

OpenNebula version:5.12.01

i changed the cache mode from writetrough to none, increased the vm memory from 8GB to 16GB on the vm and doubled the size of the swap file.

Before these actions the vm live snapshot (HOTPLUG_SNAPSHOT:create snapshot) took about 3-5 minutes. Now (on NVME SSD Storage !) it takes about 55 minutes to complete.

HOTPLUG_SNAPSHOT:delete snapshot - the time for deletion a snapshot is quite the same (before & after the mentioned changes).

i’m aware of doubling the memory and swap size, increases the time for the live snapshot, because memory information will be dumped into the snapshot.

Is this behave OpenNebula or qemu-img related ? (takes much more time to complete the live snap operation, when cache=none is used)

In this scenario it makes not much sense for me to do a live snapshot … i could also just make an offline backup of vm Disk, which takes - more or less - the same time to complete.

i’m appreciating any input. thx a lot.

Ok … i did the some tests for myself.

I reverted the templates back to disk cache=writetrough.

Now the HOTPLUG_SNAPSHOT creation and deletion is very fast again.

Check this too :

Qcow2 - Performance
KVM qcow2
Qemu has supported a features called lazy_refcounts from version 1.2. This will noticeably improve the performance of qcow2 disk images when the guest is set to using the writethrough caching mode (which is the default). The tradeoff is that if the guest experiences a sudden power loss, an fsck-like pass will need to be made on the disk image before it can be used again. Luckily, the qemu-img check can now repair qcow2 and QED images with the new -r option.
Setting Cache Type
Below is an example section of an xml configuration of one of my guests where I have set the cache mode to none (on second line). I noticed that I didn’t have a cache mode set, so it was probably defaulting to writethrough.
Copy to clipboard
There are four levels of preallocation of qcow2 disk images, listed below in order of least performance to greatest (for guest writes).
• preallocation=none
• preallocation=metadata
• preallocation=falloc
• preallocation=full
My rule of thumb is to use the metadata preallocation as it is quick to deploy small sparse images with a noticeable improvement over having no preallocation at all. If you really need more performance, then go with falloc or full.
Converting Existing Disk Images
If you wish to retrospectively “fix” existing disk images, then you can use the qemu-img convert command with the relevant options. E.g.
mvdisk.qcow2 disk.qcow2.bak
qemu-img convert
-O qcow2
-o lazy_refcounts=on,preallocation=metadata

check disk is fine before removing the originalrmdisk.qcow2.bak

Copy to clipboard
Slow Snapshotting
I had recently been suffering from incredibly slow snapshotting of one of my guests. It appears that I may have resolved it by using a script that shuts down the guest before taking the snapshot. In this case the guest had 4GB of RAM, and it appears that saving the memory state was taking up quite a bit of the time.