Qcow2 disk-efficient spawning of many instances as delta-only copy-on-write disks on NFS Shared datastore

I do not find a proper solution guide for this use case:

  1. Goal: spawn many instances of the same VM template for application scalability, but only copy the blocks different from the template image for each new instance, using the copy-on-write mode
  2. On my implementation with NFS Shared datastores, it seems all new instances duplicate the full template hard disk instead of only the differences with the template image. It is easy to see when you instantiate, you can see the size of the image that grows - say to 10GB - whereas differences should just be a few 10MB of logs or so.

Versions of the related components and OS (frontend, hypervisors, VMs):
OPennebula 5.2.1
Steps to reproduce:
a/ Create a NFS datastore of type Shared
b/ Create a VM template with non-persistent disk image of type qcow2 (example: Ubuntu 16.04 with 10GB HDD)
c/ instanciate a VM from this template
d/ watch VM in Sunstone storage tab the size of the image slowly growing to 10G after a long time

Current results:
Instance image file is same size as template : 10GB
Expected results:
Image file should be much smaller and contain only different blocks lie in qcow2 principle

This is a great idea but something like this unfortunately is not supported on OpenNebula at the moment. This said, spinning VMs from differential backups might have important repercussions on some VM features like snapshots, disk save-as, disk snapshots that will have to be considered.

I do not find reference of differential backup you mention, I found though the qemu-img incremental backup feature here https://wiki.qemu.org/Features/IncrementalBackup . Is that what you refered to ?

The feature I had in mind was actually similar to a tree of snapshots,but maybe the issue woudl be tha only one snapshot may be active at the same time ?

If so then may be a differential backup-like feature is more suitable.
It’s great that qemu-img already supports that.

Considering the repercussions on snapshots, disk save-as and disk snapshots, I think that this use case is totally in line with the auto-scaling / auto-healing vision in opennebula.

While it is always better to be able to save new data that changed at any time, in this use case where spawned numerous VMs are spawned based on a non persistent image model, chances are that we never need the data in the spawned VMs, as we actually would not be able to manage so much data. They are just throwable dispensable VMs meant to die quickly and be replaced.

I found this page showing this exact application: QEMU / KVM: Using the Copy-On-Write mode.

Looks quite simple and so powerful, I thought from the beginning that it was the idea behind non persistent image feature and service templates.

That could multiply by thousands the speed of deploying services templates, and make the advertised auto-scaling feature of opennebula a much more practical and competitive reality.

Hello,

I may have miss something but I think that we are using exactly this setup.

Just configure correctly you datastores to use qcow2 transfer mode.

Here is the example of one of my VMs

qemu-img info datastores/100/331387/disk.0.snap/0 
image: datastores/100/331387/disk.0.snap/0
file format: qcow2
virtual size: 40G (42949672960 bytes)
disk size: 6.0M
cluster_size: 2097152
backing file: /var/lib/one/datastores/101/06e06bcc29f43402d0af88ce9fec04bb
backing file format: qcow2
Format specific information:
    compat: 1.1
    lazy refcounts: true
    refcount bits: 16
    corrupt: false

It uses only 6MB instead of the 8GB of the backing file.

Regards.

Indeed ! Looks promising, thank you DaD !

I already use the Shared (NFS) type of datastores, because I need the live migration feature.

Which seems even to be possible with my older 5.0 version, with qcow2:

image

I actually have the qcow2 option, great :
image

I gather from the doc that qcow2 TM would work with my same NFS back-end.

Is that something you could confirm, or are you using a different backend like Gluster, which provides local FS access contrary to NFS ?

Also, I guess I would have to create new DS, rather than alter the existing shared DS configuration (below for reference).

Could you let me know if your working DS configuration looks very different from mine, and where, so that i check I have already everything in place for this change ?


Thanks a 1000 !

@svega I would be fine if this was supported only for qcow2 formats within OpenCloud/KVM backend. Would you please confirm this is the case, or more precisely what is/is not supported regarding this very powerful use case ?