Resize disk leads to qcow2 corruption

Until OpenNebula 5.4 we had a manual procedure to (live) resize a disk image. Since OpenNebula 5.4, live disk resize is supported. We have noticed however, that certain qcow2 images get corrupt while doing a live virtual resize. OpenNebula will report the disk resized just fine (will create a bug report for that). Instead what happens is that the VM stops working, and qemu/libvirt logs the following:

qcow2: Marking image as corrupt: Preventing invalid write on metadata (overlaps with active L1 table); further corruption events will be suppressed

At this point you can only fix the image with “qemu-img check -r all /path/to/image/file”. It does not seem to lead to data corruption. The disk will not be resized at that point and OpenNebula will report wrong VIRTUAL size.

We do not know the trigger for this behaviour. Clones from the same golden image behave differently.

OpenNebula resize_disk driver uses qemu-monitor command:

virsh --connect $LIBVIRT_URI qemu-monitor-command $DOMAIN --hmp “block_resize $drive ${DISK_SIZE}M”

We have used the following command, and this has not given us any issues, not even on the “problematic” images the qemu-monitor-command can trigger a corruption, i.e.:

virsh blockresize one-$ID vda --size 150GiB

Anyone has run into the same “corruption” issues as we have? I would recommend changing the resize_disk command to use virsh instead of qemu-monitor-command to prevent corruption / downtime on (certain) qcow2 images.

Hello

confirm behaviour: resized VM disk in Sunstone, got errors inside VM operating system, restarted without success.
Migrated VM to a different datastore, got “corrupt image error”.
I have fixed image and restarted the machine.
Now Opennebula reports new size, while OS reports old size.

Thanks for confirming, I’ll create a github issue

https://github.com/OpenNebula/one/issues/1585