Hello Francisco,
Thanks for the reply. Sorry for the delay with my follow up. I was trying figure out alternative ways in the meantime, but I was not realizing that “I am walking on such shallow ice”… I tried using the native CBT feature of libvirt by using a script that wraps a proxy qcow2 file in front of my RBD images (using the data_file property) at the deploy stage of the VM in OpenNebula. But I was not realizing that these are features that are very recent (I use libvirt 10.0.0), and apparently not supported through the whole stack. So, while the higher level mechanism seems to work, the approach fails on my system because apparently qemu / libvirt does not realize that it needs to use the RBD driver when it finds a qcow2 file with a data_file setting that starts with rbd:.
What concerns alternative methods, I still have benji backup as an option, but was initially not very keen on using, because generating a benji configuration on each KVM host seemed like a headache at first. But injecting the benji configuration directives using the RAW data option of VM templates might be a way to do this in a clean way.
And back to the OpenNebula backup, here some further explanations:
For example, I do a full backup with a reset on VM38. An incremental backup (snapshot) that is started a couple of minutes later (on small VM, around 30GB) usually terminates OK.
Then I start an incremental backup (snapshot) after a couple of hours, typically the day after. The VM switches to BACKUP state as expected: vm.log
Sat Jan 31 08:24:15 2026 [Z0][VM][I]: New LCM state is BACKUP
Sat Jan 31 08:24:42 2026 [Z0][VMM][I]: Ignoring VM state update
Then the the backup jobs hangs ad infinitum with the characteristics mentioned in my initial message. Then, after a long wait with the “prepare backup folder job” in sleeping state, I kill the parent ruby process on the KVM host, and I get plenty of messages in the vm.log:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: Command execution failed (exit code: 143): /var/lib/one/remotes/tm/ceph/prebackup_live dewey.tweax.net:/var/lib/one//datastores/100/38 0 529563bf-72ec-4071-b3bb-413
b2e162c5e - 38 100
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: Command failed:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: export LANG=C
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: export LC_ALL=C
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: set -ex -o pipefail
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: # ----------------------
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: # Prepare backup folder
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: # ----------------------
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: [ -d /var/lib/one//datastores/100/38/backup ] && rm -rf /var/lib/one//datastores/100/38/backup
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: mkdir -p /var/lib/one//datastores/100/38/backup
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: echo "PFZNPgogIDxJRD4zODwvSUQ+CiAgPFVJRD4wPC9VSUQ+CiAgPEdJRD4wPC9H
...
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: QUNLVVBTPgo8L1ZNPg==
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: " > /var/lib/one//datastores/100/38/backup/vm.xml
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: # --------------------------------
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: # Create snapshots for disks
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: # --------------------------------
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: rbd --id libvirt snap create one/one-17@one_backup_2
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: rbd --id libvirt snap protect one/one-17@one_backup_2
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: ssh 'dewey.tweax.net' ' script="$(mktemp)"; echo "c2V0IC1leCAtbyBwa
...
oKCgo=" | base64 -d > "$script"; trap "rm $script" EXIT; bash "$script"; '
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: set -ex -o pipefail
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: # --------------------------
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: # Cleanup snapshots
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: # --------------------------
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]: rbd --id libvirt snap ls one/one-17 --format json | jq -r '.[] | select(.name == "one_backup_1") .name' | xargs -rn1 sh -c ' rbd --id libvirt snap unprotect one/one-17@$1 > /dev/null 2>&1 || true; rbd --id libvirt snap rm one/one-17@$1; ' sh
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][I]:
Sat Jan 31 08:33:24 2026 [Z0][VMM][E]: prebackup_live: Command failed:
Sat Jan 31 08:33:24 2026 [Z0][VMM][E]: prebackup_live: export LANG=C
Sat Jan 31 08:33:24 2026 [Z0][VMM][E]: prebackup_live: export LC_ALL=C
Sat Jan 31 08:33:24 2026 [Z0][VMM][E]: prebackup_live: set -ex -o pipefail
Sat Jan 31 08:33:24 2026 [Z0][VMM][E]: prebackup_live:
Sat Jan 31 08:33:24 2026 [Z0][VMM][E]: prebackup_live: # ----------------------
Sat Jan 31 08:33:24 2026 [Z0][VMM][E]: prebackup_live: # Prepare backup folder
Sat Jan 31 08:33:24 2026 [Z0][VMM][E]: prebackup_live: # ----------------------
Sat Jan 31 08:33:24 2026 [Z0][VMM][E]: prebackup_live: [ -d /var/lib/one//datastores/100/38/backup ] && rm -rf /var/lib/one//datastores/100/38/backup
Sat Jan 31 08:33:24 2026 [Z0][VMM][E]: prebackup_live:
Sat Jan 31 08:33:24 2026 [Z0][VMM][E]: prebackup_live: mkdir -p /var/lib/one//datastores/100/38/backup
Sat Jan 31 08:33:24 2026 [Z0][VMM][E]: prebackup_live:
Sat Jan 31 08:33:24 2026 [Z0][VMM][E]: prebackup_live: echo "PFZNPgogIDxJRD4zODwvSUQ+CiAgPFVJRD4wPC9VSUQ+CiAgPEdJRD4wPC9H
...
Sat Jan 31 08:33:25 2026 [Z0][VMM][I]: QUNLVVBTPgo8L1ZNPg==
Sat Jan 31 08:33:25 2026 [Z0][VMM][I]: '
Sat Jan 31 08:33:25 2026 [Z0][VMM][I]: + rbd --id libvirt snap create one/one-17@one_backup_2
Sat Jan 31 08:33:25 2026 [Z0][VMM][I]: Creating snap: 10% complete...^MCreating snap: 100% complete...done.
Sat Jan 31 08:33:25 2026 [Z0][VMM][I]: + rbd --id libvirt snap protect one/one-17@one_backup_2
Sat Jan 31 08:33:25 2026 [Z0][VMM][I]: + ssh dewey.tweax.net ' script="$(mktemp)"; echo "c2V0IC1leCAtbyBwaXBlZmFpbAoKIyAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQojIFNhdmUgVFBNIHN0YXRlCiM
gLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KCgojIC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCiMgRXhwb3J0LCBjb252ZXJ0ICYgY2xlYW51cAojIC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tClsgLWQgL3Zhci9saWIvb25lLy9kYXRhc3
RvcmVzLzEwMC8zOC9iYWNrdXAgXSB8fCBta2RpciAtcCAvdmFyL2xpYi9vbmUvL2RhdGFzdG9yZXMvMTAwLzM4L2JhY2t1cAoKZW52IENFUEhfVVNFUj0ibGlidmlydCIgcnVieSAvdmFyL3RtcC9vbmUvdG0vbGliL2JhY2t1cF9yYmQucmIgb25lL29uZS0xNyBvbmVfY
mFja3VwXzEgb25lL29uZS0xN0BvbmVfYmFja3VwXzIgL3Zhci9saWIvb25lLy9kYXRhc3RvcmVzLzEwMC8zOC9iYWNrdXAvZGlzay4wLjIKCgpjZCAvdmFyL2xpYi9vbmUvL2RhdGFzdG9yZXMvMTAwLzM4L2JhY2t1cAoKCgo=" | base64 -d > "$script"; t
rap "rm $script" EXIT; bash "$script"; '
Sat Jan 31 08:33:25 2026 [Z0][VMM][I]: + '[' -d /var/lib/one//datastores/100/38/backup ']'
Sat Jan 31 08:33:25 2026 [Z0][VMM][I]: + env CEPH_USER=libvirt ruby /var/tmp/one/tm/lib/backup_rbd.rb one/one-17 one_backup_1 one/one-17@one_backup_2 /var/lib/one//datastores/100/38/backup/disk.0.2
Sat Jan 31 08:33:25 2026 [Z0][VMM][I]: backup_rbd.rb: Starting Ceph full backup.
Sat Jan 31 08:33:25 2026 [Z0][VMM][I]: Terminated
Sat Jan 31 08:33:26 2026 [Z0][VMM][I]: Successfully execute transfer manager driver operation: postbackup_live.
Sat Jan 31 08:33:26 2026 [Z0][VMM][I]: Failed to execute transfer manager driver operation: prebackup_live.
Sat Jan 31 08:33:26 2026 [Z0][VMM][E]: BACKUP: Command failed:export LANG=Cexport LC_ALL=Cset -ex -o pipefail# ----------------------# Prepare backup folder# ----------------------[ -d /var/lib/one//data
stores/100/38/backup ] && rm -rf /var/lib/one//datastores/100/38/backupmkdir -p /var/lib/one//datastores/100/38/backupecho "PFZNPgogIDxJRD4zODwv
...
CA8SUQ+MTIxPC9JRD4KICAgICAgPElEPjEyNDwvSUQ+CiAgICA8L0JBQ0tVUF9JRFM+CiAgPC9CQUNLVVBTPgo8L1ZNPg=='+ rbd --id libvirt snap create one/one-17@one_backup_2Creating snap: 10% complete...^MCreating snap: 100% complete...done.+ rbd --id libvirt snap protect one/one-17@one_backup_2+ ssh dewey.tweax.net ' script="$(mktemp)"; echo "c2V0IC1leCAtbyBwaXBlZmFpbAoKIyAtLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQojIFNhdmUgVFBNIHN0YXRlCiMgLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KCgojIC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCiMgRXhwb3J0LCBjb252ZXJ0ICYgY2xlYW51cAojIC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tClsgLWQgL3Zhci9saWIvb25lLy9kYXRhc3RvcmVzLzEwMC8zOC9iYWNrdXAgXSB8fCBta2RpciAtcCAvdmFyL2xpYi9vbmUvL2RhdGFzdG9yZXMvMTAwLzM4L2JhY2t1cAoKZW52IENFUEhfVVNFUj0ibGlidmlydCIgcnVieSAvdmFyL3RtcC9vbmUvdG0vbGliL2JhY2t1cF9yYmQucmIgb25lL29uZS0xNyBvbmVfYmFja3VwXzEgb25lL29uZS0xN0BvbmVfYmFja3VwXzIgL3Zhci9saWIvb25lLy9kYXRhc3RvcmVzLzEwMC8zOC9iYWNrdXAvZGlzay4wLjIKCgpjZCAvdmFyL2xpYi9vbmUvL2RhdGFzdG9yZXMvMTAwLzM4L2JhY2t1cAoKCgo=" | base64 -d > "$script"; trap "rm $script" EXIT; bash "$script"; '+ '[' -d /var/lib/one//datastores/100/38/backup ']'+ env CEPH_USER=libvirt ruby /var/tmp/one/tm/lib/backup_rbd.rb one/one-17 one_backup_1 one/one-17@one_backup_2 /var/lib/one//datastores/100/38/backup/disk.0.2Terminated
Sat Jan 31 08:33:26 2026 [Z0][VM][I]: New LCM state is RUNNING
If I try to understand what I find in the logs, it looks like creating the snapshot works fine. Then it’s the script:
CEPH_USER=libvirt ruby /var/tmp/one/tm/lib/backup_rbd.rb one/one-17 one_backup_1 one/one-17@one_backup_2 /var/lib/one//datastores/100/38/backup/disk.0.2
That keeps waiting. There is no timeout. The script is in sleep state until I kill it manually. The rsync transport never gets hit because this script does not terminate with success.
Best regards,
Tobias