Backup live VMs with snapshots (--quiesce --atomic)

Hi all,

I want to configure backups for live VMs without interupting VMs, I googled that only possible way is to do it with Qemu Agent on VM and --quiesce --atomic options. Unfortunately I am not able to do so :

virsh snapshot-create-as --domain one-53 tempsnap “Temporary snapshot used while backing up one-53” --disk-only --diskspec vda,file="/tank/backup/one-53-tempsnap.qcow2" --quiesce --atomic
error: internal error: unable to execute QEMU agent command ‘guest-fsfreeze-freeze’: failed to freeze /: Device or resource busy

My goal is to use BORG for backups (http://borgbackup.readthedocs.io/). So :

  1. create snapshot from qcow2 image
  2. backup snapshoted disk with borg
  3. then I would like to blockcommit snapshot so it will be merged with changes
  4. delete snapshot

This way I should be able to backup whole live VMs without interrupting them and I will have precise and consistent backups. For more info : http://borgbackup.readthedocs.io/en/stable/faq.html?highlight=VM#can-i-backup-vm-disk-images

Can someone tell me what am I doing wrong ? Why QEMU agent command ‘guest-fsfreeze-freeze’: failed to freeze /: Device or resource busy ?

BR

EDIT: I am running Opennebula 5

Hi Martin,

You should make sure that qemu-guest-agent is sunning in the guest vm and the libvirt socket is defined and available in libvirt. for example:

[root@s05 ~]# virsh dumpxml one-5 | grep guest_agent
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-18-one-5/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>

When the agent is running you’ll have

[root@s05 ~]# virsh dumpxml one-5 | grep guest_agent
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-18-one-5/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
[root@s05 ~]# virsh qemu-agent-command one-5 '{"execute":"guest-fsfreeze-status"}'
{"return":"thawed"}

I’d check is the socket created by libvirtd exists.

Best Regards,
Anton Todorov

Hi,
agent is OK.

virsh dumpxml one-53 | grep guest_agent

  <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-one-53/org.qemu.guest_agent.0'/>
  <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>

virsh qemu-agent-command one-53 ‘{“execute”:“guest-fsfreeze-status”}’

{“return”:“thawed”}

virsh qemu-agent-command one-53 ‘{“execute”:“guest-fsfreeze-freeze”}’

error: internal error: unable to execute QEMU agent command ‘guest-fsfreeze-freeze’: failed to freeze /: Device or resource busy

hm. then you should check for clues the qemu-ga logs in the VM (even try enabling debug mode) and on the hypervisor the libvirt logs too.
Also, is selinux or similar running in the VM blocking the request?

BR,
Anton

I just found that fsfreeze is not working when docker is installed on VM.
When docker is running mount command looks like…

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=1014764k,nr_inodes=253691,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=204812k,mode=755)
/dev/vda1 on / type ext4 (rw,relatime,data=ordered)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=24,pgrp=1,timeout=0,minproto=5,maxproto=5,direct)
mqueue on /dev/mqueue type mqueue (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
/dev/vda1 on /var/lib/docker/overlay type ext4 (rw,relatime,data=ordered)
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=204812k,mode=700,uid=1001,gid=1002)
tracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)

There is /dev/vda1 two times, this is probably bug of fsfreeze because when I tried to do on normal VM without docker installed, its gonna throw another Error (one-77 is VM without docker installed):

virsh snapshot-create-as --domain one-77 tempsnap “Temporary snapshot used while backing up one-77” --disk-only --diskspec vda,file="/tank/volume1/77/one-77-tempsnap.qcow2" --quiesce --atomic
error: internal error: unable to execute QEMU command ‘transaction’: Could not create file: Permission denied

Moreover I realized that on VM where docker is installed I am able to issue command “/sbin/fsfreeze --freeze /” and FS will froze. Than I am able to “thaw” FS with command :

virsh qemu-agent-command one-53 ‘{“execute”:“guest-fsfreeze-status”}’
{“return”:“thawed”}

Are you familliar with next error ? error: internal error: unable to execute QEMU command ‘transaction’: Could not create file: Permission denied

As it is executed on the hypervizor, it looks like qemu-kvm has no rights to write (I am guessing) the snapshot file or some transient lockfile…

BR,
Anton

Yes its on hypervisor, but i am executing virsh as root. Can you be more specific in order to describe what is snapshot file or transient lockfile ? Dunno what steps I should have to make to fix this. I am running gluster as /tank/volume1, maybe its related to that.

Related to previous error I also found : https://bugs.launchpad.net/qemu/+bug/1587065

Second error is caused by dynamic_ownership=0 setting in /etc/libvirt/qemu.conf ? Can someone confirm this setting is required for opennebula ?

When I touch snapshot file and chown this file as oneadmin, virsh snapshot command works as intended.

I have Opennebula 5 but there is dynamic_ownership=0 in 4.x guidelines http://docs.opennebula.org/4.12/design_and_installation/quick_starts/qs_ubuntu_kvm.html#configure-qemu .

EDIT: @jfontan , sorry for annoying, but dont you know answer please :slight_smile: ?

hello, would you share your backup script? thanks

I don’t have one, i am only testing if my approach is possible and OK. later I can share it, but for now it’s only manual commands.

Hi @feldsam ,

did you managed how to backup VMs online without downtime with Opennebula ?
Can you share some ideas ?

I did not found any suitable solution thought. :frowning:

BR

Hi @Snowman, yes I written backup script in nodejs using opennebula xml rpc api and ssh.

principialy I use external snapshot, backup, blockkcommit like is in this official article

http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit

Is there any interuption (freeze) of service like ping or SSH while external snapshot is creating or when blockcommit is in place ? Thanks.

Hi, I don’t see any interruption

Hi @feldsam,

do you also use some backup solution ? Or you just Rsync-ing images to backup storage outside of datastore storage.

BR,
Martin

Hello, I wrote own backup script in nodejs. I select images from XML-RPC API a do various checks like if is it persistent image, if is mounted to some VM or not… For live snapshots I use previously linked libvirt article, for copy over network I use rsync. At the end I copy deployments files except non-persistent images from system datastores. Script also maintain directory structure identical with opennebula datastores, so in case of crash I can just copy everything back a start VMs (except non-persistent ones).

Hi Kristian,

when you are about to create snapshot with

$ virsh snapshot-create-as --domain vm1 guest-state1
–diskspec vda,file=/export/images/overlay1.qcow2
–disk-only --atomic

are you also using parameters --disk-only and --atomic ? I’ve read that for atomic snapshots it’s required to run guest agent on VM host otherwise if domain has no guest agent, snapshot creation will fail.

Hello, yes and also if VM have more disk attached I specify on other disks snapshop=no

example snapshoting only second disk
virsh -c qemu+tcp://localhost/system snapshot-create-as --domain one-7738 weekly-backup --diskspec vda,snapshot=no --diskspec vdb,file=/var/ds/datastore2/snapshots/one-7738-weekly-backup --disk-only --atomic --no-metadata

I have guest agent in all machines, but I think that it is required only for --quiesce option

NOTE-1: Above, if you have QEMU guest agent installed in your virtual machine, try ‘–quiesce’ option with virsh snapshot-create-as[. . .] to ensure you have a consistent disk state.

In near future, probably, I opensource my backup script

It would be great! I am glad to hear that it will may become opensource.

I am looking for some solution when I will have backup also from VMs that are not running Qemu Agents, but maybe this is imposible. Owners of VMs can shutdown agent from within VM OS, and that way I will not be able to backup these VMs.

Yes, I think about when I considering use of ‘–quiesce’ option. There can be test, if command was sucessful and if not try run without that option. I have to tune my script, I also want to enable som ‘retry’ functionality like ansible playbooks have - in case of script crash will be good to continue from image where script crash etc…

Also you should have prepared for ex. Mysql appliaces with qemu-guest-agent plugin, which force flush memory buffers to disk before fs freeze and snapshoting. https://github.com/qemu/qemu/blob/master/scripts/qemu-guest-agent/fsfreeze-hook.d/mysql-flush.sh.sample

and explain this service and functionality as crucial for backup process to you customers.

1 Like