we run several very busy VMs with large dirty pages and when we had to live migrate them to another host
(for example in the case of maintenance of a VMM host) we run into problems (migration job runs forever).
Of course we modified the migrate driver ( /var/lib/one/remotes/vmm/kvm/migrate ) to use more bandwidth
and played with libvirt parameters like adding “–timeout seconds”. Currently we use up to 500 MB/s for
live migration traffic on our 2x 10G VMM network with the following addition in the driver:
exec_and_log "virsh --connect $LIBVIRT_URI migrate-setspeed $deploy_id 500" \
"Setspeed to 500M is set!"
But we also had to set virsh migrate-setmaxdowntime one-$id downtime - but this is not possible because of the serial operation of the driver (BASH script).
Is there any other option setting migration speed and max-downtime on the fly?
Currently we do this by manual intervention when busy VMs had to be live-migrated.
did you solve this problem after all? I am running into the same problem - one of my big-ish VMs is practically unmigratable because of that, and this makes installing host updates (such as the current flood of Meltdown/Spectre related kernel updates) a nightmare.
FWIW, here is the dirty memory size after several hours after onevm resched:
I even tried to login to the host where the VM was running, killed the virsh migrate process and ran virsh migrate-setmaxdowntime 2000 (was 300 by default), but it did not help. Only after killing the migration process once again, setting max downtime to 20000 (20 seconds), and rescheduling the VM again, the migration finished in several minutes. The last “memory remaining” measurements for every 5 seconds were these:
So I guess max downtime of 5-10 seconds would be sufficient.
According to this presentation from 2015 it should be possible to set up a host-wide time limit for migrating all VMs away from the host, but I don’t know where to set it up neither in libvirt nor in OpenNebula. I think onevm flush command should use it, if possible.
Hello, personally I use compression. In newer libvirt and qemu there is also posibility to use post-copy instead of pre-copy migration, but you need support in kernel. There is also post-copy-after-pre-copy option.