The migration time improvement

Hello,

I work on a live migration Improvement project using OpenNebula solution,
for now the idea that I have is to analyze the migration scripts, see the files sent over the network when migrating, try to compress when sending or try to change the method of compression if the solution already can compress the files before sending them.

If you have documents that can help me or track to follow, thank you for sharing

1 Like

Hi,

Opennebula uses the migrate script of the vmm actions to initiate the live migrations. Take a look at the vmm drivers, located at /var/lib/one/remotes/vmm/

More info here:
http://docs.opennebula.org/4.14/integration/infrastructure_integration/devel-vmm.html

1 Like

Hi,

IMO there are different places that can be improved.

First of all, the disks of the VM should be on a “shared” datastore because the disks images must be accessible from both source and destination hosts. Before VMM_MAD/migrate is called the TM_MAD/premigrate script is called to handle disks. At this stage the speed depend on the exact TM_MAD driver used for the disk images and the number of the attached disks.

Then the VMM_MAD/migrate is called on the source host. The migration is done with libvirt via qemu+ssh protocol so there is not much space for improvements. If you have faster network available(10G,40G), you can tweak the VMM_MAD/migrate to migrate over the faster network(we have a variant that is in our pipeline to push it soon to our addon). Another option is to try some different migration scenarios.

After successful VM migration TM_MAD/postmigrate script is called to cleanup the source host.

By default TM_MAD/ssh has no live migration enabled because it is not shared and the volatile disks that are located on the SYSTEM datastore are killer to transfer over the network.

In the VM home located in the SYSTEM datastore there are 3 of 4 files that are bigger and need special handling:

  1. persistent/non-persistent disk images
  2. volatile disk images
  3. the contextualization ISO disk image
  4. the checkpoint file (not related to the live migration)

if the TM_MAD we supports both SYSTEM and IMAGE datastores and can provide fast access for persistent/nonpersistent, volatile and context disk images to both source and the destination hosts there is a chance to have really fast live migration.

As our addon-storpool support both SYSTEM and IMAGE datastores the VM migration is very fast. Even the “cold” migration is faster because only the deployment XML of the VM is transferred over the network. This is because we are extending the TM_MAD/ssh and TM_MAD/shared with our pre/postmigrate scripts that handle all of the bigger files in the VM home (the checkpoint import/export to storpool block device is WIP in the “next” branch).

Take a look at our addon and use the README.md file (or the install.sh file but here is a mess to handle upgrades) as a guide where/how we are integrating with the other modules of OpenNebula. I’ll be happy to answer any technical questions If something is not clear.

Kind Regards,
Anton Todorov

1 Like

Hi @soufian @cmartin,

Finally I have some time to play with the VM migrations.

With a little tweak I’ve managed to live migrate a VM using 10G interface. The migration time (as expected) is almost 10x faster. My tests shows that is is not destructive and by default the scripts will behave as usual.

You can find attached a patch file00-vmm_kvm_migrate.patch (848 Bytes)

The patch is introducing extra argument for the migration interface which is the default dest_host but with appended “optional” variable DEST_HOST_APPEND

Here is simple followup how to enable the change

# on each host:
echo "IP_OF_THE_10G_IFACE1 hostname1-10g1" >>/etc/hosts
echo "IP_OF_THE_10G_IFACE2 hostname2-10g1" >>/etc/hosts
echo "IP_OF_THE_10G_IFACE3 hostname3-10g1" >>/etc/hosts

# add the append to the kvmrc
echo "DEST_HOST_APPEND=\"-10g1\"" >>/var/lib/one/remotes/vmm/kvm/kvmrc

# assuming the patch is in the user's home
cd /var/lib/one/remotes/vmm/kvm/
patch -p0 < ~/00-vmm_kvm_migrate.patch

# sync the changes
su - oneadmin
onehost sync --force

Carlos, i’ve spot a little bug in the migrate_local script here is the pull request against the current master branch.

Kind Regards,
Anton Todorov

Hi,

I’ve added the issues to the dev portal too:

  • bug4291 - missing MIGRATE_OPTIONS variable in vmm/kvm/migrate_live
  • feature4292 - add possibility to live migrate via another interface

Cheers,
Anton Todorov