Migrate from one ceph cluster to another

We would like to migrate all VMs from one ceph cluster to another one. All the messages in this forum related to this topic are several years old. What is these days the easiest solution to migrate them one after another? With the least downtime possible.

Considering the scarcity of attention to my post CONF 2017 / Storage Management: biggest pain to report in daily use of ONE, I would derive that not much has been changed in this field since then.

Hoping to be proven wrong though, with pointers to new features. :grinning:

So if I was to go manually and reduce downtime for live image migrations, one solution I would think of would be:

  1. Sync live images (e.g. using rsync), accepting temporarily that they may not be consistent
  2. Stop the VM
  3. Sync again (hopefully rsync will find a very limited number of blocks to sync this time)
  4. Detach old image, attach new image
  5. Start the VM

However, depending on your setup, you may have other solutions.

For example, if you use LVM, you may:

  1. create a new empty image on the new ceph DS
  2. attach it to the VM, as a new LVM PV, and live migrate all blocks from old PV to new PV.
  3. delete the old PV when empty, and detach the related image

You might have to consider separately things like /boot that may not be part of LVM. But the big part of the data can be migrated live.

Hi @kitasupport
Thank you very much for your reply. I have seen now your post as well.
Based on the reply of the other guy it looks to me like open nebula tries to support too many different technologies and because of this it can not support one very good. Hence topics like storage live migrations are lagging a little bit behind.

My biggest problem at the moment is how can one ONE node connect to two different ceph clusters at the same time.

Connecting several clusters should work, but not live-migrating an image from one to an other one.

Well, I think the support for multiple technologies has its benefits. Anyway I think they simply included either contributed developments or funded developments, which is the way open source works. Some improvements were made in storage management in latest versions (like offline cloning to a different datastore), but I donโ€™t think this one (live migration) was made.

So I believe you can already use offline cloning of images from one ceph datastore to an other one.

Because a basic Image migration feature is not so complex, you may also consider contributing or funding a development. Basic means all the copy bandwidth might flow through the front-end, which would work for most moderate size installations, but not scale much.

For a more scalable solution, much more work is needed to consider:

  1. all variety of storage types for source and target
  2. all variety of network connectivity between the two
  3. all variety of possible optimizations, including live migration

There is a large variety of possible solutions, improvements, and manual work-arounds in this image live migration critical need.

Critical to a point I would even be satisfied with a โ€œhalfโ€ solution between offline migration and live, like a live migration, requiring an extra short downtime (steps like 1/live main sync which is slow but online, shutdown fast, offline sync finish fast, detach/attach fast, reboot fast). Even if it works only on a subset of the datastore types available (2 NFS or qcow2 would be my preference).

When we can not have immediately a perfect solution, we always can shoot for a partial but strong improvement easily attainable.