Considering the scarcity of attention to my post CONF 2017 / Storage Management: biggest pain to report in daily use of ONE, I would derive that not much has been changed in this field since then.
Hoping to be proven wrong though, with pointers to new features.
So if I was to go manually and reduce downtime for live image migrations, one solution I would think of would be:
- Sync live images (e.g. using rsync), accepting temporarily that they may not be consistent
- Stop the VM
- Sync again (hopefully rsync will find a very limited number of blocks to sync this time)
- Detach old image, attach new image
- Start the VM
However, depending on your setup, you may have other solutions.
For example, if you use LVM, you may:
- create a new empty image on the new ceph DS
- attach it to the VM, as a new LVM PV, and live migrate all blocks from old PV to new PV.
- delete the old PV when empty, and detach the related image
You might have to consider separately things like /boot that may not be part of LVM. But the big part of the data can be migrated live.