Issues with fs_lvm

Hi,

I wanted to share to issues we experienced with fs_lvm, and how we mitigated it.

Problems

  1. VM migration was broken. The datastore is expected to be shared, even if it the documentation say it doesn’t need to be. And if we share it, then volatile disks are stored on the NFS host, not on the local machine.

  2. Upon host restart the VMs couldn’t reboot. The VMs directory structure was still there and in good shape, but the LV was no longer activated, causing the VM to fail to boot.

  3. fs_lvm force raw image conversion from qcow2 during clone. This would break thin provisioning (if our SAN supported it) and also causing longer “prolog” since, the full disk (filled with zero) would be copied instead of than the very small image (8GB vs 600MB, in our case)

  4. Since the image store “TM_MAD” is used to trigger the lvm “copy” we need two image stores. One standard for everything else (local storage, etc.) and a second one just for fs_lvm. And this is even if the images are not really stored on LVM but are both on local storage anyway.

Mitigation

  1. Modified the MV script to actually copy (via rsync + ssh) the symlinks and volatile disks before activating the LV.

  2. Added a hook on VM start, to double-check that the LV is activated, if a LV symlink is present and but broken.

  3. Modified the clone script to use dd instead of qemu-img convert. Also, we run qemu-resize to fix the qcow image size.

  4. We didn’t fix this one yet, but I included it anyway. Looking at the code, I understand why this is. It would seem natural, however, than if we select an SSH or Shared image and we select the VM to be deployed on a “fs_lvm” datastore, it should work. Do you have any plan on improving this? We thought of making a new “wrapper” TM, say “ssh+fs_lvm”, that would wrap SSH and FS_LVM together and trigger the real TM according to the destination DS TM .

Finally

  • Here are the added and modified files (gist)

  • Those changes will work for us on the short term. But any idea if those could get included upstream? Or you have other solutions for the future to fix these issues?

  • Which of those should considered bugs, and filed in the bug tracker?

  • Hopefully, those tricks can be used by other organizations running with the same problems.

I have opened a ticket to change the documentation and state that system datastore needs NFS:

https://dev.opennebula.org/issues/4950

We can take a looks about the volatile disks problem. Maybe we can alleviate the problem creating volatile disks. Would this be enough for your use case?

You’re right and thanks for the hook. I’ve opened another issue for this:

https://dev.opennebula.org/issues/4951

This one is a bit tricky as we would like to have it integrated in the driver itself. We will check what could be the best way to implement activation for already created VMs.

We believe that most people use lvm for performance reasons so writing qcow2 format into it doesn’t make that much sense. I’ll be checking if thin LVs could be used and if they fix the problem.

Moreover, I think that resized qcow2 inside the LV can cause problems when the disk is almost full as qcow2 also stores metadata so the total file size for a full disk will be higher that LV size.

Can you please elaborate on this? We don’t fully understand what you are proposing.

Thanks! I think this could be the start of a document on how to configure fs_lvm for non shared system datastores.

We are going to check if there is a way to support both shared and non shared configurations of system datastore.