Background: I’ve got multiple working OpenNebula 4.12.1 zones using NFS-mounted shared datastores, mainly using the qcow2 transfer driver. This has been adequate for most users, who never notice the performance issues of their virtual disks.
Root Problem: A prospective customer is convinced that some of his VMs must have at least 150MB/s sustained write capability at all times. This is more than our NFS infrastructure can reliably provide for one VM.
Hypothetical Solution: Since our hosts have adequate unused local disk for the VMs this customer needs, I’d like to just create a system datastore on local disk using the ssh driver, so that deployment means having the target host copy the VMs (persistent, qcow2) disk image from the image datastore to the local DS on the host OR (better) creates a new qcow2 file on the host’s local DS using the original image on the shared image datastore as its backing store, so that guest writes all go to fast local disk. For shutdowns or migrations (not live, obviously) the data on the hosts’s local system datastore would be copied back to the shared image repository for later redeployment.
Proximate problem: having read the storage documentation many times over the past 3 years and re-reading much of it in the past week, I’m unable to find any way to make this sort of deployment work. It seems that if I use a shared fs image repository, OpenNebula always creates a symlink for the system disk image, which means I’ve got the VM doing writes to the NFS share, which makes it a non-solution. The only thing I’ve been able to come up with is to create another image repository DS (on the NFS server, because that’s where I have space) configured to use the ssh TM and clone the images for the relevant DMs into it. This works, but it slows deployment and taxes the frontend needlessly, because it’s pulling the whole image from a NFS mount and writing it out via ssh (over a lesser network) rather than just telling the host to copy the image from the NFS server directly. Also, because OpenNebula doesn’t know the host can see the image in shared storage, it copies the whole image back and forth as needed rather than just using the handy qcow2 backing store trick.
I can’t believe that I’m the first person to have wanted this model of deployment and that there’s not some combination of datastore, image, and VM template attributes that will give me a more efficient and less cluttered method of getting a persistent image deployed to host-local storage for performance.
The “dirty” solution is you can create cluster for each nodes and have the image stored on the respective cluster. So when you deploy the image, it will start from the node where the cluster is mapped. Of course this solution doesnt provide HA
We enhanced our NFS infrastructure and educated the customer about the nature of reality on ANY “cloud” platform sufficiently to make our standard deployment model using NFS datastores acceptable. Problem solved.
I have found no way to make persistent system images that reside when not in use on qcow2 (shared) datastores deploy efficiently to a host-local System datastore and have runtime changes persisted back to the original location when undeployed without performing whole-image copies through the management interfaces of the front-end and host in both directions.
As soon as I can add 6 hours of alert & focused coding time to each day for a few weeks, I will try to write a custom driver to do this. Absent such a magical 30-hour day, it shall remain a fantasy.
I made a copy of the ssh driver and some small modifications to use rcp instead of ssh to copy the persistent images to the nodes . I tried different ways, using the hpn ssh variant
etc. but the problem was always using scp/ssh for data transfer which was very limited and not able to saturate a 10g network. So I decided to go for rcp, as we dont need encryption for the transfers.
The rcp variant indeed still needs lots of GB to be transferred during deploy or shutdown but using a 10G network this is much faster than using ssh (I measured up to 700MB/sec depending on the IO workload and remaining capacitiy available on the nodes).
Before I did it similar using modifications to the ssh tm_mad logging in to the nodes and copying the images from an NFS server. But I did not want to have each node to get access to the persistent image datastore.
So this way I got
No shared filesystem for the image repository (this is “local” to the frontend only)
up to 8 times faster deployment times compared to ssh
depending on SLAs (downtimes during migrations etc.) it is still usable even with VMs having larger qcow2 disks.
as a result the speed of the local disks (especially when using SSD only hosts)
The main requirement is, that the front end has enough bandwidth to read from and write to the image repository.
The nodes need only little preparation (rshd to be installed).
This can be perfectly mixed with other storage integrations (NFS, Moose etc.).
If you are interested I can send more details and also a diff of the modifications I made to the ssh driver to create the rcp driver out of it.