Is there an efficient local disk system datastore mechanism for persistent images?

Background: I’ve got multiple working OpenNebula 4.12.1 zones using NFS-mounted shared datastores, mainly using the qcow2 transfer driver. This has been adequate for most users, who never notice the performance issues of their virtual disks.

Root Problem: A prospective customer is convinced that some of his VMs must have at least 150MB/s sustained write capability at all times. This is more than our NFS infrastructure can reliably provide for one VM.

Hypothetical Solution: Since our hosts have adequate unused local disk for the VMs this customer needs, I’d like to just create a system datastore on local disk using the ssh driver, so that deployment means having the target host copy the VMs (persistent, qcow2) disk image from the image datastore to the local DS on the host OR (better) creates a new qcow2 file on the host’s local DS using the original image on the shared image datastore as its backing store, so that guest writes all go to fast local disk. For shutdowns or migrations (not live, obviously) the data on the hosts’s local system datastore would be copied back to the shared image repository for later redeployment.

Proximate problem: having read the storage documentation many times over the past 3 years and re-reading much of it in the past week, I’m unable to find any way to make this sort of deployment work. It seems that if I use a shared fs image repository, OpenNebula always creates a symlink for the system disk image, which means I’ve got the VM doing writes to the NFS share, which makes it a non-solution. The only thing I’ve been able to come up with is to create another image repository DS (on the NFS server, because that’s where I have space) configured to use the ssh TM and clone the images for the relevant DMs into it. This works, but it slows deployment and taxes the frontend needlessly, because it’s pulling the whole image from a NFS mount and writing it out via ssh (over a lesser network) rather than just telling the host to copy the image from the NFS server directly. Also, because OpenNebula doesn’t know the host can see the image in shared storage, it copies the whole image back and forth as needed rather than just using the handy qcow2 backing store trick.

I can’t believe that I’m the first person to have wanted this model of deployment and that there’s not some combination of datastore, image, and VM template attributes that will give me a more efficient and less cluttered method of getting a persistent image deployed to host-local storage for performance.

Same problem here… did you find a solution?

Regards.

The “dirty” solution is you can create cluster for each nodes and have the image stored on the respective cluster. So when you deploy the image, it will start from the node where the cluster is mapped. Of course this solution doesnt provide HA

Hi!

I think its possible to setting up with only one cluster. I have posted another topic with the details. Could you please check it out?

Regards.

Yes and no.

We enhanced our NFS infrastructure and educated the customer about the nature of reality on ANY “cloud” platform sufficiently to make our standard deployment model using NFS datastores acceptable. Problem solved.

I have found no way to make persistent system images that reside when not in use on qcow2 (shared) datastores deploy efficiently to a host-local System datastore and have runtime changes persisted back to the original location when undeployed without performing whole-image copies through the management interfaces of the front-end and host in both directions.

As soon as I can add 6 hours of alert & focused coding time to each day for a few weeks, I will try to write a custom driver to do this. Absent such a magical 30-hour day, it shall remain a fantasy.

Beyond not allowing for HA, that would essentially make OpenNebula nothing but pointless overhead for the hosts configured that way.

Hi,

I made a copy of the ssh driver and some small modifications to use rcp instead of ssh to copy the persistent images to the nodes . I tried different ways, using the hpn ssh variant
etc. but the problem was always using scp/ssh for data transfer which was very limited and not able to saturate a 10g network. So I decided to go for rcp, as we dont need encryption for the transfers.

The rcp variant indeed still needs lots of GB to be transferred during deploy or shutdown but using a 10G network this is much faster than using ssh (I measured up to 700MB/sec depending on the IO workload and remaining capacitiy available on the nodes).
Before I did it similar using modifications to the ssh tm_mad logging in to the nodes and copying the images from an NFS server. But I did not want to have each node to get access to the persistent image datastore.
So this way I got

  • No shared filesystem for the image repository (this is “local” to the frontend only)
  • up to 8 times faster deployment times compared to ssh
  • depending on SLAs (downtimes during migrations etc.) it is still usable even with VMs having larger qcow2 disks.
  • as a result the speed of the local disks (especially when using SSD only hosts)

The main requirement is, that the front end has enough bandwidth to read from and write to the image repository.
The nodes need only little preparation (rshd to be installed).

This can be perfectly mixed with other storage integrations (NFS, Moose etc.).

If you are interested I can send more details and also a diff of the modifications I made to the ssh driver to create the rcp driver out of it.

Best, M

We’ve done something similar for an old testing environment, create a new TM driver. Instead of making a link for persistent images in ln you can do:

qemu-img create -f qcow2 -b $SRC_PATH $DST_PATH

This creates a new qcow2 image that points to the original file. All writes will go to this new file. Just make sure that the datastore is local.

Make sure that mvds script adds the new data to the original presistent image:

qemu-img commit -f qcow2 $SRC_PATH

Hi Michael,

Can you please share the modifications to the ssh driver?
We’re having the same problem and would like to test other approach.

thanks!

Hi emot,

please find attached a short readme and a diff to create the rcp driver from the ssh driver.

Best, Michael

rcp_tm.diff (2,8 KB)

README.rcp (1,6 KB)

If this is of interest we could open an issue to make the ssh driver to be able to work in rcp mode…

1 Like

It seems that the approach of Javi is much more efficient.
Did you guys somehow skip over those lines?

  • would not need the superflous initial transfer
  • better cache efficiency since shared base image
  • works as a simple driver mod I guess

Another thing, if you see bad SSH performance, did you triple-check your AES-NI kernel modules are loaded?

I think anyone would need more real flexibility in this. Not just a switch here and a driver that forces to replace other drivers. :slight_smile: