Is there an efficient local disk system datastore mechanism for persistent images?

billcole · August 6, 2015, 10:09pm

Background: I’ve got multiple working OpenNebula 4.12.1 zones using NFS-mounted shared datastores, mainly using the qcow2 transfer driver. This has been adequate for most users, who never notice the performance issues of their virtual disks.

Root Problem: A prospective customer is convinced that some of his VMs must have at least 150MB/s sustained write capability at all times. This is more than our NFS infrastructure can reliably provide for one VM.

Hypothetical Solution: Since our hosts have adequate unused local disk for the VMs this customer needs, I’d like to just create a system datastore on local disk using the ssh driver, so that deployment means having the target host copy the VMs (persistent, qcow2) disk image from the image datastore to the local DS on the host OR (better) creates a new qcow2 file on the host’s local DS using the original image on the shared image datastore as its backing store, so that guest writes all go to fast local disk. For shutdowns or migrations (not live, obviously) the data on the hosts’s local system datastore would be copied back to the shared image repository for later redeployment.

Proximate problem: having read the storage documentation many times over the past 3 years and re-reading much of it in the past week, I’m unable to find any way to make this sort of deployment work. It seems that if I use a shared fs image repository, OpenNebula always creates a symlink for the system disk image, which means I’ve got the VM doing writes to the NFS share, which makes it a non-solution. The only thing I’ve been able to come up with is to create another image repository DS (on the NFS server, because that’s where I have space) configured to use the ssh TM and clone the images for the relevant DMs into it. This works, but it slows deployment and taxes the frontend needlessly, because it’s pulling the whole image from a NFS mount and writing it out via ssh (over a lesser network) rather than just telling the host to copy the image from the NFS server directly. Also, because OpenNebula doesn’t know the host can see the image in shared storage, it copies the whole image back and forth as needed rather than just using the handy qcow2 backing store trick.

I can’t believe that I’m the first person to have wanted this model of deployment and that there’s not some combination of datastore, image, and VM template attributes that will give me a more efficient and less cluttered method of getting a persistent image deployed to host-local storage for performance.

jcataluna · December 26, 2015, 6:35pm

Same problem here… did you find a solution?

Regards.

anandharaj · December 29, 2015, 2:24am

The “dirty” solution is you can create cluster for each nodes and have the image stored on the respective cluster. So when you deploy the image, it will start from the node where the cluster is mapped. Of course this solution doesnt provide HA

jcataluna · December 30, 2015, 7:35pm

Hi!

I think its possible to setting up with only one cluster. I have posted another topic with the details. Could you please check it out?

Regards.

billcole · December 31, 2015, 12:44am

Yes and no.

We enhanced our NFS infrastructure and educated the customer about the nature of reality on ANY “cloud” platform sufficiently to make our standard deployment model using NFS datastores acceptable. Problem solved.

I have found no way to make persistent system images that reside when not in use on qcow2 (shared) datastores deploy efficiently to a host-local System datastore and have runtime changes persisted back to the original location when undeployed without performing whole-image copies through the management interfaces of the front-end and host in both directions.

As soon as I can add 6 hours of alert & focused coding time to each day for a few weeks, I will try to write a custom driver to do this. Absent such a magical 30-hour day, it shall remain a fantasy.

billcole · December 31, 2015, 1:07am

Beyond not allowing for HA, that would essentially make OpenNebula nothing but pointless overhead for the hosts configured that way.

michael_kutzner · December 31, 2015, 6:59am

Hi,

I made a copy of the ssh driver and some small modifications to use rcp instead of ssh to copy the persistent images to the nodes . I tried different ways, using the hpn ssh variant
etc. but the problem was always using scp/ssh for data transfer which was very limited and not able to saturate a 10g network. So I decided to go for rcp, as we dont need encryption for the transfers.

The rcp variant indeed still needs lots of GB to be transferred during deploy or shutdown but using a 10G network this is much faster than using ssh (I measured up to 700MB/sec depending on the IO workload and remaining capacitiy available on the nodes).
Before I did it similar using modifications to the ssh tm_mad logging in to the nodes and copying the images from an NFS server. But I did not want to have each node to get access to the persistent image datastore.
So this way I got

No shared filesystem for the image repository (this is “local” to the frontend only)
up to 8 times faster deployment times compared to ssh
depending on SLAs (downtimes during migrations etc.) it is still usable even with VMs having larger qcow2 disks.
as a result the speed of the local disks (especially when using SSD only hosts)

The main requirement is, that the front end has enough bandwidth to read from and write to the image repository.
The nodes need only little preparation (rshd to be installed).

This can be perfectly mixed with other storage integrations (NFS, Moose etc.).

If you are interested I can send more details and also a diff of the modifications I made to the ssh driver to create the rcp driver out of it.

Best, M

jfontan · January 5, 2016, 4:22pm

We’ve done something similar for an old testing environment, create a new TM driver. Instead of making a link for persistent images in ln you can do:

qemu-img create -f qcow2 -b $SRC_PATH $DST_PATH

This creates a new qcow2 image that points to the original file. All writes will go to this new file. Just make sure that the datastore is local.

Make sure that mvds script adds the new data to the original presistent image:

qemu-img commit -f qcow2 $SRC_PATH

emot · January 25, 2016, 8:54am

Hi Michael,

Can you please share the modifications to the ssh driver?
We’re having the same problem and would like to test other approach.

thanks!

michael_kutzner · January 30, 2016, 5:35am

Hi emot,

please find attached a short readme and a diff to create the rcp driver from the ssh driver.

Best, Michael

rcp_tm.diff (2,8 KB)

README.rcp (1,6 KB)

ruben · February 3, 2016, 11:03am

If this is of interest we could open an issue to make the ssh driver to be able to work in rcp mode…

darkfader · August 28, 2016, 5:38pm

It seems that the approach of Javi is much more efficient.
Did you guys somehow skip over those lines?

would not need the superflous initial transfer
better cache efficiency since shared base image
works as a simple driver mod I guess

Another thing, if you see bad SSH performance, did you triple-check your AES-NI kernel modules are loaded?

I think anyone would need more real flexibility in this. Not just a switch here and a driver that forces to replace other drivers.

Topic		Replies	Views
Storing images directly on the hosts Product Support	7	1855	December 17, 2022
Question/s about storage Product Support	4	683	September 30, 2018
Local Disk for VM 'scratch' Product Support	5	865	October 13, 2016
Datastore question - local drives Product Support	5	1465	November 24, 2016
Local storage per host Product Support	1	1785	December 31, 2016

Is there an efficient local disk system datastore mechanism for persistent images?

Related topics