Random Failure Cloning from Image Storage to System Storage on ONE 4.14

Hello Everybody,

our ONE installation 4.14 is running mostly just fine. We have separated sunstone and the
core daemon into 2 separate servers (VMs), use ESXI hypervisors with version 5.1 and up
and the system(0) and image(1) datastores are mounted on each of these hypervisors as
iscsi targets.
Lately we encounter the problem that once in a while the deployment of a VM while fail
with following error message:
Mon Aug 1 14:21:21 2016 : Error executing image transfer script: + echo ‘Error cloning one-core.jaspersoft.com:/vmfs/volumes/1/7ee728aa83c17f809f730980439acc6a to vmware8.jaspersoft.com:/vmfs/volumes/0/5346/disk.0’
(one-core being the server running the core daemons - scheduler and oned) - vmware8
being one of the ESXI hypervisors.
We first noticed this issue occurring when another admin started deploying multiple
VMs via salt-cloud in parallel. I was also able to replicate the issue when deploying
the same template multiple times (10x usually) via the Instantiate command.
What puzzles me is that the location on one-core doesn’t have the image. The front-end
doesn’t provide access to the image store. Only the hypervisors. But somehow in the
case of a failed deployment this location (one-core) is used to copy/clone the image.
The script used in that case is the …/remotes/tm/vmfs/clone command. On successful
insantiation this script is not used (it looks like).
The only explanation i might have is that ONE tries to stage the image in that location
respectively pick it up from there. The documentation says something about the vmfs
datastore sometimes needing to stage images on the front end.
In that case it would look like the frontend doesn’t provide the image on the frontend
location but wants to pick it up from there.

Does this sound familiar at all ?

Regards

Jurgen