Multiple ESX VMs instantiation from one image error - failed to lock source file

Hello!

Faced the following problem while trying to instantiate multiple ESX VMs from one template.
The template contains one image, so all VMs are cloned from the same OS image.
Datastore is vmfs shared via SAN.

Very first VMs is created successfully, but all others are failed with the following error:

Fri May 22 13:24:09 2015 [Z0][TM][I]: DiskLib_Check() failed for source disk Failed to lock the file (16392).

How to fix?

Thanks!

Mikhail

p.s. OpenNebula - 4.12.1, ESX 5.5.0

Hi,

Can you manually check if there is snapshots associated with the original file (you can get the PATH of the file with oneimage show)?

Best,

-Tino

Hello!

There are no existing snapshots for the image:

NEBULA:

[oneadmin@nebula ~]$ oneimage list
ID USER GROUP NAME DATASTORE SIZE TYPE PER STAT RVMS
5 oneadmin demogroup redhat6_64_mts ds-esx-tb- 1.5G OS No used 3

[oneadmin@nebula ~]$ oneimage show 5 | grep SOURCE
SOURCE : /vmfs/volumes/102/a5e57dde34f0ea443512db1268823810

ESX:

~ # ls -la /vmfs/volumes/102/a5e57dde34f0ea443512db1268823810
total 1527816
drwxr-xr-x 1 root root 420 May 15 12:58 .
drwxr-xr-t 1 root root 2520 May 19 04:45 …
-rw-r–r-- 1 root root 1562735104 May 22 08:24 disk.vmdk
~ #

As I can see from logs nebula uses ‘cp’ to make image clone, not snapshots.
It seems that ESX doesn’t allow multiple processes to lock and read one vmdk file.
So, Nebula cannot instantiate multiple VMs simultaneously from one image.

Full error log attached.

Thanks

Mikhail
error.txt (3.0 KB)

Hi,

The vmfs tm drivers should use vmkfstools to clone for any image that is not a CDROM.

In any case, let’s discard the issue with concurrency, could you lower the threads on the TM (in oned.conf file):

TM_MAD = [
executable = “one_tm”,
arguments = “-t 1 -d dummy,lvm,shared,fs_lvm,qcow2,ssh,vmfs,ceph,dev”
]

Change it as above (-t 15 --> -t 1), and try again.

Best,

-Tino

Hi!

Change it as above (-t 15 → -t 1), and try again.

Done. Success.
But virtual machines were created sequentially and not in parallel.

Is it possible to do in parallel?

Thanks!

Mikhail

As I can see from logs nebula uses ‘cp’ to make image clone, not snapshots.

Could you please share these logs?

Hi!

I mistakenly thought that this phase of creating cdrom image.
cdrom image is created with cp

Sorry.

Mikhail

HI, Tino!

Is there a solution for parallel VM instatiation ?

Thanks!

Mikhail

We need to analyze the problem, please share the oned.log and the log of a VM when the image is locked.

-Tino

Hello!

We need to analyze the problem, please share the oned.log and the log of a VM when the image is locked.
-Tino

I’ve tried to instantiate two VMs at Wed May 27 17:34:57 2015.
oned.log attached.

Mikhail

oned.log (93.6 KB)

Hi Mikhail,

There is no useful informatoin in the log provided, could you change “-t 1” to “-t 15” again, launch several VMs, wait for the fail, and send us the oned.log file again?

Thanks for your feedback,

-Tino

Hi, Tino!

That log was already generated while -t 15 was in effect.

Mikhail

Hi Mikhail,

It has been indeed. Apologies, I somehow missed it.

vmkfstools allows for concurrent copies, so it should allow for multiple VM deploying from the same non persistent image.

We cannot reproduce the problem, so we need to ask you for more information. The steps would be:

1- Increase the threads to "-t 15"
2- Launch VMs until they fail
3- From the log, pick the clone line that fails, something like the following:

/var/lib/one/remotes/tm/vmfs/clone nebula.ural.inside.mts.ru:/vmfs/volumes/102/a5e57dde34f0ea443512db1268823810 neb-zoo-14.ural.inside.mts.ru:/vmfs/volumes/100/73/disk.0 73 102

4- Execute the line with “bash -xv” to extract more information:

bash -xv /var/lib/one/remotes/tm/vmfs/clone nebula.ural.inside.mts.ru:/vmfs/volumes/102/a5e57dde34f0ea443512db1268823810 neb-zoo-14.ural.inside.mts.ru:/vmfs/volumes/100/73/disk.0 73 102

And send the output please.

Hi, Tino!

Requested output attached.

Mikhail

20150529.log.txt (25.8 KB)

Hi Mikhail,

This is going to be tricky to debug, since I cannot reproduce it and the clone operation from which you sent the output did work.

What I’m trying to ascertain is if the fail comes from the CDROM (copied using cp) or the image (copied using vmkfstools). Instead of intrusively setting debug, maybe the best way would be if you could perform the following test:

  • Launch VMs with CDROM and no image
  • Launch VMs without CDROM and the image

This way we can fund out if the cp is the culprit, and try to work around that, or if the vmkfstools is to blame.

Hello Mihail,

i found your post / topic and it sounds a lot like the issue i am experiencing on our
OpenNebula implementation. Did you ever resolve your issue ?
Or did you have to “live” with -t 1 aka serial deployment" on your ESXI servers ?

Regards

Jurgen