Open Nebula : After attaching a disk to a VM, getting "failed to lock the file" When I am starting the machine

Here is the scenario :

I have created a VM using open nebula and vm deployed successfully and all features are working except the below one case

  1. Create an image of 10GB, then go to the VM and attach the disk to the vm.
  2. Then if you login to the vm, the newly added disk is visible inside the vm.
  3. We can reboot the machine( either using open nebula dashboard or from inside the vm) for testing this disk persistant or not, yes it is persistant. Everything works as expected. Newly attached disk is visible inside the Operating system after rebooting also.
  4. But If i poweroff the VM / Shutdown the vm using open nebula dashboard or from inside the vm, then the real issue starts… When I try to power on the vm using play button in dashboard It will not start… When I check my Vcenter I could see that " “VM - Failed to lock the file”, and VM not starting.
  5. What is happening is when i am clicking on play button(start) on open nebula, it is adding 10GB disk again to the server. that means if you go to the vcenter and check vm settings you can see 3rd disk has been added to the same server, but disk is pointing to the same 10GB file location. That’s why it is throwing the error when I start the machine.
  6. So if you are trying to start the vm 5 times that many additional disk are getting added to the VM in vcenter. and VM never starts unless you go to vcenter and delete all vm harddisks manually.

Let me know how to solve this issue.

Thanks,

Hi Sanal,
what OpenNebula version are you running?

Could you paste the output of the following commands?:

  • oneimage show IMAGE_ID -x, where IMAGE_ID is the identifier of the persistent image that you’re having problems with.
  • onedatastore show DATASTORE_ID -x, where DATASTORE_ID is the identifier of the datastore where the previous image is stored.
  • onevm show VMID -x, where VMID is the id of the VM that you’re managing with OpenNebula.

Cheers!

Thank you for your quick reply…

I am using open nebula 5.2.1 version. I have tested the same on 5.2.0 version also. Both versions I am getting the same error.

#[root@localhost ~]# onedatastore show 100 -x
#

100

0

101

oneadmin

PMG-BCS

VNX5200_DS27

<OWNER_U>1</OWNER_U>

<OWNER_M>1</OWNER_M>

<OWNER_A>0</OWNER_A>

<GROUP_U>1</GROUP_U>

<GROUP_M>0</GROUP_M>

<GROUP_A>0</GROUP_A>

<OTHER_U>0</OTHER_U>

<OTHER_M>0</OTHER_M>

<OTHER_A>0</OTHER_A>

<DS_MAD></DS_MAD>

<TM_MAD></TM_MAD>

<BASE_PATH></BASE_PATH>

0

<DISK_TYPE>0</DISK_TYPE>

0

0

<TOTAL_MB>4194048</TOTAL_MB>

<FREE_MB>3842818</FREE_MB>

<USED_MB>351230</USED_MB>

2

3

4

5

6

<CLONE_TARGET></CLONE_TARGET>

<DISK_TYPE></DISK_TYPE>

<DS_MAD></DS_MAD>

<LN_TARGET></LN_TARGET>

<RESTRICTED_DIRS></RESTRICTED_DIRS>

<SAFE_DIRS></SAFE_DIRS>

<TM_MAD></TM_MAD>

<VCENTER_CLUSTER></VCENTER_CLUSTER>

#
#[root@localhost ~]#


[root@localhost ~]# onevm show 8
VIRTUAL MACHINE 8 INFORMATION
ID : 8
NAME : ran_jiva1
USER : rsanal
GROUP : PMG-BCS
STATE : POWEROFF
LCM_STATE : LCM_INIT
RESCHED : No
HOST : POC-Cluster
CLUSTER ID : 0
CLUSTER : default
START TIME : 05/23 13:13:20
END TIME : -
DEPLOY ID : 4224ea6d-1e31-3d71-b356-ff29ffd47c6a

VIRTUAL MACHINE MONITORING
CPU : 0.0
MEMORY : 0K
NETTX : 2.8M
NETRX : 3.3M
ESX_HOST : poc.xxxxx.loc
GUEST_IP : 172.xxx…xxx.xxx.
GUEST_IP_ADDRESSES : fe80::d121:ccc3:c23a:bd10,172.xxxx.xxxx.xxxx.
GUEST_STATE : running
LAST_MON : 1495604785
RESOURCE_POOL : Resources
VMWARETOOLS_RUNNING_STATUS: guestToolsRunning
VMWARETOOLS_VERSION : 10240
VMWARETOOLS_VERSION_STATUS: guestToolsCurrent

PERMISSIONS
OWNER : um-
GROUP : u–
OTHER : —

VM DISKS
ID DATASTORE TARGET IMAGE SIZE TYPE SAVE
0 VNX5200_DS hda 2GBDisk -/2G file YES

VM NICS
ID NETWORK BRIDGE IP MAC PCI_ID

  • Additional IP - xxxx.xxxx.xxxx.xxxx
  • Additional IP - fe80::d121:ccc3:c23a:bd10

VIRTUAL MACHINE HISTORY
SEQ HOST ACTION DS START TIME PROLOG
0 POC-Cluster none -1 05/23 13:13:46 0d 00h13m 0h00m00s
1 POC-Cluster disk-attach -1 05/23 13:34:09 0d 00h21m 0h00m00s
2 POC-Cluster poweroff -1 05/23 13:55:49 0d 00h02m 0h00m00s
3 POC-Cluster resume -1 05/23 13:58:36 0d 00h00m 0h00m00s
4 POC-Cluster resume -1 05/23 13:59:24 0d 00h00m 0h00m00s
5 POC-Cluster none -1 05/23 14:04:25 0d 00h09m 0h00m00s
6 POC-Cluster poweroff -1 05/23 14:22:01 0d 16h29m 0h00m00s
7 POC-Cluster poweroff -1 05/24 06:51:40 0d 00h01m 0h00m00s
8 POC-Cluster resume -1 05/24 06:53:15 0d 00h00m 0h00m00s
9 POC-Cluster poweroff -1 05/24 06:57:33 0d 00h00m 0h00m00s
10 POC-Cluster resume -1 05/24 06:58:35 0d 00h00m 0h00m00s

USER TEMPLATE
DESCRIPTION="Windows 2012 with 40GB C drive"
ERROR="Tue May 23 13:23:37 2017 : Ignoring monitoring information, error:syntax error, unexpected COMMA, expecting $end at line 1, columns 68:69. Monitor information was: GUEST_IP=xxxx.xxxx.xxxx.xxxx GUEST_IP_ADDRESSES=\“fe80::d121:ccc3:c23a:bd10,xxxx.xxxx.xxxx.xxxx\” LAST_MON=1495542217 STATE=a CPU=30.97 MEMORY=1562624 NETRX=0 NETTX=0 ESX_HOST=\“poc.xxxx.loc\” GUEST_STATE=running VMWARETOOLS_RUNNING_STATUS=guestToolsRunning VMWARETOOLS_VERSION=10240 VMWARETOOLS_VERSION_STATUS=guestToolsCurrent RESOURCE_POOL=\“Resources\” "
HYPERVISOR="vcenter"
KEEP_DISKS_ON_DONE="NO"
LOGO="images/logos/windows8.png"
PUBLIC_CLOUD=[
CUSTOMIZATION_SPEC=“Dev-Ops Windows 2012”,
TYPE=“vcenter”,
VM_TEMPLATE=“4224bde6-c758-47ee-20bf-819d3e8ca06f” ]
SCHED_REQUIREMENTS="NAME=“POC-Cluster”"
SUNSTONE=[
NETWORK_SELECT=“NO” ]
USER_INPUTS=[
MEMORY=“M|list||2048,4096,6144,8192,10240,12288,14336,16384,18432,20480,22528,24576|4096”,
VCPU=“O|list||2,4,6,8,10|2” ]
VCENTER_DATASTORE=“VNX5200_DS27”

VIRTUAL MACHINE TEMPLATE
AUTOMATIC_REQUIREMENTS="!(PUBLIC_CLOUD = YES) | (PUBLIC_CLOUD = YES & (HYPERVISOR = vcenter))"
CPU="2"
GRAPHICS=[
LISTEN=“0.0.0.0”,
PORT=“5908”,
TYPE=“VNC” ]
MEMORY="4096"
TEMPLATE_ID="6"
VCPU="2"
VMID=“8”
[root@localhost ~]#


[root@localhost ~]# oneimage show 5
IMAGE 5 INFORMATION
ID : 5
NAME : ran_jiva1
USER : rsanal
GROUP : PMG-BCS
DATASTORE : VNX5200_DS27
TYPE : DATABLOCK
REGISTER TIME : 05/23 13:15:57
PERSISTENT : Yes
SOURCE : ran_jiva1.vmdk
FSTYPE : raw
SIZE : 2G
STATE : rdy
RUNNING_VMS : 0

PERMISSIONS
OWNER : um-
GROUP : —
OTHER : —

IMAGE TEMPLATE
ADAPTER_TYPE="ide"
DESCRIPTION="ran_jiva1"
DEV_PREFIX="hd"
DISK_TYPE="thick"
DRIVER=“raw”

VIRTUAL MACHINES


Hi Sanal!
thanks for the feedback but I’d need also the XML representation of the previous commands, so please use the -x flag with them. Also it would be useful to have the output of: onehost show POC-Cluster -x, to check that it’s being monitored successfully. Try to upload the files with xml if you have problems with formatting…

From the output above I think that the image ID in VM 8 is 0 but you’ve posted the output of image 5, could you check it?

Also when you’ve attached the disk to the VM the first time:

  • Did you add it to the template and then you deployed the VM from that template? or
  • The VM was running and then you hotplugged the new disk, or
  • The VM was in power off state, you attached the new disk and resumed the VM?

Finally is this post different from this post?, I mean are Ranveer and you working together in the same team? it’s just to treat that post as a duplicate so we can focus in this conversation.

Cheers!

Here is the answer for your questions.

  1. Yes, me and Ranveer are working on the same team.
  2. Template having only one basic disk, which is for OS. I have not added any extra disks on templates level.
  3. I have added extra disk to the vm while the VM is in running state.

Here is the xml format of above cmds


onevm show 8 -x

8 3 101 rsanal PMG-BCS ran_jiva1 1 1 0 1 0 0 0 0 0 0 8 0 8 0 0 1495541600 0 4224ea6d-1e31-3d71-b356-ff29ffd47c6a 8 0 POC-Cluster 0 0 1495541626 1495542449 -1 1495541626 1495541626 1495541626 1495542449 0 0 2 0 8 1 POC-Cluster 0 0 1495542849 1495544149 -1 0 0 1495542849 1495544149 0 0 2 21 8 2 POC-Cluster 0 0 1495544149 1495544303 -1 0 0 1495544149 1495544303 0 0 2 19 8 3 POC-Cluster 0 0 1495544316 1495544328 -1 0 0 1495544316 1495544328 0 0 1 11 8 4 POC-Cluster 0 0 1495544364 1495544380 -1 0 0 1495544364 1495544380 0 0 1 11 8 5 POC-Cluster 0 0 1495544665 1495545247 -1 0 0 1495544665 1495545247 0 0 2 0 8 6 POC-Cluster 0 0 1495545721 1495605069 -1 0 0 1495545721 1495605069 0 0 2 19 8 7 POC-Cluster 0 0 1495605100 1495605170 -1 0 0 1495605100 1495605170 0 0 2 19 8 8 POC-Cluster 0 0 1495605195 1495605205 -1 0 0 1495605195 1495605205 0 0 1 11 8 9 POC-Cluster 0 0 1495605453 1495605503 -1 0 0 1495605453 1495605503 0 0 2 19 8 10 POC-Cluster 0 0 1495605515 1495605536 -1 0 0 1495605515 1495605536 0 0 1 11 ******************************************************************* oneimage show 6 -x 6 2 0 CloudAdmin oneadmin 2GBDisk 1 1 0 0 0 0 0 0 0 2 0 1 1495605383 2048 8 1 0 -1 -1 100 VNX5200_DS27 8 *****************************************************************

onedatastore show 100 -x

100 0 101 oneadmin PMG-BCS VNX5200_DS27 1 1 0 1 0 0 0 0 0 0 0 0 0 4194048 3842818 351230 2 3 4 5 6 ********************************************************* Since we have 100's of servers, "onehost show POC-Cluster -x ". output is too large.. I have gone through that list and it seems it is monitoring properly.

Please let me know if you need any further information.

Hi Sanal!
I’ve been reviewing the code and I’ve reproduced the issue, so thanks for the info you’ve provided me.

I think I’ve found a fix, so if you could try it, here it is and the instructions to test it follows.

Instructions:

  1. First create a backup of your vcenter_driver.rb file, that file is located in /usr/lib/one/ruby/vcenter_driver.rb
  2. The line numbers I provide may be different from yours so check that you’re changing the right lines with the info I give you.
  3. Edit the vcenter_driver.rb file and locate the self.reconfigure_vm function, it should be near 2467 line.
  4. Inside that function go to lines 2636 -2637, you should see:
if !newvm
       vm.config.hardware.device.select {

Now replace the lines between if !newvm and end (lines 2636 to 2647) with the code I’ve linked you in a gist from GitHub. It should look like this after the change:

       # If the VM is not new, avoid reading DISKS
        if !newvm
            vm.config.hardware.device.select { |d|
                if is_disk?(d)
                   disks.each{|disk|
                      if d.backing.respond_to?(:fileName) &&
                         ("[#{disk.elements["DATASTORE"].text}] #{disk.elements["SOURCE"].text}" == d.backing.fileName ||
                          disk.elements["SOURCE"].text == d.backing.fileName)
                         disks.delete(disk)
                      end
                   }
                end
            }
        end

Finally, stop OpenNebula’s daemon and restart it again so the new code is used, and before trying to resume the affect VM again, you should remove the duplicated disks as you mentioned before so the file is not locked.

Try it and let’s see if it solves your issue!

Cheers!

Thank you Very Much for your support, I am going to try these changes in my environment and let you know the status.

Thanks,
Sanal

Awesome… Issue got resolved now.

Appreciated your quick help on this. Thank you once again.

Thanks,
Sanal

Excellent!,
thanks for your feedback and testing, if you find any other issue the community is here to help :sunny:

Cheers!