One in ten deletes fail with vCenter driver

Hi Devs,

I don’t know if this is a bug with the vCenter driver, but what I’m seen when I initiate a delete from command line, or Sunstone UI is that the VM that I delete gets purged from OpenNebula, but the VM is still in vCenter.

Below is the section of the log, I grabbed:

Sun Oct 25 22:31:04 2015 [Z0][ReM][D]: Req:5280 UID:0 VirtualMachineAction invoked , "delete", 527
Sun Oct 25 22:31:04 2015 [Z0][DiM][D]: Finalizing VM 527
Sun Oct 25 22:31:04 2015 [Z0][ReM][D]: Req:5280 UID:0 VirtualMachineAction result SUCCESS, 527 
Sun Oct 25 22:31:04 2015 [Z0][VMM][D]: Message received: LOG I 527 Driver command for 527 cancelled

Sun Oct 25 22:31:04 2015 [Z0][ReM][D]: Req:2688 UID:0 VirtualMachineInfo invoked , 527
Sun Oct 25 22:31:04 2015 [Z0][ReM][D]: Req:2688 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>527</ID><UID..."
Sun Oct 25 22:31:04 2015 [Z0][ReM][D]: Req:1200 UID:0 HostPoolInfo invoked
Sun Oct 25 22:31:04 2015 [Z0][ReM][D]: Req:1200 UID:0 HostPoolInfo result SUCCESS, "<HOST_POOL><HOST><ID..."
Sun Oct 25 22:31:04 2015 [Z0][ReM][D]: Req:2192 UID:0 SystemConfig invoked
Sun Oct 25 22:31:04 2015 [Z0][ReM][D]: Req:2192 UID:0 SystemConfig result SUCCESS, "<TEMPLATE><AUTH_MAD>..."
Sun Oct 25 22:31:04 2015 [Z0][ReM][D]: Req:7952 UID:0 HostInfo invoked , 2
Sun Oct 25 22:31:04 2015 [Z0][ReM][D]: Req:7952 UID:0 HostInfo result SUCCESS, "<HOST><ID>2</ID><NAM..."
Sun Oct 25 22:31:05 2015 [Z0][VMM][D]: Message received: LOG I 526 Successfully execute virtualization driver operation: cancel.

Sun Oct 25 22:31:05 2015 [Z0][VMM][D]: Message received: LOG I 526 Successfully execute network driver operation: clean.

Sun Oct 25 22:31:05 2015 [Z0][VMM][D]: Message received: CLEANUP SUCCESS 526

Sun Oct 25 22:31:05 2015 [Z0][VMM][D]: Message received: LOG I 527 Command execution fail: /var/lib/one/remotes/vmm/vcenter/cancel '4221c123-b4c9-4fa7-e0b2-e25f1c0afaa5' 'SJCCLUSTER' 527 SJCCLUSTER

Sun Oct 25 22:31:05 2015 [Z0][VMM][D]: Message received: LOG I 527 Cancel of VM 4221c123-b4c9-4fa7-e0b2-e25f1c0afaa5 on host SJCCLUSTER failed due to "uninitialized constant VCenterDriver::VIClient::ManagedObjectNotFound"

Sun Oct 25 22:31:05 2015 [Z0][VMM][D]: Message received: LOG I 527 ExitCode: 255

Sun Oct 25 22:31:05 2015 [Z0][VMM][D]: Message received: LOG I 527 Failed to execute virtualization driver operation: cancel.

Sun Oct 25 22:31:05 2015 [Z0][VMM][D]: Message received: LOG I 527 Successfully execute network driver operation: clean.

Sun Oct 25 22:31:05 2015 [Z0][VMM][D]: Message received: CLEANUP SUCCESS 527

In this case the object did exist and was available, but for some reason the vCenter driver could not find it. This happens on average 1 to 2 times for every 10 deletes, which means that by the end of a run that deletes about 180 VMs, I have over 20 to 30 VMs that did not get deleted.

Let me know if there is debug option I can turn on in OpenNebula so that I can experiment and test to get to the root cause.

Jesus

We are trying to reproduce this problem without luck so far. We would like indeed more debug information. Assuming you are running OpenNebula 4.14.0, please replace file /var/lib/one/remotes/vmm/vcenter/cancel with this one:

Thanks for the great feedback!

Hi Tino,

So a couple of things.

  1. I always pull from Git to deploy OpenNebula. The last pull which I am currently running is commit: 35851705c1ec31af0ccbaccb680ddcb17f5895ad

  2. Below is what I’ve gathered after replacing the cancel file under /var/lib/one//remotes/vmm/vcenter/

     Mon Oct 26 10:35:04 2015 [Z0][ReM][D]: Req:8160 UID:0 VirtualMachineAction invoked , "delete", 550
     Mon Oct 26 10:35:04 2015 [Z0][DiM][D]: Finalizing VM 550
     Mon Oct 26 10:35:04 2015 [Z0][ReM][D]: Req:8160 UID:0 VirtualMachineAction result SUCCESS, 550
     Mon Oct 26 10:35:04 2015 [Z0][VMM][D]: Message received: LOG I 550 Driver command for 550 cancelled
    
     Mon Oct 26 10:35:04 2015 [Z0][ReM][D]: Req:9072 UID:0 VirtualMachineInfo invoked , 550
     Mon Oct 26 10:35:04 2015 [Z0][ReM][D]: Req:9072 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>550</ID><UID..."
     Mon Oct 26 10:35:04 2015 [Z0][ReM][D]: Req:9200 UID:0 HostPoolInfo invoked 
     Mon Oct 26 10:35:04 2015 [Z0][ReM][D]: Req:9200 UID:0 HostPoolInfo result SUCCESS, "<HOST_POOL><HOST><ID..."
     Mon Oct 26 10:35:04 2015 [Z0][ReM][D]: Req:9600 UID:0 SystemConfig invoked 
     Mon Oct 26 10:35:04 2015 [Z0][ReM][D]: Req:9600 UID:0 SystemConfig result SUCCESS, "<TEMPLATE><AUTH_MAD>..."
     Mon Oct 26 10:35:04 2015 [Z0][ReM][D]: Req:6864 UID:0 HostInfo invoked , 2
     Mon Oct 26 10:35:04 2015 [Z0][ReM][D]: Req:6864 UID:0 HostInfo result SUCCESS, "<HOST><ID>2</ID><NAM..."
     Mon Oct 26 10:35:06 2015 [Z0][VMM][D]: Message received: LOG I 549 Successfully execute virtualization driver operation: cancel.
    
     Mon Oct 26 10:35:06 2015 [Z0][VMM][D]: Message received: LOG I 549 Successfully execute network driver operation: clean.
    
     Mon Oct 26 10:35:06 2015 [Z0][VMM][D]: Message received: CLEANUP SUCCESS 549 
    
     Mon Oct 26 10:35:08 2015 [Z0][ReM][D]: Req:3744 UID:0 VirtualMachinePoolInfo invoked , -2, -1, -1, -1
     Mon Oct 26 10:35:08 2015 [Z0][ReM][D]: Req:3744 UID:0 VirtualMachinePoolInfo result SUCCESS, "<VM_POOL><VM><ID>176..."
     Mon Oct 26 10:35:08 2015 [Z0][ReM][D]: Req:7968 UID:0 VirtualMachinePoolInfo invoked , -2, -1, -1, -1
     Mon Oct 26 10:35:08 2015 [Z0][ReM][D]: Req:7968 UID:0 VirtualMachinePoolInfo result SUCCESS, "<VM_POOL><VM><ID>176..."
     Mon Oct 26 10:35:09 2015 [Z0][VMM][D]: Message received: LOG I 550 Command execution fail: /var/lib/one/remotes/vmm/vcenter/cancel '42210ba1-0db0-4c89-8eda-6a2902da866e' 'SJCCLUSTER' 550 SJCCLUSTER
    
     Mon Oct 26 10:35:09 2015 [Z0][VMM][D]: Message received: LOG I 550 Cancel of VM 42210ba1-0db0-4c89-8eda-6a2902da866e on host SJCCLUSTER failed due to "uninitialized constant VCenterDriver::VIClient::ManagedObjectNotFound". Backtrace "/usr/lib/one/ruby/vcenter_driver.rb:206:in `find_vm_template'/usr/lib/one/ruby/vendors/rbvmomi/lib/rbvmomi/type_loader.rb:97:in `find'/usr/lib/one/ruby/vcenter_driver.rb:203:in `each'/usr/lib/one/ruby/vcenter_driver.rb:203:in `find'/usr/lib/one/ruby/vcenter_driver.rb:203:in `find_vm_template'/usr/lib/one/ruby/vcenter_driver.rb:830:in `cancel'/var/lib/one/remotes/vmm/vcenter/cancel:43"
    
       Mon Oct 26 10:35:09 2015 [Z0][VMM][D]: Message received: LOG I 550 ExitCode: 255
    
     Mon Oct 26 10:35:09 2015 [Z0][VMM][D]: Message received: LOG I 550 Failed to execute virtualization driver operation: cancel.
    
     Mon Oct 26 10:35:09 2015 [Z0][VMM][D]: Message received: LOG I 550 Successfully execute network driver operation: clean.
    
     Mon Oct 26 10:35:09 2015 [Z0][VMM][D]: Message received: CLEANUP SUCCESS 550
    
    
     Mon Oct 26 10:36:01 2015 [Z0][ReM][D]: Req:9440 UID:0 VirtualMachineAction invoked , "delete", 561
     Mon Oct 26 10:36:01 2015 [Z0][DiM][D]: Finalizing VM 561
     Mon Oct 26 10:36:01 2015 [Z0][ReM][D]: Req:9440 UID:0 VirtualMachineAction result SUCCESS, 561
     Mon Oct 26 10:36:01 2015 [Z0][VMM][D]: Message received: LOG I 561 Driver command for 561 cancelled
    
     Mon Oct 26 10:36:01 2015 [Z0][ReM][D]: Req:8912 UID:0 VirtualMachineInfo invoked , 561
     Mon Oct 26 10:36:01 2015 [Z0][ReM][D]: Req:8912 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>561</ID><UID..."
     Mon Oct 26 10:36:01 2015 [Z0][ReM][D]: Req:6800 UID:0 HostPoolInfo invoked 
     Mon Oct 26 10:36:01 2015 [Z0][ReM][D]: Req:6800 UID:0 HostPoolInfo result SUCCESS, "<HOST_POOL><HOST><ID..."
     Mon Oct 26 10:36:01 2015 [Z0][ReM][D]: Req:6096 UID:0 SystemConfig invoked 
     Mon Oct 26 10:36:01 2015 [Z0][ReM][D]: Req:6096 UID:0 SystemConfig result SUCCESS, "<TEMPLATE><AUTH_MAD>..."
     Mon Oct 26 10:36:01 2015 [Z0][ReM][D]: Req:3248 UID:0 HostInfo invoked , 2
     Mon Oct 26 10:36:01 2015 [Z0][ReM][D]: Req:3248 UID:0 HostInfo result SUCCESS, "<HOST><ID>2</ID><NAM..."
     Mon Oct 26 10:36:02 2015 [Z0][VMM][D]: Message received: LOG I 560 Successfully execute virtualization driver operation: cancel.
    
     Mon Oct 26 10:36:02 2015 [Z0][VMM][D]: Message received: LOG I 560 Successfully execute network driver operation: clean.
    
     Mon Oct 26 10:36:02 2015 [Z0][VMM][D]: Message received: CLEANUP SUCCESS 560 
    
     Mon Oct 26 10:36:06 2015 [Z0][VMM][D]: Message received: LOG I 561 Command execution fail: /var/lib/one/remotes/vmm/vcenter/cancel '4221a53a-31d0-e62c-a7c9-1f1d294c0aeb' 'SJCCLUSTER' 561 SJCCLUSTER
    
     Mon Oct 26 10:36:06 2015 [Z0][VMM][D]: Message received: LOG I 561 Cancel of VM 4221a53a-31d0-e62c-a7c9-1f1d294c0aeb on host SJCCLUSTER failed due to "uninitialized constant VCenterDriver::VIClient::ManagedObjectNotFound". Backtrace "/usr/lib/one/ruby/vcenter_driver.rb:206:in `find_vm_template'/usr/lib/one/ruby/vendors/rbvmomi/lib/rbvmomi/type_loader.rb:97:in `find'/usr/lib/one/ruby/vcenter_driver.rb:203:in `each'/usr/lib/one/ruby/vcenter_driver.rb:203:in `find'/usr/lib/one/ruby/vcenter_driver.rb:203:in `find_vm_template'/usr/lib/one/ruby/vcenter_driver.rb:830:in `cancel'/var/lib/one/remotes/vmm/vcenter/cancel:43"
    
     Mon Oct 26 10:36:06 2015 [Z0][VMM][D]: Message received: LOG I 561 ExitCode: 255
    
     Mon Oct 26 10:36:06 2015 [Z0][VMM][D]: Message received: LOG I 561 Failed to execute virtualization driver operation: cancel.
    
     Mon Oct 26 10:36:06 2015 [Z0][VMM][D]: Message received: LOG I 561 Successfully execute network driver operation: clean.
    
     Mon Oct 26 10:36:06 2015 [Z0][VMM][D]: Message received: CLEANUP SUCCESS 561 
    
     Mon Oct 26 10:36:53 2015 [Z0][ReM][D]: Req:3280 UID:0 VirtualMachineAction invoked , "delete", 571
     Mon Oct 26 10:36:53 2015 [Z0][DiM][D]: Finalizing VM 571
     Mon Oct 26 10:36:53 2015 [Z0][ReM][D]: Req:3280 UID:0 VirtualMachineAction result SUCCESS, 571
     Mon Oct 26 10:36:53 2015 [Z0][VMM][D]: Message received: LOG I 571 Driver command for 571 cancelled
    
     Mon Oct 26 10:36:53 2015 [Z0][ReM][D]: Req:3312 UID:0 VirtualMachineInfo invoked , 571
     Mon Oct 26 10:36:53 2015 [Z0][ReM][D]: Req:3312 UID:0 VirtualMachineInfo result SUCCESS, "<VM><ID>571</ID><UID..."
     Mon Oct 26 10:36:53 2015 [Z0][ReM][D]: Req:1200 UID:0 HostPoolInfo invoked 
     Mon Oct 26 10:36:53 2015 [Z0][ReM][D]: Req:1200 UID:0 HostPoolInfo result SUCCESS, "<HOST_POOL><HOST><ID..."
     Mon Oct 26 10:36:53 2015 [Z0][ReM][D]: Req:2976 UID:0 SystemConfig invoked 
     Mon Oct 26 10:36:53 2015 [Z0][ReM][D]: Req:2976 UID:0 SystemConfig result SUCCESS, "<TEMPLATE><AUTH_MAD>..."
     Mon Oct 26 10:36:53 2015 [Z0][ReM][D]: Req:3776 UID:0 HostInfo invoked , 2
     Mon Oct 26 10:36:53 2015 [Z0][ReM][D]: Req:3776 UID:0 HostInfo result SUCCESS, "<HOST><ID>2</ID><NAM..."
     Mon Oct 26 10:36:54 2015 [Z0][VMM][D]: Message received: LOG I 571 Command execution fail: /var/lib/one/remotes/vmm/vcenter/cancel '42216809-5abf-b161-4f41-6bb99bb36205' 'SJCCLUSTER' 571 SJCCLUSTER
    
     Mon Oct 26 10:36:54 2015 [Z0][VMM][D]: Message received: LOG I 571 Cancel of VM 42216809-5abf-b161-4f41-6bb99bb36205 on host SJCCLUSTER failed due to "uninitialized constant VCenterDriver::VIClient::ManagedObjectNotFound". Backtrace "/usr/lib/one/ruby/vcenter_driver.rb:206:in `find_vm_template'/usr/lib/one/ruby/vendors/rbvmomi/lib/rbvmomi/type_loader.rb:97:in `find'/usr/lib/one/ruby/vcenter_driver.rb:203:in `each'/usr/lib/one/ruby/vcenter_driver.rb:203:in `find'/usr/lib/one/ruby/vcenter_driver.rb:203:in `find_vm_template'/usr/lib/one/ruby/vcenter_driver.rb:830:in `cancel'/var/lib/one/remotes/vmm/vcenter/cancel:43"
    
      Mon Oct 26 10:36:54 2015 [Z0][VMM][D]: Message received: LOG I 571 ExitCode: 255
    
     Mon Oct 26 10:36:54 2015 [Z0][VMM][D]: Message received: LOG I 571 Failed to execute virtualization driver operation: cancel.
    
     Mon Oct 26 10:36:54 2015 [Z0][VMM][D]: Message received: LOG I 570 Successfully execute virtualization driver operation: cancel.
    
     Mon Oct 26 10:36:54 2015 [Z0][VMM][D]: Message received: LOG I 571 Successfully execute network driver operation: clean.
    
     Mon Oct 26 10:36:54 2015 [Z0][VMM][D]: Message received: CLEANUP SUCCESS 571
    

I’ve also attached a link to a picture with the VMs that I have created that were left behind showing the VMWare UUIDs which match.

http://imgur.com/4yjzz5J

Hope this helps.

Jesus

Hi Jesus,

Thanks for the detailed feedback.

First of all, I would like to state that DELETE is an admin operation, only indicated for situations where the VM is in a non recoverable state. For other scenarios, please use SHUTDOWN, since any failure will be reflected in OpenNebula (failures for DELETED would be painfully forgotten).

Having said that, we can certainly make this operation more resilient. I’ve opened a bug ticket to keep track of the solution:

http://dev.opennebula.org/issues/4103