Hello, im using Opennebula latest version for my vmware cluster (22 hosts - 12 datastores), active and monitored in opennebula is just 30 vm. From some weeks my opennebula server going to high load, i checked /var/log/one/oned.log and no errors but the vm status became UNKNOWN.
I had done some tests and discovered that after restart opennebula the load is low… after 10/15 minutes the first 100 is at 100% and so on for all the server cpus, so for me the only solution is just restart opennebula.
I suppose that ruby is getting all the server resources infact after 1 hour i can see serval processes as this:
It seems that monitoring takes more that 60 seconds to finish, the default monitoring interval. This causes new monitoring processes to start before the previous one finished.
Stop OpenNebula
Clean the host of run probes / ruby processes
Time the execution of the monitoring agent (as oneadmin):
$ time /bin/bash /var/lib/one/remotes/im/run_probes vcenter /var/lib/one//datastores 4124 20 4 Cluster
Change the monitoring interval to a higher value that the time you got in the last step