Error executing probes sometimes but not always?


Running one since 4.0, but I have a strange bevavior since I upgraded my KVM hypervisor to RHEL 7.
I have:
Thu Sep 24 18:04:21 2015 [Z0][ONE][E]: Error monitoring Host hv04 (5): Error executing probes
It is only happening on rhel7, not on other hypervisor in rhel6 …
I have read [SOLVED] Error monitoring Host (2): Error executing probes, but i have seen nothing about tmpwatch …
The problem appears not everyday, or every week …

/var/log/one/oned.log-20150912.gz:Sat Sep 12 01:32:10 2015 [Z0][ONE][E]: Error monitoring Host hv04 (5): Error executing probes
/var/log/one/oned.log-20150913.gz:Sat Sep 12 12:36:40 2015 [Z0][ONE][E]: Error monitoring Host hv04 (5): Error executing probes

Nothing between 20150913 and today …

Is there a way to have a verbose error message ?
Can you give me ways to search for ?

Thank you

** Reply to myself **

After reproducing the problem, I think I have found a problem in probes.
First, how to reproduce :
On the HV:
cd /var/tmp/one
vi im/kvm.d/collectd-client.rb … and wait
if you are lucky, you will get:
Vim: Caught deadly signal ABRT

Vim: Finished.

if not, you will get:
Wed Sep 30 00:45:12 2015 [Z0][ONE][E]: Error monitoring Host HV04 (5): Error executing probes
Then … one is cancelling ALL my VMs on HV04 (they are all reset then scheduled)

The kill signal was sent by: ./im/kvm-probes.d/
Killing the running probes returns a bad exit status so oned thinks there is a pb with the probes.

I think this is critical, because, in certain case, any user can exec a command with that string in params, waiting for probes to kill ALL vms on the HV. (I agree that no user should log into the HV… )

What do you think about my analysis ? Is it correct ?
We should enforce the “pids=$(ps axuwww | grep /collectd-client.rb | grep -v grep | awk ‘{ print $2 }’ | grep -v “^${running_pid}$”)” line. Perhaps checking that /proc/$pids/comm == “ruby” ? (second protection ?)

Thank you for your replies :smile:
ps: version 4.12.1