I have question about adding new host. When master node ssh to node1, it always fail to execute run_probe. As I debug the script, I found that this line in “collectd-client.rb” throw error
data = #{@run_probes_cmd} 2>&1
Google search about this issue return 2 solutions/issues
a) passwordless issue
b) SSH_CLIENT issue
c) try run “onehost sync --force”
I confirm that first 2 issues are solved in my system and the third one doesn’t help. All required files are valid , accessible and executable by oneadmin.
Even if the node doesn’t have virtualization extensions the monitoring part should work. Send us the /var/log/one/oned.log part where monitoring fails.
Also, which process is creating the core? You can find out with:
I also have the same thought, run_probe just collects statistic via collectd.
Today I setup another system in VMWare but still have no luck, crash file looks different this time.
Thanks for your advice. I run 2 guest hosts in my machine, each has 2GB memory and system is up to date (using “updater” tool). I monitor memory when collectd-client.rb crash, there is 1GB+ available. I don’t think it’s computing resource issue.
I try debug again in VMWare. /var/tmp/one/im/run_probes kvm script will run all scripts in kvm.d and kvm-probe.d folder. I try run each file manually and it totally fine. So , the root cause would sit in Ruby itself.
I install ruby package (sudo apt-get install ruby), is there any addition ruby packages required ?
(collectd-core package installed)
If I bypass collectd-client.rb but call runprobe kvm-prove manually it works fine.
Why collectd-client.rb fail to execute line 4 below , or it doesn’t support recursive, nahhh ?
I try change how collectd-client.rb execute that line but have no luck
from
data = #{@run_probes_cmd} 2>&1
to
data = ./../run_probes kvm-probes /var/lib/one/datastores 4124 20 1 node1 2>&1
Issue now is solved if I change /var/tmp/one/im/run_probe
from
if [-x “$i” ]; then
to
if [[ (-x “$i”) && ("$i" != “collectd-client.rb”) ]]; then
The reason is run_probe script will run all file in kvm.d folder. The first file it executes is collectd-client_control.sh which execute collectd-client.rb as background process and keep PID is /tmp/one-collectd-client.pid.
It looks to me that we already run collectd-client.rb in background, so we don’t need to let run_probe script executes collectd-client.rb again.
I am not 100% sure if this solution is the correct one. I will post consequence issue this change may caused.
oneadmin@master:/var/tmp/one/im/kvm.d$ ls -al total 20 drwxr-xr-x 2 oneadmin oneadmin 4096 Jan 17 21:00 . drwxr-xr-x 7 oneadmin oneadmin 4096 Jan 17 21:00 … -rwxr-xr-x 1 oneadmin oneadmin 2901 Jan 17 21:00 collectd-client_control.sh -rwxr-xr-x 1 oneadmin oneadmin 4151 Jan 17 21:00 collectd-client.rb
I don’t know the real root cause either. I agree that collectd-client.rb execution from run_probe should works fine but unfortunately it’s not my case. Below is the conclusion how I fix this issue