Collectd-client.rb core dumped

sutthipongl · January 11, 2017, 10:44pm

Hi Friends

I have question about adding new host. When master node ssh to node1, it always fail to execute run_probe. As I debug the script, I found that this line in “collectd-client.rb” throw error

data = #{@run_probes_cmd} 2>&1

Google search about this issue return 2 solutions/issues
a) passwordless issue
b) SSH_CLIENT issue
c) try run “onehost sync --force”

I confirm that first 2 issues are solved in my system and the third one doesn’t help. All required files are valid , accessible and executable by oneadmin.

This thread looks similar to my issue. Unfortunately, there is no valid solution
http://users.opennebula.narkive.com/SEKiQERb/one-users-opennebula-node-ubuntu-14-04-saying-error-executing-collectd-client-rb-when-creating-a

One version is 5.2.0 , Ubuntu 16.04 , ruby 2.3

checking core dump file in /var/crash , it looks like it’s something in ruby itself. Not sure it’s a bug in ruby 2.3.

I try duplicate collectd-client.rb to test.rb and remove all lines except the line in question, it runs fine without error. Below is its content.

#!/usr/bin/env ruby

data = ./../run_probes kvm-probes /var/lib/one/datastores 4124 20 1 node1 2>&1
code = $?.exitstatus == 0
puts “#{code}”

Does anyone has solution or shed me some light where the issue is ? Ruby ?

thanks in advance,
Tor.

sutthipongl · January 11, 2017, 11:18pm

Checking further in coredump file, I found this error (and others)

Jan 10 21:16:59 node1 libvirtd[12096]: internal error: QEMU / QMP failed: Could not access KVM kernel module: No such file or directory

I guess that the root cause is I am running Ubuntu in VirtualBox. Although Ubuntu is KVM-ready but virtualbox can’t nest VM in guest OS.

What do you think ?

jfontan · January 12, 2017, 11:37am

Even if the node doesn’t have virtualization extensions the monitoring part should work. Send us the /var/log/one/oned.log part where monitoring fails.

Also, which process is creating the core? You can find out with:

$ file <core file>

sutthipongl · January 12, 2017, 11:58pm

Thanks for your response Javi.

I also have the same thought, run_probe just collects statistic via collectd.
Today I setup another system in VMWare but still have no luck, crash file looks different this time.

Unfortunately, I can’t upload log file in this thread. Please find them from my share drive
https://drive.google.com/drive/folders/0B0EZTM0AkotHRU1Qem9vNDFaVDQ?usp=sharing

cheers!

jfontan · January 13, 2017, 7:49pm

The process that is crashing is ruby. It’s strange that it fails. Can you check that you have enough memory and the system is up to date?

sutthipongl · January 13, 2017, 10:15pm

Thanks for your advice. I run 2 guest hosts in my machine, each has 2GB memory and system is up to date (using “updater” tool). I monitor memory when collectd-client.rb crash, there is 1GB+ available. I don’t think it’s computing resource issue.

I try debug again in VMWare. /var/tmp/one/im/run_probes kvm script will run all scripts in kvm.d and kvm-probe.d folder. I try run each file manually and it totally fine. So , the root cause would sit in Ruby itself.

I install ruby package (sudo apt-get install ruby), is there any addition ruby packages required ?
(collectd-core package installed)

sutthipongl · January 13, 2017, 11:09pm

If I bypass collectd-client.rb but call runprobe kvm-prove manually it works fine.
Why collectd-client.rb fail to execute line 4 below , or it doesn’t support recursive, nahhh ?

oneadmin@node1:/var/tmp/one/im/kvm.d$ ls

collectd-client_control.sh collectd-client.rb

oneadmin@node1:/var/tmp/one/im/kvm.d$ ./collectd-client_control.sh /var/lib/one/datastores 4124 20 1 node1 2>&1

oneadmin@node1:/var/tmp/one/im/kvm.d$ ./…/run_probes kvm-probes /var/lib/one/datastores 4124 20 1 node1 2>&1

ARCH=x86_64

MODELNAME=“Intel(R) Core™ i5-4288U CPU @ 2.60GHz”

HYPERVISOR=kvm

TOTALCPU=200

CPUSPEED=2599

TOTALMEMORY=2964420

USEDMEMORY=1117900

FREEMEMORY=1846520

FREECPU=198

USEDCPU=2

NETRX=0

NETTX=0

DS_LOCATION_USED_MB=5060

DS_LOCATION_TOTAL_MB=97814

DS_LOCATION_FREE_MB=87764

HOSTNAME=node1

VM_POLL=YES

VERSION=“5.2.0”

I try change how collectd-client.rb execute that line but have no luck
from
data = #{@run_probes_cmd} 2>&1
to
data = ./../run_probes kvm-probes /var/lib/one/datastores 4124 20 1 node1 2>&1

sutthipongl · January 14, 2017, 11:58pm

Issue now is solved if I change /var/tmp/one/im/run_probe

from
if [-x “$i” ]; then

to
if [[ (-x “$i”) && ("$i" != “collectd-client.rb”) ]]; then

The reason is run_probe script will run all file in kvm.d folder. The first file it executes is collectd-client_control.sh which execute collectd-client.rb as background process and keep PID is /tmp/one-collectd-client.pid.

It looks to me that we already run collectd-client.rb in background, so we don’t need to let run_probe script executes collectd-client.rb again.

I am not 100% sure if this solution is the correct one. I will post consequence issue this change may caused.

sutthipongl · January 15, 2017, 12:21am

create VM in that host … ok
host statistic … ok
vnc to VM … ok

Look Good !!

Hi Javi - Do you think it’s a bug ?

jfontan · January 17, 2017, 9:26am

I don’t really understand what could be happening. We’ve been using the same system to start collectd client without problems.

Can you check that kvm.d/collectd-client.rb is not executable, that may be the problem. Here are the files from kvm.d from CentOS 7 packages.

[root@scw-ceab44 kvm.d]# ls -l
total 12
-rwxr-xr-x 1 oneadmin oneadmin 2901 Oct 17 11:09 collectd-client_control.sh
-rw-r--r-- 1 oneadmin oneadmin 4151 Oct 17 11:09 collectd-client.rb

sutthipongl · January 17, 2017, 9:13pm

Hi Javi

Yes, those files are executable

oneadmin@master:/var/tmp/one/im/kvm.d$ ls -al
total 20
drwxr-xr-x 2 oneadmin oneadmin 4096 Jan 17 21:00 .
drwxr-xr-x 7 oneadmin oneadmin 4096 Jan 17 21:00 …
-rwxr-xr-x 1 oneadmin oneadmin 2901 Jan 17 21:00 collectd-client_control.sh
-rwxr-xr-x 1 oneadmin oneadmin 4151 Jan 17 21:00 collectd-client.rb

I don’t know the real root cause either. I agree that collectd-client.rb execution from run_probe should works fine but unfortunately it’s not my case. Below is the conclusion how I fix this issue

run_probe list all files in kvm.d folder
run_probe script executes collectd-client_control.sh
collect-client_control.sh run collectd-client.rb as background process and return
run_probe execute collectd-client.rb again

The code I amend in run_probe script removes the step 4 above and now my system run totally fine.

thanks for your help !!
Tor.

Topic		Replies	Views
Host monitoring error ( Centos 7 KVM) Product Support	15	2496	July 31, 2016
Error executing probes sometimes but not always? Product Support	1	1274	September 29, 2015
Collectd-client.rb error after upgrade to 5.10.1 Installation & Configuration	0	354	May 15, 2020
Error monitoring Host (2): Error executing probes Product Support	8	6657	September 17, 2015
Collectd client connecting to 127.0.0.1:4124 Product Support	6	878	May 22, 2018

Collectd-client.rb core dumped

Related topics