Timeout executing 'if [ -x "/var/tmp/one/im/run_probes" ]

v 5.8

I just setup 20 new hosts and one of them has the following issue when being freshly added:
Thu Jul 18 08:52:13 2019 : Error monitoring Host 10.0.3.58 (53): Timeout executing ‘if [ -x “/var/tmp/one/im/run_probes” ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 53 10.0.3.58; else exit 42; fi’

I am not sure why, SSH works fine in both directions, rest is the same as on the other hosts. I tried to reenable it, tried to delete /var/tmp/one, tried to scp /var/tmp/one…

Any ideas?

Hello @superxor

Do those hosts have running VMs in libvirt?

No,
brand new fresh OS installs. Ubuntu 18.03
Only happened on one node, curious

Could you please try to execute the command manually in the host and see the error?

edit:
if I run it manually I get the whole node info:

ARCH=x86_64
MODELNAME=“Intel® Xeon® W-2145 CPU @ 3.70GHz”
HYPERVISOR=kvm
TOTALCPU=1600
CPUSPEED=1200
TOTALMEMORY=131719688
USEDMEMORY=321904
FREEMEMORY=131397784
FREECPU=1600
USEDCPU=0
NETRX=35640648
NETTX=647040
KVM_MACHINES=“pc-i440fx-bionic ubuntu isapc pc-1.1 pc-1.2 pc-1.3 pc-i440fx-zesty pc-i440fx-2.8 pc-1.0 pc-i440fx-2.9 pc-i440fx-2.6 pc-i440fx-2.7 xenfv pc-i440fx-wily pc-i440fx-2.3 pc-i440fx-2.4 pc-i440fx-2.5 pc-i440fx-yakkety pc-i440fx-2.1 pc-i440fx-2.2 pc-i440fx-2.0 pc-q35-yakkety pc-i440fx-bionic-hpb pc-q35-2.11 q35 pc-i440fx-xenial xenpv pc-q35-2.10 pc-q35-bionic-hpb pc-q35-xenial pc-i440fx-artful pc-i440fx-1.7 pc-q35-2.9 pc-0.15 pc-i440fx-1.5 pc-q35-2.7 pc-i440fx-1.6 pc-i440fx-2.11 pc pc-q35-2.8 pc-q35-zesty pc-0.13 pc-q35-artful pc-0.14 pc-q35-2.4 pc-i440fx-trusty pc-q35-2.5 pc-q35-2.6 pc-i440fx-1.4 pc-i440fx-2.10 pc-0.11 pc-0.12 pc-q35-bionic pc-0.10”
KVM_CPU_MODELS=“486 pentium pentium2 pentium3 pentiumpro coreduo n270 core2duo qemu32 kvm32 cpu64-rhel5 cpu64-rhel6 kvm64 qemu64 Conroe Penryn Nehalem Nehalem-IBRS Westmere Westmere-IBRS SandyBridge SandyBridge-IBRS IvyBridge IvyBridge-IBRS Haswell-noTSX Haswell-noTSX-IBRS Haswell Haswell-IBRS Broadwell-noTSX Broadwell-noTSX-IBRS Broadwell Broadwell-IBRS Skylake-Client Skylake-Client-IBRS Skylake-Server Skylake-Server-IBRS athlon phenom Opteron_G1 Opteron_G2 Opteron_G3 Opteron_G4 Opteron_G5 EPYC EPYC-IBPB”
DS_LOCATION_USED_MB=2444
DS_LOCATION_TOTAL_MB=896131
DS_LOCATION_FREE_MB=848097
HOSTNAME=xxx
VM_POLL=YES
VERSION=“5.8.1”

but still ERROR in Nodes

Try to offline the hosts and the enable them back.

Tried that, still errors out

Could you please send me the output of onehost show <host_id> -x?

<HOST>
  <ID>53</ID>
  <NAME>10.0.3.58</NAME>
  <STATE>7</STATE>
  <IM_MAD><![CDATA[kvm]]></IM_MAD>
  <VM_MAD><![CDATA[kvm]]></VM_MAD>
  <LAST_MON_TIME>1563458327</LAST_MON_TIME>
  <CLUSTER_ID>0</CLUSTER_ID>
  <CLUSTER>default</CLUSTER>
  <HOST_SHARE>
<DISK_USAGE>0</DISK_USAGE>
<MEM_USAGE>0</MEM_USAGE>
<CPU_USAGE>0</CPU_USAGE>
<TOTAL_MEM>0</TOTAL_MEM>
<TOTAL_CPU>0</TOTAL_CPU>
<MAX_DISK>0</MAX_DISK>
<MAX_MEM>0</MAX_MEM>
<MAX_CPU>0</MAX_CPU>
<FREE_DISK>0</FREE_DISK>
<FREE_MEM>0</FREE_MEM>
<FREE_CPU>0</FREE_CPU>
<USED_DISK>0</USED_DISK>
<USED_MEM>0</USED_MEM>
<USED_CPU>0</USED_CPU>
<RUNNING_VMS>0</RUNNING_VMS>
<DATASTORES/>
<PCI_DEVICES/>
  </HOST_SHARE>
  <VMS/>
  <TEMPLATE>
<CLUSTER_ID><![CDATA[0]]></CLUSTER_ID>
<ERROR><![CDATA[Thu Jul 18 15:55:33 2019 : Error monitoring Host 10.0.3.58 (53): Timeout executing 'if [ -x "/var/tmp/one/im/run_probes" ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 53 10.0.3.58; else                              exit 42; fi']]></ERROR>
<IM_MAD><![CDATA[kvm]]></IM_MAD>
<NAME><![CDATA[10.0.3.58]]></NAME>
<RESERVED_CPU><![CDATA[]]></RESERVED_CPU>
<RESERVED_MEM><![CDATA[]]></RESERVED_MEM>
<VM_MAD><![CDATA[kvm]]></VM_MAD>
  </TEMPLATE>
</HOST>

Does ssh 10.0.3.58 work passwordless from frontend?

Yes, first thing I tested

Could you try to execute the polling command but from the frontend ssh 10.0.3.58 -e <COMMAND>.

output is empty, still errors out

When you execute the command, you don’t get the same output as executing it on the host?

checked again, no I do, its same output as printed above

What value do you have in oned.conf for MONITORING_INTERVAL_HOST?

the default value, didnt touch anything there

Try onehost sync --force and then offline /enable that host.

either it got deleted or I posted it in the wrong thrread but I had it fixed by just reinstalling the package, for some magical reason it worked, despite me attempting this before.

Perfect then!