v 5.8
I just setup 20 new hosts and one of them has the following issue when being freshly added:
Thu Jul 18 08:52:13 2019 : Error monitoring Host 10.0.3.58 (53): Timeout executing ‘if [ -x “/var/tmp/one/im/run_probes” ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 53 10.0.3.58; else exit 42; fi’
I am not sure why, SSH works fine in both directions, rest is the same as on the other hosts. I tried to reenable it, tried to delete /var/tmp/one, tried to scp /var/tmp/one…
Any ideas?
ahuertas
(Alejandro Huertas)
July 18, 2019, 7:50am
2
Hello @superxor
Do those hosts have running VMs in libvirt?
No,
brand new fresh OS installs. Ubuntu 18.03
Only happened on one node, curious
ahuertas
(Alejandro Huertas)
July 18, 2019, 9:57am
4
Could you please try to execute the command manually in the host and see the error?
edit:
if I run it manually I get the whole node info:
ARCH=x86_64
MODELNAME=“Intel(R) Xeon(R) W-2145 CPU @ 3.70GHz”
HYPERVISOR=kvm
TOTALCPU=1600
CPUSPEED=1200
TOTALMEMORY=131719688
USEDMEMORY=321904
FREEMEMORY=131397784
FREECPU=1600
USEDCPU=0
NETRX=35640648
NETTX=647040
KVM_MACHINES=“pc-i440fx-bionic ubuntu isapc pc-1.1 pc-1.2 pc-1.3 pc-i440fx-zesty pc-i440fx-2.8 pc-1.0 pc-i440fx-2.9 pc-i440fx-2.6 pc-i440fx-2.7 xenfv pc-i440fx-wily pc-i440fx-2.3 pc-i440fx-2.4 pc-i440fx-2.5 pc-i440fx-yakkety pc-i440fx-2.1 pc-i440fx-2.2 pc-i440fx-2.0 pc-q35-yakkety pc-i440fx-bionic-hpb pc-q35-2.11 q35 pc-i440fx-xenial xenpv pc-q35-2.10 pc-q35-bionic-hpb pc-q35-xenial pc-i440fx-artful pc-i440fx-1.7 pc-q35-2.9 pc-0.15 pc-i440fx-1.5 pc-q35-2.7 pc-i440fx-1.6 pc-i440fx-2.11 pc pc-q35-2.8 pc-q35-zesty pc-0.13 pc-q35-artful pc-0.14 pc-q35-2.4 pc-i440fx-trusty pc-q35-2.5 pc-q35-2.6 pc-i440fx-1.4 pc-i440fx-2.10 pc-0.11 pc-0.12 pc-q35-bionic pc-0.10”
KVM_CPU_MODELS=“486 pentium pentium2 pentium3 pentiumpro coreduo n270 core2duo qemu32 kvm32 cpu64-rhel5 cpu64-rhel6 kvm64 qemu64 Conroe Penryn Nehalem Nehalem-IBRS Westmere Westmere-IBRS SandyBridge SandyBridge-IBRS IvyBridge IvyBridge-IBRS Haswell-noTSX Haswell-noTSX-IBRS Haswell Haswell-IBRS Broadwell-noTSX Broadwell-noTSX-IBRS Broadwell Broadwell-IBRS Skylake-Client Skylake-Client-IBRS Skylake-Server Skylake-Server-IBRS athlon phenom Opteron_G1 Opteron_G2 Opteron_G3 Opteron_G4 Opteron_G5 EPYC EPYC-IBPB”
DS_LOCATION_USED_MB=2444
DS_LOCATION_TOTAL_MB=896131
DS_LOCATION_FREE_MB=848097
HOSTNAME=xxx
VM_POLL=YES
VERSION=“5.8.1”
but still ERROR in Nodes
ahuertas
(Alejandro Huertas)
July 18, 2019, 11:10am
6
Try to offline the hosts and the enable them back.
Tried that, still errors out
ahuertas
(Alejandro Huertas)
July 18, 2019, 1:29pm
8
Could you please send me the output of onehost show <host_id> -x
?
<HOST>
<ID>53</ID>
<NAME>10.0.3.58</NAME>
<STATE>7</STATE>
<IM_MAD><![CDATA[kvm]]></IM_MAD>
<VM_MAD><![CDATA[kvm]]></VM_MAD>
<LAST_MON_TIME>1563458327</LAST_MON_TIME>
<CLUSTER_ID>0</CLUSTER_ID>
<CLUSTER>default</CLUSTER>
<HOST_SHARE>
<DISK_USAGE>0</DISK_USAGE>
<MEM_USAGE>0</MEM_USAGE>
<CPU_USAGE>0</CPU_USAGE>
<TOTAL_MEM>0</TOTAL_MEM>
<TOTAL_CPU>0</TOTAL_CPU>
<MAX_DISK>0</MAX_DISK>
<MAX_MEM>0</MAX_MEM>
<MAX_CPU>0</MAX_CPU>
<FREE_DISK>0</FREE_DISK>
<FREE_MEM>0</FREE_MEM>
<FREE_CPU>0</FREE_CPU>
<USED_DISK>0</USED_DISK>
<USED_MEM>0</USED_MEM>
<USED_CPU>0</USED_CPU>
<RUNNING_VMS>0</RUNNING_VMS>
<DATASTORES/>
<PCI_DEVICES/>
</HOST_SHARE>
<VMS/>
<TEMPLATE>
<CLUSTER_ID><![CDATA[0]]></CLUSTER_ID>
<ERROR><![CDATA[Thu Jul 18 15:55:33 2019 : Error monitoring Host 10.0.3.58 (53): Timeout executing 'if [ -x "/var/tmp/one/im/run_probes" ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 53 10.0.3.58; else exit 42; fi']]></ERROR>
<IM_MAD><![CDATA[kvm]]></IM_MAD>
<NAME><![CDATA[10.0.3.58]]></NAME>
<RESERVED_CPU><![CDATA[]]></RESERVED_CPU>
<RESERVED_MEM><![CDATA[]]></RESERVED_MEM>
<VM_MAD><![CDATA[kvm]]></VM_MAD>
</TEMPLATE>
</HOST>
ahuertas
(Alejandro Huertas)
July 19, 2019, 7:39am
10
Does ssh 10.0.3.58
work passwordless from frontend?
Yes, first thing I tested
ahuertas
(Alejandro Huertas)
July 22, 2019, 7:48am
12
Could you try to execute the polling command but from the frontend ssh 10.0.3.58 -e <COMMAND>
.
output is empty, still errors out
ahuertas
(Alejandro Huertas)
July 22, 2019, 11:21am
14
When you execute the command, you don’t get the same output as executing it on the host?
checked again, no I do, its same output as printed above
ahuertas
(Alejandro Huertas)
July 22, 2019, 12:35pm
16
What value do you have in oned.conf for MONITORING_INTERVAL_HOST?
the default value, didnt touch anything there
ahuertas
(Alejandro Huertas)
July 23, 2019, 7:49am
18
Try onehost sync --force
and then offline /enable that host.
either it got deleted or I posted it in the wrong thrread but I had it fixed by just reinstalling the package, for some magical reason it worked, despite me attempting this before.