Hello, I’m getting the following error into the oned.log when trying to execute the host_error hook:
Command execution failed (exit code: 255): /var/lib/one/remotes/datastore/ceph/monitor PERTX0RSSVZF.....some big encoded stuff
[Z0][ImM][E]: Error monitoring datastore 102: LQ==. Decoded info: -
My OpenNebula Cluster consists of 4 Machines, with one beeing the Master (Frontend) and the other 3 are the Nodes and their own ceph-Cluster. Creating Machines onto the Ceph-Datastore works. Live-Migrating a VM from one to another node works as well.
But when I turn for example Node-2 off, which has a VM running on it, the VM does not get redeployed onto another Node unless I choose the “Reschedule” Option in Sunstone. It also takes some time till Sunstone recognizes that a Node is down and the State of the VM is UNKNOWN.
I used the existing host_error Template and modified the cycle from 5 to 2.
ARGUMENTS = "$TEMPLATE -m -p 2"
COMMAND = "/var/lib/one/remotes/hooks/ft/host_error.rb"
NAME = "host_error"
STATE = "ERROR"
REMOTE = "no"
RESOURCE = HOST
TYPE = state
From the onehem.log File:
BYZW9uKFIpIEUtMjE3NkcgQ1BVIEAgMy43MEdIel1dPjwvTU9ERUxOQU1FPjxOQU1FPjwhW0NEQVRBW25vZGUxXV0+PC9OQU1FPjxSRVNFUlZFRF9DUFU+PCFbQ0RBVEFbXV0+PC9SRVNFUlZFRF9DUFU+PFJFU0VSVkVEX01FTT48IVtDREFUQVtdXT48L1JFU0VSVkVEX01FTT48VkVSU0lPTj48IVtDREFUQVs2LjAuMC4yXV0+PC9WRVJTSU9OPjxWTV9NQUQ+PCFbQ0RBVEFba3ZtXV0+PC9WTV9NQUQ+PC9URU1QTEFURT48TU9OSVRPUklORy8+PC9IT1NUPjwvSE9PS19NRVNTQUdFPg==
Tue Sep 21 13:12:06 2021 [I]: Executing hook 6 for HOST/ERROR/
Tue Sep 21 13:14:06 2021 [E]: Failure executing hook 6 for HOST/ERROR/
Onehook show provides following information:
onehook show 6 -e 0
HOOK 6 INFORMATION
ID : 6
NAME : host_error
TYPE : state
LOCK : None
HOOK EXECUTION RECORD
EXECUTION ID : 0
TIMESTAMP : 09/21 13:14:06
COMMAND : /var/lib/one/remotes/hooks/ft/host_error.rb
ARGUMENTS : <HOST>
<ID>0</ID>
<NAME>node1</NAME>
<STATE>3</STATE>
<PREV_STATE>2</PREV_STATE>
<IM_MAD><![CDATA[kvm]]></IM_MAD>
<VM_MAD><![CDATA[kvm]]></VM_MAD>
<CLUSTER_ID>0</CLUSTER_ID>
<CLUSTER>default</CLUSTER>
<HOST_SHARE>
<MEM_USAGE>8388608</MEM_USAGE>
<CPU_USAGE>100</CPU_USAGE>
<TOTAL_MEM>65659868</TOTAL_MEM>
<TOTAL_CPU>1200</TOTAL_CPU>
<MAX_MEM>65659868</MAX_MEM>
<MAX_CPU>1200</MAX_CPU>
<RUNNING_VMS>1</RUNNING_VMS>
<VMS_THREAD>1</VMS_THREAD>
<DATASTORES>
<DISK_USAGE><![CDATA[0]]></DISK_USAGE>
<FREE_DISK><![CDATA[49113]]></FREE_DISK>
<MAX_DISK><![CDATA[51175]]></MAX_DISK>
<USED_DISK><![CDATA[2063]]></USED_DISK>
</DATASTORES>
<PCI_DEVICES/>
<NUMA_NODES>
<NODE>
<CORE>
<CPUS><![CDATA[0:-1,6:-1]]></CPUS>
<DEDICATED><![CDATA[NO]]></DEDICATED>
<FREE><![CDATA[2]]></FREE>
<ID><![CDATA[0]]></ID>
</CORE>
<CORE>
<CPUS><![CDATA[1:-1,7:-1]]></CPUS>
<DEDICATED><![CDATA[NO]]></DEDICATED>
<FREE><![CDATA[2]]></FREE>
<ID><![CDATA[1]]></ID>
</CORE>
<CORE>
<CPUS><![CDATA[2:-1,8:-1]]></CPUS>
<DEDICATED><![CDATA[NO]]></DEDICATED>
<FREE><![CDATA[2]]></FREE>
<ID><![CDATA[2]]></ID>
</CORE>
<CORE>
<CPUS><![CDATA[3:-1,9:-1]]></CPUS>
<DEDICATED><![CDATA[NO]]></DEDICATED>
<FREE><![CDATA[2]]></FREE>
<ID><![CDATA[3]]></ID>
</CORE>
<CORE>
<CPUS><![CDATA[4:-1,10:-1]]></CPUS>
<DEDICATED><![CDATA[NO]]></DEDICATED>
<FREE><![CDATA[2]]></FREE>
<ID><![CDATA[4]]></ID>
</CORE>
<CORE>
<CPUS><![CDATA[5:-1,11:-1]]></CPUS>
<DEDICATED><![CDATA[NO]]></DEDICATED>
<FREE><![CDATA[2]]></FREE>
<ID><![CDATA[5]]></ID>
</CORE>
<HUGEPAGE>
<FREE><![CDATA[0]]></FREE>
<PAGES><![CDATA[0]]></PAGES>
<SIZE><![CDATA[1048576]]></SIZE>
<USAGE><![CDATA[0]]></USAGE>
</HUGEPAGE>
<HUGEPAGE>
<FREE><![CDATA[0]]></FREE>
<PAGES><![CDATA[0]]></PAGES>
<SIZE><![CDATA[2048]]></SIZE>
<USAGE><![CDATA[0]]></USAGE>
</HUGEPAGE>
<MEMORY>
<DISTANCE><![CDATA[0]]></DISTANCE>
<FREE><![CDATA[0]]></FREE>
<TOTAL><![CDATA[65659868]]></TOTAL>
<USAGE><![CDATA[0]]></USAGE>
<USED><![CDATA[0]]></USED>
</MEMORY>
<NODE_ID><![CDATA[0]]></NODE_ID>
</NODE>
</NUMA_NODES>
</HOST_SHARE>
<VMS>
<ID>2</ID>
</VMS>
<TEMPLATE>
<ARCH><![CDATA[x86_64]]></ARCH>
<CLUSTER_ID><![CDATA[0]]></CLUSTER_ID>
<CPUSPEED><![CDATA[847]]></CPUSPEED>
<ERROR><![CDATA[Tue Sep 21 13:12:06 2021 : Error monitoring Host node1 (0): ]]></ERROR>
<HOSTNAME><![CDATA[node1]]></HOSTNAME>
<HYPERVISOR><![CDATA[kvm]]></HYPERVISOR>
<IM_MAD><![CDATA[kvm]]></IM_MAD>
<KVM_CPU_MODEL><![CDATA[Skylake-Client-IBRS]]></KVM_CPU_MODEL>
<KVM_CPU_MODELS><![CDATA[486 pentium pentium2 pentium3 pentiumpro coreduo n270 core2duo qemu32 kvm32 cpu64-rhel5 cpu64-rhel6 kvm64 qemu64 Conroe Penryn Nehalem Nehalem-IBRS Westmere Westmere-IBRS SandyBridge SandyBridge-IBRS IvyBridge IvyBridge-IBRS Haswell-noTSX Haswell-noTSX-IBRS Haswell Haswell-IBRS Broadwell-noTSX Broadwell-noTSX-IBRS Broadwell Broadwell-IBRS Skylake-Client Skylake-Client-IBRS Skylake-Server Skylake-Server-IBRS Icelake-Client Icelake-Server athlon phenom Opteron_G1 Opteron_G2 Opteron_G3 Opteron_G4 Opteron_G5 EPYC EPYC-IBPB]]></KVM_CPU_MODELS>
<KVM_MACHINES><![CDATA[pc-i440fx-rhel7.0.0 pc rhel6.0.0 rhel6.1.0 rhel6.2.0 rhel6.3.0 rhel6.4.0 rhel6.5.0 rhel6.6.0]]></KVM_MACHINES>
<MODELNAME><![CDATA[Intel(R) Xeon(R) E-2176G CPU @ 3.70GHz]]></MODELNAME>
<NAME><![CDATA[node1]]></NAME>
<RESERVED_CPU><![CDATA[]]></RESERVED_CPU>
<RESERVED_MEM><![CDATA[]]></RESERVED_MEM>
<VERSION><![CDATA[6.0.0.2]]></VERSION>
<VM_MAD><![CDATA[kvm]]></VM_MAD>
</TEMPLATE>
<MONITORING/>
</HOST> -m -p 2
EXIT CODE : 255
I hope that someone can help me out
Thanks in Advance