HOST failure HOOK execution failed

Hi,

I’ve enabled hook in the oned.conf:

$ grep -A5 ^HOST_HOOK /etc/one/oned.conf 
HOST_HOOK = [
    NAME      = "error",
    ON        = "ERROR",
    COMMAND   = "ft/host_error.rb",
    ARGUMENTS = "$ID -m -p 2",
    REMOTE    = "no" ]

had restart opennebula service and shutdown the kvm node.

The VM status has changed:

/var/log/one/276.log
Fri Oct 11 16:17:02 2019 [Z0][VM][I]: New LCM state is RUNNING
Fri Oct 11 16:27:44 2019 [Z0][VM][I]: New LCM state is UNKNOWN

oned noticed an host error and tried to execute hook but something is went wrong:

/var/log/one/oned.log
Fri Oct 11 16:31:06 2019 [Z0][InM][I]: ssh: connect to host one-kvm-node-02-int port 22: No route to host

$ grep HKM /var/log/one/oned.log
Fri Oct 11 16:24:48 2019 [Z0][HKM][I]: Starting Hook Manager...
Fri Oct 11 16:24:48 2019 [Z0][HKM][I]: Loading Hook Manager driver.
Fri Oct 11 16:24:48 2019 [Z0][HKM][I]: Hook Manager started.
Fri Oct 11 16:24:48 2019 [Z0][HKM][I]:  Hook Manager loaded
Fri Oct 11 16:33:44 2019 [Z0][HKM][D]: Message received: LOG I 7 Command execution failed (exit code: 255): /var/lib/one/remotes//hooks/ft/host_error.rb 7 -m -p 2
Fri Oct 11 16:33:44 2019 [Z0][HKM][D]: Message received: EXECUTE FAILURE 7 error: -

where to dig deeper to find out the error cause?


Versions of the related components and OS (frontend, hypervisors, VMs):
OpenNebula 5.8.5
KVM nodes: CentOS Linux release 7.7.1908 (Core)
VM: Ubuntu 19.04

That name mus be resolved from frontend, to add it to /etc/hosts and make sure you can ping it.

Hi,

Sorry, you’ve missed my point. I had shutdown the node to check hook. So the behavior is correct:

Fri Oct 11 16:40:55 2019 [Z0][InM][I]: ssh: connect to host one-kvm-node-02-int port 22: No route to host
Fri Oct 11 16:40:58 2019 [Z0][InM][I]: Command execution failed (exit code: 255): 'if [ -x "/var/tmp/one/im/run_probes" ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 7 one-kvm-node-02-int; else                              exit 42; fi'
Fri Oct 11 16:40:59 2019 [Z0][InM][I]: Command execution failed (exit code: 255): 'if [ -x "/var/tmp/one/im/run_probes" ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 7 one-kvm-node-02-int; else                              exit 42; fi'
Fri Oct 11 16:40:59 2019 [Z0][ONE][E]: Error monitoring Host one-kvm-node-02-int (7): -

The node name already resolved.

Hi,

I think that most of your open issues are due to incomplete reconfiguration after the change of the default XMLRPC port.

Yes, it seems I had to first grep “2633” through all conf files before changing the default XMLRPC port.
But this case is not relevant to port configuration, imo.

So I’ve turned back all port configurations:

$ grep -r 2633 /etc/one/* | grep -v ':#'
/etc/one/econe.conf::one_xmlrpc: http://localhost:2633/RPC2
/etc/one/oned.conf:PORT = 2633
/etc/one/oneflow-server.conf::one_xmlrpc: http://localhost:2633/RPC2
/etc/one/onegate-server.conf::one_xmlrpc: http://localhost:2633/RPC2
/etc/one/sched.conf:ONE_XMLRPC = "http://localhost:2633/RPC2"
/etc/one/sunstone-server.conf::one_xmlrpc: http://localhost:2633/RPC2

sent to reboot opennebula server but hook still failed to execute:

Wed Oct 16 13:25:53 2019 [Z0][HKM][D]: Message received: LOG I 7 Command execution failed (exit code: 255): /var/lib/one/remotes//hooks/ft/host_error.rb 7 -m -p 2
Wed Oct 16 13:25:53 2019 [Z0][HKM][D]: Message received: EXECUTE FAILURE 7 error: -

Hello @gray380

If you execute the hook manually, does it fail?

I don’t know, it outputs nothing to the console:

[oneadmin@one-srv-01 ~]$ onehost list
  ID NAME            CLUSTER   TVM      ALLOCATED_CPU      ALLOCATED_MEM STAT  
  10 one-lxd-node-01 tsu_kvm     1    100 / 800 (12%)   2G / 19.6G (10%) on    
   9 one-kvm-node-01 tsu_kvm     7   800 / 1600 (50%)    20G / 48G (41%) on    
   7 one-kvm-node-02 tsu_kvm     4   600 / 3200 (18%)    18G / 48G (37%) err   
[oneadmin@one-srv-01 ~]$ /var/lib/one/remotes//hooks/ft/host_error.rb 7 -m -p 2
[oneadmin@one-srv-01 ~]$

Can you please check the exit code echo $??

Of course:

$ echo $?
0

That exit code is right after executing the hook failing command, right?

Sorry, that was not right after…
So, here we are:

Wed Oct 23 12:12:21 2019 [Z0][HKM][I]: Starting Hook Manager...
Wed Oct 23 12:12:21 2019 [Z0][HKM][I]: Loading Hook Manager driver.
Wed Oct 23 12:12:21 2019 [Z0][HKM][I]: Hook Manager started.
Wed Oct 23 12:12:21 2019 [Z0][HKM][I]:  Hook Manager loaded

12:30 one-kvm-node-02 turned off

Wed Oct 23 12:38:21 2019 [Z0][HKM][D]: Message received: LOG I 7 Command execution failed (exit code: 255): /var/lib/one/remotes//hooks/ft/host_error.rb 7 -m -p 2
Wed Oct 23 12:38:21 2019 [Z0][HKM][D]: Message received: EXECUTE FAILURE 7 error: -


[oneadmin@one-srv-01 ~]$ /var/lib/one/remotes//hooks/ft/host_error.rb 7 -m -p 2
[oneadmin@one-srv-01 ~]$ echo $?
255

[oneadmin@one-srv-01 ~]$ logout
[root@one-srv-01 sadm]# /var/lib/one/remotes//hooks/ft/host_error.rb 7 -m -p 2
[root@one-srv-01 sadm]# echo $?
255

As you can see the exit code is 255 so please check that your script is returning a 0 if everything is working correctly. In case of failing, you can add some log output to see what’s going wrong.

That was my very first question :slight_smile:

which logs I have to check?

BTW if the node is up and running then scripts retutns 0:

[oneadmin@one-srv-01 ~]$ onehost list
  ID NAME            CLUSTER   TVM      ALLOCATED_CPU      ALLOCATED_MEM STAT  
  10 one-lxd-node-01 tsu_kvm     1    100 / 800 (12%)   2G / 19.6G (10%) on    
   9 one-kvm-node-01 tsu_kvm     7   800 / 1600 (50%)    20G / 48G (41%) on    
   7 one-kvm-node-02 tsu_kvm     5   700 / 3200 (21%)    20G / 48G (41%) on 

[oneadmin@one-srv-01 ~]$ /var/lib/one/remotes//hooks/ft/host_error.rb 9 -m -p 2
[oneadmin@one-srv-01 ~]$ echo $?
0

Okay, there is a /var/log/one/host_error.log:

[2019-10-23 12:38:21 +0300][HOST 7][E] Fencing error
[2019-10-23 12:38:21 +0300][HOST 7][E] Exiting due to previous error.
[2019-10-23 12:39:17 +0300][HOST 7][I] Hook launched
[2019-10-23 12:39:17 +0300][HOST 7][I] hostname: one-kvm-node-02-int
[2019-10-23 12:39:17 +0300][HOST 7][I] Wait 2 cycles.
[2019-10-23 12:39:17 +0300][HOST 7][I] Sleeping 360 seconds.
[2019-10-23 12:45:17 +0300][HOST 7][I] Fencing enabled
[2019-10-23 12:45:17 +0300][HOST 7][E] Fence host not configured, please edit ft/fence_host.sh

so will look for fence_host.sh at maunals… if any

@gray380 maybe this link is useful for you.

Thanks, but not really, because it lacks the examples of fencing mechanism using from the opennebula prospective. So it will be some kind of homework :slight_smile:

There is an example with ilo fence and password in the plaintext within conf file, which is not good, imo.