Hi,
I’ve enabled hook in the oned.conf:
$ grep -A5 ^HOST_HOOK /etc/one/oned.conf
HOST_HOOK = [
NAME = "error",
ON = "ERROR",
COMMAND = "ft/host_error.rb",
ARGUMENTS = "$ID -m -p 2",
REMOTE = "no" ]
had restart opennebula service and shutdown the kvm node.
The VM status has changed:
/var/log/one/276.log
Fri Oct 11 16:17:02 2019 [Z0][VM][I]: New LCM state is RUNNING
Fri Oct 11 16:27:44 2019 [Z0][VM][I]: New LCM state is UNKNOWN
oned noticed an host error and tried to execute hook but something is went wrong:
/var/log/one/oned.log
Fri Oct 11 16:31:06 2019 [Z0][InM][I]: ssh: connect to host one-kvm-node-02-int port 22: No route to host
$ grep HKM /var/log/one/oned.log
Fri Oct 11 16:24:48 2019 [Z0][HKM][I]: Starting Hook Manager...
Fri Oct 11 16:24:48 2019 [Z0][HKM][I]: Loading Hook Manager driver.
Fri Oct 11 16:24:48 2019 [Z0][HKM][I]: Hook Manager started.
Fri Oct 11 16:24:48 2019 [Z0][HKM][I]: Hook Manager loaded
Fri Oct 11 16:33:44 2019 [Z0][HKM][D]: Message received: LOG I 7 Command execution failed (exit code: 255): /var/lib/one/remotes//hooks/ft/host_error.rb 7 -m -p 2
Fri Oct 11 16:33:44 2019 [Z0][HKM][D]: Message received: EXECUTE FAILURE 7 error: -
where to dig deeper to find out the error cause?
Versions of the related components and OS (frontend, hypervisors, VMs):
OpenNebula 5.8.5
KVM nodes: CentOS Linux release 7.7.1908 (Core)
VM: Ubuntu 19.04
ahuertas
(Alejandro Huertas)
October 11, 2019, 2:01pm
2
gray380:
one-kvm-node-02-int
That name mus be resolved from frontend, to add it to /etc/hosts
and make sure you can ping it.
Hi,
Sorry, you’ve missed my point. I had shutdown the node to check hook. So the behavior is correct:
Fri Oct 11 16:40:55 2019 [Z0][InM][I]: ssh: connect to host one-kvm-node-02-int port 22: No route to host
Fri Oct 11 16:40:58 2019 [Z0][InM][I]: Command execution failed (exit code: 255): 'if [ -x "/var/tmp/one/im/run_probes" ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 7 one-kvm-node-02-int; else exit 42; fi'
Fri Oct 11 16:40:59 2019 [Z0][InM][I]: Command execution failed (exit code: 255): 'if [ -x "/var/tmp/one/im/run_probes" ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 60 7 one-kvm-node-02-int; else exit 42; fi'
Fri Oct 11 16:40:59 2019 [Z0][ONE][E]: Error monitoring Host one-kvm-node-02-int (7): -
The node name already resolved.
Hi,
I think that most of your open issues are due to incomplete reconfiguration after the change of the default XMLRPC port.
Hello,
I’ve changed the default 2633 port to 26633 in the /etc/oned.conf:
PORT = 26633
and accordingly – /etc/one/sunstone-server.conf:
:one_xmlrpc: http://localhost:26633/RPC2
so sunstone works, but there is an issue with CLI:
[oneadmin@one-srv-01 ~]$ onevm list
Connection refused - connect(2)
I assume it tries to connect to 2633 port.
Could you advise how to solve this problem?
gray380
October 15, 2019, 11:12am
5
Yes, it seems I had to first grep “2633” through all conf files before changing the default XMLRPC port.
But this case is not relevant to port configuration, imo.
gray380
October 16, 2019, 10:49am
6
So I’ve turned back all port configurations:
$ grep -r 2633 /etc/one/* | grep -v ':#'
/etc/one/econe.conf::one_xmlrpc: http://localhost:2633/RPC2
/etc/one/oned.conf:PORT = 2633
/etc/one/oneflow-server.conf::one_xmlrpc: http://localhost:2633/RPC2
/etc/one/onegate-server.conf::one_xmlrpc: http://localhost:2633/RPC2
/etc/one/sched.conf:ONE_XMLRPC = "http://localhost:2633/RPC2"
/etc/one/sunstone-server.conf::one_xmlrpc: http://localhost:2633/RPC2
sent to reboot opennebula server but hook still failed to execute:
Wed Oct 16 13:25:53 2019 [Z0][HKM][D]: Message received: LOG I 7 Command execution failed (exit code: 255): /var/lib/one/remotes//hooks/ft/host_error.rb 7 -m -p 2
Wed Oct 16 13:25:53 2019 [Z0][HKM][D]: Message received: EXECUTE FAILURE 7 error: -
ahuertas
(Alejandro Huertas)
October 16, 2019, 1:13pm
7
Hello @gray380
If you execute the hook manually, does it fail?
I don’t know, it outputs nothing to the console:
[oneadmin@one-srv-01 ~]$ onehost list
ID NAME CLUSTER TVM ALLOCATED_CPU ALLOCATED_MEM STAT
10 one-lxd-node-01 tsu_kvm 1 100 / 800 (12%) 2G / 19.6G (10%) on
9 one-kvm-node-01 tsu_kvm 7 800 / 1600 (50%) 20G / 48G (41%) on
7 one-kvm-node-02 tsu_kvm 4 600 / 3200 (18%) 18G / 48G (37%) err
[oneadmin@one-srv-01 ~]$ /var/lib/one/remotes//hooks/ft/host_error.rb 7 -m -p 2
[oneadmin@one-srv-01 ~]$
ahuertas
(Alejandro Huertas)
October 16, 2019, 2:36pm
9
Can you please check the exit code echo $?
?
ahuertas
(Alejandro Huertas)
October 22, 2019, 7:23am
11
That exit code is right after executing the hook failing command, right?
gray380
October 23, 2019, 11:02am
12
Sorry, that was not right after…
So, here we are:
Wed Oct 23 12:12:21 2019 [Z0][HKM][I]: Starting Hook Manager...
Wed Oct 23 12:12:21 2019 [Z0][HKM][I]: Loading Hook Manager driver.
Wed Oct 23 12:12:21 2019 [Z0][HKM][I]: Hook Manager started.
Wed Oct 23 12:12:21 2019 [Z0][HKM][I]: Hook Manager loaded
12:30 one-kvm-node-02 turned off
Wed Oct 23 12:38:21 2019 [Z0][HKM][D]: Message received: LOG I 7 Command execution failed (exit code: 255): /var/lib/one/remotes//hooks/ft/host_error.rb 7 -m -p 2
Wed Oct 23 12:38:21 2019 [Z0][HKM][D]: Message received: EXECUTE FAILURE 7 error: -
[oneadmin@one-srv-01 ~]$ /var/lib/one/remotes//hooks/ft/host_error.rb 7 -m -p 2
[oneadmin@one-srv-01 ~]$ echo $?
255
[oneadmin@one-srv-01 ~]$ logout
[root@one-srv-01 sadm]# /var/lib/one/remotes//hooks/ft/host_error.rb 7 -m -p 2
[root@one-srv-01 sadm]# echo $?
255
ahuertas
(Alejandro Huertas)
October 23, 2019, 11:24am
13
As you can see the exit code is 255 so please check that your script is returning a 0 if everything is working correctly. In case of failing, you can add some log output to see what’s going wrong.
gray380
October 23, 2019, 11:27am
14
That was my very first question
which logs I have to check?
gray380
October 23, 2019, 11:52am
15
BTW if the node is up and running then scripts retutns 0:
[oneadmin@one-srv-01 ~]$ onehost list
ID NAME CLUSTER TVM ALLOCATED_CPU ALLOCATED_MEM STAT
10 one-lxd-node-01 tsu_kvm 1 100 / 800 (12%) 2G / 19.6G (10%) on
9 one-kvm-node-01 tsu_kvm 7 800 / 1600 (50%) 20G / 48G (41%) on
7 one-kvm-node-02 tsu_kvm 5 700 / 3200 (21%) 20G / 48G (41%) on
[oneadmin@one-srv-01 ~]$ /var/lib/one/remotes//hooks/ft/host_error.rb 9 -m -p 2
[oneadmin@one-srv-01 ~]$ echo $?
0
gray380
October 23, 2019, 12:01pm
16
Okay, there is a /var/log/one/host_error.log:
[2019-10-23 12:38:21 +0300][HOST 7][E] Fencing error
[2019-10-23 12:38:21 +0300][HOST 7][E] Exiting due to previous error.
[2019-10-23 12:39:17 +0300][HOST 7][I] Hook launched
[2019-10-23 12:39:17 +0300][HOST 7][I] hostname: one-kvm-node-02-int
[2019-10-23 12:39:17 +0300][HOST 7][I] Wait 2 cycles.
[2019-10-23 12:39:17 +0300][HOST 7][I] Sleeping 360 seconds.
[2019-10-23 12:45:17 +0300][HOST 7][I] Fencing enabled
[2019-10-23 12:45:17 +0300][HOST 7][E] Fence host not configured, please edit ft/fence_host.sh
so will look for fence_host.sh at maunals… if any
ahuertas
(Alejandro Huertas)
October 23, 2019, 4:09pm
17
@gray380 maybe this link is useful for you.
gray380
October 24, 2019, 7:34am
18
Thanks, but not really, because it lacks the examples of fencing mechanism using from the opennebula prospective. So it will be some kind of homework
There is an example with ilo fence and password in the plaintext within conf file, which is not good, imo.