I have three nodes for RAFT cluster:
onenode-1 (leader and FIP)
onenode-2 (follower)
onenode-3 (follower)
At 09:31:08 the RPC on the leader node became unavailable.
Web interface works (sunstone)
MySQL DB is working
Leader noda sends messages keepalive
FIP - remains on the leder node (10.191.171.100)
2: ens3 inet 10.191.171.9/23 brd 10.191.171.255 scope global ens3
2: ens3 inet 10.191.171.100/23 scope global secondary ens3
Open nebula service launched
root@onenode-1:~# systemctl status opennebula
● opennebula.service - OpenNebula Cloud Controller Daemon
Loaded: loaded (/lib/systemd/system/opennebula.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2019-04-19 15:00:01 +07; 2 days ago
Port 2633 is open, but it has packets in the status of Recv-Q
root@onenode-1:~# netstat -lpn4
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 16 0 0.0.0.0:2633 0.0.0.0:* LISTEN 30262/oned
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN 2053/mysqld
tcp 0 0 0.0.0.0:9869 0.0.0.0:* LISTEN 2545/ruby
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 612/rpcbind
tcp 0 0 0.0.0.0:29876 0.0.0.0:* LISTEN 2171/python2
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 614/systemd-resolve
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 2071/sshd
tcp 0 0 0.0.0.0:10050 0.0.0.0:* LISTEN 20003/zabbix_agentd
udp 178944 0 0.0.0.0:4124 0.0.0.0:* 30398/collectd
udp 0 0 127.0.0.53:53 0.0.0.0:* 614/systemd-resolve
udp 0 0 0.0.0.0:111 0.0.0.0:* 612/rpcbind
udp 0 0 0.0.0.0:788 0.0.0.0:* 612/rpcbind
As a result, received a non-working cluster, this situation has already happened twice.
The problem is solved by restarting the service Opennebula. but This is not an optio.
Because stops the work of clients in the cloud until the administrator finds this problem.
LOGs
onenode-1
onezone
root@onenode-1:~# onezone show 0
execution expired
/var/log/one/oned.log
Sun Apr 21 09:30:19 2019 [Z0][ReM][D]: Req:1472 UID:0 one.zone.raftstatus invoked
Sun Apr 21 09:30:19 2019 [Z0][ReM][D]: Req:1472 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>0</..."
Sun Apr 21 09:30:19 2019 [Z0][ReM][D]: Req:9936 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:30:19 2019 [Z0][ReM][D]: Req:9936 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:30:19 2019 [Z0][ReM][D]: Req:2000 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:30:19 2019 [Z0][ReM][D]: Req:2000 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:30:44 2019 [Z0][InM][D]: Host nodehost-3 (4) successfully monitored.
Sun Apr 21 09:30:49 2019 [Z0][ReM][D]: Req:8016 UID:0 one.zone.raftstatus invoked
Sun Apr 21 09:30:49 2019 [Z0][ReM][D]: Req:8016 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>0</..."
Sun Apr 21 09:30:49 2019 [Z0][ReM][D]: Req:2752 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:30:49 2019 [Z0][ReM][D]: Req:2752 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:30:49 2019 [Z0][ReM][D]: Req:6800 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:30:49 2019 [Z0][ReM][D]: Req:6800 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:31:00 2019 [Z0][InM][D]: Host nodehost-2 (1) successfully monitored.
Sun Apr 21 09:31:00 2019 [Z0][VMM][D]: VM 1 successfully monitored: DISK_SIZE=[ID=0,SIZE=409] DISK_SIZE=[ID=1,SIZE=1] STATE=a CPU=0.0 MEMORY=786432 NETRX=20816 NETTX=20956 DISKRDBYTES=193706860 DISKWRBYTES=214742528 DISKRDIOPS=15754 DISKWRIOPS=23959
Sun Apr 21 09:31:19 2019 [Z0][InM][D]: Host nodehost-1 (0) successfully monitored.
Sun Apr 21 09:31:19 2019 [Z0][VMM][D]: VM 2 successfully monitored: DISK_SIZE=[ID=0,SIZE=28] DISK_SIZE=[ID=1,SIZE=1] STATE=a CPU=0.0 MEMORY=131072 DISKRDBYTES=13868632 DISKWRBYTES=2501632 DISKRDIOPS=986 DISKWRIOPS=1676
Sun Apr 21 09:31:19 2019 [Z0][ReM][D]: Req:6672 UID:0 one.zone.raftstatus invoked
Sun Apr 21 09:31:19 2019 [Z0][ReM][D]: Req:6672 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>0</..."
Sun Apr 21 09:31:19 2019 [Z0][ReM][D]: Req:1616 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:31:19 2019 [Z0][ReM][D]: Req:1616 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:31:19 2019 [Z0][ReM][D]: Req:7600 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:31:19 2019 [Z0][ReM][D]: Req:7600 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:31:45 2019 [Z0][InM][D]: Host nodehost-3 (4) successfully monitored.
Sun Apr 21 09:31:49 2019 [Z0][ReM][D]: Req:2800 UID:0 one.zone.raftstatus invoked
Sun Apr 21 09:31:49 2019 [Z0][ReM][D]: Req:2800 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>0</..."
Sun Apr 21 09:31:49 2019 [Z0][ReM][D]: Req:4336 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:31:49 2019 [Z0][ReM][D]: Req:4336 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:31:49 2019 [Z0][ReM][D]: Req:9856 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:31:49 2019 [Z0][ReM][D]: Req:9856 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:32:01 2019 [Z0][InM][D]: Host nodehost-2 (1) successfully monitored.
Sun Apr 21 09:32:01 2019 [Z0][VMM][D]: VM 1 successfully monitored: DISK_SIZE=[ID=0,SIZE=409] DISK_SIZE=[ID=1,SIZE=1] STATE=a CPU=0.0 MEMORY=786432 NETRX=20816 NETTX=20956 DISKRDBYTES=193706860 DISKWRBYTES=214763008 DISKRDIOPS=15754 DISKWRIOPS=23963
Sun Apr 21 09:32:19 2019 [Z0][InM][D]: Host nodehost-1 (0) successfully monitored.
Sun Apr 21 09:32:19 2019 [Z0][VMM][D]: VM 2 successfully monitored: DISK_SIZE=[ID=0,SIZE=28] DISK_SIZE=[ID=1,SIZE=1] STATE=a CPU=0.0 MEMORY=131072 DISKRDBYTES=13868632 DISKWRBYTES=2501632 DISKRDIOPS=986 DISKWRIOPS=1676
Sun Apr 21 09:32:19 2019 [Z0][ReM][D]: Req:2112 UID:0 one.zone.raftstatus invoked
Sun Apr 21 09:32:19 2019 [Z0][ReM][D]: Req:2112 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>0</..."
Sun Apr 21 09:32:19 2019 [Z0][ReM][D]: Req:8560 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:32:19 2019 [Z0][ReM][D]: Req:8560 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:32:19 2019 [Z0][ReM][D]: Req:320 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:32:19 2019 [Z0][ReM][D]: Req:320 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:32:41 2019 [Z0][VMM][I]: --Mark--
Sun Apr 21 09:32:46 2019 [Z0][InM][D]: Host nodehost-3 (4) successfully monitored.
onenode-2
onezone
root@onenode-2:~# onezone show 0
ZONE 0 INFORMATION
ID : 0
NAME : OpenNebula
ZONE SERVERS
ID NAME ENDPOINT
0 onenode-1 http://10.191.171.9:2633/RPC2
1 onenode-2 http://10.191.171.30:2633/RPC2
2 onenode-3 http://10.191.171.21:2633/RPC2
HA & FEDERATION SYNC STATUS
ID NAME STATE TERM INDEX COMMIT VOTE FED_INDEX
0 onenode-1 error - - - - -
1 onenode-2 follower 1102 205424 205424 -1 -1
2 onenode-3 follower 1102 194443 194443 0 -1
ZONE TEMPLATE
ENDPOINT="http://localhost:2633/RPC2"
/var/log/one/oned.log
Sun Apr 21 09:28:21 2019 [Z0][ReM][D]: Req:8384 UID:0 one.zone.raftstatus invoked
Sun Apr 21 09:28:21 2019 [Z0][ReM][D]: Req:8384 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:28:51 2019 [Z0][ReM][D]: Req:9200 UID:0 one.zone.raftstatus invoked
Sun Apr 21 09:28:51 2019 [Z0][ReM][D]: Req:9200 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:29:21 2019 [Z0][ReM][D]: Req:8880 UID:0 one.zone.raftstatus invoked
Sun Apr 21 09:29:21 2019 [Z0][ReM][D]: Req:8880 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:29:51 2019 [Z0][ReM][D]: Req:3216 UID:0 one.zone.raftstatus invoked
Sun Apr 21 09:29:51 2019 [Z0][ReM][D]: Req:3216 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:30:21 2019 [Z0][ReM][D]: Req:3248 UID:0 one.zone.raftstatus invoked
Sun Apr 21 09:30:21 2019 [Z0][ReM][D]: Req:3248 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:30:38 2019 [Z0][MKP][I]: --Mark--
Sun Apr 21 09:30:38 2019 [Z0][ImM][I]: --Mark--
Sun Apr 21 09:30:38 2019 [Z0][InM][I]: --Mark--
Sun Apr 21 09:30:51 2019 [Z0][ReM][D]: Req:9936 UID:0 one.zone.raftstatus invoked
Sun Apr 21 09:30:51 2019 [Z0][ReM][D]: Req:9936 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:30:57 2019 [Z0][VMM][I]: --Mark--
Sun Apr 21 09:31:21 2019 [Z0][ReM][D]: Req:1840 UID:0 one.zone.raftstatus invoked
Sun Apr 21 09:31:21 2019 [Z0][ReM][D]: Req:1840 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:31:51 2019 [Z0][ReM][D]: Req:3440 UID:0 one.zone.raftstatus invoked
Sun Apr 21 09:31:51 2019 [Z0][ReM][D]: Req:3440 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:32:21 2019 [Z0][ReM][D]: Req:5344 UID:0 one.zone.raftstatus invoked
onenode-3
onezone
root@onenode-3:~# onezone show 0
ZONE 0 INFORMATION
ID : 0
NAME : OpenNebula
ZONE SERVERS
ID NAME ENDPOINT
0 onenode-1 http://10.191.171.9:2633/RPC2
1 onenode-2 http://10.191.171.30:2633/RPC2
2 onenode-3 http://10.191.171.21:2633/RPC2
HA & FEDERATION SYNC STATUS
ID NAME STATE TERM INDEX COMMIT VOTE FED_INDEX
0 onenode-1 error - - - - -
1 onenode-2 follower 1102 205424 205424 -1 -1
2 onenode-3 follower 1102 194443 194443 0 -1
ZONE TEMPLATE
ENDPOINT="http://localhost:2633/RPC2"
/var/log/one/oned.log
Mon Apr 22 09:29:53 2019 [Z0][ReM][D]: Req:8160 UID:0 one.zone.raftstatus invoked
Mon Apr 22 09:29:53 2019 [Z0][ReM][D]: Req:8160 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Mon Apr 22 09:30:23 2019 [Z0][ReM][D]: Req:9664 UID:0 one.zone.raftstatus invoked
Mon Apr 22 09:30:23 2019 [Z0][ReM][D]: Req:9664 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Mon Apr 22 09:30:53 2019 [Z0][ReM][D]: Req:7648 UID:0 one.zone.raftstatus invoked
Mon Apr 22 09:30:53 2019 [Z0][ReM][D]: Req:7648 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Mon Apr 22 09:31:23 2019 [Z0][ReM][D]: Req:256 UID:0 one.zone.raftstatus invoked
Mon Apr 22 09:31:23 2019 [Z0][ReM][D]: Req:256 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Mon Apr 22 09:31:23 2019 [Z0][MKP][I]: --Mark--
Mon Apr 22 09:31:23 2019 [Z0][ImM][I]: --Mark--
Mon Apr 22 09:31:23 2019 [Z0][InM][I]: --Mark--
Mon Apr 22 09:31:40 2019 [Z0][VMM][I]: --Mark--
Mon Apr 22 09:31:53 2019 [Z0][ReM][D]: Req:7008 UID:0 one.zone.raftstatus invoked
Mon Apr 22 09:31:53 2019 [Z0][ReM][D]: Req:7008 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Mon Apr 22 09:32:23 2019 [Z0][ReM][D]: Req:8496 UID:0 one.zone.raftstatus invoked
Mon Apr 22 09:32:23 2019 [Z0][ReM][D]: Req:8496 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Mon Apr 22 09:32:53 2019 [Z0][ReM][D]: Req:9472 UID:0 one.zone.raftstatus invoked