Unavailable RPC on Leder node. RAFT on opennebula 5.6.2

I have three nodes for RAFT cluster:
onenode-1 (leader and FIP)
onenode-2 (follower)
onenode-3 (follower)

At 09:31:08 the RPC on the leader node became unavailable.
Web interface works (sunstone)
MySQL DB is working
Leader noda sends messages keepalive
FIP - remains on the leder node (10.191.171.100)

2: ens3    inet 10.191.171.9/23 brd 10.191.171.255 scope global ens3
2: ens3    inet 10.191.171.100/23 scope global secondary ens3

Open nebula service launched

root@onenode-1:~# systemctl status  opennebula
● opennebula.service - OpenNebula Cloud Controller Daemon
   Loaded: loaded (/lib/systemd/system/opennebula.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2019-04-19 15:00:01 +07; 2 days ago

Port 2633 is open, but it has packets in the status of Recv-Q

 root@onenode-1:~# netstat -lpn4
    Active Internet connections (only servers)
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
    tcp       16      0 0.0.0.0:2633            0.0.0.0:*               LISTEN      30262/oned          
    tcp        0      0 0.0.0.0:3306            0.0.0.0:*               LISTEN      2053/mysqld         
    tcp        0      0 0.0.0.0:9869            0.0.0.0:*               LISTEN      2545/ruby           
    tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      612/rpcbind         
    tcp        0      0 0.0.0.0:29876           0.0.0.0:*               LISTEN      2171/python2        
    tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      614/systemd-resolve 
    tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      2071/sshd           
    tcp        0      0 0.0.0.0:10050           0.0.0.0:*               LISTEN      20003/zabbix_agentd 
    udp   178944      0 0.0.0.0:4124            0.0.0.0:*                           30398/collectd      
    udp        0      0 127.0.0.53:53           0.0.0.0:*                           614/systemd-resolve 
    udp        0      0 0.0.0.0:111             0.0.0.0:*                           612/rpcbind         
    udp        0      0 0.0.0.0:788             0.0.0.0:*                           612/rpcbind 

As a result, received a non-working cluster, this situation has already happened twice.
The problem is solved by restarting the service Opennebula. but This is not an optio.
Because stops the work of clients in the cloud until the administrator finds this problem.

LOGs
onenode-1
onezone

root@onenode-1:~# onezone show 0 
execution expired

/var/log/one/oned.log

Sun Apr 21 09:30:19 2019 [Z0][ReM][D]: Req:1472 UID:0 one.zone.raftstatus invoked 
Sun Apr 21 09:30:19 2019 [Z0][ReM][D]: Req:1472 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>0</..."
Sun Apr 21 09:30:19 2019 [Z0][ReM][D]: Req:9936 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:30:19 2019 [Z0][ReM][D]: Req:9936 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:30:19 2019 [Z0][ReM][D]: Req:2000 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:30:19 2019 [Z0][ReM][D]: Req:2000 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:30:44 2019 [Z0][InM][D]: Host nodehost-3 (4) successfully monitored.
Sun Apr 21 09:30:49 2019 [Z0][ReM][D]: Req:8016 UID:0 one.zone.raftstatus invoked 
Sun Apr 21 09:30:49 2019 [Z0][ReM][D]: Req:8016 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>0</..."
Sun Apr 21 09:30:49 2019 [Z0][ReM][D]: Req:2752 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:30:49 2019 [Z0][ReM][D]: Req:2752 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:30:49 2019 [Z0][ReM][D]: Req:6800 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:30:49 2019 [Z0][ReM][D]: Req:6800 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:31:00 2019 [Z0][InM][D]: Host nodehost-2 (1) successfully monitored.
Sun Apr 21 09:31:00 2019 [Z0][VMM][D]: VM 1 successfully monitored: DISK_SIZE=[ID=0,SIZE=409] DISK_SIZE=[ID=1,SIZE=1]  STATE=a CPU=0.0 MEMORY=786432 NETRX=20816 NETTX=20956 DISKRDBYTES=193706860 DISKWRBYTES=214742528 DISKRDIOPS=15754 DISKWRIOPS=23959
Sun Apr 21 09:31:19 2019 [Z0][InM][D]: Host nodehost-1 (0) successfully monitored.
Sun Apr 21 09:31:19 2019 [Z0][VMM][D]: VM 2 successfully monitored: DISK_SIZE=[ID=0,SIZE=28] DISK_SIZE=[ID=1,SIZE=1]  STATE=a CPU=0.0 MEMORY=131072 DISKRDBYTES=13868632 DISKWRBYTES=2501632 DISKRDIOPS=986 DISKWRIOPS=1676
Sun Apr 21 09:31:19 2019 [Z0][ReM][D]: Req:6672 UID:0 one.zone.raftstatus invoked 
Sun Apr 21 09:31:19 2019 [Z0][ReM][D]: Req:6672 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>0</..."
Sun Apr 21 09:31:19 2019 [Z0][ReM][D]: Req:1616 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:31:19 2019 [Z0][ReM][D]: Req:1616 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:31:19 2019 [Z0][ReM][D]: Req:7600 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:31:19 2019 [Z0][ReM][D]: Req:7600 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:31:45 2019 [Z0][InM][D]: Host nodehost-3 (4) successfully monitored.
Sun Apr 21 09:31:49 2019 [Z0][ReM][D]: Req:2800 UID:0 one.zone.raftstatus invoked 
Sun Apr 21 09:31:49 2019 [Z0][ReM][D]: Req:2800 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>0</..."
Sun Apr 21 09:31:49 2019 [Z0][ReM][D]: Req:4336 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:31:49 2019 [Z0][ReM][D]: Req:4336 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:31:49 2019 [Z0][ReM][D]: Req:9856 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:31:49 2019 [Z0][ReM][D]: Req:9856 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:32:01 2019 [Z0][InM][D]: Host nodehost-2 (1) successfully monitored.
Sun Apr 21 09:32:01 2019 [Z0][VMM][D]: VM 1 successfully monitored: DISK_SIZE=[ID=0,SIZE=409] DISK_SIZE=[ID=1,SIZE=1]  STATE=a CPU=0.0 MEMORY=786432 NETRX=20816 NETTX=20956 DISKRDBYTES=193706860 DISKWRBYTES=214763008 DISKRDIOPS=15754 DISKWRIOPS=23963
Sun Apr 21 09:32:19 2019 [Z0][InM][D]: Host nodehost-1 (0) successfully monitored.
Sun Apr 21 09:32:19 2019 [Z0][VMM][D]: VM 2 successfully monitored: DISK_SIZE=[ID=0,SIZE=28] DISK_SIZE=[ID=1,SIZE=1]  STATE=a CPU=0.0 MEMORY=131072 DISKRDBYTES=13868632 DISKWRBYTES=2501632 DISKRDIOPS=986 DISKWRIOPS=1676
Sun Apr 21 09:32:19 2019 [Z0][ReM][D]: Req:2112 UID:0 one.zone.raftstatus invoked 
Sun Apr 21 09:32:19 2019 [Z0][ReM][D]: Req:2112 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>0</..."
Sun Apr 21 09:32:19 2019 [Z0][ReM][D]: Req:8560 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:32:19 2019 [Z0][ReM][D]: Req:8560 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:32:19 2019 [Z0][ReM][D]: Req:320 UID:0 one.vmpool.info invoked , -2, -1, -1, -1
Sun Apr 21 09:32:19 2019 [Z0][ReM][D]: Req:320 UID:0 one.vmpool.info result SUCCESS, "<VM_POOL><VM><ID>1</..."
Sun Apr 21 09:32:41 2019 [Z0][VMM][I]: --Mark--
Sun Apr 21 09:32:46 2019 [Z0][InM][D]: Host nodehost-3 (4) successfully monitored.

onenode-2
onezone

root@onenode-2:~# onezone show 0 
ZONE 0 INFORMATION                                                              
ID                : 0                   
NAME              : OpenNebula          

ZONE SERVERS                                                                    
ID NAME            ENDPOINT                                                       
 0 onenode-1       http://10.191.171.9:2633/RPC2
 1 onenode-2       http://10.191.171.30:2633/RPC2
 2 onenode-3       http://10.191.171.21:2633/RPC2

HA & FEDERATION SYNC STATUS                                                     
ID NAME            STATE      TERM       INDEX      COMMIT     VOTE  FED_INDEX 
 0 onenode-1       error      -          -          -          -     -
 1 onenode-2       follower   1102       205424     205424     -1    -1
 2 onenode-3       follower   1102       194443     194443     0     -1

ZONE TEMPLATE                                                                   
ENDPOINT="http://localhost:2633/RPC2"

/var/log/one/oned.log

Sun Apr 21 09:28:21 2019 [Z0][ReM][D]: Req:8384 UID:0 one.zone.raftstatus invoked 
Sun Apr 21 09:28:21 2019 [Z0][ReM][D]: Req:8384 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:28:51 2019 [Z0][ReM][D]: Req:9200 UID:0 one.zone.raftstatus invoked 
Sun Apr 21 09:28:51 2019 [Z0][ReM][D]: Req:9200 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:29:21 2019 [Z0][ReM][D]: Req:8880 UID:0 one.zone.raftstatus invoked 
Sun Apr 21 09:29:21 2019 [Z0][ReM][D]: Req:8880 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:29:51 2019 [Z0][ReM][D]: Req:3216 UID:0 one.zone.raftstatus invoked 
Sun Apr 21 09:29:51 2019 [Z0][ReM][D]: Req:3216 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:30:21 2019 [Z0][ReM][D]: Req:3248 UID:0 one.zone.raftstatus invoked 
Sun Apr 21 09:30:21 2019 [Z0][ReM][D]: Req:3248 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:30:38 2019 [Z0][MKP][I]: --Mark--
Sun Apr 21 09:30:38 2019 [Z0][ImM][I]: --Mark--
Sun Apr 21 09:30:38 2019 [Z0][InM][I]: --Mark--
Sun Apr 21 09:30:51 2019 [Z0][ReM][D]: Req:9936 UID:0 one.zone.raftstatus invoked 
Sun Apr 21 09:30:51 2019 [Z0][ReM][D]: Req:9936 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:30:57 2019 [Z0][VMM][I]: --Mark--
Sun Apr 21 09:31:21 2019 [Z0][ReM][D]: Req:1840 UID:0 one.zone.raftstatus invoked 
Sun Apr 21 09:31:21 2019 [Z0][ReM][D]: Req:1840 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:31:51 2019 [Z0][ReM][D]: Req:3440 UID:0 one.zone.raftstatus invoked 
Sun Apr 21 09:31:51 2019 [Z0][ReM][D]: Req:3440 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>1</..."
Sun Apr 21 09:32:21 2019 [Z0][ReM][D]: Req:5344 UID:0 one.zone.raftstatus invoked

onenode-3
onezone

root@onenode-3:~# onezone show 0 
ZONE 0 INFORMATION                                                              
ID                : 0                   
NAME              : OpenNebula          

ZONE SERVERS                                                                    
ID NAME            ENDPOINT                                                       
 0 onenode-1       http://10.191.171.9:2633/RPC2
 1 onenode-2       http://10.191.171.30:2633/RPC2
 2 onenode-3       http://10.191.171.21:2633/RPC2

HA & FEDERATION SYNC STATUS                                                     
ID NAME            STATE      TERM       INDEX      COMMIT     VOTE  FED_INDEX 
 0 onenode-1       error      -          -          -          -     -
 1 onenode-2       follower   1102       205424     205424     -1    -1
 2 onenode-3       follower   1102       194443     194443     0     -1

ZONE TEMPLATE                                                                   
ENDPOINT="http://localhost:2633/RPC2"

/var/log/one/oned.log

Mon Apr 22 09:29:53 2019 [Z0][ReM][D]: Req:8160 UID:0 one.zone.raftstatus invoked 
Mon Apr 22 09:29:53 2019 [Z0][ReM][D]: Req:8160 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Mon Apr 22 09:30:23 2019 [Z0][ReM][D]: Req:9664 UID:0 one.zone.raftstatus invoked 
Mon Apr 22 09:30:23 2019 [Z0][ReM][D]: Req:9664 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Mon Apr 22 09:30:53 2019 [Z0][ReM][D]: Req:7648 UID:0 one.zone.raftstatus invoked 
Mon Apr 22 09:30:53 2019 [Z0][ReM][D]: Req:7648 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Mon Apr 22 09:31:23 2019 [Z0][ReM][D]: Req:256 UID:0 one.zone.raftstatus invoked 
Mon Apr 22 09:31:23 2019 [Z0][ReM][D]: Req:256 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Mon Apr 22 09:31:23 2019 [Z0][MKP][I]: --Mark--
Mon Apr 22 09:31:23 2019 [Z0][ImM][I]: --Mark--
Mon Apr 22 09:31:23 2019 [Z0][InM][I]: --Mark--
Mon Apr 22 09:31:40 2019 [Z0][VMM][I]: --Mark--
Mon Apr 22 09:31:53 2019 [Z0][ReM][D]: Req:7008 UID:0 one.zone.raftstatus invoked 
Mon Apr 22 09:31:53 2019 [Z0][ReM][D]: Req:7008 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Mon Apr 22 09:32:23 2019 [Z0][ReM][D]: Req:8496 UID:0 one.zone.raftstatus invoked 
Mon Apr 22 09:32:23 2019 [Z0][ReM][D]: Req:8496 UID:0 one.zone.raftstatus result SUCCESS, "<RAFT><SERVER_ID>2</..."
Mon Apr 22 09:32:53 2019 [Z0][ReM][D]: Req:9472 UID:0 one.zone.raftstatus invoked

Hi @barte1by, could you share which OS is running in your frontends?

Hi @cgonzalez
cat /etc/issue.net
Ubuntu 18.04.1 LTS

It seems that your issue is related with this bug https://github.com/OpenNebula/one/issues/3182.

It is already fixed and will be released with next OpenNebula version.

The case is similar.
Only in my cases, the API did not respond after two days of work.

Did you try different API calls, for example, could you try to run onevm list command?

Оnly calls “onezone show 0”
Now RAFT/API in working condition.
When the problem happens again. I will check the other commands.

I will upgrade to 5.8.1
Because there is such a problem too