Hi All!
I have tested claster HA-RAFT which consisted from 3 members (opennebula 5.6)
Unfortunaly discovered that one of them members follower or leader it’s doesn’t matter, changed own status to error.
oneadmin@csor3:~$ onezone show 0
ZONE 0 INFORMATION
ID : 0
NAME : OpenNebula
ZONE SERVERS
ID NAME ENDPOINT
0 server-1 http://10.93.221.94:2633/RPC2
1 server-2 http://10.93.221.126:2633/RPC2
2 server-3 http://10.93.221.61:2633/RPC2
HA & FEDERATION SYNC STATUS
ID NAME STATE TERM INDEX COMMIT VOTE FED_INDEX
0 server-1 error - - - - -
1 server-2 follower 20505 97955812 97955812 2 -1
2 server-3 leader 20505 97955812 97955812 2 -1
ZONE TEMPLATE
ENDPOINT="-"
The last strokes in the log oned.log:
Thu Aug 2 11:55:05 2018 [Z0][InM][D]: Host 10.93.221.54 (143) successfully monitored.
Thu Aug 2 11:55:05 2018 [Z0][InM][D]: Host 10.93.221.49 (144) successfully monitored.
Thu Aug 2 11:55:06 2018 [Z0][ACL][I]: ACL Manager stopped.
Thu Aug 2 11:55:06 2018 [Z0][VMM][I]: Stopping Virtual Machine Manager...
Thu Aug 2 11:55:06 2018 [Z0][LCM][I]: Stopping Life-cycle Manager...
Thu Aug 2 11:55:06 2018 [Z0][LCM][I]: Life-cycle Manager stopped.
Thu Aug 2 11:55:06 2018 [Z0][TM][I]: Stopping Transfer Manager...
Thu Aug 2 11:55:06 2018 [Z0][DiM][I]: Stopping Dispatch Manager...
Thu Aug 2 11:55:06 2018 [Z0][DiM][I]: Dispatch Manager stopped.
Thu Aug 2 11:55:06 2018 [Z0][InM][I]: Stopping Information Manager...
Thu Aug 2 11:55:06 2018 [Z0][ReM][I]: Stopping Request Manager...
Thu Aug 2 11:55:06 2018 [Z0][AuM][I]: Stopping Authorization Manager...
Thu Aug 2 11:55:06 2018 [Z0][HKM][I]: Stopping Hook Manager...
Thu Aug 2 11:55:06 2018 [Z0][ImM][I]: Stopping Image Manager...
Thu Aug 2 11:55:06 2018 [Z0][MKP][I]: Stopping Marketplace Manager...
Thu Aug 2 11:55:06 2018 [Z0][IPM][I]: Stopping IPAM Manager...
Thu Aug 2 11:55:06 2018 [Z0][RCM][I]: Raft Consensus Manager...
Thu Aug 2 11:55:06 2018 [Z0][FRM][I]: Federation Replica Manager...
Thu Aug 2 11:55:06 2018 [Z0][FRM][I]: Federation Replica Manger stopped.
Thu Aug 2 11:55:06 2018 [Z0][ReM][I]: XML-RPC server stopped.
Thu Aug 2 11:55:06 2018 [Z0][ReM][I]: Request Manager stopped.
Thu Aug 2 11:55:07 2018 [Z0][AuM][I]: Authorization Manager stopped.
Thu Aug 2 11:55:07 2018 [Z0][MKP][I]: Marketplace Manager stopped.
Thu Aug 2 11:55:07 2018 [Z0][HKM][I]: Hook Manager stopped.
Thu Aug 2 11:55:07 2018 [Z0][ImM][I]: Image Manager stopped.
Thu Aug 2 11:55:07 2018 [Z0][RCM][I]: oned is set to follower mode
Thu Aug 2 11:55:07 2018 [Z0][RCM][I]: Raft Consensus Manager stopped.
Thu Aug 2 11:55:07 2018 [Z0][InM][I]: Information Manager stopped.
Thu Aug 2 11:55:07 2018 [Z0][IPM][I]: IPAM Manager stopped.
Thu Aug 2 11:55:07 2018 [Z0][VMM][I]: Virtual Machine Manager stopped.
Thu Aug 2 11:55:07 2018 [Z0][RCM][I]: Replication thread stopped
Thu Aug 2 11:55:07 2018 [Z0][RCM][I]: Replication thread stopped
Thu Aug 2 11:55:07 2018 [Z0][RCM][I]: Replication thread stopped
Thu Aug 2 11:55:07 2018 [Z0][RCM][I]: Replication thread stopped
Thu Aug 2 11:55:07 2018 [Z0][TrM][I]: Transfer Manager stopped.
Thu Aug 2 11:55:07 2018 [Z0][ONE][I]: All modules finalized, exiting.
RAFT parameters:
RAFT = [
LIMIT_PURGE = 100000,
LOG_RETENTION = 100000, #500000
LOG_PURGE_TIMEOUT = 60, #600
ELECTION_TIMEOUT_MS = 5000, #2500
BROADCAST_TIMEOUT_MS = 500,
XMLRPC_TIMEOUT_MS = 1500 #450
]
May be it happen from wrong parameters of RAFT?