Hi,
While setting up HA on our existing standalone 5.4.1 test environment (on Ubuntu 16.0.4) I made a mistake and end up having Zone 0 with a single follower (which used to be the leader) and Opennebula is not functional anymore.
If I try to remove that single follower, the command fails saying that the zone has no leader.
/var/log/one# onezone show 0
ZONE 0 INFORMATION
ID : 0
NAME : OpenNebula
ZONE SERVERS
ID NAME ENDPOINT
0 server-0 http:// xxxxx:2633/RPC2
HA & FEDERATION SYNC STATUS
ID NAME STATE TERM INDEX COMMIT VOTE FED_INDEX
0 server-0 follower 2 147 147 -1 -1
ZONE TEMPLATE
ENDPOINT=“http://localhost:2633/RPC2”
I tried to recover by reloading a DB export and oned.conf backups but that didn’t help (notice the error from “/usr/share/one/follower_cleanup”).
root@coenebula01:/# systemctl status opennebula
● opennebula.service - OpenNebula Cloud Controller Daemon
Loaded: loaded (/lib/systemd/system/opennebula.service; disabled; vendor preset: enabled)
Active: active (running) since Fri 2017-12-22 10:13:38 MST; 57min ago
Process: 3241 ExecStopPost=/usr/share/one/follower_cleanup (code=exited, status=2)
Process: 3238 ExecStopPost=/bin/rm -f /var/lock/one/one (code=exited, status=0/SUCCESS)
Process: 3222 ExecStop=/bin/kill -TERM $MAINPID (code=exited, status=0/SUCCESS)
Process: 3356 ExecStartPre=/usr/sbin/logrotate -s /tmp/logrotate.state -f /etc/logrotate.d/opennebula (code=exited, status=0/SUCCESS)
Process: 3351 ExecStartPre=/bin/chown oneadmin:oneadmin /var/log/one (code=exited, status=0/SUCCESS)
Process: 3348 ExecStartPre=/bin/mkdir -p /var/log/one (code=exited, status=0/SUCCESS)
Main PID: 3362 (oned)
Tasks: 103
Memory: 92.4M
CPU: 16.283s
CGroup: /system.slice/opennebula.service
├─3362 /usr/bin/oned -f
├─3374 ruby /usr/lib/one/mads/one_hm.rb
├─3410 ruby /usr/lib/one/mads/one_vmm_exec.rb -t 15 -r 0 kvm
├─3427 ruby /usr/lib/one/mads/one_vmm_exec.rb -l deploy,shutdown,reboot,cancel,save,restore,migrate,poll,pre,post,clean,snapshotcreate,snapshotrevert,snapshotdelete,attach_nic,de
├─3444 /usr/lib/one/mads/collectd -p 4124 -f 5 -t 50 -i 20
├─3497 ruby /usr/lib/one/mads/one_im_exec.rb -r 3 -t 15 kvm
├─3512 ruby /usr/lib/one/mads/one_tm.rb -t 15 -d dummy,lvm,shared,fs_lvm,qcow2,ssh,ceph,dev,vcenter,iscsi_libvirt
├─3532 ruby /usr/lib/one/mads/one_datastore.rb -t 15 -d dummy,fs,lvm,ceph,dev,iscsi_libvirt,vcenter -s shared,ssh,ceph,fs_lvm,qcow2,vcenter
├─3548 ruby /usr/lib/one/mads/one_market.rb -t 15 -m http,s3,one
├─3564 ruby /usr/lib/one/mads/one_ipam.rb -t 1 -i dummy
└─3577 ruby /usr/lib/one/mads/one_auth_mad.rb --authn ssh,x509,ldap,server_cipher,server_x509
Dec 22 10:13:38 coenebula01 systemd[1]: Starting OpenNebula Cloud Controller Daemon…
Dec 22 10:13:38 coenebula01 systemd[1]: Started OpenNebula Cloud Controller Daemon.
.
Before I go ahead and rebuild the whole environment, would somebody have an idea how could I recover from this state??
oned.log and sched.log are being updated with these lines:
root@coenebula01:/var/log/one# tail oned.log
Fri Dec 22 11:13:44 2017 [Z0][ReM][D]: Req:6368 UID:0 one.zone.raftstatus invoked
Fri Dec 22 11:13:44 2017 [Z0][ReM][D]: Req:6368 UID:0 one.zone.raftstatus result SUCCESS, "<SERVER_ID>-1<…"
Fri Dec 22 11:14:14 2017 [Z0][ReM][D]: Req:6080 UID:0 one.zone.raftstatus invoked
Fri Dec 22 11:14:14 2017 [Z0][ReM][D]: Req:6080 UID:0 one.zone.raftstatus result SUCCESS, "<SERVER_ID>-1<…"
Fri Dec 22 11:14:44 2017 [Z0][ReM][D]: Req:8000 UID:0 one.zone.raftstatus invoked
Fri Dec 22 11:14:44 2017 [Z0][ReM][D]: Req:8000 UID:0 one.zone.raftstatus result SUCCESS, "<SERVER_ID>-1<…"
Fri Dec 22 11:15:14 2017 [Z0][ReM][D]: Req:2000 UID:0 one.zone.raftstatus invoked
Fri Dec 22 11:15:14 2017 [Z0][ReM][D]: Req:2000 UID:0 one.zone.raftstatus result SUCCESS, "<SERVER_ID>-1<…"
Fri Dec 22 11:15:44 2017 [Z0][ReM][D]: Req:9728 UID:0 one.zone.raftstatus invoked
Fri Dec 22 11:15:44 2017 [Z0][ReM][D]: Req:9728 UID:0 one.zone.raftstatus result SUCCESS, “<SERVER_ID>-1<…”
root@coenebula01:/var/log/one# tail sched.log
Fri Dec 22 11:11:44 2017 [Z0][SCHED][E]: oned is not leader
Fri Dec 22 11:12:14 2017 [Z0][SCHED][E]: oned is not leader
Fri Dec 22 11:12:44 2017 [Z0][SCHED][E]: oned is not leader
Fri Dec 22 11:13:14 2017 [Z0][SCHED][E]: oned is not leader
Fri Dec 22 11:13:44 2017 [Z0][SCHED][E]: oned is not leader
Fri Dec 22 11:14:14 2017 [Z0][SCHED][E]: oned is not leader
Fri Dec 22 11:14:44 2017 [Z0][SCHED][E]: oned is not leader
Fri Dec 22 11:15:14 2017 [Z0][SCHED][E]: oned is not leader
Fri Dec 22 11:15:44 2017 [Z0][SCHED][E]: oned is not leader
Fri Dec 22 11:16:14 2017 [Z0][SCHED][E]: oned is not leader
root@coenebula01:/var/log/one#
Thanks a lot,
Alex