Hi Thomas and Guillaume,
I tried looking through the raftmanager code but didn’t see how I could add another RAFT_LEADER_HOOK and RAFT_FOLLOWER_HOOK vector entry without modifying more than I wanted to, so I took a similar route as Guillaume and ended up just modifying the default vip.sh
script to do more actions when passed the leader and follower actions. It’s kind of annoying because I’ll have to make sure to merge and track my changes on the vip.sh
script through upgrades, but not an impossible task.
I did end up successfully getting the exiting leader node to transfer over the VM provisioning logs to the new leader node by updating the leader action to modify SSH on the leader node to listen on the VIP address, and then modifying the follower action to disable SSH for the VIP address and copy over just the VM provisioning logs to the new leader node that claimed the VIP.
Here’s the updated /var/lib/one/remotes/hooks/raft/vip/.sh
:
ACTION="$1"
INTERFACE="$2"
IFADDR="$3"
IP="${IFADDR%%/*}"
REGIP="${IP//./\\.}"
if [ -z "$INTERFACE" ]; then
echo "Missing interface." >&2
exit 1
fi
if [ -z "$IFADDR" ]; then
echo "Missing IP." >&2
exit 1
fi
###
if which systemctl &>/dev/null && [ -d /etc/systemd ]; then
IS_SYSTEMD=yes
else
IS_SYSTEMD=no
fi
case $ACTION in
leader)
sudo ip address add $IFADDR dev $INTERFACE
for i in $(seq 5); do
sudo arping -c 1 -U -I $INTERFACE ${IP}
sleep 1
sudo arping -c 1 -A -I $INTERFACE ${IP}
sleep 1
done
# check if listenaddress VIP is commented out
if sudo -n egrep -q "^#+ *ListenAddress +${IP}" /etc/ssh/sshd_config
then
# if it is then uncomment it
sudo -n sed -i -E "/^#+ *ListenAddress +${REGIP}/s/^#+ *//" /etc/ssh/sshd_config
else
# if its not present then append it to the file
echo "ListenAddress $IP" | sudo -n tee -a /etc/ssh/sshd_config
fi
if [ "${IS_SYSTEMD}" = 'yes' ]
then
sudo -n systemctl restart sshd >/dev/null 2>&1
else
sudo -n service sshd restart >/dev/null 2>&1
fi
if [ "${IS_SYSTEMD}" = 'yes' ]; then
if systemctl is-enabled opennebula-flow >/dev/null 2>&1; then
sudo -n systemctl start opennebula-flow
fi
if systemctl is-enabled opennebula-gate >/dev/null 2>&1; then
sudo -n systemctl start opennebula-gate
fi
else
if [ -e /usr/lib/one/oneflow/oneflow-server.rb ]; then
sudo -n service opennebula-flow start
fi
if [ -e /usr/lib/one/onegate/onegate-server.rb ]; then
sudo -n service opennebula-gate start
fi
fi
;;
follower)
if sudo ip address show dev $INTERFACE | grep -qi " ${IP}/"; then
sudo ip address del $IFADDR dev $INTERFACE
fi
#if VIP listenaddress is in sshd_config comment it out
sudo -n sed -i -E "s/^ *ListenAddress +${REGIP}/#ListenAddress ${REGIP}/" /etc/ssh/sshd_config
if [ "${IS_SYSTEMD}" = 'yes' ]
then
sudo -n systemctl restart sshd >>/dev/null 2>&1
else
sudo -n service sshd restart >/dev/null 2>&1
fi
if [ "${IS_SYSTEMD}" = 'yes' ]; then
if systemctl is-enabled opennebula-flow >/dev/null 2>&1 ||
systemctl is-active opennebula-flow >/dev/null 2>&1;
then
sudo -n systemctl stop opennebula-flow
fi
if systemctl is-enabled opennebula-gate >/dev/null 2>&1 ||
systemctl is-active opennebula-gate >/dev/null 2>&1;
then
sudo -n systemctl stop opennebula-gate
fi
else
if [ -e /usr/lib/one/oneflow/oneflow-server.rb ]; then
sudo -n service opennebula-flow stop
fi
if [ -e /usr/lib/one/onegate/onegate-server.rb ]; then
sudo -n service opennebula-gate stop
fi
fi
# copy over only VM provisioning logfiles (eg 125.log)
sleep 10
find /var/log/one -type f -regextype posix-egrep -regex ".*/[[:digit:]]+\.log" |
cut -d / -f 5 |
rsync -az --include-from=- --exclude='*' /var/log/one/ oneadmin@"${IP}":/var/log/one >>/dev/null 2>&1
;;
*)
echo "Unknown action '$ACTION'" >&2
exit 1
;;
esac
exit 0
And here’s just the updated content of /etc/sudoers.d/opennebula
:
Cmnd_Alias ONE_HA = /usr/bin/systemctl start opennebula-flow, /usr/bin/systemctl stop opennebula-flow, /usr/bin/systemctl start opennebula-gate, /usr/bin/systemctl stop opennebula-gate, /usr/sbin/service opennebula-flow start, /usr/sbin/service opennebula-flow stop, /usr/sbin/service opennebula-gate start, /usr/sbin/service opennebula-gate stop, /usr/bin/sed, /usr/bin/egrep, /usr/bin/tee, /usr/bin/systemctl restart sshd
oneadmin ALL=(ALL) NOPASSWD: ONE_MISC, ONE_NET, ONE_LVM, ONE_ISCSI, ONE_OVS, ONE_XEN, ONE_CEPH, ONE_MARKET, ONE_HA
Look here if you’re interested in locking down the additional commands in sudoers to only allow running specific explicit commands with specific parameters.
If you need to troubleshoot the log copying regex just temporarily update the rsync command in vip.sh
on all controller nodes with a -vv
flag and change >/dev/null
to a writable temp file like >>/tmp/rafttest
and forcefully transition leader nodes.
Thanks,
Dave