Hi! Test VM HA on 5.0.2 - all works fine! After update to 5.2 - hook not work. VMs still in UNKNOWN state. Checked configs on nodes - all ok.
Hi Anton,
Please provide some more info. What is your HOST_ERROR hook configuration in /etc/oned.conf?
It is extremely dangerous(highly possible data corruption on the VM disks) to do failover when there is only false positive /or monitoring error/ and the VM is still working on the “old” host. So in 5.2 there is a fencing script expected to be configured and working by default. To have the 5.0.x behavior you must add ‘-u’ to disable the failover script.
In any way there should be a clues in the logs.
Kind Regards,
Anton Todorov
Hi! In /etc/one/oned.conf I have:
HOST_HOOK = [
NAME = “error”,
ON = “ERROR”,
COMMAND = “ft/host_error.rb”,
ARGUMENTS = “$ID -m -p 1”,
REMOTE = “no” ]
Have error in /var/log/one/host_error.log
[2016-10-25 23:00:39 +0300][HOST 5][I] Hook launched
[2016-10-25 23:00:39 +0300][HOST 5][I] hostname: node03
[2016-10-25 23:00:39 +0300][HOST 5][I] Wait 1 cycles.
[2016-10-25 23:00:39 +0300][HOST 5][I] Sleeping 60 seconds.
[2016-10-25 23:01:39 +0300][HOST 5][I] Fencing enabled
[2016-10-25 23:01:39 +0300][HOST 5][E]
[2016-10-25 23:01:39 +0300][HOST 5][E] Fencing error
[2016-10-25 23:01:39 +0300][HOST 5][E] Exiting due to previous error.
Don’t have other errors.
Thanks!
Hi,
The script is working as it is designed, You can provide ‘-u’ to the arguments to have the “old” behavior, but I strongly recommend to implement a fencing.
should be changed to
The docs are lagging tough. For more details please check the comments inside /var/lib/one/remotes/hooks/ft/host_error.rb and fence_host.sh files.
Kind Regards,
Anton Todorov
I use fencing for cluster, but in Opennebula dancing not working, I try many times. Every time - fencing error!
Can you show the fencing_script that you are using?
It is ok to black-out any sensitive info as credentials/passwords…
Kind Regards,
Anton Todorov
# @param $1 the host information in base64
# @return 0 on success. Make sure this script does not return 0 if it fails.
# To enable remove this line
exit 1
#-------------------------------------------------------------------------------
# Get host parameters with XPATH
#-------------------------------------------------------------------------------
if [ -z "$ONE_LOCATION" ]; then
XPATH=/var/lib/one/remotes/datastore/xpath.rb
else
XPATH=$ONE_LOCATION/var/remotes/datastore/xpath.rb
fi
if [ ! -x "$XPATH" ]; then
echo "XPATH not found: $XPATH"
exit 1
fi
XPATH="${XPATH} -b $1"
unset i j XPATH_ELEMENTS
while IFS= read -r -d '' element; do
XPATH_ELEMENTS[i++]="$element"
done < <($XPATH /HOST/ID \
/HOST/NAME \
/HOST/TEMPLATE/FENCE_IP )
HOST_ID="${XPATH_ELEMENTS[j++]}"
NAME="${XPATH_ELEMENTS[j++]}"
FENCE_IP="${XPATH_ELEMENTS[j++]}"
if [ -z "$FENCE_IP" ]; then
echo "Fence ip not found"
exit 1
fi
#-------------------------------------------------------------------------------
# Fence
#-------------------------------------------------------------------------------
# Example:
# fence_ilo -a $FENCE_IP -l <username> -p <password>
/bin/ipmitool -I lanplus -H $FENCE_IP -U root -P mypassword chassis power off
Hi,
You should comment out (or remove) this line
Kind Regards,
Anton Todorov
No changes!
Hm.
If it is failing it is because of different reason…
Next step is to try adding some debugging lines to figure out what is going on. I prefer logging to syslog so before each key line like before ‘if’ clauses you can add something like:
logger -t fence_host “something meaningful”
Then you will know is the script called and where there are no issues. Then you can tale measures depending on what the output is.
Kind Regards,
Anton Todorov
Thank you for help but it’s not work. I will use old options.
Even after adding -u to HOST_HOOK arguments fencing error still comes. Why. How to make it work?