Deploy High Availability on version 5.12.x

Hi all,

I am deploying the HA for 3 nodes in my OpenNebula cluster and get the below error when I shutdown the node1:

  • /var/log/one/host_error.log

[2021-04-17 23:20:29 +0700][HOST 9][I] Hook launched
[2021-04-17 23:20:29 +0700][HOST 9][I] hostname: node1
[2021-04-17 23:20:29 +0700][HOST 9][I] Wait 5 cycles.
[2021-04-17 23:20:29 +0700][HOST 9][I] Sleeping 120 seconds.
[2021-04-17 23:22:29 +0700][HOST 9][I] Fencing enabled
[2021-04-17 23:22:29 +0700][HOST 9][E] Fence ip not found
[2021-04-17 23:22:29 +0700][HOST 9][E]
[2021-04-17 23:22:29 +0700][HOST 9][E] Fencing error
[2021-04-17 23:22:29 +0700][HOST 9][E] Exiting due to previous error.

  • /usr/share/one/examples/host_hooks/error_hook

ARGUMENTS = “$TEMPLATE -m -p 5 -u”
COMMAND = “/var/lib/one/remotes/hooks/ft/host_error.rb”
NAME = “host_error”
STATE = “ERROR”
REMOTE = “no”
RESOURCE = HOST
TYPE = state

  • /var/lib/one/remotes/hooks/ft

I’ve comment the line “echo ““Fence host not configured, please edit ft/fence_host.sh”” && exit 1” and added the fence action to the last line:

fence_ilo -a $FENCE_IP -l oneadmin -p -o

Can anyone tell me did I configure something wrong? I’ve searched on both OpenNebula site and the forum but there is no topic about how to configure the FENCE_IP .
Any help will be very appriciated.

Regards,

Tuong Nguyen

Hi, I use this script with FENCE_IP on each host in template.

#!/bin/bash

# -------------------------------------------------------------------------- #
# Copyright 2002-2019, OpenNebula Project, OpenNebula Systems                #
#                                                                            #
# Licensed under the Apache License, Version 2.0 (the "License"); you may    #
# not use this file except in compliance with the License. You may obtain    #
# a copy of the License at                                                   #
#                                                                            #
# http://www.apache.org/licenses/LICENSE-2.0                                 #
#                                                                            #
# Unless required by applicable law or agreed to in writing, software        #
# distributed under the License is distributed on an "AS IS" BASIS,          #
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.   #
# See the License for the specific language governing permissions and        #
# limitations under the License.                                             #
#--------------------------------------------------------------------------- #

##############################################################################
# WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING!
#
# This script needs to be modified to enable fencing of the host. By default it
# will fail, as the first line is 'exit 1'. You will need to remove it.
#
# In order to perform the fencing, you will probably need to install a fencing
# utility. They are typically found in: fence-agents-all (CentOS) and fence-
# agents (Ubuntu). They come with many utilities: fence_ilo, fence_ipmilan,
# fence_apc, etc...
#
# To call the fencing utility, you will need to pass some parameters, which are
# typically the iLO IP of the host, etc. We recommend you enter this information
# in the host's template, and pick it up using the xpath example below. AS AN
# EXAMPLE (only an example) the script below expects that you have defined a
# parameter called FENCE_IP in the Host's template, and it will rely on that to
# call the fencing mechanism. You should customize this to your needs. It is
# perfectly OK to discard the code below and use a different mechanism, like
# storing the information required to perform the fencing in a separate CMDB,
# etc. However, you will probably need to get the host's NAME, which should be
# done as shown below.
#
# WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING! WARNING!
#############################################################################

# @param $1 the host information in base64
# @return 0 on success. Make sure this script does not return 0 if it fails.

#-------------------------------------------------------------------------------
# Get host parameters with XPATH
#-------------------------------------------------------------------------------

if [ -z "$ONE_LOCATION" ]; then
    XPATH=/var/lib/one/remotes/datastore/xpath.rb
else
    XPATH=$ONE_LOCATION/var/remotes/datastore/xpath.rb
fi

if [ ! -x "$XPATH" ]; then
    echo "XPATH not found: $XPATH"
    exit 1
fi

XPATH="${XPATH} -b $1"

unset i j XPATH_ELEMENTS

while IFS= read -r -d '' element; do
    XPATH_ELEMENTS[i++]="$element"
done < <($XPATH     /HOST/ID \
                    /HOST/NAME \
                    /HOST/TEMPLATE/FENCE_IP )

HOST_ID="${XPATH_ELEMENTS[j++]}"
NAME="${XPATH_ELEMENTS[j++]}"
FENCE_IP="${XPATH_ELEMENTS[j++]}"

echo "Host $HOST_ID $NAME: State Error!" | mail -s "$HOST_ID $NAME Error!" admin@feldhost.cz

#-------------------------------------------------------------------------------
# Fence
#-------------------------------------------------------------------------------

if [ -z "$FENCE_IP" ]; then
    echo "Fence ip not found"
    exit 1
fi

# Example:
# fence_ilo -a $FENCE_IP -l <username> -p <password>

Hi @feldsam ,

Thank you for your response.

Do you use this script on just Front-end? Because I can not find it in any KVM node.

And could you show me the steps you did to configure the HA after creating the host hook in /usr/share/one/examples/host_hooks/error_hook?

hello, yes, only on frontend. There is something more about HA setup problems

1 Like

Hi @feldsam ,

  • I can setup the FENCE_IP and the hook executed successfully.
  • /var/log/one/host_error.log

[2021-04-23 14:42:25 +0700][HOST 9][I] Hook launched
[2021-04-23 14:42:25 +0700][HOST 9][I] hostname: node134
[2021-04-23 14:42:25 +0700][HOST 9][I] WARNING: Fencing disabled
[2021-04-23 14:42:25 +0700][HOST 9][I] states: 3, 5, 8
[2021-04-23 14:42:25 +0700][HOST 9][I] vms: [“30”, “28”]
[2021-04-23 14:42:25 +0700][HOST 9][I] resched 30
[2021-04-23 14:42:25 +0700][HOST 9][I] resched 28
[2021-04-23 14:42:25 +0700][HOST 9][I] Hook finished

  • But it failed to migrate VM to the remaining hosts although I can migrate it manually without having any issue.
  • /var/log/one/sched.log

Fri Apr 23 15:25:49 2021 [Z0][VM][D]: Found 2 pending/rescheduling VMs.
Fri Apr 23 15:25:49 2021 [Z0][HOST][D]: Discovered 2 enabled hosts.
Fri Apr 23 15:25:49 2021 [Z0][VM][D]: VMs in VMGroups:
Fri Apr 23 15:25:49 2021 [Z0][VNET][D]: Discovered 0 vnets.
Fri Apr 23 15:25:49 2021 [Z0][SCHED][D]: Match-making results for VM 28:
Cannot schedule VM, there is no suitable host.
Fri Apr 23 15:25:49 2021 [Z0][SCHED][D]: Match-making results for VM 30:
Cannot schedule VM, there is no suitable host.
Fri Apr 23 15:25:50 2021 [Z0][SCHED][D]: Dispatching VMs to hosts:
VMID Priority Host System DS
--------------------------------------------------------------

Do you have any idea about this?

You can test and debug this by “VM Reschedule” function in opennebula. You need to have host in same cluster with sufficient resources to handle VM.

Hi @feldsam ,

Thanks for your help, I can use HA with a VM deployed using a whole new template.

But I can not use HA with the existed VMs. Is there anyway to change the placement host here?

  • New VM by new template:

opennebula_vm41

  • Existed VM:

opennebula_vm4

Hi, you can use onevm update <vmid> command and edit XML manually

https://docs.opennebula.io/doc/5.12/cli/onevm.1.html

Hi,

Really thanks for your help, I can deploy the HA for my server.