HA. VM stuck in BOOT_POWEROFF state

In case one of hosts goes down HA for VMs works incorrect in several specific cases.

For example:

  • when we stop one of hosts ONE waiting for several monitoring cycles befor use stonith script;
  • host goes to shutdown and already inaccessible from ONE but VM’s still in RUNNING;
  • in this step we’ve send POWEROFF to vm;
  • until host is down VMs in SHUTDOWN state;
  • when host starting VMs going to POWEROFF state;
  • then we try to start follow VMs we take:

Mon Feb 20 12:14:47 2017 [Z0][VM][I]: New LCM state is SHUTDOWN_POWEROFF
Mon Feb 20 12:21:00 2017 [Z0][LCM][I]: VM reported SHUTDOWN by the drivers
Mon Feb 20 12:21:00 2017 [Z0][VM][I]: New state is POWEROFF
Mon Feb 20 12:21:00 2017 [Z0][VM][I]: New LCM state is LCM_INIT
Mon Feb 20 12:22:22 2017 [Z0][VM][I]: New state is ACTIVE
Mon Feb 20 12:22:22 2017 [Z0][VM][I]: New LCM state is BOOT_POWEROFF
Mon Feb 20 12:22:22 2017 [Z0][VMM][I]: Generating deployment file: /var/lib/one/vms/68/deployment.2

Hello, you can adjust in /etc/one/oned.conf monitoring driver settings

#-------------------------------------------------------------------------------
#  KVM UDP-push Information Driver Manager Configuration
#    -r number of retries when monitoring a host
#    -t number of threads, i.e. number of hosts monitored at the same time
#    -w Timeout in seconds to execute external commands (default unlimited)
#-------------------------------------------------------------------------------
IM_MAD = [
      NAME          = "kvm",
      SUNSTONE_NAME = "KVM",
      EXECUTABLE    = "one_im_ssh",
      ARGUMENTS     = "-r 3 -t 15 kvm" ]

“stonith” scrpt should auto migrate VMs to another host

https://docs.opennebula.org/5.2/advanced_components/ha/ftguide.html#host-failures

hello,

i already enable the KVM UDP Push, but still getting state POWEROFF on the VM if i turned off one of the host.

Any suggestion or solution ?

Hello,

  • when we stop one of hosts ONE waiting for several monitoring cycles befor use stonith script;
  • host goes to shutdown and already inaccessible from ONE but VM’s still in RUNNING;

Why you do this?

  • in this step we’ve send POWEROFF to vm;
  • until host is down VMs in SHUTDOWN state;
  • when host starting VMs going to POWEROFF state;

Are you implemented proper fencing mechanism to host error hook?

hi Feldsam,

I was trying to setup a High availability for host failures.

https://docs.opennebula.org/5.2/advanced_components/ha/ftguide.html#host-failures

and when i test to enable fencing or disable from oned.conf file to get feature of HA of host failure, this vm wont migrate to other host and still getting up state of POWEROFF.

Hi,

Fyi this is the schema our cloud design:

host1, front end HA
host2, node kvm
host3, node kvm

and for the shared storage i using glusterfs on each host node.