VM stuck in RUNNING/POWEROFF cycle

gerry · September 30, 2016, 4:57pm

Hi,

Some VMs get into a continual RUNNING/POWEROFF cycle in ON 4.12 (see excerpt form logs below). Previously we though it was only Windows machines but this not the case, it happens to any type.

This is very bad for Sunstone users as they cannot access the VNC in the POWEROFF state even though the VM is running and can be contacted via ssh.

There seems to be multiple instances of run_probes running on the host. Is this correct?

Any help would be much appreciated.

  Regards.
    Gerry

…

Fri Sep 30 17:32:39 2016 [Z0][LCM][I]: New VM state is RUNNING
Fri Sep 30 17:33:11 2016 [Z0][DiM][I]: New VM state is POWEROFF
Fri Sep 30 17:37:33 2016 [Z0][VMM][I]: VM found again, state is RUNNING
Fri Sep 30 17:37:33 2016 [Z0][LCM][I]: New VM state is RUNNING
Fri Sep 30 17:38:53 2016 [Z0][DiM][I]: New VM state is POWEROFF

…

feldsam · October 2, 2016, 10:21am

Hi, it look like a problem with monitoring. Which monitoring type use? Udp or ssh? Why you use relatively old 4.12 version? In new versions was monitoring enhanced.

gerry · October 3, 2016, 11:58am

Hi Kristian,

As far as I’m aware we use ssh monitoring. Where can I look, apart from the individual machine log quoted above, to see what issues are being recorded? Are there timeout settings that can be modified somewhere?

We are using 4.12 as we are still running on Debian Wheezy. THis never happened on earlier versions

 Regards,
   Gerry

ruben · October 3, 2016, 4:57pm

I’m afraid Kristian is rigth,there were a couple of issues with the state
transtion, (some race conditions between the driver callbacks and the
monitoring). I’d strongly suggest to upgrade to 5.0, if that involves too
much work, at least 4.14 address most of those issues.

gerry · October 4, 2016, 6:28am

Hello Ruben,

The reason we are stuck at 4.12 at the moment is that I believe that this is the higest version that will run on Debian Wheezy. Will 4.14 run on Wheezy?

We plan to migrate to Debian Jessie / ON 5. In the meantime, is there any workaround to this issue, e.e lengthening timeoute, etc?

 Regards,
   Gerry

feldsam · October 12, 2016, 4:23pm

Hello, I personally think, that there is no problem tu run latest version on wheezy too.

Try to update repo config and update.

echo "deb http://downloads.opennebula.org/repo/5.0/Debian/8 stable opennebula" > /etc/apt/sources.list.d/opennebula.list

but better will be to update to jessie. it is relatively simple and safe

I do it on several servers without problems

gerry · October 13, 2016, 9:39am

Hello Ruben,

Below is an example of a sub process from “ruby /usr/lib/one/mads/one_im_exec.rb -r 3 -t 15 kvm”. Am I correct in thinking that we are running in “UDP-push” mode? We have 120+ nodes so I think we should be running in this mode.

Are there any parameters we can tweek to avoid the race condition you mentioned until we get the opportunity to upgrade to 4.14 or 5. I know we can’t mix hosts running Debian Wheezy and Jessy as the kvm libvirt is different, but is it possible to run 4.14 on Wheezy?

This issue is causing real problems for users and we would like to put in a temp fix.

    Regards,
      Gerry

oneadmin 11897 12078 0 10:29 pts/1 00:00:00 sh -c ssh -n host128.X.Y.Z ‘if [ -x “/var/tmp/one/im/run_probes” ]; then /var/tmp/one/im/run_probes kvm /var/lib/one//datastores 4124 20 140 host128.X.Y.Z; else exit 42; fi’ ; echo ExitCode: $? 1>&2

ruben · October 13, 2016, 9:43am

Try to increase the interval

Here increase -i to some minutes e.g. 180

IM_MAD = [

      NAME       = "collectd",

      EXECUTABLE = "collectd",

      ARGUMENTS  = "-p 4124 -f 5 -t 50 -i 20" ]

And also:

MONITORING_INTERVAL = 240

This may helps sometimes…

Cheers

Topic		Replies	Views
Windows 10 in RUNNING/POWEROFF cycle Community Support	2	603	September 21, 2016
Random VMs incorrectly in POWEROFF state after upgrade to 5.12 General	1	496	September 3, 2021
VM running but monitor state is POWEROFF Community Support	6	3166	April 25, 2025
VM goes from RUNNING state to POWEROFF state even though it is running on cluster/hypervisor Community Support	4	2549	February 19, 2022
After update to 5.4 VM turn poweroff (random) Community Support	4	957	August 22, 2017

VM stuck in RUNNING/POWEROFF cycle

Related topics