After update to 5.4 VM turn poweroff (random)

alfeijoo · August 14, 2017, 12:07pm

Hi.

At first. we have a pre-production arch with 1 master and 1 dom0 (only support kvm)

I updated from 4.14 to 5.40, all fine no problems.

MASTER
[root@pre one]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)

DOM0
[root@compute-11-25 ~]# cat /etc/redhat-release
Scientific Linux release 6.7 (Carbon)

I know there are some problems with dom0 version 6.7 but i dont know if there are any solution to solve that.

Well the problem.

After update, some machines that are running turn to POWEROFF. (but machines are working fine) and in the log i can found that:

Mon Aug 14 13:50:23 2017 [Z0][LCM][I]: VM running but monitor state is POWEROFF
Mon Aug 14 13:50:23 2017 [Z0][VM][I]: New LCM state is SHUTDOWN_POWEROFF
Mon Aug 14 13:50:23 2017 [Z0][VM][I]: New state is POWEROFF
Mon Aug 14 13:50:23 2017 [Z0][VM][I]: New LCM state is LCM_INIT

In pre production there are no problem, i think this problem is related to “not updated” SSOO at dom0… but if that happens in production we have a problem…

Whats the problem? and… is possible to recover state to running? because i cant update dom0 while have running machines… and cant update opennebula stopping al VM…

Thanks

alfeijoo · August 18, 2017, 12:32pm

I think the state to poweroff only change if VM has no IP… is that normal?

jfontan · August 21, 2017, 9:41am

Have you issued onehost sync --force to update remote scripts after upgrade.

You can try executing the IM probes manually in the remote machine to check if the VM is detected or for any error message. As oneadmin in the hypervisor:

$ cd /var/tmp/one/im/kvm-probes.d
$ ./poll.sh

Do you get info about VMs running there? Any error message?

alfeijoo · August 21, 2017, 10:54am

Hi @jfontan

i did that and there are no error.

The output when 2 machines (1421 have IP. 1429 didnt)
-bash-4.1$ ./poll.sh
VM_POLL=YES
VM=[
ID=1421,
DEPLOY_ID=one-1421,
POLL=“DISKRDBYTES=115217548 NETTX=1490 DISKWRBYTES=49864704 DISKRDIOPS=7334 MEMORY=1048576 DISKWRIOPS=4781 CPU=0 STATE=a NETRX=188144967” ]
VM=[
ID=1429,
DEPLOY_ID=one-1429,
POLL=“DISKRDBYTES=25423936 DISKWRBYTES=4096 DISKRDIOPS=1636 MEMORY=1048576 DISKWRIOPS=1 CPU=11 STATE=a” ]

About 5/10 minutes later

-bash-4.1$ ./poll.sh
VM_POLL=YES
VM=[
ID=1421,
DEPLOY_ID=one-1421,
POLL=“DISKRDBYTES=115217548 NETTX=1490 DISKWRBYTES=49864704 DISKRDIOPS=7334 MEMORY=1048576 DISKWRIOPS=4781 CPU=0 STATE=a NETRX=188334867” ]
VM=[
ID=1429,
DEPLOY_ID=one-1429,
POLL=“DISKRDBYTES=100729996 DISKWRBYTES=4653056 DISKRDIOPS=5209 MEMORY=1048576 DISKWRIOPS=300 CPU=0 STATE=a” ]

but at Sunstone i see:

the output using onehost is so similar:

Its possible that im_mad have some problem? because i use the UDP-push and not the ssh-pull (kvm argument Vs kvm-probes)
captura-area-2017-08-21-125209

Thanks in advance.

jfontan · August 22, 2017, 4:46pm

I’ve tested machines without network just in case and they are monitored correctly. Also the KVM IM configuration seems correct to me. Monitoring should be working or the host wouldn’t be in MONITORED state.

Can you take a look at oned.log maybe the problem is in some other probe and it’s not able to parse some of the data.

Topic		Replies	Views
VM goes from RUNNING state to POWEROFF state even though it is running on cluster/hypervisor Product Support	4	2553	February 19, 2022
Random VMs incorrectly in POWEROFF state after upgrade to 5.12 Installation & Configuration	1	496	September 3, 2021
VM running but monitor state is POWEROFF Product Support	6	3197	April 25, 2025
VM stuck in RUNNING/POWEROFF cycle Product Support	7	1035	October 13, 2016
What are the causes let OpenNebula "thinks" VM is "POWEROFF" and how to recover? Product Support	5	738	February 18, 2015

After update to 5.4 VM turn poweroff (random)

Related topics