After update to 5.4 VM turn poweroff (random)

Hi.

At first. we have a pre-production arch with 1 master and 1 dom0 (only support kvm)

I updated from 4.14 to 5.40, all fine no problems.

MASTER
[root@pre one]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)

DOM0
[root@compute-11-25 ~]# cat /etc/redhat-release
Scientific Linux release 6.7 (Carbon)

I know there are some problems with dom0 version 6.7 but i dont know if there are any solution to solve that.

Well the problem.

After update, some machines that are running turn to POWEROFF. (but machines are working fine) and in the log i can found that:

Mon Aug 14 13:50:23 2017 [Z0][LCM][I]: VM running but monitor state is POWEROFF
Mon Aug 14 13:50:23 2017 [Z0][VM][I]: New LCM state is SHUTDOWN_POWEROFF
Mon Aug 14 13:50:23 2017 [Z0][VM][I]: New state is POWEROFF
Mon Aug 14 13:50:23 2017 [Z0][VM][I]: New LCM state is LCM_INIT

In pre production there are no problem, i think this problem is related to “not updated” SSOO at dom0… but if that happens in production we have a problem…

Whats the problem? and… is possible to recover state to running? because i cant update dom0 while have running machines… and cant update opennebula stopping al VM…

Thanks

I think the state to poweroff only change if VM has no IP… is that normal?

Have you issued onehost sync --force to update remote scripts after upgrade.

You can try executing the IM probes manually in the remote machine to check if the VM is detected or for any error message. As oneadmin in the hypervisor:

$ cd /var/tmp/one/im/kvm-probes.d
$ ./poll.sh

Do you get info about VMs running there? Any error message?

Hi @jfontan

i did that and there are no error.

The output when 2 machines (1421 have IP. 1429 didnt)
-bash-4.1$ ./poll.sh
VM_POLL=YES
VM=[
ID=1421,
DEPLOY_ID=one-1421,
POLL=“DISKRDBYTES=115217548 NETTX=1490 DISKWRBYTES=49864704 DISKRDIOPS=7334 MEMORY=1048576 DISKWRIOPS=4781 CPU=0 STATE=a NETRX=188144967” ]
VM=[
ID=1429,
DEPLOY_ID=one-1429,
POLL=“DISKRDBYTES=25423936 DISKWRBYTES=4096 DISKRDIOPS=1636 MEMORY=1048576 DISKWRIOPS=1 CPU=11 STATE=a” ]

About 5/10 minutes later

-bash-4.1$ ./poll.sh
VM_POLL=YES
VM=[
ID=1421,
DEPLOY_ID=one-1421,
POLL=“DISKRDBYTES=115217548 NETTX=1490 DISKWRBYTES=49864704 DISKRDIOPS=7334 MEMORY=1048576 DISKWRIOPS=4781 CPU=0 STATE=a NETRX=188334867” ]
VM=[
ID=1429,
DEPLOY_ID=one-1429,
POLL=“DISKRDBYTES=100729996 DISKWRBYTES=4653056 DISKRDIOPS=5209 MEMORY=1048576 DISKWRIOPS=300 CPU=0 STATE=a” ]

but at Sunstone i see:

the output using onehost is so similar:

Its possible that im_mad have some problem? because i use the UDP-push and not the ssh-pull (kvm argument Vs kvm-probes)
captura-area-2017-08-21-125209

Thanks in advance.

I’ve tested machines without network just in case and they are monitored correctly. Also the KVM IM configuration seems correct to me. Monitoring should be working or the host wouldn’t be in MONITORED state.

Can you take a look at oned.log maybe the problem is in some other probe and it’s not able to parse some of the data.