Im running opennebula 5.2.0 on Debian 8.6 with KVM . unfortunately my vms going to crash one at a time with no vnc no network and nothing but opennebula said everything ok, VM monitored well and state is running .
I have to poweroff hard the vm and resume again .
oned log is fine and here i want to know anyone had same experience or not ? any hint to look for problem ?
real memory : 113GB Allocated: 218GB Total: 284GB
real CPU : 500 Allocated : 6400 Total : 6000
Number of VMs : 50
Hello @UAnton
I checked it before in /var/log/libvirt/qemu/one-NUM.log show only variables , startingup and shutting down
/var/log/one/NUM.log shows usual logs .
which log do you mean ?
thanks
@UAnton I checked syslog and messages
Seems server went in blackout
Jan 15 20:49:44 mail postfix/qmgr[21830]: C1AD4A025A: removed
Jan 16 11:59:36 mail rsyslogd: [origin software=ārsyslogdā swVersion=ā8.4.2ā x-pid=ā393ā x-info=āhttp://www.rsyslog.comā] start
there is no log between 20:49:44 till 11:59:36 I thinks VM crashed in 20:49 and get back after I started again in 11:59
Hi Arash!
letās see if we can find more information about what happened.
Could you run a onehost show X | grep ERROR? where X should be replaced with the ID of the KVM node where the VMs that you had to reboot where running. Letās see if OpenNebula did notice any error in the KVM node.
If all VM showed the RUNNING state and not UNKNOWN that would mean that the KVM process would be informing OpenNebula that the VMs where indeed running though you could not get access to them. The fact that you could poweroff hard the VMs that means that the KVM process was being able to answer to requests.
Can you filter for libvirtd messages in your /var/log/syslog for your KVM node that may explain what happened to VMs running in that node? Any out of memory error, stack trace information or IO errors in your nodeās logs?
Is this the first time you find this issue or itās being happening periodically?
Hello @mcabrerizo
Thank you for your answer
the command onehost show X | grep ERROR doesnt show any thing , its empty i ran the command with no grep and there is no errors
There is no Error log since 20:49 til 11:59 about that particular machine in /var/log/syslog and /var/log/messages
only this error repeated so many times the device for one-NUM entered promiscuous mode sth like that
all VMs is in running state now but i have to check when problem comes back and there is nothing in /var/log/syslog about libvirt or Error or anything else
it happends many times one machine at a time , for example last week we have this issue with another vm in this host
Hi Arash!
Iām not a Debian guy so I hope Iām not suggesting you odd things for a KVM troubleshooting, Iām installing a Debian VM so I can look what more files could you check.
As you havenāt found any error in the mentioned files, I would also check if any weird stuff is in the /var/log/dmesg file of your KVM node. The point is that if you canāt find any error or hint related with KVM or Kernel it will be quite difficult to understand if those VMs are failing because of a storage problem, a QEMU option, IO blocks, memoryā¦ As the KVM process reports those VMs to be running to OpenNebula Iād focus on KVM troubleshooting. If youāre using shared storage Iād also try to check if you have any performance issueā¦ sorry being so vague but if no log message is found, I canāt imagine what could be the issue.
Maybe thereās a bug in Debian (Kernel or KVM stuff) so if you havenāt already done Iād try to check if there are updates for your node packages or look if thereās a bug in the Debian list related with the qemu-kvm package version that you haveā¦ proceed with caution of course.
I have nearly the same issue, except itās a group of 8 VMs from 80 that all crash at exactly the same time. There is nothing on the hypervisors indicating a problem, the VNC console for the VM is completely unresponsive and does not have any indication of a kernel panic or anything of the sort. Iāve tried bringing up the VMs on different hypervisors and still have the same issue where they crash hard every 20-30 hours.
@Jake_Burns ,
I cant find any solution to fix the problem , I changed the hyper-visorās OS from Debian 8 to Ubuntu 16.04 and everything is fine till now .
I think the problem was about kernel somehow with changing the OS I upgrade 3.18 kernel to 4.4 .