After some week of test with no major issue, we have pass our OpenNebula in production this week end.
We have 36 VM on 3 hosts.
Today, we have problems:
Each host was detected by one as done but OS is correctly running and be accessible by ssh.
We have this error in syslog
Aug 22 22:13:58 adnpvirt07 libvirtd: error from service: CheckAuthorization: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
Aug 22 22:13:58 adnpvirt07 libvirtd: End of file while reading data: Erreur d’entrée/sortie
oneadmin is membrer of libvrt group.
this issue cause corruption of the running vdisk and cause some problem with our production.
in addition, (HOOK on error host is enable to activate HA for our hosts)
After rebooting each host (and apply an apt-get upgrade), it seems to be good, but I want to understand where is the problem to fix it.
We are using OpenNebula 5.0.2
Thanks for helps,