As of lately, we’ve been suffering from weird network related problems.
One is that ARP responses are never coming to the virtual machine. This ARP response hasn’t even reached our bond0 interface. With port mirroring to another host we can see that the ARP response is coming to the mirrored port, but for some reason our host running the OpenNebula VM doesn’t receive it. The odd thing is, if we empty the host from VMs, which has the consequence that br0 is deleted, create a VM that creates the br0, everything has a good chance of working again. And another weird thing is, that even though the ARP response never gets through, ‘arp -a’ shows the data actually being received. But if we ping for example, the ping is still stuck waiting for the IP to get resolved. How can the ARP table get updated even if we can’t see the ARP response when tsharking?
Because the nature of this problem, we today tried reverting the firmware of our network card on our Dell host… but ended up with this:
…yet another weird thing! We have a script loop that creates and deletes VMs and tries to reproduce this flaky situation. After a few successfully created virtual machines where the network works just fine, we managed to create a VM that never received any IP address to begin with. The reason was that br0 never had bond0 as a connector. So a br0 was created, but incorrectly. As soon as I just did “brctl addif br0 bond0.100”, everything began working again.
Has anyone else stumbled upon any of these mysterious random network issues?