Br0 not connected to bond0

As of lately, we’ve been suffering from weird network related problems.
One is that ARP responses are never coming to the virtual machine. This ARP response hasn’t even reached our bond0 interface. With port mirroring to another host we can see that the ARP response is coming to the mirrored port, but for some reason our host running the OpenNebula VM doesn’t receive it. The odd thing is, if we empty the host from VMs, which has the consequence that br0 is deleted, create a VM that creates the br0, everything has a good chance of working again. And another weird thing is, that even though the ARP response never gets through, ‘arp -a’ shows the data actually being received. But if we ping for example, the ping is still stuck waiting for the IP to get resolved. How can the ARP table get updated even if we can’t see the ARP response when tsharking?

Because the nature of this problem, we today tried reverting the firmware of our network card on our Dell host… but ended up with this:

…yet another weird thing! We have a script loop that creates and deletes VMs and tries to reproduce this flaky situation. After a few successfully created virtual machines where the network works just fine, we managed to create a VM that never received any IP address to begin with. The reason was that br0 never had bond0 as a connector. So a br0 was created, but incorrectly. As soon as I just did “brctl addif br0 bond0.100”, everything began working again.

Has anyone else stumbled upon any of these mysterious random network issues?

Hi @tosaraja,

You can prevent bridge removal → Bridged Networking — OpenNebula 6.8.3 documentation

Could you share your virtual network template? :thinking:

That bridge removal surely would help in the case that the network begins on the first try. I’ll have to add that. It just leaves the possibility that it doesn’t work on the first try, which surely is an improvement, but not perfect.

I don’t use virtual network templates. I’ve just set it up manually with ansible… what are those templates even for?

Hi @tosaraja,

I wanted to see how you define VNET in OpenNebula, i.e. onevnet show X -j, those settings grouped together are sometimes called “template”, sorry for the confusion. :slight_smile:

[OT]

Indeed… The duality of the "VM Template" meaning is confusing · Issue #6082 · OpenNebula/one · GitHub

1 Like

{
“VNET”: {
“ID”: “1”,
“UID”: “0”,
“GID”: “1”,
“UNAME”: “oneadmin”,
“GNAME”: “users”,
“NAME”: “ONE-1-PROD”,
“PERMISSIONS”: {
“OWNER_U”: “1”,
“OWNER_M”: “1”,
“OWNER_A”: “0”,
“GROUP_U”: “1”,
“GROUP_M”: “0”,
“GROUP_A”: “0”,
“OTHER_U”: “1”,
“OTHER_M”: “0”,
“OTHER_A”: “0”
},
“CLUSTERS”: {
“ID”: [
“0”,
“101”,
“103”,
“104”
]
},
“BRIDGE”: “br0”,
“BRIDGE_TYPE”: “linux”,
“STATE”: “1”,
“PREV_STATE”: “1”,
“PARENT_NETWORK_ID”: {
},
“VN_MAD”: “802.1Q”,
“PHYDEV”: “bond0”,
“VLAN_ID”: “123”,
“OUTER_VLAN_ID”: {
},
“VLAN_ID_AUTOMATIC”: “0”,
“OUTER_VLAN_ID_AUTOMATIC”: “0”,
“USED_LEASES”: “368”,
“VROUTERS”: {
},
“UPDATED_VMS”: {
“ID”: [
“2666588”,
“2669415”,

“3484294”,
“3484295”
]
},
“OUTDATED_VMS”: {
},
“UPDATING_VMS”: {
},
“ERROR_VMS”: {
},
“TEMPLATE”: {
“BRIDGE”: “br0”,
“BRIDGE_TYPE”: “linux”,
“OUTER_VLAN_ID”: “”,
“PHYDEV”: “bond0”,
“SECURITY_GROUPS”: “0”,
“VLAN_ID”: “123”,
“VN_MAD”: “802.1Q”
},
“AR_POOL”: {
“AR”: {
“AR_ID”: “0”,
“MAC”: “02:00:e7:ef:16:d3”,
“SIZE”: “4096”,
“TYPE”: “ETHER”,
“MAC_END”: “02:00:e7:ef:26:d2”,
“USED_LEASES”: “368”,
“LEASES”: {
“LEASE”: [
{
“MAC”: “02:00:e7:ef:16:d3”,
“VM”: “3483181”
},
{
“MAC”: “02:00:e7:ef:16:d4”,
“VM”: “3291250”
},

{
“MAC”: “02:00:e7:ef:19:12”,
“VM”: “3416224”
},
{
“MAC”: “02:00:e7:ef:19:30”,
“VM”: “3451027”
}
]
}
}
}
}
}