Hello together,
I have some serious problems getting our huge setup working. The main problem seems to be with arp.
First of all some infos about my setup:
OS: CentOS 7
One: 5.4
Network: Vlan Setup
Storage: Ceph, but this is not the problem at all :slight_smile:
I have several nodes using KVM based virtualization. I have setup a bond, named bond1 which is being trunked to get several vlans to my nodes. The bond is using 802.3ad with lacp and so far this is working.
Now I have created several vms which are being served by the following virtual network (IPs are changed…):
[root@sun01 ~]# onevnet show 0
VIRTUAL NETWORK 0 INFORMATION
ID : 0
NAME : cloudtest-193
USER : oneadmin
GROUP : oneadmin
CLUSTERS : 0
BRIDGE : onebr.57
VN_MAD : 802.1Q
PHYSICAL DEVICE: bond1
VLAN ID : 57
USED LEASES : 7
PERMISSIONS
OWNER : um-
GROUP : ---
OTHER : ---
VIRTUAL NETWORK TEMPLATE
BRIDGE="onebr.57"
DNS="8.8.8.8"
GATEWAY="aa.bb.193.1"
NETWORK_MASK="255.255.255.0"
PHYDEV="bond1"
SECURITY_GROUPS="0"
VLAN_ID="57"
VN_MAD="802.1Q"
ADDRESS RANGE POOL
AR 0
SIZE : 50
LEASES : 7
RANGE FIRST LAST
MAC 02:00:3e:71:c1:0a 02:00:3e:71:c1:3b
IP aa.bb.193.10 aa.bb.193.59
LEASES
AR OWNER MAC IP IP6
0 V:0 02:00:3e:71:c1:0a aa.bb.193.10 -
0 V:1 02:00:3e:71:c1:0b aa.bb.193.11 -
0 V:3 02:00:3e:71:c1:0c aa.bb.193.12 -
0 V:5 02:00:3e:71:c1:0d aa.bb.193.13 -
0 V:6 02:00:3e:71:c1:0e aa.bb.193.14 -
0 V:7 02:00:3e:71:c1:0f aa.bb.193.15 -
0 V:8 02:00:3e:71:c1:10 aa.bb.193.16 -
VIRTUAL ROUTERS
[root@sun01 ~]#
So the vms are created fine, bridges are created correctly and macs seem to be learned. Now i have the problem, that I do not have connectivity. I cannot ping my vms, nor can the vms ping the outside world.
See the macs on the bridge of one of the virtualization nodes:
[root@virt01 ~]# brctl showmacs onebr.57
port no mac addr is local? ageing timer
2 02:00:3e:71:c1:0a no 135.40
3 02:00:3e:71:c1:10 no 111.37
1 48:df:37:03:00:10 yes 0.00
1 ec:3e:f7:93:9b:c0 no 111.36
2 fe:00:3e:71:c1:0a yes 0.00
2 fe:00:3e:71:c1:0a yes 0.00
3 fe:00:3e:71:c1:10 yes 0.00
3 fe:00:3e:71:c1:10 yes 0.00
[root@virt01 ~]#
The macs are also correctly learned on the switch (switch sees some more macs from other nodes):
user@fra1-pod02-c18-vccs01> show ethernet-switching table vlan 57
Ethernet-switching table: 6 unicast entries
VLAN MAC address Type Age Interfaces
vlan57 * Flood - All-members
vlan57 02:00:3e:71:c1:0a Learn 0 ae14.0
vlan57 02:00:3e:71:c1:0b Learn 52 ae15.0
vlan57 02:00:3e:71:c1:0d Learn 2:57 ae15.0
vlan57 02:00:3e:71:c1:0e Learn 2:06 ae15.0
vlan57 02:00:3e:71:c1:10 Learn 3:02 ae14.0
vlan57 ec:3e:f7:93:9b:c0 Learn 0 ae0.0
{master:0}
user@fra1-pod02-c18-vccs01>
I have double checked my network configuration on our switches and routes and everything is fine. I have even directly connected a server to the vlan over the bond and this is also working.
From what i found out right now is that there seems to be a problem with arp or something. I have done some basic tcpdump tests and see that arp packets do not seem to reach the vms. Somehow the arps are not correctly distributed.
Is there anyone out there who has faced a similar problem or has an idea what might be causing this? Maybe I am just missing a kernel option?
Thank you for for your input.