VXLAN, 2 KVM Hosts, 50K multicasts/second

Hello!

TL;DR:
Using 1 VXLAN across 2 KVM hosts causes instability due to high multicast packet volume. Is this normal, or just me?

Physical Environment:
-1 CentOS Linux release 7.6.1810 Front end server mode4/802.3ad/lacp bond
-2 CentOS Linux release 7.6.1810 KVM Hosts mode4/802.3ad/lacp bond
-1 Windows Server 2016 Hyper-V host for support VMs (remote access etc) NIC Teaming (Aggregate)
NICs for ONe nodes are Mellanox ConnectX-4 and ConnectX-2 for the Hyper-V host. (10 GB/S interfaces and switch)

OpenNebula Version:
5.4.13 #This issue also occurred on 5.6. I rolled back to practice the upgrade process

Networking:
Bond Creation:
nmcli con add type bond con-name bond0 ifname bond0 mode 4 ipv4.method disabled ipv6.method ignore
nmcli con add type bond-slave ifname enp2s0f0 bond0
nmcli con add type bond-slave ifname enp2s0f0 master bond0
nmcli con add type bond-slave ifname enp2s0f1 master bond0
nmcli con up bond-slave-enp2s0f0
nmcli con up bond-slave-enp2s0f1
nmcli con up bond0

KVM Hosts have 4 Tagged Bridges created via the following commands:
‘nmcli c add type bridge con-name br1002 ifname br1002 ipv4.method manual ipv6.method ignore ’ #Management Network for internet and Remote Access
‘nmcli c add type bridge con-name br1003 ifname br1003 ipv4.method disabled ipv6.method ignore’ #For ONe Virtual Networks
‘nmcli c add type bridge con-name br1004 ifname br1004 ipv4.method manual ipv6.method ignore ’ #For Ceph Cluster
‘nmcli c add type bridge con-name br1003 ifname br1003 ipv4.method disabled ipv6.method ignore’ #For external facing IPs for tenants’ router/firewall’
‘nmcli con add type vlan ifname bond0.1002 dev bond0 id 1002 master br1002 slave-type bridge’ #updated for each bridge

This followed the guide here:
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-vlan_on_bond_and_bridge_using_the_networkmanager_command_line_tool_nmcli

I then created a VXLAN via the ONe web ui.

[root@cloud1 ~]# su oneadmin -c ‘onevnet list’
ID USER GROUP NAME CLUSTERS BRIDGE LEASES
0 oneadmin oneadmin vxlan-upgrade 0 br1003 6
6 oneadmin oneadmin edge 0 br1011 1

[root@cloud1 ~]# su oneadmin -c ‘onevnet show 0’
VIRTUAL NETWORK 0 INFORMATION
ID : 0
NAME : vxlan-upgrade
USER : oneadmin
GROUP : oneadmin
CLUSTERS : 0
BRIDGE : br1003
VN_MAD : vxlan
PHYSICAL DEVICE: bond0.1003
VLAN ID : 2
USED LEASES : 6

PERMISSIONS
OWNER : um-
GROUP : —
OTHER : —

VIRTUAL NETWORK TEMPLATE
BRIDGE=“br1003”
GATEWAY=“10.0.0.1”
NETWORK_ADDRESS=“10.0.0.0”
NETWORK_MASK=“255.255.255.0”
PHYDEV=“bond0.1003”
SECURITY_GROUPS=“0”
VN_MAD=“vxlan”

ADDRESS RANGE POOL
AR 0
SIZE : 100
LEASES : 6

RANGE FIRST LAST
MAC 02:00:0a:00:00:01 02:00:0a:00:00:64
IP 10.0.0.1 10.0.0.100

LEASES
AR OWNER MAC IP IP6
0 V:184 02:00:0a:00:00:01 10.0.0.1 -
0 V:185 02:00:0a:00:00:02 10.0.0.2 -
0 V:186 02:00:0a:00:00:03 10.0.0.3 -
0 V:187 02:00:0a:00:00:04 10.0.0.4 -
0 V:188 02:00:0a:00:00:05 10.0.0.5 -
0 V:178 02:00:0a:00:00:06 10.0.0.6 -

VIRTUAL ROUTERS
6

[root@cloud3 ~]# brctl showmacs br1003
port no mac addr is local? ageing timer
1 00:01:e8:8b:2e:9c no 1.78
6 02:00:0a:00:00:01 no 0.88
7 02:00:0a:00:00:02 no 0.87
4 02:00:0a:00:00:03 no 0.87
5 02:00:0a:00:00:04 no 2.94
3 02:00:0a:00:00:05 no 0.88
8 02:00:0a:00:00:06 no 1.30
2 e2:58:51:a9:8f:a6 yes 0.00
2 e2:58:51:a9:8f:a6 yes 0.00
1 ec:0d:9a:9c:79:52 yes 0.00
1 ec:0d:9a:9c:79:52 yes 0.00
1 ec:0d:9a:9c:79:5e no 0.00
6 fe:00:0a:00:00:01 yes 0.00
6 fe:00:0a:00:00:01 yes 0.00
7 fe:00:0a:00:00:02 yes 0.00
7 fe:00:0a:00:00:02 yes 0.00
4 fe:00:0a:00:00:03 yes 0.00
4 fe:00:0a:00:00:03 yes 0.00
5 fe:00:0a:00:00:04 yes 0.00
5 fe:00:0a:00:00:04 yes 0.00
3 fe:00:0a:00:00:05 yes 0.00
3 fe:00:0a:00:00:05 yes 0.00
8 fe:00:0a:00:00:06 yes 0.00
8 fe:00:0a:00:00:06 yes 0.00

Currently all VMs are on this KVM host.

[root@cloud3 ~]# nmcli c
NAME UUID TYPE DEVICE
bond0 9154687c-309a-4ccb-aa2c-03b212b6a9a1 bond bond0
bond0.1003.2 881ba273-627c-40cf-9d94-3d0e597d7be0 vxlan bond0.1003.2
bond-slave-enp2s0f0 fbf1fc1a-3af8-4315-b705-3fefb8cbdc63 ethernet enp2s0f0
bond-slave-enp2s0f1 79494087-b028-4f54-96c8-cec802bd29f3 ethernet enp2s0f1
br1002 07f971c9-94bd-41e4-917c-278cf546740b bridge br1002
br1003 169e3e6d-9398-4694-90c4-7752277236c0 bridge br1003
br1004 59e8e5a6-c465-445b-8a5b-84c1ccb9ce3b bridge br1004
br1011 0915a3c2-649a-4fa2-9fd0-0627c14abcbc bridge br1011
bridge-slave-bond0.1002 c2160853-0c08-4df8-bb3a-4edb579f148e vlan bond0.1002
bridge-slave-bond0.1003 40a8e165-aad0-4669-a777-0ced65d8ca4d vlan bond0.1003
bridge-slave-bond0.1004 21d6ad24-1343-44ca-9aae-0765a2aa616b vlan bond0.1004
bridge-slave-bond0.1011 98f2ec39-03ad-4fe0-b2fa-97f9e1adfeff vlan bond0.1011
one-178-0 f813a77d-4fec-4955-b566-c0ba266c43f9 tun one-178-0
one-184-0 3e77c05b-e509-4453-b695-7c7954e92a43 tun one-184-0
one-184-1 0e04bb78-bb3c-418b-ab08-d1092d1501c6 tun one-184-1
one-185-0 4fc12c26-5a9f-4662-8ff0-3b14f2e5f4eb tun one-185-0
one-186-0 36c521ec-f215-44af-84ce-b81acfd99c56 tun one-186-0
one-187-0 0bbf24cf-2dcf-4047-9641-9c72bba6cc55 tun one-187-0
one-188-0 b33a7c79-6d57-46da-b6f5-ba1d5c0d07e4 tun one-188-0
enp2s0f0 abec93f5-45b0-4809-aae0-3687f7a929b5 ethernet –
enp2s0f1 7d570c77-11f3-4df1-8b26-229531766edb ethernet –
enp4s0f0 76ef8b34-4b97-42d6-bfc2-970ecd5b046b ethernet –
enp4s0f1 3bdc9e25-ef49-4d94-b326-8d4545a72d60 ethernet –
enp4s0f2 722e5282-bc81-47bd-ab14-0b81d184d1ad ethernet –
enp4s0f3 16adcdcc-8958-434e-a0f1-54691eca525e ethernet –

Virtual Machines:
one-184 is a router with 2 vNICS, 1 on the Edge and 1 on the VXLAN performing NAT.
one-178-0 is a windows server which performed ceph testing.
one-187-0 is a Zabbix Monitoring Server, which is pinging out to google every 5 seconds.
The rest are Zabbix Monitoring Agents, which perform some data transfer every half hour to update what to monitor, as well as a 5 second server->agent ping.
These are not generating 50K+ multicast packets / second.

Problems that occured which caused investigation:
The switch that I use for ONe was trunked to another switch which is for a HyperV lab cluster. One day the HyperV lab cluster’s networking became interrupted. I executed systemctl restart network after deleting my ONe bridges and the network stabilized. We then isolated our ONe test environment. The current issue is when I tried to create another VM on my hyper-v host. The moment I mounted vNICs to the vm the hyperv host locked up. Disconnecting the physical networking from the server made the host operable.

Investigative Findings:
My coworker ran a packet counter on the switch and described 500K/Second multicasts being transmitted through the LACP port channels associated with my KVM hosts.
On the switch we mirrored 1 port channel for a laptop to run a wireshark capture.

Some context for the following PDFs:
At first my VMs were balanced across my KVM hosts. I migrated all VMs to a second KVM host and then executed on the empty host: nmcli c down br1003; nmcli c up br1003. This cleared the multicasts and provided a baseline.
Then I live migrated 1 VM to a second KVM host, and then ran a 1 minute packet capture to measure Quantity of Packets / Time.
Each subsequent test is an additional VM.
I then finished with migrating all VMs back to the original KVM host, and running a packet capture. The main thing here is that the multicasts were still casting.


https://www.scribd.com/document/401908515/1vm1min
https://www.scribd.com/document/401908553/2vm1min
https://www.scribd.com/document/401908560/3vm1min
https://www.scribd.com/document/401908569/Post-Migrate-1-Min

TCPdump on the KVM host #tcmpdump -n multicast shows packets of the following:

IP .filenet-tms > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .filenet-tms > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .filenet-tms > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .filenet-tms > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .filenet-tms > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .44278 > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .41381 > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP > 239.0.0.2: ip-proto-17
15:01:13.059226 IP .41381 > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP .38244 > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP truncated-ip - 50 bytes missing! .41381 > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2
IP truncated-ip - 50 bytes missing! .38244 > 239.0.0.2.otv: OTV, flags [I] (0x08), overlay 0, instance 2

Related thoughts/questions/ignorance:
-I do not think I have a loop as the volume is not rising exponentially.
-Do I need to provide an IP on my VLAN tagged bridge that I will be using for my ONe VNets? Traffic is passing between my VMs on different hosts without it, but Not getting traffic through vxlan bridge states that I should. I am confused!

VXLAN as it is now is taking up an insane amount of bandwidth. Does anybody have any config or troubleshooting ideas?

Resolved this issue by upgrading the kernel to the elrepo LT 4.4 kernel.

If anyone has any ideas where to look / how I should submit a bug report to the centos community, that would be super helpful.

Hello, I’m fairly new to VxLAN but I think this is the expected behavior of this technology.

Each of the packages sent by your virtual machine causes a multicast send to find out the destination VTEP.
To solvert this it is recommended to use a switch that supports BGP EVPN technology and can recognize the routes without the need to flood the network with multicast.
You can take a look at a recent post in the pose questions about the suitability of implementing these technologies and how to do it.

As a source of information about how this protocol works (VxLAN) you can see a series of videos in the following url:
Vxlan Videos

In particular I think it will be very interesting to see the number 4 of the series.

Greetings.