VM can only be pinged from node : Solved

I have been setting up a test environment for open nebula, and i can create VMs using the marketplace templates, however the VM can only communicate with the host node, nothing else on the network can ping/ssh it and it cannot ping/ssh to anything else. How would i got about fixing this? Currently i have disabled the firewall on the KVM node to rule that out. The setup was done via the guide below.

http://docs.opennebula.org/4.12/design_and_installation/quick_starts/qs_centos7_kvm.html#step-2-installation-in-the-nodes

KVM node is running Centos 7
Frontend is Centos7
Test VM is using Ubuntu 16.04

ifcfg-br0

EVICE=br0
TYPE=Bridge
IPADDR="100.77.2.1"
PREFIX="8"
ONBOOT=yes
BOOTPROTO=static
NM_CONTROLLED=no
DNS1="100.10.5.1"
GATEWAY=“100.0.0.1”

ifcfg-eth0

DEVICE=eth0
BOOTPROTO=none
NM_CONTROLLED=no
ONBOOT=yes
TYPE=Ethernet
BRIDGE=br0

Hi Kyle!
Let’s see if we can help you.

So you have instantiated two VMs and ping doesn’t work between those two machines, does it?

Your VMs are using private addresses or public addresses 100.X.X.X?

Who is the gateway for your VMs? an Internet router, the frontend, the node?

Though not related I guess it’s just a typo when pasting in the forum in ifcfg-br0 EVICE should be DEVICE

Cheers!

The typo is only here and not in thw config file, it is a “public” range of addresses but they are all located on a large private test network. The vm can ping the host its running on, i have not tried having it ping another vm on the same host yet, i have tried to have it ping the frontend machine on the same network segment and that fails. The gateway for the vm is the router on the network.

Update: I created a few other VMs, they can ping each other as long as they are on the same node.

Hi!
your bridge configuration looks OK and if all IP addresses are in the same LAN then your VMs should be able to reach the router. I guess you can’t check if you’re receving level 2 traffic from the VMs in your router, can you?

Maybe you can use the following command in your KVM node and check if MAC addresses are being learned: brctl showmacs br0

Some providers like Hetzner needs special configuration so MAC ARPs work (https://www.centos.org/forums/viewtopic.php?t=8042) but I guess that you haven’t that problem. I’ll keep thinking on what else could you test but I don’t think it’s an OpenNebula issue.

Cheers

Here is the output from that command, I can check to see if anything is coming on the router, and it is getting nothing, The routers and switches on this end of the network are a Ubiquity ER-8 pro and a Ubiquiti ES-48 750W, however I’m fairly confident the issue exists somewhere in the bridge on the node, the node itself can connect to everything just find, and it can connect to the VMs, the VMs can connect to each other and the node, just nothing else on the network,.

[root@compute01 ~]# brctl showmacs br0
port no mac addr is local? ageing timer
1 00:04:23:c2:69:a2 no 21.47
1 00:15:17:86:5f:00 no 0.36
1 00:15:17:86:5f:01 no 0.36
1 00:15:17:86:5f:02 no 0.36
1 00:15:17:86:5f:03 no 0.36
1 00:15:17:86:8a:58 no 28.69
1 00:15:17:86:8a:59 no 2.33
1 00:15:17:86:8a:5a no 28.69
1 00:15:5d:05:7b:1a no 6.45
1 00:15:5d:05:7b:1d yes 0.00
1 00:15:5d:05:7b:1d yes 0.00
1 00:15:5d:38:01:0b no 104.79
1 00:15:5d:38:01:13 no 116.59
1 00:15:5d:38:01:14 no 57.47
1 00:15:5d:38:01:15 no 111.81
1 00:15:5d:38:01:1b no 10.77
1 00:15:5d:38:01:1c no 180.18
1 00:15:5d:44:dc:00 no 4.02
1 00:15:5d:44:dc:08 no 0.13
1 00:15:5d:44:dc:09 no 21.10
1 00:15:5d:44:dc:0d no 273.52
1 00:15:5d:44:dc:10 no 0.00
1 00:15:5d:e0:9d:00 no 0.73
1 00:23:7d:fc:6d:7c no 17.00
1 00:23:7d:fc:6d:7d no 17.00
1 00:23:7d:fc:6d:7e no 17.00
1 00:23:7d:fc:6d:7f no 17.00
1 00:25:90:05:0c:b2 no 7.05
1 00:25:90:05:0c:b3 no 25.84
1 00:25:90:05:2e:15 no 2.77
1 00:25:90:18:42:d2 no 17.00
1 00:25:90:18:42:d3 no 11.72
2 02:00:64:5a:00:02 no 0.30
3 02:00:64:5a:00:03 no 0.27
1 04:18:d6:31:55:74 no 9.33
1 04:18:d6:31:55:77 no 2.24
1 04:18:d6:31:55:7b no 4.89
1 3c:d9:2b:fe:6e:c6 no 28.69
1 3c:d9:2b:fe:6e:c7 no 13.94
1 3c:d9:2b:ff:03:b8 no 0.36
1 3c:d9:2b:ff:03:b9 no 0.36
1 44:d9:e7:06:c7:e7 no 15.08
1 44:d9:e7:93:c3:d9 no 8.14
1 a0:8c:fd:55:53:86 no 18.73
2 fe:00:64:5a:00:02 yes 0.00
2 fe:00:64:5a:00:02 yes 0.00
3 fe:00:64:5a:00:03 yes 0.00
3 fe:00:64:5a:00:03 yes 0.00
4 fe:00:64:5a:00:04 yes 0.00
4 fe:00:64:5a:00:04 yes 0.00

Ok, then let’s check OpenNebula :smiley:

Let’s see what’s your virtual network configuration, can you paste the output of the follwing command:

onevnet show X

where X is the ID of your virtual network?

Cheers!

[root@controller ~]# onevnet show 0
VIRTUAL NETWORK 0 INFORMATION
ID : 0
NAME : General Network
USER : oneadmin
GROUP : oneadmin
CLUSTERS : 0
BRIDGE : br0
VN_MAD : dummy
USED LEASES : 3

PERMISSIONS
OWNER : um-
GROUP : —
OTHER : —

VIRTUAL NETWORK TEMPLATE
BRIDGE="br0"
DNS="100.10.5.1"
GATEWAY="100.0.0.1"
NETWORK_ADDRESS="100.0.0.0"
NETWORK_MASK=“255.0.0.0"
PHYDEV=”"
SECURITY_GROUPS=“0"
VLAN_ID=”"
VN_MAD=“dummy”

ADDRESS RANGE POOL
AR 0
SIZE : 254
LEASES : 3

RANGE FIRST LAST
MAC 02:00:64:5a:00:01 02:00:64:5a:00:fe
IP 100.90.0.1 100.90.0.254

LEASES
AR OWNER MAC IP IP6_GLOBAL
0 V:8 02:00:64:5a:00:02 100.90.0.2 -
0 V:9 02:00:64:5a:00:03 100.90.0.3 -
0 V:9 02:00:64:5a:00:04 100.90.0.4 -

VIRTUAL ROUTERS
[root@controller ~]#

Ok, a simple “dummy” bridged virtual network so no filtering would be happening in OpenNebula.

I’m sorry asking you so many questions but I’m trying to troubleshoot common network problems. Could you paste the results of these commands from your VM guest OS:

  • ip route

  • ip -d addr

Cheers

No problem, i appreciate the help, here is the output of the commands.

root@ubuntu:~# ip route
default via 100.0.0.1 dev ens3 onlink
100.0.0.0/8 dev ens3 proto kernel scope link src 100.90.0.2
root@ubuntu:~# ip -d addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 02:00:64:5a:00:02 brd ff:ff:ff:ff:ff:ff promiscuity 0
inet 100.90.0.2/8 brd 100.255.255.255 scope global ens3
valid_lft forever preferred_lft forever
inet6 fe80::64ff:fe5a:2/64 scope link
valid_lft forever preferred_lft forever
root@ubuntu:~#

Oh my… I’m running out of ideas this kind of configuration shouldn’t be a problem, I too use CentOS 7.

If you were using routing I would check if net.ipv4.ip_forward equals 1 after running the following command:
sudo sysctl -a | grep net.ipv4.ip_forward but you don’t need any routing.

You could try to modify the virtual network so it uses Bridged & Security Groups but in CentOS 7 you would need to enable iptables which is not enabled by default, too much work for an easy configuration.

And as you can ping from the node to the frontend and gateway it shouldn’t be a problem with VLAN tagging, trunking…

It’s like something was blocking ARP traffic. Maybe you could try to install tcpdump and do:
tcpdump -i br0 -v “icmp or arp” at least to check if ARP is going out of the bridge when doing a ping…

Sorry if it seems odd, I suppose there’s no restrictions in switch ports so they only accept one MAC address like Cisco’s switchport port-security or STP.

What would happen if you set STP=yes in the bridge configuration? PLEASE DON’T DO THAT UNLESS YOU CAN REACH THE SERVER BY ANY OTHER MEAN AS YOU MAY LOOSE NETWORK CONNECT. Sorry by caps but it’s quite frustrating if you change network config and suddenly no more ssh.

I’ll get a walk and think of what I’d be missing!

Here is the tcp dump, Enabling STP did not seem to have any effect. The forwarding is already set to 1, i double checked anyway to make sure it was still set that way.

[root@compute01 ~]# tcpdump -i br0 -v "icmp or arp"
tcpdump: listening on br0, link-type EN10MB (Ethernet), capture size 65535 bytes
15:03:28.499578 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has pool-100-10-7-3.prvdri.fios.verizon.net tell lo0-100.bstnma-vfttp-361.verizon-gni.net, length 46
15:03:28.502928 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 100.90.0.2 tell 100.77.1.1, length 28
15:03:28.503909 ARP, Ethernet (len 6), IPv4 (len 4), Reply 100.90.0.2 is-at 02:00:64:5a:00:02 (oui Unknown), length 28
15:03:30.499294 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 100.90.0.2 tell 100.77.1.1, length 28
15:03:30.500183 ARP, Ethernet (len 6), IPv4 (len 4), Reply 100.90.0.2 is-at 02:00:64:5a:00:02 (oui Unknown), length 28
15:03:30.652292 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 100.180.0.20 tell lo0-100.phlapa-vfttp-344.verizon-gni.net, length 46
15:03:31.455778 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has pool-100-10-7-2.prvdri.fios.verizon.net tell l102.prvdri-vfttp-21.verizon-gni.net, length 46
15:03:31.500878 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 100.90.0.2 tell 100.77.1.1, length 28
15:03:31.501746 ARP, Ethernet (len 6), IPv4 (len 4), Reply 100.90.0.2 is-at 02:00:64:5a:00:02 (oui Unknown), length 28
15:03:31.605153 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 100.180.0.20 tell lo0-100.phlapa-vfttp-344.verizon-gni.net, length 46
15:03:32.220537 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has pool-100-10-7-2.prvdri.fios.verizon.net tell l102.prvdri-vfttp-21.verizon-gni.net, length 46
15:03:32.503025 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 100.90.0.2 tell 100.77.1.1, length 28
15:03:32.504144 ARP, Ethernet (len 6), IPv4 (len 4), Reply 100.90.0.2 is-at 02:00:64:5a:00:02 (oui Unknown), length 28
15:03:34.500254 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 100.90.0.2 tell 100.77.1.1, length 28
15:03:34.501206 ARP, Ethernet (len 6), IPv4 (len 4), Reply 100.90.0.2 is-at 02:00:64:5a:00:02 (oui Unknown), length 28
^C15:03:34.710487 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has l102.prvdri-vfttp-21.verizon-gni.net tell lo0-100.phlapa-vfttp-323.verizon-gni.net, length 46

Morning!
the Reply 100.90.0.2 is-at 02:00:64:5a:00:02 messages confirms that the bridge is behaving as expected answering for ARP messages. So I would check if your switch is doing something, I’d check MAC address table in the switch, if port security is disabled or if you can see any log about why this ARP info would being discarded this information

Cheers!

Solved:

The issue turned out to be that the compute node was nested in Hyper-V which has mac spoofing protection that filtered out the ARP requests, after disabling this feature under settings>network adapter>advanced features> mac spoofing it is now working. Thanks for your help.

Awesome!
I’m sure this post will help others if having problems with the bridge. Thanks for the feedback.