"one-contextd" restarts network during runtime

Hi!

We just upgraded to 5.12.0.1 and from Debian 9 to Debian10.
We run a k8s cluster with 5 nodes on centos 7.

For some reason one-contextd occasionally restarts the network on my nodes. This leads to a crash of the k8s underlay network (calico).

I can see the following log on one affected node:

Oct 6 14:57:04 node-0 one-contextd: Started for type all to reconfigure
Oct 6 14:57:04 node-0 one-contextd: Processing local scripts
Oct 6 14:57:04 node-0 one-contextd: Script loc-05-grow-rootfs output: NOCHANGE: partition 1 is size 83883999. it cannot be grown
Oct 6 14:57:04 node-0 one-contextd: meta-data=/dev/vda1 isize=512 agcount=21, agsize=524224 blks
Oct 6 14:57:04 node-0 one-contextd: = sectsz=512 attr=2, projid32bit=1
Oct 6 14:57:04 node-0 one-contextd: = crc=1 finobt=0 spinodes=0
Oct 6 14:57:04 node-0 one-contextd: data = bsize=4096 blocks=10485499, imaxpct=25
Oct 6 14:57:04 node-0 one-contextd: = sunit=0 swidth=0 blks
Oct 6 14:57:04 node-0 one-contextd: naming =version 2 bsize=4096 ascii-ci=0 ftype=1
Oct 6 14:57:04 node-0 one-contextd: log =internal bsize=4096 blocks=2560, version=2
Oct 6 14:57:04 node-0 one-contextd: = sectsz=512 sunit=0 blks, lazy-count=1
Oct 6 14:57:04 node-0 one-contextd: realtime =none extsz=4096 blocks=0, rtextents=0
Oct 6 14:57:21 node-0 one-contextd: Script loc-10-network output: RTNETLINK answers: File exists
Oct 6 14:57:21 node-0 one-contextd: Restarting network (via systemctl): [ OK ]
Oct 6 14:57:21 node-0 one-contextd: Script loc-16-gen-env output: cat: /var/run/one-context/mount.lg4M31/token.txt: No such file or directory
Oct 6 14:57:21 node-0 one-contextd: Processing network scripts
Oct 6 14:57:21 node-0 one-contextd: Script net-97-start-script output: Setup already completed
Oct 6 14:57:21 node-0 one-contextd: Done

Why is one-contextd processing these scripts during runtime? Has anybody an idea?

Thanks,
Thomas

Hello.

Context scripts are triggered on various state changes during the VM lifecycle (when NIC is added or removed, block device changes the size, etc. - see /lib/udev/rules.d/65-context.rules). If the changes are not a result of action in OpenNebula, but are done by the application/PaaS running inside the VM, they might be breaking your application. In that case disable/uninstall the context package inside the VM and use it only for initial configuration.

Best,
Vlastimil

Hello @vholer,

thanks for your feedback.
We are running Kubernetes on several nodes. We found out that each time a k8s job is starting a new pod, the script is triggered. In 80% of the cases the script finds out that there is nothing to do (“Setup already completed”). But in 20% of the cases the script restarts the network, which make the overlay network (in our case Calico) crash.
We haven’t had this behavior in the prior version of OpenNebula. For me this looks like some kind of race condition.

To solve the problem we removed the context package from the VMs. But maybe you have an idea about the root cause.

Best,
Thomas