"one-contextd" restarts network during runtime

tku · October 6, 2020, 3:27pm

Hi!

We just upgraded to 5.12.0.1 and from Debian 9 to Debian10.
We run a k8s cluster with 5 nodes on centos 7.

For some reason one-contextd occasionally restarts the network on my nodes. This leads to a crash of the k8s underlay network (calico).

I can see the following log on one affected node:

Oct 6 14:57:04 node-0 one-contextd: Started for type all to reconfigure
Oct 6 14:57:04 node-0 one-contextd: Processing local scripts
Oct 6 14:57:04 node-0 one-contextd: Script loc-05-grow-rootfs output: NOCHANGE: partition 1 is size 83883999. it cannot be grown
Oct 6 14:57:04 node-0 one-contextd: meta-data=/dev/vda1 isize=512 agcount=21, agsize=524224 blks
Oct 6 14:57:04 node-0 one-contextd: = sectsz=512 attr=2, projid32bit=1
Oct 6 14:57:04 node-0 one-contextd: = crc=1 finobt=0 spinodes=0
Oct 6 14:57:04 node-0 one-contextd: data = bsize=4096 blocks=10485499, imaxpct=25
Oct 6 14:57:04 node-0 one-contextd: = sunit=0 swidth=0 blks
Oct 6 14:57:04 node-0 one-contextd: naming =version 2 bsize=4096 ascii-ci=0 ftype=1
Oct 6 14:57:04 node-0 one-contextd: log =internal bsize=4096 blocks=2560, version=2
Oct 6 14:57:04 node-0 one-contextd: = sectsz=512 sunit=0 blks, lazy-count=1
Oct 6 14:57:04 node-0 one-contextd: realtime =none extsz=4096 blocks=0, rtextents=0
Oct 6 14:57:21 node-0 one-contextd: Script loc-10-network output: RTNETLINK answers: File exists
Oct 6 14:57:21 node-0 one-contextd: Restarting network (via systemctl): [ OK ]
Oct 6 14:57:21 node-0 one-contextd: Script loc-16-gen-env output: cat: /var/run/one-context/mount.lg4M31/token.txt: No such file or directory
Oct 6 14:57:21 node-0 one-contextd: Processing network scripts
Oct 6 14:57:21 node-0 one-contextd: Script net-97-start-script output: Setup already completed
Oct 6 14:57:21 node-0 one-contextd: Done

Why is one-contextd processing these scripts during runtime? Has anybody an idea?

Thanks,
Thomas

vholer · October 7, 2020, 8:10am

Hello.

Context scripts are triggered on various state changes during the VM lifecycle (when NIC is added or removed, block device changes the size, etc. - see /lib/udev/rules.d/65-context.rules). If the changes are not a result of action in OpenNebula, but are done by the application/PaaS running inside the VM, they might be breaking your application. In that case disable/uninstall the context package inside the VM and use it only for initial configuration.

Best,
Vlastimil

tku · October 15, 2020, 2:57pm

Hello @vholer,

thanks for your feedback.
We are running Kubernetes on several nodes. We found out that each time a k8s job is starting a new pod, the script is triggered. In 80% of the cases the script finds out that there is nothing to do (“Setup already completed”). But in 20% of the cases the script restarts the network, which make the overlay network (in our case Calico) crash.
We haven’t had this behavior in the prior version of OpenNebula. For me this looks like some kind of race condition.

To solve the problem we removed the context package from the VMs. But maybe you have an idea about the root cause.

Best,
Thomas

Topic		Replies	Views
Ubuntu 14 context Product Support	2	530	November 12, 2018
Error in one-context-force status after one-context install on Debian 10 Product Support	2	644	October 8, 2019
One.template.instantiate issues with networking Product Support	0	357	November 28, 2018
Onedb fsck fails during upgrade 5.0.2->5.2 Product Support	1	438	October 21, 2016
Onevm updateconf does not set network settings correctly Product Support	6	1940	August 3, 2016

"one-contextd" restarts network during runtime

Related topics