oneKE kube-apiservers down

SysAdminHorror · March 6, 2024, 9:08am

Hello,

I have a bunch of independent clusters deployed and all the master nodes currently have the same problem.
After deploying everything I needed on the clusters and let it run for a couple of weeks without any problems all the master nodes started prompting the following during the scheduled restart of the rke2-server service:

INFO[0721] Container for etcd is running
INFO[0721] Container for kube-apiserver not found (no matching container found), retrying
INFO[0725] Waiting for API server to become available
INFO[0741] Container for etcd is running
INFO[0741] Container for kube-apiserver not found (no matching container found), retrying
INFO[0741] Waiting for API server to become available
INFO[0755] Waiting for API server to become available
.
.
.
INFO[0901] Container for etcd is running
INFO[0901] Container for kube-apiserver not found (no matching container found), retrying
INFO[0905] Waiting for API server to become available
FATA[0921] Failed to get request handlers from apiserver: timed out waiting for the condition, failed to get apiserver /readyz status: Get "https://127.0.0.1:6443/readyz": dial tcp 127.0.0.1:6443: connect: connection refused

This error makes managing the clusters imposible.

Help will be appreciate,
Thanks

Versions of the related components:
rke2 version v1.27.2+rke2r1 (300a06dabe679c779970112a9cb48b289c17536c)
go version go1.20.4 X:boringcrypto

mopala · March 8, 2024, 2:42pm

Hi @SysAdminHorror,

Could you double check your resource usage, like if your storage isn’t full or master VMs have at least 3G of RAM?

SysAdminHorror · March 8, 2024, 2:58pm

Hello @mopala,

Everything here seems fine to me:

root@oneke-ip-10-100-100-12:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           297M   11M  287M   4% /run
/dev/vda1        20G   14G  6.1G  69% /
tmpfs           1.5G     0  1.5G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
/dev/vda15      105M  6.1M   99M   6% /boot/efi
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/a4505f79cf0979a6df9f353f38e34aafddb97c02e78d1a0958ef5b0c4d24992f/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/2b6d16ef348379e1fe3802f493e6c20928320a214783fe6c6b447338a85c17d3/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/02e7bd1592a7ee6aae429b53b7d1c904ee2ea0461cff8fb493c1a210ffd7bbd2/shm
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/1017de7cfacbc51e474c847b556e385806f0638af8914745b0df4a20e12713d2/shm
overlay          20G   14G  6.1G  69% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/2b6d16ef348379e1fe3802f493e6c20928320a214783fe6c6b447338a85c17d3/rootfs
overlay          20G   14G  6.1G  69% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/1017de7cfacbc51e474c847b556e385806f0638af8914745b0df4a20e12713d2/rootfs
overlay          20G   14G  6.1G  69% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/02e7bd1592a7ee6aae429b53b7d1c904ee2ea0461cff8fb493c1a210ffd7bbd2/rootfs
overlay          20G   14G  6.1G  69% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/a4505f79cf0979a6df9f353f38e34aafddb97c02e78d1a0958ef5b0c4d24992f/rootfs
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/80f57aea675e55ddbfea707f86aade78c9a7d1e5ff89b54ad5c8a24910db791e/shm
overlay          20G   14G  6.1G  69% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/80f57aea675e55ddbfea707f86aade78c9a7d1e5ff89b54ad5c8a24910db791e/rootfs
overlay          20G   14G  6.1G  69% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/a6d34d9aee2ed58089e4608ed4d866571baab13baeb97646a197643fc00f408d/rootfs
tmpfs           2.9G     0  2.9G   0% /var/lib/kubelet/pods/8002c985-a8c4-422f-9a74-9d7fcfee2dda/volumes/kubernetes.io~secret/clustermesh-secrets
tmpfs           2.9G   12K  2.9G   1% /var/lib/kubelet/pods/0e04a51e-2d5c-4e61-b52e-8c9df180c199/volumes/kubernetes.io~projected/kube-api-access-tv9ps
tmpfs           2.9G   12K  2.9G   1% /var/lib/kubelet/pods/8002c985-a8c4-422f-9a74-9d7fcfee2dda/volumes/kubernetes.io~projected/kube-api-access-f8smf
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/e485e10ebbf19c6014612026198706b355abb7aa5d6a5c370d00d41854ba9345/shm
overlay          20G   14G  6.1G  69% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/e485e10ebbf19c6014612026198706b355abb7aa5d6a5c370d00d41854ba9345/rootfs
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/d8e28c8a16475702341d7171a15657911aaec0d29e84a688e332928a09a5d3ff/shm
overlay          20G   14G  6.1G  69% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/d8e28c8a16475702341d7171a15657911aaec0d29e84a688e332928a09a5d3ff/rootfs
overlay          20G   14G  6.1G  69% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/ddaa083262af03ece8773a2bc92c076a05560ce2aa629ac6cdf5f4d54b92a903/rootfs
overlay          20G   14G  6.1G  69% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/9d4cc9427a5face694f1c795a18a9a3950b181f3400b099c13479010a3c320e1/rootfs
tmpfs           2.9G   12K  2.9G   1% /var/lib/kubelet/pods/a1cb0cd6-306d-4aaa-8a2b-454da622681e/volumes/kubernetes.io~projected/kube-api-access-4qbq5
shm              64M     0   64M   0% /run/k3s/containerd/io.containerd.grpc.v1.cri/sandboxes/530582ae8360a1970c632d86a4a0ab498cb7a6d7b602f7b487e6efeabf703706/shm
overlay          20G   14G  6.1G  69% /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/530582ae8360a1970c632d86a4a0ab498cb7a6d7b602f7b487e6efeabf703706/rootfs
tmpfs           297M     0  297M   0% /run/user/0
root@oneke-ip-10-100-100-12:~# free -h
               total        used        free      shared  buff/cache   available
Mem:           2.9Gi       470Mi       330Mi        10Mi       2.1Gi       2.2Gi
Swap:             0B          0B          0B

mopala · March 11, 2024, 10:34am

I that case you could check in the VNF if the HAProxy instance has all the backends configured /etc/haproxy/haproxy.cfg. I suspect this may be the problem, in such case you may want to replace the VNF image from the latest OneKE in the marketplace and replace all VNF instances.

SysAdminHorror · March 11, 2024, 10:57am

This is the current configuration of the VNF HAProxy backends:

localhost:~# cat /etc/haproxy/haproxy.cfg
.
.
.
backend 31302e39352e38322e38303a39333435
    mode tcp
    balance roundrobin
    option tcp-check
backend 31302e39352e38322e38303a36343433
    mode tcp
    balance roundrobin
    option tcp-check
    server 31302e3130302e3130302e31323a36343433 10.100.100.12:6443 check observe layer4 error-limit 50 on-error mark-down
backend 31302e39352e38322e38303a343433
    mode tcp
    balance roundrobin
    option tcp-check
    server 31302e3130302e3130302e31393a3332343433 10.100.100.19:32443 check observe layer4 error-limit 50 on-error mark-down
    server 31302e3130302e3130302e31333a3332343433 10.100.100.13:32443 check observe layer4 error-limit 50 on-error mark-down
    server 31302e3130302e3130302e31363a3332343433 10.100.100.16:32443 check observe layer4 error-limit 50 on-error mark-down
backend 31302e39352e38322e38303a3830
    mode tcp
    balance roundrobin
    option tcp-check
    server 31302e3130302e3130302e31393a3332303830 10.100.100.19:32080 check observe layer4 error-limit 50 on-error mark-down
    server 31302e3130302e3130302e31333a3332303830 10.100.100.13:32080 check observe layer4 error-limit 50 on-error mark-down
    server 31302e3130302e3130302e31363a3332303830 10.100.100.16:32080 check observe layer4 error-limit 50 on-error mark-down

Is something misconfigured in this file??

mopala · March 11, 2024, 11:03am

So this is the old VNF image for sure, we’ve replaced it completely with a new one vr_balancing · OpenNebula/one-apps Wiki · GitHub. But if you have just one master and 3 nodes, then it seems to be correct. I guess you need to look for problems in RKE2 logs themselves.

SysAdminHorror · March 12, 2024, 8:16am

The first logs of this thread are from the rke2-server, I don’t see anything that rings a bell on me on this logs. The only thing I know is that the service is not able to get the kube-apiserver image, is strange because the rest of the images needed to run the service are pulled without problems.

mopala · March 12, 2024, 12:36pm

Hi @SysAdminHorror,

That’s the image in a working cluster:

  kube-apiserver:
    Image:         index.docker.io/rancher/hardened-kubernetes:v1.27.2-rke2r1-build20230518

You can list images using crictl images on the master.

docker.io/rancher/hardened-kubernetes    v1.27.2-rke2r1-build20230518   77a5bb5822f66       217MB

You can try pulling it manually I guess

$ crictl pull docker.io/rancher/hardened-kubernetes:v1.27.2-rke2r1-build20230518
Image is up to date for sha256:77a5bb5822f668bac88c0722c9fa1dd210efef9a5c9896c73cf37bd0859e87d2

But I don’t think that will help, I’d rather try to verify if the LB is actually operational and if not I’d try replacing VNF image with the latest one, that has completely new + much simpler implementation.

SysAdminHorror · March 12, 2024, 1:45pm

Hi @mopala,

First of all thank you for the help,

Given that this image is available in the machine I’ll try to check de LB and the VNF.

root@oneke-ip-10-100-100-12:~# crictl images | grep build20230518
docker.io/rancher/hardened-kubernetes                                v1.27.2-rke2r1-build20230518               77a5bb5822f66       695MB
root@oneke-ip-10-100-100-12:~# crictl pull docker.io/rancher/hardened-kubernetes:v1.27.2-rke2r1-build20230518
Image is up to date for sha256:77a5bb5822f668bac88c0722c9fa1dd210efef9a5c9896c73cf37bd0859e87d2

system · June 25, 2025, 3:47pm

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cannot use OneKE in air-gap environnement? Product Support	12	399	February 23, 2024
Opennebula kubernetes Deploying State Problem Product Support	0	250	May 22, 2023
Kubernetes Appliance stuck in Deploying state Product Support	11	692	June 25, 2025
OpenNebula OneKe 1.31 Service Appliance stuck at 2/3 Configuration step is in progress Installation & Configuration	0	6	August 1, 2025
Tryin to use OneKE 1.31 Operations	2	36	July 3, 2025

oneKE kube-apiservers down

Related topics