How to fix MicroK8s restart loop
If you have enabled the cis-hardening plugin in your microk8s cluster, you might experience instability issues, especially after a node restart. In this article, I will explain the cause of this issue and how to fix microk8s in this scenario.
Symptoms
Sometimes it is not obvious that we have an issue with our microk8s configuration. The symptoms are very subtle and it can take some time to figure out the root case. Typical symptoms are:
- Pods failing to connect to other services
- Pods not resolving service names
kubectl port-forward
disconnecting after few secondskubectl exec -ti
disconnecting after few secondskubectl get nodes
showing some nodes constantly switching fromReady
toNot Ready
kubectl
unable to reach thekube-apiserver
If you are experiencing these issues there is high chance that your microk8s
node is stuck in a restart loop. You can verify if this is the case by issuing the sudo microk8s status
command a few times.
marcol@k8s-master:~$ sudo microk8s status
microk8s is running
[...]
marcol@k8s-master:~$ sudo microk8s status
microk8s is not running. Use microk8s inspect for a deeper inspection.
The --protect-kernel-defaults
flag
If you have installed the cis-hardening plugin in your cluster, there is high chance that the instability is caused by the --protect-kernel-defaults
flag, especially if the issues started after a node restarts.
There are many ways to confirm this is the case. The easiest is to run a sudo microk8s inspect
which will package for us an inspection report stored at /var/snap/microk8s/current/inspection-report
We can then read the journal of snap.microk8s.daemon-kubelite
to figure out if the root cause is our kernel configuration.
root@k8s-master:/var/snap/microk8s/current# cat inspection-report/snap.microk8s.daemon-kubelite/journal.log | grep -e kernel/panic -e vm/overcommit_memory -e kernel/panic_on_oops
Jan 30 04:56:28 k8s-master microk8s.daemon-kubelite[35998]: E0130 04:56:28.616813 35998 kubelet.go:1511] "Failed to start ContainerManager" err="[invalid kernel flag: kernel/panic_on_oops, expected value: 1, actual value: 0, invalid kernel flag: vm/overcommit_memory, expected value: 1, actual value: 0, invalid kernel flag: kernel/panic, expected value: 10, actual value: 0]"
The issue is now very clear. MicroK8s expects a different configuration for the kernel/panic_on_oops
, vm/overcommit_memory
, and kernel/panic
flags. Since the cis-hardening plugin is preventing the kubelite
daemon from changing the kernel defaults, the service keeps restarting.
Solution
You have two options.
You can set the --protect-kernel-default
flag in your /var/snap/microk8s/current/args/kubelet
to false
.
root@k8s-master:/var/snap/microk8s/current# cat args/kubelet | grep protect-kernel
--protect-kernel-defaults=false
Or you can change the kernel configuration in /etc/sysctl.conf
by adding the following:
vm.overcommit_memory = 1
kernel.panic = 10
kernel.panic_on_oops = 1
Both options are valid. I would use the first option, if you are unfamiliar with the kernel and you are not so sure of what those flags do. Otherwise, you can comply with the cis-hardening recommendations by modifying them. Note that the cis-hardening recommendation does not specify a value for those flags, it is simply preventing the daemon from adopting a different configuration than the one defined in the host.
Hope this helps!