Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pods fail after upgrade to Ubuntu 20.04 #1775

Closed
antons1 opened this issue Nov 26, 2020 · 2 comments
Closed

Pods fail after upgrade to Ubuntu 20.04 #1775

antons1 opened this issue Nov 26, 2020 · 2 comments

Comments

@antons1
Copy link

antons1 commented Nov 26, 2020

microk8s inspect: inspection-report-20201126_102305.tar.gz

OS: Ubuntu 20.04.1 LTS
Mk8s version: 1.19/stable (But I have also tried to install 1.18 with the same results)

I have been running microk8s on my Ubuntu server for a few months, and everything has been working flawlessly. A few days ago, i upgraded the server from 18.04 to 20.04, and after that the cluster has been unable to start any pods. I don't really know which logs to check to find out more, but here are the symptoms:

A bunch of virtual nics are created and removed while the server is running. Right now I have tried running microk8s reset and then only enabled dns afterwards. So now, one card is continously created and removed. When i previously had several addons enabled, there would be several cards. The output of ip a gives me my physical interfaces, and then the one being added by kubernetes, which is now number 33 and the number keeps rising.

vethcdbb9af9@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0 state UP group default
    link/ether ba:f3:ea:bc:56:64 brd ff:ff:ff:ff:ff:ff link-netns cni-7a0bf69a-a6d8-826c-8f15-c93d13ff07eb
    inet 169.254.253.99/16 brd 169.254.255.255 scope global noprefixroute vethcdbb9af9
       valid_lft forever preferred_lft forever
    inet6 fe80::7414:4a4e:872c:eb81/64 scope link
       valid_lft forever preferred_lft forever
    inet6 fe80::b8f3:eaff:febc:5664/64 scope link
       valid_lft forever preferred_lft forever

All pods are stuck on status ContainerCreating or Unknown. Sometimes it will say status Running, but the ready column will still say 0/1.

Sometimes, executing kubectl get all --all-namespaces will give me the normal output:

NAME                           READY   STATUS    RESTARTS   AGE
pod/coredns-86f78bb79c-9qddz   0/1     Running   31         33m

NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
service/kube-dns   ClusterIP   10.152.183.10   <none>        53/UDP,53/TCP,9153/TCP   33m

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/coredns   0/1     1            0           33m

NAME                                 DESIRED   CURRENT   READY   AGE
replicaset.apps/coredns-86f78bb79c   1         1         0       33m

But other times, I just get a bunch of The connection to the server 127.0.0.1:16443 was refused - did you specify the right host or port?, as in after running kubectl ... once, the terminal prints out the error message 5-15 times.

kubectl describe [dns-pod] -n kube-system gives a loop of these messages.

Warning  Unhealthy       104s                   kubelet            Readiness probe failed: Get "http://10.1.49.37:8181/ready": dial tcp 10.1.49.37:8181: connect: connection refused
  Normal   SandboxChanged  64s (x2 over 65s)      kubelet            Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          63s                    kubelet            Container image "coredns/coredns:1.6.6" already present on machine
  Normal   Created         61s                    kubelet            Created container coredns
  Normal   Started         61s                    kubelet            Started container coredns
  Normal   SandboxChanged  4s (x2 over 12s)       kubelet            Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          3s                     kubelet            Container image "coredns/coredns:1.6.6" already present on machine
  Warning  Failed          2s                     kubelet            Error: transport is closing

If I add more addons, the same behaviour is exhibited by all the created pods, and that was also the case with my own deployments that were still there when I performed the upgrade.

I have tried to remove and reinstall microk8s (sudo snap remove microk8s --purge, sudo snap install microk8s --channel=1.19/stable --classic), and before that microk8s reset.

@ktsakalozos
Copy link
Member

Hi @antons1

MicroK8s has a service (called apiserver-kicker [1]) that keeps an eye on the interfaces on the system and triggers certificate refreshes and service restarts. This apiserver-kicker is used for example in the case where you are have MicroK8s on your laptop and you switch networks. I see that the apiserver-kicker always detects a change and keeps restarting the apiserver.

Here is what you could try. First temporarily disable this service and see if the cluster gets into a healthy state:

systemctl stop snap.microk8s.daemon-apiserver-kicker

If this is indeed the problem you can stop the the apiserver-kicker by editing /var/snap/microk8s/current/args/kube-apiserver and appending a configuration argument that would forcibly bind the kubernetes apiserver to a network interface, eg:

--advertise-address=192.168.1.25

[1] https://github.com/ubuntu/microk8s/blob/master/microk8s-resources/wrappers/apiservice-kicker

@antons1
Copy link
Author

antons1 commented Nov 26, 2020

Thank you, this solved the issue for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants