Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage extension misconfiguration takes down all controllers #562

Closed
pschichtel opened this issue Sep 20, 2023 · 3 comments · Fixed by #567
Closed

storage extension misconfiguration takes down all controllers #562

pschichtel opened this issue Sep 20, 2023 · 3 comments · Fixed by #567
Labels
bug Something isn't working

Comments

@pschichtel
Copy link

pschichtel commented Sep 20, 2023

k0sctl version v0.15.5 on NixOS, k0s 1.27.5 on Rocky 9.

As suggested in #561, misconfiguring the spec.k0s.config.spec.extensions.storage section has lead k0sctl to take down all 3 of our controllers nodes one after the other until the entire cluster was unavailable and unable to start.

The issue came up because we wanted to enable the openebs extension, but we only switched the create_default_storage_class option to true without also switching the type to openebs_local_storage. This made k0s reject the configuration during startup, because create_default_storage_class: true is not allowed with type: external_storage. k0sctl apparently didn't recognize the error state and kept going for the remaining nodes.

To reproduce:

  1. Setup a working 3 controller cluster with create_default_storage_class: false is not allowed with type: external_storage
  2. Apply the configuration at the end

Expected:

Update of first controller fails and k0sctl stops the rollout

Actual:

All 3 nodes go down

apiVersion: k0sctl.k0sproject.io/v1beta1
kind: Cluster
metadata:
  name: k0s-cluster
spec:
  hosts: []
  k0s:
    version: v1.27.5+k0s.0
    dynamicConfig: false
    config:
      apiVersion: k0s.k0sproject.io/v1beta1
      kind: ClusterConfig
      metadata:
        name: some-cluster
      spec:
        api:
          k0sApiPort: 9443
          port: 6443
          tunneledNetworkingMode: false
        extensions:
          storage:
            create_default_storage_class: true # <-- this was the issue
            type: external_storage
        installConfig:
          users:
            etcdUser: etcd
            kineUser: kube-apiserver
            konnectivityUser: konnectivity-server
            kubeAPIserverUser: kube-apiserver
            kubeSchedulerUser: kube-scheduler
        konnectivity:
          adminPort: 8133
          agentPort: 8132
        network:
          provider: calico
          clusterDomain: cluster.local
          dualStack: {}
          kubeProxy:
            iptables:
              masqueradeAll: true
              minSyncPeriod: 0s
              syncPeriod: 0s
            ipvs:
              minSyncPeriod: 0s
              syncPeriod: 0s
              tcpFinTimeout: 0s
              tcpTimeout: 0s
              udpTimeout: 0s
            metricsBindAddress: 0.0.0.0:10249
            mode: iptables
          kuberouter:
            autoMTU: true
            hairpin: Enabled
            ipMasq: false
            mtu: 0
          nodeLocalLoadBalancing:
            enabled: true
            envoyProxy:
              apiServerBindPort: 7443
              image:
                image: docker.io/envoyproxy/envoy-distroless
              konnectivityServerBindPort: 7132
            type: EnvoyProxy
          podCIDR: 10.244.0.0/16
          serviceCIDR: 10.96.0.0/12
        scheduler: {}
        storage:
          type: etcd
        telemetry:
          enabled: true
@kke
Copy link
Contributor

kke commented Sep 20, 2023

K0sctl first writes the config and only after that uses k0s config validate to see if it's valid. It should be done the other way around.

It still should have catched this and errored out before restarting k0s on the controllers.

@kke kke added the bug Something isn't working label Sep 20, 2023
@pschichtel
Copy link
Author

I found out that the option wasn't accepted by checking the k0scontroller service logs. K0sctl just eventually bailed out because the controller didn't come up.

Early validation would be great, but for me even worse was, that it also continued on the other nodes eventually talking down the entire cluster, even though the first controller didn't come up.

@kke
Copy link
Contributor

kke commented Oct 3, 2023

Yeah, no bueno, this will be addressed shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants