Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node removal not supported, however it "works" in unexpected manner #95

Open
danielskowronski opened this issue Nov 13, 2023 · 1 comment

Comments

@danielskowronski
Copy link

Some time ago, k0sctl added support for node removal.

This provider calls the necessary phase to reset controllers, but it doesn't prepare hosts list, so they can be removed. Data structure ClusterResourceModelHost misses Reset field, and there's no logic that would translate host removal from state to flag update, so it can be picked up by phase manager.

It's quite problematic, when after removal, a new host is added with the same IP, as this is the unique ID for many k0s structures - it results in split-brain. The cluster still tries to connect to a new VM using IP that was not removed (mainly from etcd) and the new VM is stuck on cluster init phase, but serves requests immediately. Control-plane HA requires a load-balancer, so without sophisticated checks it can easily serve two clusters at the same time.

As per docs, the workaround seems to be to manually execute k0s etcd leave --peer-address IP_ADDR on an alive node - in most cases the node we want to delete, but it gets tricky if we're rebuilding a crashed VM. More so, since destroy time provisioners in TF only work with clean destroy - not even with taint.

danielskowronski added a commit to danielskowronski/terraform-provider-k0s that referenced this issue Nov 28, 2023
- add other option to pass SSH key - as raw PEM-encoded string alessiodionisi#94
- add k0sctlconfig as output, so it can be used with k0sctl CLI alessiodionisi#76
- handle situation, where k0s leader is not available - attempt to validate cluster on Read phase using all controllers
- investigate ways of actual node removal alessiodionisi#95
@danielskowronski
Copy link
Author

This is not yet supported in k0sctl itself - k0sproject/k0sctl#603

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant