You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This provider calls the necessary phase to reset controllers, but it doesn't prepare hosts list, so they can be removed. Data structure ClusterResourceModelHost misses Reset field, and there's no logic that would translate host removal from state to flag update, so it can be picked up by phase manager.
It's quite problematic, when after removal, a new host is added with the same IP, as this is the unique ID for many k0s structures - it results in split-brain. The cluster still tries to connect to a new VM using IP that was not removed (mainly from etcd) and the new VM is stuck on cluster init phase, but serves requests immediately. Control-plane HA requires a load-balancer, so without sophisticated checks it can easily serve two clusters at the same time.
As per docs, the workaround seems to be to manually execute k0s etcd leave --peer-address IP_ADDR on an alive node - in most cases the node we want to delete, but it gets tricky if we're rebuilding a crashed VM. More so, since destroy time provisioners in TF only work with clean destroy - not even with taint.
The text was updated successfully, but these errors were encountered:
danielskowronski
added a commit
to danielskowronski/terraform-provider-k0s
that referenced
this issue
Nov 28, 2023
- add other option to pass SSH key - as raw PEM-encoded string alessiodionisi#94
- add k0sctlconfig as output, so it can be used with k0sctl CLI alessiodionisi#76
- handle situation, where k0s leader is not available - attempt to validate cluster on Read phase using all controllers
- investigate ways of actual node removal alessiodionisi#95
Some time ago, k0sctl added support for node removal.
This provider calls the necessary phase to reset controllers, but it doesn't prepare hosts list, so they can be removed. Data structure
ClusterResourceModelHost
missesReset
field, and there's no logic that would translate host removal from state to flag update, so it can be picked up by phase manager.It's quite problematic, when after removal, a new host is added with the same IP, as this is the unique ID for many k0s structures - it results in split-brain. The cluster still tries to connect to a new VM using IP that was not removed (mainly from etcd) and the new VM is stuck on cluster init phase, but serves requests immediately. Control-plane HA requires a load-balancer, so without sophisticated checks it can easily serve two clusters at the same time.
As per docs, the workaround seems to be to manually execute
k0s etcd leave --peer-address IP_ADDR
on an alive node - in most cases the node we want to delete, but it gets tricky if we're rebuilding a crashed VM. More so, since destroy time provisioners in TF only work with clean destroy - not even with taint.The text was updated successfully, but these errors were encountered: