-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add k0scontrolplane heathcheck-remediation #824
base: main
Are you sure you want to change the base?
Conversation
a90e27f
to
4437cae
Compare
return fmt.Errorf("failed to filter machines for control plane: %w", err) | ||
} | ||
|
||
healthyMachines := machines.Filter(collections.Not(isUnhealthy)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use collections.Not(collections.HasUnhealthyCondition)
here? If not, could we then avoid a double negative here? Eg something like machines.Filter(isHealthy)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 fixed using machines.Filter(isHealthy)
4437cae
to
096ed96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, just noticed a couple of things with annotations.
|
||
// Remove the annotation tracking that a remediation is in progress. | ||
// A remediation is completed when the replacement machine has been created above. | ||
delete(kcp.Annotations, cpv1beta1.RemediationInProgressAnnotation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be done before creating a machine in kube-api?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this way we are sure machine is created in kube-api which means any remediation is done. If not there could be errors creating machine in kube-api and start a second remediation even if the first one was not completed. I think is safer if we make sure machine is created/remediated. WDYT?
|
||
// Mark controlplane to track that remediation is in progress and do not proceed until machine is gone. | ||
// This annotation is removed when new controlplane creates a new machine. | ||
annotations.AddAnnotations(kcp, map[string]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This annotation should probably be also removed from the K0sControlPlane once it recreates all the machines. Somewhere in updateStatus
func or so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing it before creating the machine is safe in order to continue with next remediations. We could face cases where more than one machine needs to be remediated. This annotation is to not allow multiples remediations at the same time
096ed96
to
9e25f53
Compare
…/mkdocs-3ba6cc2ae5 Bump mkdocs-material from 9.5.47 to 9.5.48 in /docs in the mkdocs group
e082659
to
e09c775
Compare
Signed-off-by: Adrian Pedriza <[email protected]>
e09c775
to
c9b6c59
Compare
This PR adds the reconciliation by k0scontrolplane of machines that are considered unhealthy by the machinehealtcheck controller. Check MachineHealthCheck contract for more details h
It basically replicates the behavior of KubeadmControlPlane when handling machines considered unhealthy except that a remediation strategy in order to have a more granular process control is not implemented. Currently machine creation does not take into account the previous state of a machine if it is to be a replacement, so adding this control would require changes to the machine synchronization process. It can always be added later but I did not want to compromise that logic in this PR given its sensitivity.