Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add k0scontrolplane heathcheck-remediation #824

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

apedriza
Copy link
Contributor

@apedriza apedriza commented Nov 22, 2024

This PR adds the reconciliation by k0scontrolplane of machines that are considered unhealthy by the machinehealtcheck controller. Check MachineHealthCheck contract for more details h

It basically replicates the behavior of KubeadmControlPlane when handling machines considered unhealthy except that a remediation strategy in order to have a more granular process control is not implemented. Currently machine creation does not take into account the previous state of a machine if it is to be a replacement, so adding this control would require changes to the machine synchronization process. It can always be added later but I did not want to compromise that logic in this PR given its sensitivity.

@apedriza apedriza requested a review from a team as a code owner November 22, 2024 12:10
@apedriza apedriza marked this pull request as draft November 22, 2024 12:10
@apedriza apedriza force-pushed the support-machinehealthchecks branch 14 times, most recently from a90e27f to 4437cae Compare November 27, 2024 13:43
@apedriza apedriza marked this pull request as ready for review November 27, 2024 14:37
return fmt.Errorf("failed to filter machines for control plane: %w", err)
}

healthyMachines := machines.Filter(collections.Not(isUnhealthy))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use collections.Not(collections.HasUnhealthyCondition) here? If not, could we then avoid a double negative here? Eg something like machines.Filter(isHealthy)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 fixed using machines.Filter(isHealthy)

@apedriza apedriza force-pushed the support-machinehealthchecks branch from 4437cae to 096ed96 Compare November 28, 2024 14:45
@apedriza apedriza requested a review from makhov November 28, 2024 18:05
Copy link
Contributor

@makhov makhov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, just noticed a couple of things with annotations.


// Remove the annotation tracking that a remediation is in progress.
// A remediation is completed when the replacement machine has been created above.
delete(kcp.Annotations, cpv1beta1.RemediationInProgressAnnotation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be done before creating a machine in kube-api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this way we are sure machine is created in kube-api which means any remediation is done. If not there could be errors creating machine in kube-api and start a second remediation even if the first one was not completed. I think is safer if we make sure machine is created/remediated. WDYT?


// Mark controlplane to track that remediation is in progress and do not proceed until machine is gone.
// This annotation is removed when new controlplane creates a new machine.
annotations.AddAnnotations(kcp, map[string]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This annotation should probably be also removed from the K0sControlPlane once it recreates all the machines. Somewhere in updateStatus func or so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing it before creating the machine is safe in order to continue with next remediations. We could face cases where more than one machine needs to be remediated. This annotation is to not allow multiples remediations at the same time

@apedriza apedriza force-pushed the support-machinehealthchecks branch from 096ed96 to 9e25f53 Compare November 29, 2024 15:37
…/mkdocs-3ba6cc2ae5

Bump mkdocs-material from 9.5.47 to 9.5.48 in /docs in the mkdocs group
@apedriza apedriza force-pushed the support-machinehealthchecks branch 2 times, most recently from e082659 to e09c775 Compare December 12, 2024 13:24
@apedriza apedriza force-pushed the support-machinehealthchecks branch from e09c775 to c9b6c59 Compare December 12, 2024 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants