Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove startup and liveness probes #423

Merged
merged 1 commit into from
Sep 1, 2022

Conversation

timuthy
Copy link
Member

@timuthy timuthy commented Sep 1, 2022

How to categorize this PR?

/area control-plane
/kind enhancement

What this PR does / why we need it:
This PR removes the livenessProbe and startupProbe for etcd.

The enablement of startup/liveness probes through #396 showed that they cause more harm than good:

Special notes for your reviewer:
/cc @ishan16696 @aaronfern @ashwani2k

Release note:

Liveness and startup probes for etcd were removed. After activating them in the last release, we noticed that they cause more harm than good since the startup time for etcd clusters varies and isn't predicable. Killing the `etcd` container in such a case doesn't solve the situation and will rather end in an endless loop of restarts. This change will cause a restart of etcd clusters.

The enablement of startup/liveness probes through gardener#396 showed that they cause more harm than good:
- The startup time of etcds can vary depending on the state and amount of data
- If startup does not happen in the expected time, the failing probes kill the container which does not help to solve the issue at all but will end in a endless loop of restarts
- Liveness probes had been disabled for a long time before which never caused issues in our experience.
- Other communities have come to a similar conclusion, see https://github.com/improbable-eng/etcd-cluster-operator/blob/master/docs/operations.md#why-arent-there-liveness-probes-for-the-etcd-pods
@timuthy timuthy requested a review from a team as a code owner September 1, 2022 08:27
@gardener-robot gardener-robot added area/control-plane Control plane related kind/enhancement Enhancement, improvement, extension needs/review Needs review size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels Sep 1, 2022
@timuthy
Copy link
Member Author

timuthy commented Sep 1, 2022

/needs cherry-pick

@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Sep 1, 2022
@gardener-robot gardener-robot added the needs/cherry-pick Needs to be cherry-picked to older version label Sep 1, 2022
@gardener-robot-ci-3 gardener-robot-ci-3 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Sep 1, 2022
@aaronfern
Copy link
Contributor

/assign

@timuthy
Copy link
Member Author

timuthy commented Sep 1, 2022

/priority 1

@gardener-robot gardener-robot added the priority/1 Priority (lower number equals higher priority) label Sep 1, 2022
Copy link
Contributor

@aaronfern aaronfern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @timuthy!
/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review labels Sep 1, 2022
@timuthy timuthy merged commit 62d6f25 into gardener:master Sep 1, 2022
@timuthy timuthy deleted the enhancement.probes branch September 1, 2022 11:44
@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Control plane related kind/enhancement Enhancement, improvement, extension needs/cherry-pick Needs to be cherry-picked to older version needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) priority/1 Priority (lower number equals higher priority) reviewed/lgtm Has approval for merging size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants