-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add await logic for DaemonSets #609
Comments
Raising awareness. We had a sysdig outage that could have been mitigated sooner if this was in place. |
@RichardWLaub @casey-robertson @rquitales I wanted to share the current approach I'm taking with this in case you have any concerns or suggestions. For DaemonSets with a RollingUpdate strategy, we will essentially follow the same behavior as kubectl rollout status. Pulumi will wait until the rollout status is ready unless the DS has a skipAwait annotation. It's less clear how to handle the OnDelete update strategy. For background, we currently handle StatefulSets with OnDelete update strategies by waiting for all pods to be manually removed (#2473); if the user doesn't want this behavior they must annotate the OnDelete StatefulSet with skipAwait. I'm preserving this behavior for OnDelete DaemonSets for consistently, although I see an argument that Pulumi should not wait for this update strategy at all. |
This adds await logic for DaemonSets with RollingUpdate or OnDelete update strategies. The implementation is largely based on the existing StatefulSet logic with two high-level simplifications: 1. We use [kstatus](https://pkg.go.dev/sigs.k8s.io/cli-utils/pkg/kstatus/status) to decide when a DaemonSet is ready. 2. We use a `PodAggregator` to handle reporting pod statuses. Importantly, unlike StatefulSet this means we do not currently inspect pods to decide readiness -- we only use them for informational purposes. I _think_ this is sufficient but I could easily be missing something. I haven't been able to simulate situations where this logic doesn't fully capture readiness and we would need to inspect pod statuses. A failing e2e test was added in YAML under the awkwardly name `tests/sdk/java` path. Unit tests were added around the public `Creation`, `Update`, etc. methods in order to more fully exercise timeouts and retries. To that end I introduced a mock clock package which might be controversial. IMO Go doesn't have a great OSS mock clock but something like this can be very helpful for testing. I'm still somewhat confused by the role of `await.Read` since it doesn't actually await anything, but it's implemented similar to StatefulSet as a one-shot read + readiness check. Fixes #609 Refs #2800 Refs #2799 Refs #2798
We are deploying the sysdig helm chart that creates a daemonSet with a rollingUpdate updateStrategy. We immediately run tests to make sure the pods are running correctly afterward. With
deployment
s andservice
s our tests always pass because pulumi waits to verify that those kinds deployed successfully. Our daemonSet tests sometimes fail because there is no await logic for daemonsets.Simple repro:
With a chart with template
ds-chart/templates/daemonset.yaml
:Toggle the tag above to
1.17
to see that the pulumi program finishes before the new pods are ready.The text was updated successfully, but these errors were encountered: