DRA: handle non graceful node shutdowns #4260

bart0sh · 2023-10-02T11:53:41Z

120421: interaction with unexpected node shutdown KEP

One-line PR description: Added a section about handling non graceful node shutdowns (see https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown for more details
Other comments: see https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown for more details

bart0sh · 2023-10-02T11:54:13Z

/assign @pohly

pohly

Looks good to me, some minor spelling suggestions.

pohly · 2023-10-02T13:07:28Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

@@ -1162,6 +1163,18 @@ Once all of those steps are complete, kubelet will notice that the claims are
 ready and run the pod. Until then it will keep checking periodically, just as
 it does for other reasons that prevent a pod from running.

+### Handling non graceful node shutdowns


Suggested change

### Handling non graceful node shutdowns

### Handling non-graceful node shutdowns

'non graceful' is used in the KEP, that's why I decided to use it here and in the e2e test code.

The KEP issues uses "non-graceful", as does the KEP README in one place - looks like the original authors weren't sure about the right spelling.

"non-graceful" feels more right to me, but I'm not a native speaker and English is creative... Let's leave it in this PR as you have it now ("non graceful").

pohly · 2023-10-02T13:07:43Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

@@ -1162,6 +1163,18 @@ Once all of those steps are complete, kubelet will notice that the claims are
 ready and run the pod. Until then it will keep checking periodically, just as
 it does for other reasons that prevent a pod from running.

+### Handling non graceful node shutdowns
+
+When a node is shutdown unexpectedly and is tained with an `out-of-service`


Suggested change

When a node is shutdown unexpectedly and is tained with an `out-of-service`

When a node is shut down unexpectedly and is tainted with an `out-of-service`

done, thank you for the review!

pohly · 2023-10-02T13:08:02Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

+### Handling non graceful node shutdowns
+
+When a node is shutdown unexpectedly and is tained with an `out-of-service`
+taint with NoExecute effect as explained in the [Non graceful node shutdown KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown),


Suggested change

taint with NoExecute effect as explained in the [Non graceful node shutdown KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown),

taint with NoExecute effect as explained in the [Non-graceful node shutdown KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown),

Same thing here.

SergeyKanzhelev · 2023-10-02T18:52:13Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

+resources used by the pods will be deallocated. However, they will not be
+un-prepared as the node is down and Kubelet is not running on it.
+
+Resource drivers should be able to handle this situation correctly and


If Deallocate is called without UnprepareNodeResources does it automatically mean a broken node? Or there are other cases like this?

non-blocking: is there any recommendations that can be shared here on the best practices on implementing those? If there is no guarantee, what logic should be implemented in UnprepareNodeResources?

If Deallocate is called without UnprepareNodeResources does it automatically mean a broken node? Or there are other cases like this?

Currently this is the only one case I'm aware of.

is there any recommendations that can be shared here on the best practices on implementing those? If there is no guarantee, what logic should be implemented in UnprepareNodeResources?

It depends on a resource type. For local resources not much can be done if node is powered off, but something can be done if it's just a Kubelet crash. For network-attached resources DRA controller can theoretically detach them from the node. However, all these cases are not generic enough to give recommendations. Plugin authors should know how to handle this type of cases and in most cases it depends on a particular hardware setup, I think.

the reason I am asking is to understand if we need to guarantee that in normal case all callbacks will be called? And if so - is there any guarantees on timing? Can they be called super fast one after another - how much synchronization is needed there. Can they somehow be called in opposite order? (sorry, I haven't looked at implementation so my questions may be completely out of context).

It's already guaranteed that PrepareResources is called when container is created and UnprepareResources - when container is terminated. It's a part of the initial DRA implementation. Description of the non graceful node shutdown case is just a way to explain how DRA plugins are expected to behave when UnprepareResources is not called due to unexpected node shutdown.

Let me provide a bit more info on this.
NodePrepareResources and NodeUnprepareResources calls are expected to do as little work as possible as specified in the KEP

This operation SHALL do as little work as possible as it’s called after a pod is scheduled to a node. All potentially failing operations SHALL be done during allocation phase.

In most cases it means that NodePrepareResources only creates CDI file and returns its fqdn to the Kubelet and NodeUnprepareResources removes the file. When the node is rebooted CDI files are removed as they're usually placed at /var/run/cdi/. In this particular case it means that even if NodeUnprepareResources is not called, because of unexpected node shutdown, the file will be removed anyway on the node reboot.

SergeyKanzhelev · 2023-10-03T00:41:14Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

+
+Resource drivers should be able to handle this situation correctly and
+should not expect `UnprepareNodeResources` to be always called before
+`Deallocate`.


I think this sentence saying that UnprepareNodeResources may be omitted, but can be read as "they can be called in opposite order". Please make it more explicit what sentence means

Thank you for pointing out to this. Re-frazed, PTAL.

Added a section about handling non graceful node shutdowns (see https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown for more details)

bart0sh · 2023-10-03T21:02:33Z

@SergeyKanzhelev any other questions/concerns? if not, can you lgtm/approve?

SergeyKanzhelev

/lgtm

bart0sh · 2023-10-04T07:54:50Z

/assign @mrunalp @dchen1107
for a final approval

bart0sh · 2023-10-11T09:54:29Z

@mrunalp @dchen1107 @derekwaynecarr Can you approve this? It's a minor change and it doesn't require any code changes.

k8s-ci-robot · 2023-10-18T23:51:26Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bart0sh, mrunalp, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-node/OWNERS~~ [mrunalp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 2, 2023

k8s-ci-robot requested review from dchen1107 and mrunalp October 2, 2023 11:53

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 2, 2023

k8s-ci-robot assigned pohly Oct 2, 2023

bart0sh force-pushed the PR013-DRA-non-graceful-node-shutdowns branch 2 times, most recently from 6d43d94 to 51670a6 Compare October 2, 2023 12:03

pohly requested changes Oct 2, 2023

View reviewed changes

SergeyKanzhelev reviewed Oct 2, 2023

View reviewed changes

bart0sh force-pushed the PR013-DRA-non-graceful-node-shutdowns branch from 51670a6 to 4240c3b Compare October 2, 2023 19:07

SergeyKanzhelev reviewed Oct 3, 2023

View reviewed changes

DRA: handle non graceful node shutdowns

aa42236

Added a section about handling non graceful node shutdowns (see https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown for more details)

bart0sh force-pushed the PR013-DRA-non-graceful-node-shutdowns branch from 4240c3b to aa42236 Compare October 3, 2023 09:15

bart0sh mentioned this pull request Oct 3, 2023

DRA: interaction with unexpected node shutdown KEP kubernetes/kubernetes#120421

Closed

SergeyKanzhelev approved these changes Oct 3, 2023

View reviewed changes

k8s-ci-robot assigned SergeyKanzhelev Oct 3, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 3, 2023

k8s-ci-robot assigned dchen1107 and mrunalp Oct 4, 2023

mrunalp approved these changes Oct 18, 2023

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 18, 2023

k8s-ci-robot merged commit 0d9f0c4 into kubernetes:master Oct 19, 2023

k8s-ci-robot added this to the v1.29 milestone Oct 19, 2023

pohly mentioned this pull request Nov 6, 2023

DRA: control plane controller ("classic DRA") #3063

Closed

42 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DRA: handle non graceful node shutdowns #4260

DRA: handle non graceful node shutdowns #4260

bart0sh commented Oct 2, 2023 •

edited

Loading

bart0sh commented Oct 2, 2023

pohly left a comment

pohly Oct 2, 2023

bart0sh Oct 2, 2023 •

edited

Loading

pohly Oct 3, 2023

pohly Oct 2, 2023

bart0sh Oct 2, 2023

pohly Oct 2, 2023

bart0sh Oct 2, 2023

SergeyKanzhelev Oct 2, 2023

bart0sh Oct 2, 2023 •

edited

Loading

SergeyKanzhelev Oct 3, 2023

bart0sh Oct 3, 2023

bart0sh Oct 3, 2023

SergeyKanzhelev Oct 3, 2023

bart0sh Oct 3, 2023

bart0sh commented Oct 3, 2023

SergeyKanzhelev left a comment

bart0sh commented Oct 4, 2023

bart0sh commented Oct 11, 2023

k8s-ci-robot commented Oct 18, 2023

	### Handling non graceful node shutdowns
	### Handling non-graceful node shutdowns

	When a node is shutdown unexpectedly and is tained with an `out-of-service`
	When a node is shut down unexpectedly and is tainted with an `out-of-service`

	taint with NoExecute effect as explained in the [Non graceful node shutdown KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown),
	taint with NoExecute effect as explained in the [Non-graceful node shutdown KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/2268-non-graceful-shutdown),

DRA: handle non graceful node shutdowns #4260

DRA: handle non graceful node shutdowns #4260

Conversation

bart0sh commented Oct 2, 2023 • edited Loading

bart0sh commented Oct 2, 2023

pohly left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bart0sh Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bart0sh Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bart0sh commented Oct 3, 2023

SergeyKanzhelev left a comment

Choose a reason for hiding this comment

bart0sh commented Oct 4, 2023

bart0sh commented Oct 11, 2023

k8s-ci-robot commented Oct 18, 2023

bart0sh commented Oct 2, 2023 •

edited

Loading

bart0sh Oct 2, 2023 •

edited

Loading

bart0sh Oct 2, 2023 •

edited

Loading