KEP-3063: dra: pre-scheduled pods #4063

pohly · 2023-06-06T15:20:12Z

One-line PR description: pods with spec.nodeName set while claims are not reserved is an error scenario, but dealing with it automatically when it arises is still useful and not too hard, so it should be worthwhile.
Issue link: DRA: control plane controller ("classic DRA") #3063
Other comments:
See dra: pre-scheduled pods kubernetes#118209 for the kube-controller-manager implementation

/cc @alculquicondor

alculquicondor

overall lgtm, but I think this falls under sig-node more than sig-scheduling.

keps/sig-node/3063-dynamic-resource-allocation/README.md

pohly · 2023-06-06T18:36:21Z

sig-node more than sig-scheduling

Yes. Both kubelet and pkg/controller/resourceclaim are SIG Node. Do you want me to find a different reviewer also for kubernetes/kubernetes#118209? If I remember correctly, you are also moonlighting as reviewer for kcm, so I wasn't sure how much you might want to be involved in that.

alculquicondor · 2023-06-06T19:03:37Z

I'm not an approver in kcm, so you would still need another person. But also the changes you proposed still belong to SIG Node https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/resourceclaim/OWNERS

pohly · 2023-06-07T07:42:03Z

keps/sig-node/3063-dynamic-resource-allocation/README.md

@@ -1972,6 +2022,10 @@ exist at all must not be allowed to run. Instead, a suitable event must be
 emitted which explains the problem. Such a situation can occur as part of
 downgrade scenarios.

+In addition, kubelet informs about the claim status with a
+`ResourceClaimsReady` `PodConditionType`. It's `false` when NodePrepareResource
+still needs to be called. It's `true` when that has succeeded for all claims.


Any thoughts on whether such a condition is useful? I'm undecided myself.

@klueska, @smarterclayton ?

My thinking: it's all-or-nothing. So, if something is interested in a particular claim, it's not helpful. If you want the side effect of a metric changing to see how long it's taking Pods to get to this point, it might be.

Maybe we can provide another instrumentation approach though. For example, one or more of:

increment a per-node counter when a Pod is accepted with pending claims, and increment a different counter when claims are resolved

put the .status in the ResourceClaim instead, with an entry for each associated Pod;
teach tooling to look there

fire an Event [per claim, per Pod] to note that a resource claim has succeeded and is ready

(thinking ahead)
Could we ever have a situation where a Pod that is already running might make a resource claim?
For example, a Pod that is using one cryptographic key identifies that it needs to make another (key rotation) and so requests a second one, by directly calling create for the ResourceClaim API.

In that case, we wouldn't expect to update .status for the Pod; we might not even have the associated Pod object identity available, depending on how it has authenticated.

I believe the only real use case that we have right now is "notify the user why the pod is not starting". An event satisfies that requirement.

A condition or dedicated fields in the claim status seem more about "notify some component so that it can take some action" - we don't have use cases for that right now. My preference is to not add any conditions or fields until we know more about how they would be used.

Could we ever have a situation where a Pod that is already running might make a resource claim?

I suspect that this would be very tricky to implement. Right now, the pod.spec.resourceClaims are fixed. We would have to allow modifying that and then find a way how the container runtime can be told to update the containers, ideally without having to restart them.

Mutating some existing claim is already possible, to some extend. For example, if the claim causes some directory to be mounted, the resource driver can change the content of what was mounted. However, the same caveat applies: modifying the container sandbox in response to some claim changes is not supported.

I removed the condition part. That needs more thought about use cases.

One of the challenges on the debuggability today is the user / dev don't know why a pod being scheduled but not running. The scenario of an unallocated or unreserved claims makes the situation even more complicated. But I agreed this need more thoughts.

swatisehgal · 2023-06-07T17:35:19Z

/cc

pohly · 2023-06-15T16:34:03Z

/assign @klueska

Atharva-Shinde · 2023-06-15T16:55:17Z

Would this need another PRR @johnbelamaric ?

pohly · 2023-06-15T18:03:17Z

My two cents: this is a relatively minor change that doesn't impact any of the PRR aspects. The same API calls are made as before, they just come from kube-controller-manager instead of kube-scheduler. The actual content (= size of PodSchedulingContext) is smaller than normal.

klueska · 2023-08-28T09:38:40Z

Should we also include the updates to the kubelet plugin API in this or open a separate PR for that?

pohly · 2023-08-28T12:14:53Z

Given the focus of this PR I prefer a separate PR for kubelet changes.

klueska · 2023-08-28T12:23:47Z

@bart0sh or @byako Can one of you open a PR to update the KEP with the kubeletplugin API changes? I'd prefer to present the full set of updates all at once in the sig-node meeting (rather than present this PR now and another one later).

byako · 2023-08-28T13:53:28Z

@klueska , yep, on it. I somehow thought this was updated already.

byako · 2023-08-28T20:43:38Z

@bart0sh or @byako Can one of you open a PR to update the KEP with the kubeletplugin API changes? I'd prefer to present the full set of updates all at once in the sig-node meeting (rather than present this PR now and another one later).

#4164

This is an error scenario, but dealing with it automatically when it arises is still useful and not too hard, so it should be worthwhile.

dchen1107 · 2023-09-05T17:03:06Z

/lgtm
/approve

k8s-ci-robot · 2023-09-05T17:03:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/sig-node/OWNERS~~ [dchen1107]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested a review from alculquicondor June 6, 2023 15:20

pohly mentioned this pull request Jun 6, 2023

DRA: Kubelet doesn't report anything when it fails to prepare dynamic resources kubernetes/kubernetes#118468

Closed

pohly force-pushed the dra-pre-scheduled-pods branch from e5b95b1 to b18ef8f Compare June 6, 2023 15:54

alculquicondor reviewed Jun 6, 2023

View reviewed changes

keps/sig-node/3063-dynamic-resource-allocation/README.md Outdated Show resolved Hide resolved

keps/sig-node/3063-dynamic-resource-allocation/README.md Outdated Show resolved Hide resolved

pohly commented Jun 7, 2023

View reviewed changes

k8s-ci-robot requested a review from swatisehgal June 7, 2023 17:35

pohly mentioned this pull request Jun 12, 2023

Scheduling Pods with dynamic resources and using the Pod .spec.nodeName field fails kubernetes/kubernetes#114005

Closed

pohly force-pushed the dra-pre-scheduled-pods branch from b18ef8f to 5b393bb Compare June 12, 2023 10:29

k8s-ci-robot assigned klueska Jun 15, 2023

Atharva-Shinde mentioned this pull request Jun 15, 2023

DRA: control plane controller ("classic DRA") #3063

Closed

42 tasks

dra: handle already scheduled pods

96c54f5

This is an error scenario, but dealing with it automatically when it arises is still useful and not too hard, so it should be worthwhile.

pohly force-pushed the dra-pre-scheduled-pods branch from 5b393bb to 96c54f5 Compare August 29, 2023 09:40

k8s-ci-robot assigned dchen1107 Sep 5, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 5, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 5, 2023

k8s-ci-robot merged commit 384510b into kubernetes:master Sep 5, 2023

k8s-ci-robot added this to the v1.29 milestone Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-3063: dra: pre-scheduled pods #4063

KEP-3063: dra: pre-scheduled pods #4063

pohly commented Jun 6, 2023 •

edited

Loading

alculquicondor left a comment

pohly commented Jun 6, 2023

alculquicondor commented Jun 6, 2023

pohly Jun 7, 2023

sftim Jun 7, 2023

sftim Jun 7, 2023

pohly Jun 7, 2023 •

edited

Loading

pohly Jun 15, 2023

dchen1107 Sep 5, 2023

swatisehgal commented Jun 7, 2023

pohly commented Jun 15, 2023

Atharva-Shinde commented Jun 15, 2023

pohly commented Jun 15, 2023

klueska commented Aug 28, 2023

pohly commented Aug 28, 2023

klueska commented Aug 28, 2023

byako commented Aug 28, 2023

byako commented Aug 28, 2023

dchen1107 commented Sep 5, 2023

k8s-ci-robot commented Sep 5, 2023

KEP-3063: dra: pre-scheduled pods #4063

KEP-3063: dra: pre-scheduled pods #4063

Conversation

pohly commented Jun 6, 2023 • edited Loading

alculquicondor left a comment

Choose a reason for hiding this comment

pohly commented Jun 6, 2023

alculquicondor commented Jun 6, 2023

pohly Jun 7, 2023

Choose a reason for hiding this comment

sftim Jun 7, 2023

Choose a reason for hiding this comment

sftim Jun 7, 2023

Choose a reason for hiding this comment

pohly Jun 7, 2023 • edited Loading

Choose a reason for hiding this comment

pohly Jun 15, 2023

Choose a reason for hiding this comment

dchen1107 Sep 5, 2023

Choose a reason for hiding this comment

swatisehgal commented Jun 7, 2023

pohly commented Jun 15, 2023

Atharva-Shinde commented Jun 15, 2023

pohly commented Jun 15, 2023

klueska commented Aug 28, 2023

pohly commented Aug 28, 2023

klueska commented Aug 28, 2023

byako commented Aug 28, 2023

byako commented Aug 28, 2023

dchen1107 commented Sep 5, 2023

k8s-ci-robot commented Sep 5, 2023

pohly commented Jun 6, 2023 •

edited

Loading

pohly Jun 7, 2023 •

edited

Loading