Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3063: dra: pre-scheduled pods #4063

Merged
merged 1 commit into from
Sep 5, 2023

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Jun 6, 2023

/cc @alculquicondor

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 6, 2023
@pohly pohly force-pushed the dra-pre-scheduled-pods branch from e5b95b1 to b18ef8f Compare June 6, 2023 15:54
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, but I think this falls under sig-node more than sig-scheduling.

keps/sig-node/3063-dynamic-resource-allocation/README.md Outdated Show resolved Hide resolved
keps/sig-node/3063-dynamic-resource-allocation/README.md Outdated Show resolved Hide resolved
@pohly
Copy link
Contributor Author

pohly commented Jun 6, 2023

sig-node more than sig-scheduling

Yes. Both kubelet and pkg/controller/resourceclaim are SIG Node. Do you want me to find a different reviewer also for kubernetes/kubernetes#118209? If I remember correctly, you are also moonlighting as reviewer for kcm, so I wasn't sure how much you might want to be involved in that.

@alculquicondor
Copy link
Member

I'm not an approver in kcm, so you would still need another person. But also the changes you proposed still belong to SIG Node https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/resourceclaim/OWNERS

@@ -1972,6 +2022,10 @@ exist at all must not be allowed to run. Instead, a suitable event must be
emitted which explains the problem. Such a situation can occur as part of
downgrade scenarios.

In addition, kubelet informs about the claim status with a
`ResourceClaimsReady` `PodConditionType`. It's `false` when NodePrepareResource
still needs to be called. It's `true` when that has succeeded for all claims.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on whether such a condition is useful? I'm undecided myself.

@klueska, @smarterclayton ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking: it's all-or-nothing. So, if something is interested in a particular claim, it's not helpful. If you want the side effect of a metric changing to see how long it's taking Pods to get to this point, it might be.

Maybe we can provide another instrumentation approach though. For example, one or more of:

  • increment a per-node counter when a Pod is accepted with pending claims, and increment a different counter when claims are resolved
  • put the .status in the ResourceClaim instead, with an entry for each associated Pod;
    teach tooling to look there
  • fire an Event [per claim, per Pod] to note that a resource claim has succeeded and is ready

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(thinking ahead)
Could we ever have a situation where a Pod that is already running might make a resource claim?
For example, a Pod that is using one cryptographic key identifies that it needs to make another (key rotation) and so requests a second one, by directly calling create for the ResourceClaim API.

In that case, we wouldn't expect to update .status for the Pod; we might not even have the associated Pod object identity available, depending on how it has authenticated.

Copy link
Contributor Author

@pohly pohly Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the only real use case that we have right now is "notify the user why the pod is not starting". An event satisfies that requirement.

A condition or dedicated fields in the claim status seem more about "notify some component so that it can take some action" - we don't have use cases for that right now. My preference is to not add any conditions or fields until we know more about how they would be used.

Could we ever have a situation where a Pod that is already running might make a resource claim?

I suspect that this would be very tricky to implement. Right now, the pod.spec.resourceClaims are fixed. We would have to allow modifying that and then find a way how the container runtime can be told to update the containers, ideally without having to restart them.

Mutating some existing claim is already possible, to some extend. For example, if the claim causes some directory to be mounted, the resource driver can change the content of what was mounted. However, the same caveat applies: modifying the container sandbox in response to some claim changes is not supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the condition part. That needs more thought about use cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the challenges on the debuggability today is the user / dev don't know why a pod being scheduled but not running. The scenario of an unallocated or unreserved claims makes the situation even more complicated. But I agreed this need more thoughts.

@swatisehgal
Copy link
Contributor

/cc

@pohly
Copy link
Contributor Author

pohly commented Jun 15, 2023

/assign @klueska

@Atharva-Shinde
Copy link
Contributor

Would this need another PRR @johnbelamaric ?

@pohly
Copy link
Contributor Author

pohly commented Jun 15, 2023

My two cents: this is a relatively minor change that doesn't impact any of the PRR aspects. The same API calls are made as before, they just come from kube-controller-manager instead of kube-scheduler. The actual content (= size of PodSchedulingContext) is smaller than normal.

@klueska
Copy link
Contributor

klueska commented Aug 28, 2023

Should we also include the updates to the kubelet plugin API in this or open a separate PR for that?

@pohly
Copy link
Contributor Author

pohly commented Aug 28, 2023

Given the focus of this PR I prefer a separate PR for kubelet changes.

@klueska
Copy link
Contributor

klueska commented Aug 28, 2023

@bart0sh or @byako Can one of you open a PR to update the KEP with the kubeletplugin API changes? I'd prefer to present the full set of updates all at once in the sig-node meeting (rather than present this PR now and another one later).

@byako
Copy link
Contributor

byako commented Aug 28, 2023

@klueska , yep, on it. I somehow thought this was updated already.

@byako
Copy link
Contributor

byako commented Aug 28, 2023

@bart0sh or @byako Can one of you open a PR to update the KEP with the kubeletplugin API changes? I'd prefer to present the full set of updates all at once in the sig-node meeting (rather than present this PR now and another one later).

#4164

This is an error scenario, but dealing with it automatically when it arises is
still useful and not too hard, so it should be worthwhile.
@pohly pohly force-pushed the dra-pre-scheduled-pods branch from 5b393bb to 96c54f5 Compare August 29, 2023 09:40
@dchen1107
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 5, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 5, 2023
@k8s-ci-robot k8s-ci-robot merged commit 384510b into kubernetes:master Sep 5, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Development

Successfully merging this pull request may close these issues.

10 participants