-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-2594 implementable for alpha #2946
Conversation
rahulkjoshi
commented
Sep 6, 2021
•
edited
Loading
edited
- One-line PR description: PR Review for KEP 2594 and marking implementable for alpha
- Issue link: Enhancing NodeIPAM to support multiple ClusterCIDRs #2593
Hi @rahulkjoshi. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign @wojtek-t |
Thanks! /lgtm |
174c029
to
5fd48a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rahulkjoshi - the PRR for Alpha looks reasonable - I just added couple smaller comment. But I also added couple comments for the proposal itself. PTAL
- [ ] (R) Production readiness review approved | ||
- [X] (R) Graduation criteria is in place | ||
- [X] (R) Production readiness review completed | ||
- [X] (R) Production readiness review approved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - I can't comment on unchanged lines, so commenting here:
L226: Will this be a built-in API or a CRD?
L241: I suggest using NodeSelector instead of LabelSelector:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L2643
L299: I think that we need to mention how we handle cases where there aren't any free IPs anymore.
So basically the algorithm should be sth like:
When trying to allocate a pod-cidr for a node:
(1) discard all ClusterCIDRConfigs where there aren't any free IPs or those that doesn't match the selector.
(2) From the remaining prefer ....
<--- update ---> I see that being discussed below - would be more intuitive to merge it to sth like I mentioned above.
L364: It might be a mental shortcut, but we should be clear - periodic polling doesn't scale well - the controller should be watching (and potentially polling from the local cache to check, though even that isn't necessary needed with our informer machinery).
L452: I think I'm not fully following - are you saying that: whether we apply IPv4, IPv6 or both is decided purely on their availability in the ClusterCIDRConfig object?
[If not can you clarify? If so, can we also make it more explicit?]
L500: if the IPs from the previously existing created-from-flags-\<hash\>
are in-use, then finalizer will block its deletion. Does that block initialization of the new ipam controller? That sounds like a problem to me...
L515: This shouldn't be different from a regular operation - on failed operation we should simply be re-queuing (with backoff) the node to retry [we should need any additional logic.] (This pattern is used in couple other controllers).
L732: I would say that we should have a feature gate too. So basically:
- a feature-gate that is deciding if the new controller can even be started
- and on top of that a flag that decides which controller to use
L758: Not really restart nodes - but rather recreate them.
L773: nit: "No - the (...) tests will be added before graduating the feature to Alpha."
[The tests doesn't yet exist, right?]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made changes to reflect the comments above. I hope I clarified everything.
For L500: The bit that's probably getting lost is that the controller chooses a random hash each time it starts up. So if it needs to create a new object and delete an old one, it need not block. However, users will still have to recreate their Nodes -- there's no way around that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you - that's perfect!
/ok-to-test |
5fd48a4
to
e159865
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rahulkjoshi, thockin, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |