-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Multiple Cluster CIDRs #2594
Conversation
Welcome @rahulkjoshi! |
Hi @rahulkjoshi. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
94f8c10
to
36849ed
Compare
36849ed
to
0a8dbab
Compare
/cc |
/cc |
/cc @sdmodi |
@rahulkjoshi: GitHub didn't allow me to request PR reviews from the following users: sdmodi. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
||
Today, IP ranges for podCIDRs for nodes are allocated from a single range allocated to the cluster (cluster CIDR). Each node gets a range of a fixed size from the overall cluster CIDR. The size is specified during cluster startup time and cannot be modified later on. | ||
|
||
This proposal enhances how pod CIDRs are allocated for nodes by adding a new CIDR allocator that can be controlled by a new resource `ClusterCIDRRange`. This would enable users to dynamically allocate more IP ranges for pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tying into what was discussed in the SIG meeting, it would be worth considering the possibility that instead of adding a new config object for extending the apiserver's --allocate-node-cidrs
functionality, that instead we should deprecate --allocate-node-cidrs
and add a new config object so that the network plugin can explain to other components how it is allocating pod IPs itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now the way this is scoped, we do not require existing Node IPAM allocators to populate what ranges have been allocated. If we did this, then it would force existing implementations to create a new object when a new range is allocated.
I view this as a good second step if there is consensus that this is a good idea. We can start with just requiring the object for input purposes and then enhance it to be consumable by other parts of the system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I just meant having the network plugin create an object/objects saying "The following CIDRs are in use by the pod network: ...". Not necessarily saying which specific subnets of it have been allocated where.
|
||
This proposal enhances how pod CIDRs are allocated for nodes by adding a new CIDR allocator that can be controlled by a new resource 'CIDRRange'. This enables users to dynamically allocate more IP ranges for pods. In addition, it gives users the capability to control what ranges are allocated to specific nodes as well as the size of the pod CIDR allocated to these nodes. | ||
|
||
### User Stories (Optional) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a good user story is "I want to add more pods to a node after filling up its CIDR, but I don't want to have to kill and recreate the node's existing pods". This is something Calico (IIRC) supports, by letting you add additional CIDRs to a node rather than making you drop its existing CIDR before assigning it a new one.
(And yes, it would be complicated to add this to kcm, and it would require changing the semantics of the podCIDRs
field on the node, etc. My point was more that this is something people are doing already, without having changed kcm, and maybe that's a better model.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is NOT the story being told here. Unless I gravely misunderstand, this proposes multiple CIDRs for the cluster, but still one per node. (we probably want that TOO but it's not this KEP)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We are still saying only one per node.
We do need to decide whether it is a better model to let CNI implementations do this. The fundamental question is whether we want Kubernetes to do the IPAM or not. Today Kubernetes does do IPAM out of the box and it is lacking a lot of functionality. We are proposing we enhance these capabilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the extreme other end of the spectrum: "We never should have done this, we're sorry to have teased you, please do it yourself."
We'd still need a resource (or 2) and admission controller(s). Fundamentally the model would be the same, just not built-in.
I'm not outright against adding this as another built-in, as long as we can do it in reasonable complexity. It definitely pushed the bounds and we need to think hard about it.
0a8dbab
to
5a26d81
Compare
|
||
The controller will wait until the first Node is added before picking a mode. So | ||
users who want to specify dual-stack must first add 2 `CluterCIDRConfigs` (one | ||
each for IPv4 and IPv6). Only after creating those resources should Nodes be added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm, migrating a node from single-stack to dual-stack is not possible, you have single-stack nodes, you have to create new nodes, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. What I'm unsure of is whether we want to support migrating the cluster from single-stack to dual-stack while nodes are running.
For example, the model proposed here says that if any Node is in single-stack mode, the controller will only allocate single-stack CIDRs to new nodes even if there are IPv4 and IPv6 ClusterCIDRConfigs provided. The only way to migrate your cluster is to add both IPv4 and IPv6 ClusterCIDRConfigs and then delete all nodes. Then once you start adding nodes again, the controller will allocate them in dual-stack mode. Is this an acceptable workflow for us -- I have seen concern about mixing single-stack and dual-stack nodes hence this setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will go for implementing something simple, and let the users do the heavylifting with the APIs we provide, they can delete nodes and reasign and deal with the pod migration, blah,blaa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I've reworked the section to that effect. Let me know if it sounds good.
|
||
The initial release of this feature will be as a CRD and custom controller. | ||
However, as part of the alpha -> beta graduation, this feature will be merged | ||
into the core Kubernetes API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gateway friends did this dance recently kubernetes-sigs/gateway-api#707 , we should ask them about the experience
cc: @robscott
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can omit this comment if we don't start with CRDs/custom controller #2594 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we can dodge this work 😄
a12185c
to
8bd09d0
Compare
8bd09d0
to
b11c70e
Compare
79fade9
to
7d0fc9c
Compare
LGTM |
7d0fc9c
to
4cd8027
Compare
4cd8027
to
c53e0a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rahulkjoshi, sdmodi, thockin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Issue: #2593