Fix overwriting etcd data when local subnet file exists #1505

zhangzhangzf · 2021-11-19T14:51:56Z

Description

Fix#1289
Remove the code that loads the local subnet file and update etcd. This behavior will overwrite the records of the same key in etcd.
A description of the problem is recorded in the issues#1289. As in this case, the flannel processes of both nodes are running on the same subnet.This will cause the node network to become unavailable

Release Note

None required

manuelbuil · 2021-11-22T13:00:31Z

#1289

manuelbuil · 2021-11-22T13:02:21Z

If host_A goes offline and when it comes online again, it gets a different subnet, what will happen to the pods running on that host? They will be using the old subnet, or?

zhangzhangzf · 2021-11-22T13:34:35Z

I persist the subnet information to the /opt/software/ flannel/subnet.env file. I expect that the subnet can be reassigned when the node restarts, but he loads the local file. When calling updateSubnet method, the etcd is not locked, resulting in the data of the same key being overwritten. You can reproduce it quickly in this way.
Now, we simulation the scene.The lease term of nodeA expires and nodeB joins the cluster.
(1) Stop the flannel daemon of nodesA and nodeB and delete the etcd records of nodesA and nodeB.
(2) Start flannel on nodeA
(3) Delete nodeA data in etcd (you will see the process exit because of the watch mechanism)
(3) Configure the subnet information of nodeB to be consistent with node A
(4) Start nodeB flannel
(5) Start nodeA flannel
You will see two flannels working in the same network segment

manuelbuil · 2022-01-19T09:11:26Z

@sjoerdsimons could you review this PR? I am not familiar with etcd as subnet manager

sjoerdsimons · 2022-01-19T20:36:29Z

This looks correct to me; I can't really work out what the previous logic was trying to do to be honest, I don't see why it would ever update subnet that's no longer correlated with the nodes IP (which should imply someone else is using it). So dropping that whole chunk seems fine.

manuelbuil · 2022-01-20T08:34:23Z

This looks correct to me; I can't really work out what the previous logic was trying to do to be honest, I don't see why it would ever update subnet that's no longer correlated with the nodes IP (which should imply someone else is using it). So dropping that whole chunk seems fine.

Thanks a lot!

luthermonson · 2022-01-20T17:01:17Z

@sjoerdsimons @zhangzhangzf can we wrap any testing around this? our concern is that it has been one way for so long and we're ripping out code which people could be relying on. if this is backwards compat with no issues we can merge

sjoerdsimons · 2022-01-22T09:29:27Z

@luthermonson That's one for @zhangzhangzf I was just asked (and gave) my opinion about it ;)

It is a bit of a behavioural change for sure; How it will impact users i don't really know. I hope some from the flannel maintainers can answer that.

After wondering about it in the back of my head for a few days I suspect what this code meant to do is to allow nodes that change their IP address to recover their previous subnet; However how it does this is fundamentally racy (as it's only the ipv4 IP address that "identifies" the owner).

Basically what the new removed code handles correctly is:

Node A gets a fresh Lease
Node A shuts down flannel and gets different IP address (e.g. it rebooted?)
Node A wants rejoins the flannel mesh and the lease did not expire yet
- In this case it will find its previous lease for the subnet and take ownership again; Which means everything is happy

Where it goes wrong is:

Node A gets a fresh Lease (LA)
Node A shuts down flannel (e.g. it is temporarily shut down)
LA expires (which can happen after "just" an hour if the default renew margin is used)
Node B starts and gets a fresh lease (LB) with the same subnet as the now goes lease LA
Node A starts again and tries to recover it's previous subnet
- In this case it will find LB which is owned by node B and hijack it. At which point there are two active nodes with the same subnet, which ofcourse goes badly wrong

The latter case is not that likely to happen in practise as there is some randomisation involved in lease allocation (it picks a random subnet out of the first 100 free subnets).. At least assuming people don't do silly things like cloning machines including a persistently stored subnet.env file.... But if it happens it's quite bad obviously, which is what this patch fixes.

The code that's left after this patch with respect to previous subnet handling will only opportunistically re-use the subnet from subnet.env if there is no current lease tied to the nodes IP address and there is no current lease for the previous subnet. Which is race-free..

So I guess the question whether deployments rely on flannel keeping/recovering the same subnet after changing the host IP address. Which is not something i really cannot answer

zhangzhangzf · 2022-02-13T12:09:39Z

@sjoerdsimons You're right. This patch fixes the latter problem.But I don't agree that this problem doesn't happen frequently. The 100 subnets in the subnets pool are taken from small to large, and one is pick randomly, which is easy to repeat in large-scale clusters.Especially when the cluster is not stable enough. Repetition can lead to bad problems. We have found this problem many times in the deployment environment.

zhangzhangzf · 2022-02-13T12:28:27Z

@sjoerdsimons @zhangzhangzf can we wrap any testing around this? our concern is that it has been one way for so long and we're ripping out code which people could be relying on. if this is backwards compat with no issues we can merge

@luthermonson I didn't find the compatibility problem. The maintainers can discuss it in the next step. I have deployed the repaired code in a large-scale cluster, which can solve the latter problem described by sjoerdsimons .

AleksandrNull · 2022-02-21T16:07:21Z

Long answer:
I'm trying to wrap my head around the problem. One side of it: node keeps information about the state which got propagated back to the cluster even if node was physically deleted from etcd. That's bad in general. Once node was deleted/re-created it shouldn't be able to force her subnet/segment be it's own again or update any data in etcd based on some leftovers. From security perspective that's not good behavior. From user perspective right now it can be expected so it's need to be considered.
In regards of the problems in large cluster: I believe it's an exaggeration, and let me explain why: I can understand that nodes going down and back but k8s built in a way that you don't have to remove them on each and every event unless you build 'reliable' cluster on spot instances. So basically I do see one issue here: if nodes was deleted from the cluster it actually need to be wiped. The same is true for the certs/envs/etc of k8s in general.

Short answer:
I believe losing of node data in regards of subnet/ips should not impact users in general because idea is about scalability and cattle. So, no one should statically refer to the pod IPs. It will change behavior but for good :) So I'm upvoting the change and have only one suggestion: may be it worth to merge two IFs :)

manuelbuil · 2022-02-22T17:32:06Z

Long answer: I'm trying to wrap my head around the problem. One side of it: node keeps information about the state which got propagated back to the cluster even if node was physically deleted from etcd. That's bad in general. Once node was deleted/re-created it shouldn't be able to force her subnet/segment be it's own again or update any data in etcd based on some leftovers. From security perspective that's not good behavior. From user perspective right now it can be expected so it's need to be considered. In regards of the problems in large cluster: I believe it's an exaggeration, and let me explain why: I can understand that nodes going down and back but k8s built in a way that you don't have to remove them on each and every event unless you build 'reliable' cluster on spot instances. So basically I do see one issue here: if nodes was deleted from the cluster it actually need to be wiped. The same is true for the certs/envs/etc of k8s in general.

Short answer: I believe losing of node data in regards of subnet/ips should not impact users in general because idea is about scalability and cattle. So, no one should statically refer to the pod IPs. It will change behavior but for good :) So I'm upvoting the change and have only one suggestion: may be it worth to merge two IFs :)

Hey @AleksandrNull, thanks for the comment. If you upvote the PR, would you mind approving it? Then we could merge

@zhangzhangzf could you please rebase? I added a linter

AleksandrNull

LGTM.

manuelbuil · 2022-02-25T10:54:27Z

@zhangzhangzf could you please rebase? Then we will merge

manuelbuil · 2022-03-01T08:45:35Z

@zhangzhangzf we will release the new version of flannel without this PR. But we can add it in subsequent patch releases once you rebase. Thanks

zhangzhangzf · 2022-03-06T08:53:05Z

workflows awaiting approval

Sorry, I'm a junior user of GitHub. I'm not sure whether the operation I just did solved the problem?

manuelbuil · 2022-03-07T07:17:24Z

workflows awaiting approval

Sorry, I'm a junior user of GitHub. I'm not sure whether the operation I just did solved the problem?

No worries! I think they did :)

fix:load subnet from local,update etcd record in same subnet.

1879210

zhangzhangzf changed the title ~~Fix overwriting etcd database data when local subnet file exists~~ Fix overwriting etcd data when local subnet file exists Nov 19, 2021

manuelbuil approved these changes Jan 20, 2022

View reviewed changes

zhangzhangzf closed this Feb 13, 2022

zhangzhangzf reopened this Feb 13, 2022

luthermonson requested a review from AleksandrNull February 17, 2022 16:57

AleksandrNull approved these changes Feb 22, 2022

View reviewed changes

Merge branch 'master' into flannel-zhangzhangzf

720435e

manuelbuil merged commit 7d379f0 into flannel-io:master Mar 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix overwriting etcd data when local subnet file exists #1505

Fix overwriting etcd data when local subnet file exists #1505

zhangzhangzf commented Nov 19, 2021

manuelbuil commented Nov 22, 2021

manuelbuil commented Nov 22, 2021

zhangzhangzf commented Nov 22, 2021

manuelbuil commented Jan 19, 2022

sjoerdsimons commented Jan 19, 2022

manuelbuil commented Jan 20, 2022

luthermonson commented Jan 20, 2022

sjoerdsimons commented Jan 22, 2022

zhangzhangzf commented Feb 13, 2022

zhangzhangzf commented Feb 13, 2022

AleksandrNull commented Feb 21, 2022

manuelbuil commented Feb 22, 2022

AleksandrNull left a comment

manuelbuil commented Feb 25, 2022 •

edited

Loading

manuelbuil commented Mar 1, 2022

zhangzhangzf commented Mar 6, 2022 •

edited

Loading

manuelbuil commented Mar 7, 2022

Fix overwriting etcd data when local subnet file exists #1505

Fix overwriting etcd data when local subnet file exists #1505

Conversation

zhangzhangzf commented Nov 19, 2021

Description

Release Note

manuelbuil commented Nov 22, 2021

manuelbuil commented Nov 22, 2021

zhangzhangzf commented Nov 22, 2021

manuelbuil commented Jan 19, 2022

sjoerdsimons commented Jan 19, 2022

manuelbuil commented Jan 20, 2022

luthermonson commented Jan 20, 2022

sjoerdsimons commented Jan 22, 2022

zhangzhangzf commented Feb 13, 2022

zhangzhangzf commented Feb 13, 2022

AleksandrNull commented Feb 21, 2022

manuelbuil commented Feb 22, 2022

AleksandrNull left a comment

Choose a reason for hiding this comment

manuelbuil commented Feb 25, 2022 • edited Loading

manuelbuil commented Mar 1, 2022

zhangzhangzf commented Mar 6, 2022 • edited Loading

manuelbuil commented Mar 7, 2022

manuelbuil commented Feb 25, 2022 •

edited

Loading

zhangzhangzf commented Mar 6, 2022 •

edited

Loading