You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In one of our production kubernetes cluster, two worker nodes were competing for one subnet, which led to connectivity issue in one of the worker nodes.
What happened:
Host A joined the cluster, obtained a subnet and saved config in /run/flannel/subnet.env;
Host A went offline (no lease renew were done after that);
After TTL, this subnet lease were removed from etcd;
Host B joined the cluster and obtained the same subnet which was assigned to Host A;
Host A came online again, reused its previous subnet and renewd its lease;
Now two different hosts compete for the same subnet, whoever renews the lease will update the subnet's public IP in etcd, which cause network issue in the other host.
Note: Host A & B have different public IP.
Expected Behavior
If a host went offline and then come online again, it should not reuse its previous subnet, if that subnet has been assigned to another host already.
Current Behavior
If a host went offline and then come online again, it will reuse its previous subnet and renew the lease, no matter whether this subnet is being used by another host or not.
Possible Solution
In flannel/subnet/etcdv2/local_manager.go tryAcquireLease, validate the subnet's public IP before reusing subnet.
// no existing match, check if there was a previous subnet to use var sn ip.IP4Net if !m.previousSubnet.Empty() { // use previous subnet if l := findLeaseBySubnet(leases, m.previousSubnet); l != nil { // Make sure the existing subnet is still within the configured network if isSubnetConfigCompat(config, l.Subnet) { log.Infof("Found lease (%v) matching previously leased subnet, reusing", l.Subnet)
Steps to Reproduce (for bugs)
Add a worker (Host A) into the kubernetes cluster, confirm that the subnet info is saved in /run/flannel/subnet.env;
Stop flanneld service;
After TTL (24 hours), confirm that the subnet lease was removed from etcd;
Add another host (Host B) into the cluster, verify that the same subnet is assigned to this host; (this may not always happen, may need to try multiple times)
Restart flanneld service in Host A;
Now we can see that Host A seizes the subnet and both Host A & B are using the same subnet.
Context
When two hosts are using one subnet, only one of the nodes is working. The other node lost the connection with other nodes in the cluster and lead to business outage.
Your Environment
Flannel version: 0.9.1
Backend used (e.g. vxlan or udp): vxlan
Etcd version: 3.1.10
Kubernetes version (if used): 1.11.5
Operating System and version: CentOS 7.4
Link to your project (optional):
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
In one of our production kubernetes cluster, two worker nodes were competing for one subnet, which led to connectivity issue in one of the worker nodes.
What happened:
Note: Host A & B have different public IP.
Expected Behavior
If a host went offline and then come online again, it should not reuse its previous subnet, if that subnet has been assigned to another host already.
Current Behavior
If a host went offline and then come online again, it will reuse its previous subnet and renew the lease, no matter whether this subnet is being used by another host or not.
Possible Solution
In flannel/subnet/etcdv2/local_manager.go tryAcquireLease, validate the subnet's public IP before reusing subnet.
// no existing match, check if there was a previous subnet to use var sn ip.IP4Net
if !m.previousSubnet.Empty() {
// use previous subnet
if l := findLeaseBySubnet(leases, m.previousSubnet); l != nil {
// Make sure the existing subnet is still within the configured network
if isSubnetConfigCompat(config, l.Subnet) {
log.Infof("Found lease (%v) matching previously leased subnet, reusing", l.Subnet)
Steps to Reproduce (for bugs)
Context
When two hosts are using one subnet, only one of the nodes is working. The other node lost the connection with other nodes in the cluster and lead to business outage.
Your Environment
The text was updated successfully, but these errors were encountered: