Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adoption fails? #26

Open
issmirnov opened this issue Jan 10, 2023 · 11 comments
Open

Adoption fails? #26

issmirnov opened this issue Jan 10, 2023 · 11 comments

Comments

@issmirnov
Copy link

Hey @fabianishere ,

I installed the kernel and was able to get BGP multipath routing working - yay!

However, overnight my devices all rebooted after an upgrade and entered a repeated failed adoption loop. Truee the usual set-inform and reboots, but nothing helped.

After reverting to the stock kernel and rebooting, adoption worked on the whole network again.

Any thoughts? Ideally I need both the multipath but also my fleet to stay up 😄

@fabianishere
Copy link
Owner

Can you SSH into the UDM (Pro)? It would be useful to share the kernel log (dmesg) and UniFi log (cat /var/log/messages).

@issmirnov
Copy link
Author

Will do once I get home. This morning I saw a ton of error messages complaining about multipath, and DHCP error logs.

@issmirnov
Copy link
Author

This is with the edge kernel:

/var/log/messages

...
Jan 10 16:03:05 UDM-Pro user.warn ubios-udapi-server: netlink: Multipath routes not supported, got 3 nexthops for route 10.5.0.10/32 via 10.3.34.201 dev br0
Jan 10 16:03:05 UDM-Pro user.warn kernel: [   89.072888] [LAN_IN-D-2006] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:7c:d9:5c:2a:04:05:08:00 SRC=192.168.6.39 DST=10.3.33.1 LEN=396 TOS=0x00 PREC=0x00 TTL=63 ID=23536 PROTO=TCP SPT=8009 DPT=44648 WINDOW=4096 RES=0x00 ACK PSH URGP=0 
Jan 10 16:03:05 UDM-Pro user.warn kernel: [   89.141830] [LAN_IN-D-2006] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:7c:d9:5c:2a:04:05:08:00 SRC=192.168.6.39 DST=10.3.33.228 LEN=396 TOS=0x00 PREC=0x00 TTL=63 ID=18633 PROTO=TCP SPT=8009 DPT=54125 WINDOW=4096 RES=0x00 ACK PSH URGP=0 
Jan 10 16:03:05 UDM-Pro user.warn kernel: [   89.283025] [LAN_IN-D-2006] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:1c:f2:9a:0f:27:e9:08:00 SRC=192.168.6.122 DST=10.3.33.1 LEN=397 TOS=0x00 PREC=0x00 TTL=63 ID=6893 PROTO=TCP SPT=8009 DPT=58554 WINDOW=4096 RES=0x00 ACK PSH URGP=0 
Jan 10 16:03:06 UDM-Pro user.warn ubios-udapi-server: netlink: Multipath routes not supported, got 2 nexthops for route 10.50.0.15/32 via 192.168.4.22 dev br5
Jan 10 16:03:06 UDM-Pro user.warn ubios-udapi-server: netlink: Multipath routes not supported, got 2 nexthops for route 10.50.0.11/32 via 192.168.4.22 dev br5
Jan 10 16:03:06 UDM-Pro user.warn ubios-udapi-server: netlink: Multipath routes not supported, got 2 nexthops for route 10.50.0.14/32 via 192.168.4.21 dev br5
Jan 10 16:03:06 UDM-Pro user.warn ubios-udapi-server: netlink: Multipath routes not supported, got 3 nexthops for route 10.50.0.12/32 via 192.168.4.21 dev br5
Jan 10 16:03:06 UDM-Pro user.warn ubios-udapi-server: netlink: Multipath routes not supported, got 2 nexthops for route 10.5.0.13/32 via 10.3.34.202 dev br0
Jan 10 16:03:06 UDM-Pro user.warn ubios-udapi-server: netlink: Multipath routes not supported, got 2 nexthops for route 10.5.0.12/32 via 10.3.34.202 dev br0
Jan 10 16:03:06 UDM-Pro user.warn ubios-udapi-server: netlink: Multipath routes not supported, got 2 nexthops for route 10.5.0.14/32 via 10.3.34.202 dev br0
Jan 10 16:03:06 UDM-Pro user.warn ubios-udapi-server: netlink: Multipath routes not supported, got 3 nexthops for route 10.5.0.10/32 via 10.3.34.201 dev br0
Jan 10 16:03:06 UDM-Pro daemon.warn dnsmasq-dhcp[3156]: no address range available for DHCP request via br0

dmesg:

[  101.938051] [LAN_IN-RET-2005] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:48:d6:d5:7b:57:84:08:00 SRC=192.168.6.55 DST=10.3.37.71 LEN=1343 TOS=0x00 PREC=0x00 TTL=63 ID=11838 DF PROTO=TCP SPT=32244 DPT=55232 WINDOW=538 RES=0x00 ACK PSH URGP=0 
[  102.110401] [LAN_IN-D-2006] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:1c:f2:9a:0f:27:e9:08:00 SRC=192.168.6.122 DST=10.3.33.228 LEN=397 TOS=0x00 PREC=0x00 TTL=63 ID=46966 PROTO=TCP SPT=8009 DPT=54126 WINDOW=4096 RES=0x00 ACK PSH URGP=0 
[  102.314922] [LAN_IN-RET-2005] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:48:d6:d5:7b:57:84:08:00 SRC=192.168.6.55 DST=10.3.37.71 LEN=422 TOS=0x00 PREC=0x00 TTL=63 ID=11839 DF PROTO=TCP SPT=32244 DPT=55232 WINDOW=538 RES=0x00 ACK PSH URGP=0 
[  106.823440] [LAN_IN-RET-2005] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:7c:d9:5c:2a:04:05:08:00 SRC=192.168.6.39 DST=10.3.37.71 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=65085 PROTO=TCP SPT=8009 DPT=33110 WINDOW=4095 RES=0x00 ACK URGP=0 
[  106.825171] [LAN_IN-RET-2005] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:7c:d9:5c:2a:04:05:08:00 SRC=192.168.6.39 DST=10.3.37.71 LEN=162 TOS=0x00 PREC=0x00 TTL=63 ID=65086 PROTO=TCP SPT=8009 DPT=33110 WINDOW=4096 RES=0x00 ACK PSH URGP=0 
[  111.855204] [LAN_IN-RET-2005] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:1c:f2:9a:0f:27:e9:08:00 SRC=192.168.6.122 DST=10.3.37.71 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=42073 PROTO=TCP SPT=8009 DPT=55726 WINDOW=4095 RES=0x00 ACK URGP=0 
[  111.856091] [LAN_IN-RET-2005] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:1c:f2:9a:0f:27:e9:08:00 SRC=192.168.6.122 DST=10.3.37.71 LEN=162 TOS=0x00 PREC=0x00 TTL=63 ID=42074 PROTO=TCP SPT=8009 DPT=55726 WINDOW=4096 RES=0x00 ACK PSH URGP=0 
[  111.867595] [LAN_IN-RET-2005] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:48:d6:d5:7b:57:84:08:00 SRC=192.168.6.55 DST=10.3.37.71 LEN=162 TOS=0x00 PREC=0x00 TTL=63 ID=8444 DF PROTO=TCP SPT=8009 DPT=51754 WINDOW=488 RES=0x00 ACK PSH URGP=0 
[  112.330584] [LAN_IN-RET-2005] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:48:d6:d5:7b:57:84:08:00 SRC=192.168.6.55 DST=10.3.37.71 LEN=162 TOS=0x00 PREC=0x00 TTL=63 ID=11840 DF PROTO=TCP SPT=32244 DPT=55232 WINDOW=560 RES=0x00 ACK PSH URGP=0 
[  114.182943] [LAN_IN-D-2006] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:48:d6:d5:7b:57:84:08:00 SRC=192.168.6.55 DST=10.3.37.190 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=16460 DF PROTO=TCP SPT=35722 DPT=9000 WINDOW=29200 RES=0x00 SYN URGP=0 
[  114.203181] [LAN_IN-D-2006] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:48:d6:d5:7b:57:84:08:00 SRC=192.168.6.55 DST=10.3.33.228 LEN=1241 TOS=0x00 PREC=0x00 TTL=63 ID=50130 DF PROTO=TCP SPT=32244 DPT=54341 WINDOW=503 RES=0x00 ACK PSH URGP=0 
[  115.172967] [LAN_IN-D-2006] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:48:d6:d5:7b:57:84:08:00 SRC=192.168.6.55 DST=10.3.37.190 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=16461 DF PROTO=TCP SPT=35722 DPT=9000 WINDOW=29200 RES=0x00 SYN URGP=0 
[  116.872963] [LAN_IN-RET-2005] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:7c:d9:5c:2a:04:05:08:00 SRC=192.168.6.39 DST=10.3.37.71 LEN=52 TOS=0x00 PREC=0x00 TTL=63 ID=65087 PROTO=TCP SPT=8009 DPT=33110 WINDOW=4095 RES=0x00 ACK URGP=0 
[  116.873628] [LAN_IN-RET-2005] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:7c:d9:5c:2a:04:05:08:00 SRC=192.168.6.39 DST=10.3.37.71 LEN=162 TOS=0x00 PREC=0x00 TTL=63 ID=65088 PROTO=TCP SPT=8009 DPT=33110 WINDOW=4096 RES=0x00 ACK PSH URGP=0 
[  117.179111] [LAN_IN-D-2006] IN=br3 OUT=br0 MAC=78:45:58:86:50:df:48:d6:d5:7b:57:84:08:00 SRC=192.168.6.55 DST=10.3.37.190 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=16462 DF PROTO=TCP SPT=35722 DPT=9000 WINDOW=29200 RES=0x00 SYN URGP=0

Immediately after loading the edge kernel, all devices got un-adopted.

image

@fabianishere
Copy link
Owner

Are the devices reachable from SSH? I guess the UniFi controller is not liking the multipath routes (and might be misconfiguring your router).

@issmirnov
Copy link
Author

issmirnov commented Jan 11, 2023

Agreed.

no, the devices are not reachable. Interesting to see that on the edge kernel, there's a route added to the top of the routing table:

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         67.190.92.1     0.0.0.0         UG    0      0        0 eth7

That's quite a strange one, not sure why it's happening.

Here's a full dump of configs: https://gist.github.com/issmirnov/cb467b4aee42442b8b734e77fdf1959f

@fabianishere
Copy link
Owner

You have no route for 192.168.1.0/24 (I would expect it to be there), so it is going to your default gateway. Also, any idea why your default route is through eth7 and not through the UDM's WAN ports (eth8 or eth9)?

@issmirnov
Copy link
Author

I didn't change the default route settings, so I'm not sure why it's uring eth7. My uplink is connected to port 9, with a failover WAN2 uplink on port 8. I'm using the SFP connector (port 11) to connect to my 10gbs backbone internally.

I could try adding a simple on-boot patch to the custom kernel to add the route. FWIW, I don't see the route for 192.168.1.0/24 in the stock kernel route -n output, even though adoption works fine.

I am curious though why simply switching the kernel changes the routing table so much.

And by the way, thank you so much for your responsiveness! I really do appreciate it.

@fabianishere
Copy link
Owner

Could you perhaps share the output of ip route and ip a as well?

@issmirnov
Copy link
Author

Here are the latest: https://gist.github.com/issmirnov/3f62343e8221204402d85580b3b9b364

From what I can tell, both on edge and on stock kernel the outputs are identical, although the issue is 100% reproducible.

@fabianishere
Copy link
Owner

I guess the default route (default via 67.190.92.1 dev eth7 proto dhcp) on the edge kernel is causing issues. Could you try removing it from SSH to see if it resolves the issue:

ip route delete default via 67.190.92.1 dev eth7

@issmirnov
Copy link
Author

Here's what it looks like on the edge kernel.

# ip route delete default via 67.190.92.1 dev eth7
# ip route get 192.168.1.20
192.168.1.20 via 192.168.0.1 dev eth8 table 201 src 192.168.0.4 uid 0 
    cache 
# ping 192.168.1.20
PING 192.168.1.20 (192.168.1.20): 56 data bytes
^C
--- 192.168.1.20 ping statistics ---
7 packets transmitted, 0 packets received, 100% packet los

I tried messing around with various command like ip route add 192.168.1.0/24 via 0.0.0.0 interface br0, but I'm not sure what the default management bridge interface looks like so I was never able to get SSH working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants