Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

obscure error message: babeld: send: Destination address required #108

Open
DanielG opened this issue Jul 6, 2023 · 10 comments
Open

obscure error message: babeld: send: Destination address required #108

DanielG opened this issue Jul 6, 2023 · 10 comments

Comments

@DanielG
Copy link
Contributor

DanielG commented Jul 6, 2023

Hi Juliusz,

one of my babeld routers (running 1.12.1-1 from Debian) has started to print this obscure error every second or so

babeld[11278]: send: Destination address required

Not sure what is going on here, just opening this so maybe I remember to debug it. My guess is it's coming from flushbuf() and the error case there should probably be a bit more verbose.

--Daniel

@DanielG
Copy link
Contributor Author

DanielG commented Jul 6, 2023

Looks like this is from wireguard interfaces that don't currently have an endpoint. It's strange I'm not seeing this on any of my other routers since this is fairly common in my deployment because I always have two wg devices per destination host so things work over both v4 and v6 underlay networks, but often only one will actually be working.

The router that started printing the errors has recently been updated to Debian 12 whereas the others aren't yet so it's probably related to a kernel change.

@jech
Copy link
Owner

jech commented Jul 7, 2023

Interesting. Do you think you could capture a trace with strace?

@zorun
Copy link

zorun commented Jul 7, 2023

This error is returned by Wireguard, it makes "interesting" usage of existing error codes and that can sometimes make it hard to understand the error.

It indeed happens if you have Wireguard peers without an endpoint, here is what Wireguard does:

        family = READ_ONCE(peer->endpoint.addr.sa_family);
        if (unlikely(family != AF_INET && family != AF_INET6)) {
                ret = -EDESTADDRREQ;
                net_dbg_ratelimited("%s: No valid endpoint has been configured or discovered for peer %llu\n",
                                    dev->name, peer->internal_id);
                goto err_peer;
        }

@jech
Copy link
Owner

jech commented Jul 7, 2023

Could you please explain what it means for a wireguard peer to not have an endpoint? Is it something that happens in normal usage, or is it an erroneous configuration?

@DanielG
Copy link
Contributor Author

DanielG commented Jul 7, 2023

It's a perfectly valid configuration. Setting the endpoint on startup is optional. Wireguard sets the endpoint based on incoming encapsulated packets automatically, so not having an endpoint set (yet) just means this interface is standing by waiting to get traffic from this peer.

It certainly isn't worth printing a log line on every attempted send(). I've been thinking about how to handle this and I think we have two options 1) special case type wireguard interfaces and ignore this error completely or 2) rate-limit this error to once per startup or once every couple of minutes.

@jech
Copy link
Owner

jech commented Jul 7, 2023

We could of course avoid logging this specific error, or at least rate-limit the logging. But I'd much rather somebody implemented good support for Wireguard in Babel.

Some ideas:

  1. interface wg0 unicast-only true. All packets are send to unicast addresses. Hellos are either converted to multi-unicast, or sent as unicast Hellos (not sure which one is better). This is stronger than unicast true, which send Hellos as multicast.
  2. neighbour fe80::1234 if wg0: statically defines a unicast neighbour, to which it will send Hellos even if unicast-only is true.
  3. Have some form of communication between Wireguard and babeld so that babeld can learn unicast-only peer addresses. (Probably not needed.)

@DanielG
Copy link
Contributor Author

DanielG commented Jul 19, 2023

I've started copying the embeddable-wg-library into the babeld tree more than once but it always defeats me in the end ;)

  1. I'm not sure what a multi-unicast here?

  2. I don't think statically configuring the neighbours is very nice.. I'd much rather babeld just sent it's adverts to every peer that has an fe80::/64 addresses automatically (or at least does this optionally).

The main blocker for wg support is it's internal routing based on allowedips.

We pretty much have to dynamically install allowedips to select which peer traffic should go to but wireguard doesn't have an efficient API for doing this like we have for kernel routes which is disappointing.

Makes me think perhaps there should be a route attribute we can set that wireguard picks up. Not sure how well received that would be by Jason though. Certainly would be more efficient (and easier code wise) for the dynamic routing use-case though. Hmm.

@jech
Copy link
Owner

jech commented Aug 6, 2023

I'm not sure what a multi-unicast here?

A multi-unicast is when instead of sending a multicast packet, you send the equivalent unicast packet to every neighbour. Of course, this requires that you already know the addresses of the neighbours, but it is my understanding that wireguard is able to provide this information.

Makes me think perhaps there should be a route attribute we can set that wireguard picks up. Not sure how well received that would be by Jason though.

Could that not be done in userspace? A daemon that monitor's Babel's routing table and pushes all routes into Wireguard?

@zorun
Copy link

zorun commented Aug 7, 2023

FYI, I wrote a small daemon that does almost this, although it's the other way around: it monitors Wireguard peers and pushes routes to the kernel. It's at https://github.com/zorun/wg-autoroute . I didn't use netlink to ease deployment. Also, we don't have resources constraints, so forking regularly is fine for us.

@DanielG
Copy link
Contributor Author

DanielG commented Aug 7, 2023

@jech

I'm not sure what a multi-unicast here?

A multi-unicast is when instead of sending a multicast packet, you send the equivalent unicast packet to every neighbour. Of course, this requires that you already know the addresses of the neighbours, but it is my understanding that wireguard is able to provide this information.

Right, we can enumerate all wireguard peers and find their allowed-ips, sure. The user would have to ensure every peer has a unique fe80::/64 link-local address though but that's not really a problem either.

Makes me think perhaps there should be a route attribute we can set that wireguard picks up. Not sure how well received that would be by Jason though.

Could that not be done in userspace? A daemon that monitor's Babel's routing table and pushes all routes into Wireguard?

I mean in my mind babeld is exactly that daemon.

It's not that we can't do it, but it's inefficient. AFAICT from the kernel source wireguard's netlink interface only lets us replace all peers and all their allowedips at once. So that's going to be a big scaling bottleneck.

My idea would be to have something like ip route add 2001:db8::/64 dev wg0 wg-peer <base64-pubkey> to bypass wireguard's internal allowedip based routing and select the peer directly in the routing table, but that's probably too much hassle to support in the kernel. I'd settle for just a more efficient wg netlink api to add/remove allowedips instead :)

@zorun

FYI, I wrote a small daemon that does almost this, although it's the other way around: it monitors Wireguard peers and pushes routes to the kernel. It's at https://github.com/zorun/wg-autoroute . I didn't use netlink to ease deployment. Also, we don't have resources constraints, so forking regularly is fine for us.

Thanks! I was pretty much about to implement this myself as I want to play with redundant wg endpoints :)

I had one additional idea I was going to try out: what if the (babel) metric of a peer's route was dependent on the time since the last wg handshake? That way failover could be much faster than just picking a cutoff for when a peer is "active" like you (seem) to do. Alas, again, there is no efficient netlink API for getting notified of handshake events so it would amount to polling every wg device and every peer once a second (or however fast/slow the failover should be) which is not efficient at all.

Anyway this is orthogonal to the issue of supporting dynamic(!) allowedips with babel. The problem with setting AllowedIPs=::/0 statically is that you loose the ability to have more than one peer per wg device and having a static list of prefixes as is the "normal" wireguard deployment kind of defeats the purpose of dynamic routing.

--Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants