-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression: Resolving unqualified DNS names fails #1307
Comments
Memberlist is doing the resolving, and from looking at the code it uses go's stdlib Are you able to resolve the address using this go code running in its own script in your container? |
There's a probably related issue in #1312. Does your status page show any peers? |
Looking a bit further into the code, it is resolving the addresses but apparently unable to join them .. |
@stuartnelson3 it's not showing any peers when using the unqualified name. I'll test with a go binary using net.LookupIP once I get back at this. |
Just built a simple binary running LookupIP and this seems to work just fine:
|
Thanks for looking at this. From the original log line you provided, there are two errors: One is a failure to resolve, and another is a failure to join. Both of those bits of code are in the same loop here: https://github.com/hashicorp/memberlist/blob/9f5b38f1dc837733754bf57f4ea62726a509c0fc/memberlist.go#L214-L234 The initial lookup seems to be happening on a forwarder local to that kubelet
But then the second error is connecting to an IP that isn't (according to your
The connection failure, I think, would be stale DNS data or something .. I'm not sure where that IP came from. Are you configuring each AM instance to have the full list of peers? So instance1 has Also, how do you have |
So don't think it's related to listen address or something since using the fully qualified name it works. But yeah I focused on the DNS error but you're right, it's confusing that it tries to join some other IP..? Not sure how what happened.. When I'm trying to reproduce it by deleting my pods and recreating them I don't see this, just the DNS error:
I'm as much confused as you as why this fails, give that it's using the stdlib LookupIP but it's definitely a problem with resolving the name. Just double checked that my test binary can resolve the unqualified name just fine in this same pod. |
WTH.. I just read the memberlist code and it implements it's own resolver and only uses the stdlib when this fails: https://github.com/hashicorp/memberlist/blob/9f5b38f1dc837733754bf57f4ea62726a509c0fc/memberlist.go#L247 I'll gonna fill an upstream issue. |
silence from them after 7 days :/ would it be a lot of work for you to package your own am after patching https://github.com/hashicorp/memberlist/blob/9f5b38f1dc837733754bf57f4ea62726a509c0fc/memberlist.go#L333-L339 to be commented out? I'm just starting back at SC and won't have time to try this for probably 2 weeks. |
As a workaround, I'm using the FQDN. So not urgent but something that should get fixed because others will trip over this too. |
Hi there, any news about this? |
no idea is this same problem but I do have problem like this in kubernetes gliderlabs/docker-alpine#255 in debian based docker images the dns works fine, but in alertmanager:
|
@zetaab Why do you think your probably might be the same? As you verified, DNS isn't working at all in your container. That doesn't seem to relate to this issue. @alesnav See the upstream issue (hashicorp/memberlist#147), nothing we can do here beside replacing memberlist. |
same problem. any help? thanks |
@xkfen Someone would have to fix the upstream issue: hashicorp/memberlist#147 |
What did you do?
Running AM in prometheus as stateful set with a headless service, giving each AM a name like
alertmanager-0.alertmanager.default.svc.cluster.local
.The pod gets, among others,
default.svc.cluster.local
configured as search domain in /etc/resolve.conf:This allows for the
alertmanager-0.alertmanager
name to be resolved unqualified like this:The alertmanager though can't resolve this name unqualified (which was working at least in 0.11.0) and logs this error:
Environment
The text was updated successfully, but these errors were encountered: