-
Notifications
You must be signed in to change notification settings - Fork 1.1k
DNS resolution problems in NixOS #1980
Comments
It looks a bit like it lost connectivity -- it failed to fetch new commits, reset to cloning the repo, and failed at that (then that's what was reported). It should keep trying and correct itself eventually -- do the logs stop there? |
Hi @squaremo, I'm keeping an eye on it and incredibly it has done a successful retry:
The fact that started my looking into this is the fact that flux "pollutes" the gitlab's activity feed with stuff like :
And i thought It had to to do that just one time, as a verification. Am I right? |
Yay! About the eventually correcting itself. The git mirroring is basically a state machine, and every time anything fails it goes back to the initial state -- so, when it fails to fetch new commits, it starts again by attempting to clone the repo. As part of that, there's a write check, which is what you see in the gitlab logs. Some possible mitigation:
|
thanks @squaremo , I'm trying to look into to the possible connectivity problem... but I don't see any issues, the pod is on a virtual machine, that itself is on a server inside a server farm. This is the only possible connectivity issue that I'm having..nasty |
+1 |
@azazel75 Did you find anything more out? It might have been a passing problem with the secret (or filesystem into which it was mounted). It's difficult to tell, since generally git just reports "could not read". If we can't pin it down to a particular problem, shall we close this issue on the basis that it's not clear where the problem was, and fluxd did operate as designed (by recovering eventually). |
@squaremo as you wish, it's certainly unclear what's happening, I'm running several applications and services in the same cluster and no one is reporting issues. In the previous message you wrote |
So I spent some time digging this. There's another symptom connected with this: executing a shell from inside the
I've tested the resolution with a busybox container and it hasn't any problem resolving it, i've used a for loop in the shell and made it resolve the domain 200 times without issues. Looking for could be the reason I've tried first lifting the resource limits inside Flux's Helm chart, but it didn't change anything. The I've tried to recompile Flux (following the build instructions) with the intent of rebasing the docker image on something different from Alpine, but with that I'm stuck on the
I'm surrending on this side ;-)
By the way, it's behavior is the same of the base image
So @squaremo in theend I'm asking you this: can you point out what isn't working in my go setup, why the |
@azazel75 How did you deploy Kubernetes? To rule out problems with the internal DNS server, can you try testing the same name resolutions with an external one? e.g. running |
@2opremio I've deployed Kubernetes using NixOS on a cluster composed of three VMs. The internal DNS is the latest version of CoreDNS. As I said before, the error only happens on alpine-based containers. I'm seriously thinking to switch to ArgoCD, even if it's a much more complex setup but It compiles out of the box. Anyway, here is your wanted test:
thanks anyway |
What version of dep are you using? It compiles out of the box for me every day :) :
Thanks, this shows that the resolution problem only happens with the internal DNS server, so I am pretty sure it's a problem with how CoreDNS is set up, combined with Alpine (probably |
This seems to be the culprit gliderlabs/docker-alpine#255 See gliderlabs/docker-alpine#255 (comment) and gliderlabs/docker-alpine#255 (comment) |
The comments I linked above also explain why it works when you supply My bet is that CoreDNS is configured with strict rate-limiting or similar, causing the (doubled) @azazel75 can you post the |
Unfortunately it doesn't compile for me, dep 0.5.1 fails at resolving the dependencies, I've posted the details above.
No, that shows that the test for And yes, I agree that the issue is generated by the damn Alpine ;-) Thanks |
Ah true, I misread. |
correct
I doubt that, but I don't have proof. There are various reports of Alpine dns issues, starting long ago (so probably when dnsmasq-based KubeDNS was used) and some have been reported using docker alone. I never used Alpine apart for containers, and after encountering and reading about these issues I don't thing that the hours spent in debugging are worth the few megabytes saved in disk space.
Here it is
|
Yep, I am starting to agree with this. We will have a conversation on Flux Slack channel about this. We may change base distro as a result. |
Please try |
The continuous integration system does it from scratch. Please take a look at the |
Or try setting the |
>>>> "Alfonso" == Alfonso Acosta ***@***.***> writes:
@azazel75 Also, we just moved away from `dep` (#2083 ), so, you
shouldn't have a problem with it anymore (in case you want to play
with other images yourself)
Thanks Alfonso, that's interesting, I'll try to find the time to test
debian based image in the weekend, or early next week. For the other
suggestion of trying with `ndots 1` i expect it to work as I've proved
already that absolute resolution works well, I don't have any musl
specific knowledge to say if musl will honor such configuration. we'll
see.
…--
Alberto Berti - Information Technology Consultant
PGP: 9377 A68C C5B5 B534 36BD F20B E3B5 C559 99D6 7CF9
"gutta cavat lapidem"
|
As you may have expected, the |
The issue seems more setup related than image related is the conclusion I draw from this. I have incorporated the pod DNS config in our Helm chart in #2116 to ease altering the |
I am closing this as #2116 is now merged and this should give users enough tools to overcome the issue. |
I'm observing a strange behavior of flux, it starts cloning the repo without problems but then it starts failing.. I've checked the free space but there's plenty... Any clue about what it may be?
Flux is at version 1.12.0, I've checked the permissions on the repo and flux's user is "Maintainer"... trying with a
kubectl exec -ti ...
using the same private key completes thegit clone --mirror
without issues...To Reproduce
I don't know ho to reproduce it
Additional context
Add any other context about the problem here, e.g
The text was updated successfully, but these errors were encountered: