Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

DNS resolution problems in NixOS #1980

Closed
azazel75 opened this issue Apr 25, 2019 · 26 comments
Closed

DNS resolution problems in NixOS #1980

azazel75 opened this issue Apr 25, 2019 · 26 comments
Labels

Comments

@azazel75
Copy link

I'm observing a strange behavior of flux, it starts cloning the repo without problems but then it starts failing.. I've checked the free space but there's plenty... Any clue about what it may be?
Flux is at version 1.12.0, I've checked the permissions on the repo and flux's user is "Maintainer"... trying with a kubectl exec -ti ... using the same private key completes the git clone --mirror without issues...

ts=2019-04-25T14:36:46.506163209Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://[email protected]/etour/ndn-deploy.git branch=master HEAD=10566cbc3923d0fc0ec2250fba151fe79766db30
ts=2019-04-25T14:36:52.945555213Z caller=sync.go:455 component=cluster method=Sync cmd=apply args= count=1
ts=2019-04-25T14:36:55.66944758Z caller=sync.go:521 component=cluster method=Sync cmd="kubectl apply -f -" took=2.723612826s err=null output="helmrelease.flux.weave.works/hopi-test configured"
ts=2019-04-25T14:41:38.073278704Z caller=images.go:18 component=sync-loop msg="polling images"
ts=2019-04-25T14:41:48.113528705Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://[email protected]/etour/ndn-deploy.git branch=master HEAD=10566cbc3923d0fc0ec2250fba151fe79766db30
ts=2019-04-25T14:42:02.572146828Z caller=sync.go:455 component=cluster method=Sync cmd=apply args= count=1
ts=2019-04-25T14:42:02.868525945Z caller=sync.go:521 component=cluster method=Sync cmd="kubectl apply -f -" took=296.217383ms err=null output="helmrelease.flux.weave.works/hopi-test unchanged"
ts=2019-04-25T14:46:38.227383065Z caller=images.go:18 component=sync-loop msg="polling images"
ts=2019-04-25T14:46:49.733189505Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://[email protected]/etour/ndn-deploy.git branch=master HEAD=10566cbc3923d0fc0ec2250fba151fe79766db30
ts=2019-04-25T14:47:09.296892066Z caller=sync.go:455 component=cluster method=Sync cmd=apply args= count=1
ts=2019-04-25T14:47:11.287687418Z caller=sync.go:521 component=cluster method=Sync cmd="kubectl apply -f -" took=1.99029949s err=null output="helmrelease.flux.weave.works/hopi-test unchanged"
ts=2019-04-25T14:51:38.305624468Z caller=images.go:18 component=sync-loop msg="polling images"
ts=2019-04-25T14:52:11.311781787Z caller=loop.go:90 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository."

To Reproduce
I don't know ho to reproduce it

Additional context
Add any other context about the problem here, e.g

  • Flux version: 1.12.0
  • Helm Operator version: the same?
  • Kubernetes version: 1.13.5
  • Git provider: gitlab
  • Container registry provider: gitlab
@squaremo
Copy link
Member

squaremo commented Apr 25, 2019

It looks a bit like it lost connectivity -- it failed to fetch new commits, reset to cloning the repo, and failed at that (then that's what was reported). It should keep trying and correct itself eventually -- do the logs stop there?

@squaremo squaremo added blocked-needs-validation Issue is waiting to be validated before we can proceed question labels Apr 25, 2019
@azazel75
Copy link
Author

Hi @squaremo, I'm keeping an eye on it and incredibly it has done a successful retry:

ts=2019-04-25T14:56:38.447718206Z caller=images.go:18 component=sync-loop msg="polling images"
ts=2019-04-25T14:56:38.447846078Z caller=images.go:28 component=sync-loop msg="no automated workloads"
ts=2019-04-25T14:57:11.312272858Z caller=loop.go:90 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository."
ts=2019-04-25T15:01:38.448380635Z caller=images.go:18 component=sync-loop msg="polling images"
ts=2019-04-25T15:01:38.44849261Z caller=images.go:28 component=sync-loop msg="no automated workloads"
ts=2019-04-25T15:02:11.314893467Z caller=loop.go:90 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository."
ts=2019-04-25T15:06:38.449780771Z caller=images.go:18 component=sync-loop msg="polling images"
ts=2019-04-25T15:06:38.449954049Z caller=images.go:28 component=sync-loop msg="no automated workloads"
ts=2019-04-25T15:07:11.315185637Z caller=loop.go:90 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository."
ts=2019-04-25T15:11:38.450535875Z caller=images.go:18 component=sync-loop msg="polling images"
ts=2019-04-25T15:11:38.450640262Z caller=images.go:28 component=sync-loop msg="no automated workloads"
ts=2019-04-25T15:12:11.315640172Z caller=loop.go:90 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository."
ts=2019-04-25T15:16:38.451160644Z caller=images.go:18 component=sync-loop msg="polling images"
ts=2019-04-25T15:16:38.451317821Z caller=images.go:28 component=sync-loop msg="no automated workloads"
ts=2019-04-25T15:17:11.316171034Z caller=loop.go:90 component=sync-loop err="git repo not ready: git clone --mirror: fatal: Could not read from remote repository."
ts=2019-04-25T15:21:38.451931012Z caller=images.go:18 component=sync-loop msg="polling images"
ts=2019-04-25T15:21:38.45240027Z caller=images.go:28 component=sync-loop msg="no automated workloads"
ts=2019-04-25T15:21:52.55392213Z caller=loop.go:103 component=sync-loop event=refreshed url=ssh://[email protected]/etour/ndn-deploy.git branch=master HEAD=10566cbc3923d0fc0ec2250fba151fe79766db30
ts=2019-04-25T15:22:17.743163377Z caller=sync.go:455 component=cluster method=Sync cmd=apply args= count=1
ts=2019-04-25T15:22:20.521628247Z caller=sync.go:521 component=cluster method=Sync cmd="kubectl apply -f -" took=2.778006572s err=null output="helmrelease.flux.weave.works/hopi-test unchanged"

The fact that started my looking into this is the fact that flux "pollutes" the gitlab's activity feed with stuff like :

@etour_gitops deleted tag flux-write-check at eTour / ndn-deploy 6 minutes ago
@etour_gitops pushed new tag flux-write-check at eTour / ndn-deploy 6 minutes ago
@etour_gitops deleted tag flux-write-check at eTour / ndn-deploy 51 minutes ago
@etour_gitops pushed new tag flux-write-check at eTour / ndn-deploy 51minutes ago

And i thought It had to to do that just one time, as a verification. Am I right?

@squaremo
Copy link
Member

Yay! About the eventually correcting itself.

The git mirroring is basically a state machine, and every time anything fails it goes back to the initial state -- so, when it fails to fetch new commits, it starts again by attempting to clone the repo. As part of that, there's a write check, which is what you see in the gitlab logs.

Some possible mitigation:

  • In Flux git read-only mode #1741 there's a read-only mode in development; that will not use a write check.
  • It's possible that the state machine could be altered, so that it will only return to the initial state if there are repeated failures -- I'll have a quick look and report back.

@squaremo squaremo removed the blocked-needs-validation Issue is waiting to be validated before we can proceed label Apr 25, 2019
@azazel75
Copy link
Author

thanks @squaremo , I'm trying to look into to the possible connectivity problem... but I don't see any issues, the pod is on a virtual machine, that itself is on a server inside a server farm. This is the only possible connectivity issue that I'm having..nasty

@sercanacar
Copy link

+1

@squaremo
Copy link
Member

squaremo commented May 1, 2019

@azazel75 Did you find anything more out? It might have been a passing problem with the secret (or filesystem into which it was mounted). It's difficult to tell, since generally git just reports "could not read".

If we can't pin it down to a particular problem, shall we close this issue on the basis that it's not clear where the problem was, and fluxd did operate as designed (by recovering eventually).

@azazel75
Copy link
Author

azazel75 commented May 2, 2019

@squaremo as you wish, it's certainly unclear what's happening, I'm running several applications and services in the same cluster and no one is reporting issues. In the previous message you wrote t's possible that the state machine could be altered, so that it will only return to the initial state if there are repeated failures -- I'll have a quick look and report back... did you find anything?

@azazel75
Copy link
Author

azazel75 commented May 5, 2019

So I spent some time digging this. There's another symptom connected with this: executing a shell from inside the flux pod and trying to resolve gitlab.com (with ping gitlab.com) and I found that it has serious issues resolving it.

/home/flux # ping gitlab.com
ping: bad address 'gitlab.com'

I've tested the resolution with a busybox container and it hasn't any problem resolving it, i've used a for loop in the shell and made it resolve the domain 200 times without issues.

Looking for could be the reason I've tried first lifting the resource limits inside Flux's Helm chart, but it didn't change anything. The I've tried to recompile Flux (following the build instructions) with the intent of rebasing the docker image on something different from Alpine, but with that I'm stuck on the dep ensure step, where it complains of several dependency issues (I'm a go newbie):

:/tmp/gopath/src/github.com/weaveworks/flux$ dep ensure
Solving failure: No versions of github.com/golang/dep met constraints:
	v0.5.1: Could not introduce github.com/golang/[email protected], as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	v0.5.0: Could not introduce github.com/golang/[email protected], as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	v0.4.1: Could not introduce github.com/golang/[email protected], as it depends on github.com/Masterminds/semver from https://github.com/carolynvs/semver.git, but github.com/Masterminds/semver is already marked as coming from github.com/Masterminds/semver by github.com/weaveworks/flux
	v0.4.0: Could not introduce github.com/golang/[email protected], as it depends on github.com/Masterminds/semver from https://github.com/carolynvs/semver.git, but github.com/Masterminds/semver is already marked as coming from github.com/Masterminds/semver by github.com/weaveworks/flux
	v0.3.2: Could not introduce github.com/golang/[email protected], as its subpackage github.com/golang/dep/gps is missing. (Package is required by (root).)
	v0.3.1: Could not introduce github.com/golang/[email protected], as its subpackage github.com/golang/dep/gps is missing. (Package is required by (root).)
	v0.3.0: Could not introduce github.com/golang/[email protected], as its subpackage github.com/golang/dep/gps is missing. (Package is required by (root).)
	v0.2.1: Could not introduce github.com/golang/[email protected], as its subpackage github.com/golang/dep/gps is missing. (Package is required by (root).)
	v0.2.0: Could not introduce github.com/golang/[email protected], as its subpackage github.com/golang/dep/gps is missing. (Package is required by (root).)
	v0.1.0: Could not introduce github.com/golang/[email protected], as its subpackage github.com/golang/dep/gps is missing. (Package is required by (root).)
	master: Could not introduce github.com/golang/dep@master, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	add_timeout_to_git_ls-remote: Could not introduce github.com/golang/dep@add_timeout_to_git_ls-remote, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	assign-op: Could not introduce github.com/golang/dep@assign-op, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	better-info-panic: Could not introduce github.com/golang/dep@better-info-panic, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	better_bitbucket_resolutions: Could not introduce github.com/golang/dep@better_bitbucket_resolutions, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	cleanupFSTests: Could not introduce github.com/golang/dep@cleanupFSTests, as its subpackage github.com/golang/dep/gps is missing. (Package is required by (root).)
	daixiang0-delete-blank: Could not introduce github.com/golang/dep@daixiang0-delete-blank, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	debug-ls-remote-failure: Could not introduce github.com/golang/dep@debug-ls-remote-failure, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	gabolaev-patch-1: Could not introduce github.com/golang/dep@gabolaev-patch-1, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	gh-pages: Could not introduce github.com/golang/dep@gh-pages due to multiple problematic subpackages:
	Subpackage github.com/golang/dep does not contain usable Go code (*build.NoGoError).. (Package is required by (root).)	Subpackage github.com/golang/dep/gps is missing. (Package is required by (root).)
	installation: Could not introduce github.com/golang/dep@installation, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	kill-child-process: Could not introduce github.com/golang/dep@kill-child-process, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	lock-all-equals-signs: Could not introduce github.com/golang/dep@lock-all-equals-signs, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	release-0.5.1: Could not introduce github.com/golang/[email protected], as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	s390x-releases: Could not introduce github.com/golang/dep@s390x-releases, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	skip-tests: Could not introduce github.com/golang/dep@skip-tests, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)
	unlambda: Could not introduce github.com/golang/dep@unlambda, as it has a dependency on github.com/Masterminds/semver with constraint 2.x, which has no overlap with existing constraint ^1.4.0 from (root)

enter-fhs-chrootenv:azazel@ender:/tmp/gopath/src/github.com/weaveworks/flux$ go version
go version go1.12.1 linux/amd64
enter-fhs-chrootenv:azazel@ender:/tmp/gopath/src/github.com/weaveworks/flux$ dep version
dep:
 version     : 0.5.1
 build date  : 
 git hash    : v0.5.1
 go version  : go1.12.1
 go compiler : gc
 platform    : linux/amd64
 features    : ImportDuringSolve=false

I'm surrending on this side ;-)
So back with the standard flux 1.12.1 container I've tried some more commands and I've discovered that there's certinly something going on with domain searches:

$ kubectl -n flux exec -ti flux-59894f7b7d-wf6fn sh
/home/flux # ping gitlab.com
ping: bad address 'gitlab.com'
/home/flux # nslookup gitlab.com
Server:		10.0.0.254
Address:	10.0.0.254#53

Non-authoritative answer:
** server can't find gitlab.com.etour.tn.it: NXDOMAIN

/home/flux # nslookup gitlab.com.
Server:		10.0.0.254
Address:	10.0.0.254#53

Non-authoritative answer:
Name:	gitlab.com
Address: 35.231.145.151

/home/flux # ping gitlab.com.
PING gitlab.com. (35.231.145.151): 56 data bytes
64 bytes from 35.231.145.151: seq=0 ttl=42 time=103.549 ms

By the way, it's behavior is the same of the base image alpine:3.9. It turns out that there are a number of these problems known for DNS resolution in Alpine. It's difficult to establish what's causing them, one could read for hours, ending on the bugtrackers for kubernets, azure and so on... But trying with a more normal distro like debian:stretch revealed that while showing some of the symptoms, it ends up being able to resolve the names, always:

$ kubectl run -i --tty  --image debian:stretch dns-tests --restart=Never --rm /bin/sh
If you don't see a command prompt, try pressing enter.

# ping gitlab.com
PING gitlab.com (35.231.145.151) 56(84) bytes of data.
64 bytes from 151.145.231.35.bc.googleusercontent.com (35.231.145.151): icmp_seq=1 ttl=42 time=103 ms
64 bytes from 151.145.231.35.bc.googleusercontent.com (35.231.145.151): icmp_seq=2 ttl=42 time=103 ms
^C
--- gitlab.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 103.161/103.549/103.938/0.504 ms
# nslookup	
/bin/sh: 3: nslookup: not found
# apt update
[...]
# apt install dnsutils
[...]
# nslookup gitlab.com
Server:		10.0.0.254
Address:	10.0.0.254#53

Non-authoritative answer:
*** Can't find gitlab.com: No answer

# ping gitlab.com
PING gitlab.com (35.231.145.151) 56(84) bytes of data.
64 bytes from 151.145.231.35.bc.googleusercontent.com (35.231.145.151): icmp_seq=1 ttl=42 time=103 ms
64

So @squaremo in theend I'm asking you this: can you point out what isn't working in my go setup, why the dep ensure command isn't working for me in a vanilla setup? This would allow me to compile Flux by my own and rebase it on a more reliable base image (i.e. debian) or even better could you build an alternative but official Flux image based on debian:stretch?

@2opremio
Copy link
Contributor

2opremio commented May 15, 2019

  • Kubernetes version: 1.13.5

@azazel75 How did you deploy Kubernetes? To rule out problems with the internal DNS server, can you try testing the same name resolutions with an external one? e.g. running nslookup gitlab.com 1.1.1.1 inside the flux container (which uses Cloudflare's DNS server)

@azazel75
Copy link
Author

@2opremio I've deployed Kubernetes using NixOS on a cluster composed of three VMs. The internal DNS is the latest version of CoreDNS. As I said before, the error only happens on alpine-based containers. I'm seriously thinking to switch to ArgoCD, even if it's a much more complex setup but It compiles out of the box. Anyway, here is your wanted test:

$ kubectl -n flux exec -ti flux-59894f7b7d-wf6fn sh
/home/flux # ping gitlab.com
ping: bad address 'gitlab.com'
/home/flux # nslookup gitlab.com
nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'gitlab.com': Name does not resolve
/home/flux # nslookup gitlab.com 1.1.1.1
Server:    1.1.1.1
Address 1: 1.1.1.1 one.one.one.one

nslookup: can't resolve 'gitlab.com': Name does not resolve
/home/flux # nslookup gitlab.com. 1.1.1.1 #this with a final dot  works
Server:    1.1.1.1
Address 1: 1.1.1.1 one.one.one.one

Name:      gitlab.com.
Address 1: 35.231.145.151 151.145.231.35.bc.googleusercontent.com

thanks anyway

@2opremio 2opremio changed the title Flux fails at cloning the repo, after a while DNS resolution problems in NixOS May 15, 2019
@2opremio
Copy link
Contributor

The I've tried to recompile Flux (following the build instructions) with the intent of rebasing the docker image on something different from Alpine, but with that I'm stuck on the dep ensure step, where it complains of several dependency issues (I'm a go newbie)

but It compiles out of the box

What version of dep are you using? It compiles out of the box for me every day :) :

apfelmus-2:flux fons$ dep version
dep:
 version     : devel
 build date  : 
 git hash    : 
 go version  : go1.11.4
 go compiler : gc
 platform    : darwin/amd64
 features    : ImportDuringSolve=false

Anyway, here is your wanted test:

Thanks, this shows that the resolution problem only happens with the internal DNS server, so I am pretty sure it's a problem with how CoreDNS is set up, combined with Alpine (probably musl being more strict than glibc).

@2opremio
Copy link
Contributor

2opremio commented May 15, 2019

@2opremio
Copy link
Contributor

The comments I linked above also explain why it works when you supply gitlab.com. (since it ignores the search path)

My bet is that CoreDNS is configured with strict rate-limiting or similar, causing the (doubled) search lookups to end up failing.

@azazel75 can you post the /etc/resolv.conf file the Flux container gets?

@azazel75
Copy link
Author

The I've tried to recompile Flux (following the build instructions) with the intent of rebasing the docker image on something different from Alpine, but with that I'm stuck on the dep ensure step, where it complains of several dependency issues (I'm a go newbie)

but It compiles out of the box

What version of dep are you using? It compiles out of the box for me every day :) :

Unfortunately it doesn't compile for me, dep 0.5.1 fails at resolving the dependencies, I've posted the details above.

Anyway, here is your wanted test:

Thanks, this shows that the resolution problem only happens with the internal DNS server, so I am pretty sure it's a problem with how CoreDNS is set up, combined with Alpine (probably musl being more strict than glibc).

No, that shows that the test for gitlab.com doesn't work with 1.1.1.1 either, only the the gitlab.com. (note the last dot that makes so the resolution library doesn't try do searches) works with both DNS, even if above only the resolution using 1.1.1.1 is show.

And yes, I agree that the issue is generated by the damn Alpine ;-)

Thanks

@2opremio
Copy link
Contributor

No, that shows that the test for gitlab.com doesn't work with 1.1.1.1 either, only the the gitlab.com.

Ah true, I misread.

@azazel75
Copy link
Author

The comments I linked above also explain why it works when you supply gitlab.com. (since it ignores the search path)

correct

My bet is that CoreDNS is configured with strict rate-limiting or similar, causing the (doubled) search lookups to end up failing.

I doubt that, but I don't have proof. There are various reports of Alpine dns issues, starting long ago (so probably when dnsmasq-based KubeDNS was used) and some have been reported using docker alone. I never used Alpine apart for containers, and after encountering and reading about these issues I don't thing that the hours spent in debugging are worth the few megabytes saved in disk space.

@azazel75 can you post the /etc/resolv.conf file the Flux container gets?

Here it is

$ kubectl -n flux exec  flux-59894f7b7d-wf6fn cat /etc/resolv.conf
nameserver 10.0.0.254
search flux.svc.cluster.local svc.cluster.local cluster.local etour.tn.it
options ndots:5

@2opremio
Copy link
Contributor

I don't thing that the hours spent in debugging are worth the few megabytes saved in disk space.

Yep, I am starting to agree with this. We will have a conversation on Flux Slack channel about this. We may change base distro as a result.

@azazel75
Copy link
Author

azazel75 commented May 15, 2019

What version of dep are you using? It compiles out of the box for me every day :) :

apfelmus-2:flux fons$ dep version
dep:
 version     : devel
 build date  : 
 git hash    : 
 go version  : go1.11.4
 go compiler : gc
 platform    : darwin/amd64
 features    : ImportDuringSolve=false

Please try dep ensure in clean environment (i.e. without packages in $GOPATH) with a released version of go dep (latest seems to be 0.5.2?), I've posted the result of "dep ensure" above but no one can answer me why it gives such results

@2opremio
Copy link
Contributor

The continuous integration system does it from scratch. Please take a look at the .circleci directory

@2opremio
Copy link
Contributor

2opremio commented May 21, 2019

@azazel75 It seems Alpine may not be the issue after all, could you please test the images offered in #2051 , to see if they fix the problem? (CC @hiddeco )

@hiddeco
Copy link
Member

hiddeco commented May 21, 2019

Or try setting the ndots configuration to 1, you can do this without modifying the image by setting a custom DNS config in the pod spec: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-config

@2opremio
Copy link
Contributor

@azazel75 Also, we just moved away from dep (#2083 ), so, you shouldn't have a problem with it anymore (in case you want to play with other images yourself)

@azazel75
Copy link
Author

azazel75 commented May 24, 2019 via email

@azazel75
Copy link
Author

azazel75 commented Jun 3, 2019

As you may have expected, the ndots config fixed it, but instead the debian based flux did not fix the issue. I've tried with my own images, rebased on latest master, but nothing that we can be certain of. What happens remains unfortunately unclear to me. Thanks you all for your support

@hiddeco
Copy link
Member

hiddeco commented Jun 3, 2019

As you may have expected, the ndots config fixed it, but instead the debian based flux did not fix the issue.

The issue seems more setup related than image related is the conclusion I draw from this.

I have incorporated the pod DNS config in our Helm chart in #2116 to ease altering the ndots config, plus made mention of it in the Flux daemon deployment.

@hiddeco
Copy link
Member

hiddeco commented Jun 3, 2019

I am closing this as #2116 is now merged and this should give users enough tools to overcome the issue.

@hiddeco hiddeco closed this as completed Jun 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants