Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerfile does not have health check #3290

Closed
agneevX opened this issue Jun 28, 2021 · 27 comments
Closed

Dockerfile does not have health check #3290

agneevX opened this issue Jun 28, 2021 · 27 comments

Comments

@agneevX
Copy link
Contributor

agneevX commented Jun 28, 2021

Continuation of #1426

There's no health check for Docker images.

This is how Pi-hole does it:

HEALTHCHECK CMD dig +norecurse +retry=0 @127.0.0.1 pi.hole || exit 1
@agneevX
Copy link
Contributor Author

agneevX commented Jun 28, 2021

Using curl for web UI:

HEALTHCHECK CMD curl -fs http://localhost:3000 -o /dev/null || exit 1

@ainar-g
Copy link
Contributor

ainar-g commented Jun 28, 2021

Unfortunately, it's not that straightforward. AGH's initial HTTP API port is 3000, but the user can change it. In fact, the port that we currently propose by default is 80. And people can change it in the config file. We could include some sort of a script that would read the config file and use that value, but definitely not in this release cycle.

@sachinwadhwa
Copy link

I have used following healthcheck in my compose file. Port 53 is most important for me (as I am not using DoH).

healthcheck:
  test: "/bin/netstat -pant | /bin/grep 53"
  interval: 45s
  timeout: 30s
  retries: 3

@ameshkov
Copy link
Member

ameshkov commented Jul 5, 2021

DNS server is not working until the initial setup is finished

@agneevX
Copy link
Contributor Author

agneevX commented Jul 12, 2021

Seems like a separate script is needed to accomplish this.

@lolgast1987
Copy link

I made something to do exactly that. Checks for the Setup to be done and if so, which port is configured for bind_port.

https://github.com/lolgast1987/adguard-unbound/blob/master/files/healthcheck.sh

@PeterDaveHello
Copy link
Contributor

Can we at least accept to use maybe 53 port with nslookup or dig to do a basic health check? I'd like to send a pull request for it if it's acceptable. We could have some further improvement later, when the more complete and complex idea came up?

@ainar-g ainar-g added this to the v0.107.13 milestone Sep 7, 2022
@PeterDaveHello
Copy link
Contributor

My current healthcheck config using docker-compose:

https://github.com/PeterDaveHello/dnslow.me/blob/master/docker-compose.yml#L9-L14

healthcheck:
  test: nslookup www.google.com || exit 1
  timeout: 5s
  interval: 60s
  start_period: 10s
  retries: 1

@ainar-g ainar-g modified the milestones: v0.107.13, v0.107.14 Sep 14, 2022
@ainar-g ainar-g modified the milestones: v0.107.14, v0.107.15 Sep 29, 2022
@EugeneOne1 EugeneOne1 modified the milestones: v0.107.15, v0.107.16 Oct 3, 2022
@JaneJeon
Copy link

For those of us using DNS-over-HTTPS, what is the best option here?

@JaneJeon
Copy link

What in the world is the random string of letters in the ?dns= query and how do I construct it myself?

@PeterDaveHello
Copy link
Contributor

What in the world is the random string of letters in the ?dns= query and how do I construct it myself?

It's not random, I just do it the lazy way, use any DoH client to send a http get query to a https server, then you can find it in the access log.

@zoonderkins
Copy link

Another example to perform www.example.com

https://doh-jp.blahdns.com/dns-query?dns=AAABAAABAAAAAAAAA3d3dwdleGFtcGxlA2NvbQAAAQAB

@alexsannikov
Copy link

As an example: Node Red docker container uses healthcheck....

adguard pushed a commit that referenced this issue Mar 26, 2023
Merge in DNS/adguard-home from 3290-docker-healthcheck to master

Updates #3290.

Squashed commit of the following:

commit 3ac8f26
Merge: bc17565 0df3260
Author: Eugene Burkov <[email protected]>
Date:   Mon Mar 27 01:09:03 2023 +0500

    Merge branch 'master' into 3290-docker-healthcheck

commit bc17565
Author: Eugene Burkov <[email protected]>
Date:   Sun Mar 26 18:04:08 2023 +0500

    all: fix script

commit e150fee
Author: Eugene Burkov <[email protected]>
Date:   Sun Mar 26 17:18:12 2023 +0500

    all: imp naming

commit 26b6448
Author: Eugene Burkov <[email protected]>
Date:   Sun Mar 26 03:13:47 2023 +0500

    all: support https web

commit b5c09ce
Author: Eugene Burkov <[email protected]>
Date:   Sat Mar 25 20:03:45 2023 +0500

    all: imp scripts fmt, naming

commit 8c3798c
Merge: e33b0c5 fb7b8bb
Author: Eugene Burkov <[email protected]>
Date:   Sat Mar 25 00:25:38 2023 +0500

    Merge branch 'master' into 3290-docker-healthcheck

commit e33b0c5
Author: Eugene Burkov <[email protected]>
Date:   Fri Mar 24 16:47:26 2023 +0500

    all: fix docs

commit 57bfd89
Author: Eugene Burkov <[email protected]>
Date:   Fri Mar 24 16:44:40 2023 +0500

    dnsforward: add special-use domain handling

commit f04ae13
Author: Eugene Burkov <[email protected]>
Date:   Fri Mar 24 16:05:10 2023 +0500

    all: imp code

commit 32f150f
Author: Eugene Burkov <[email protected]>
Date:   Fri Mar 24 04:19:10 2023 +0500

    all: mv Dockerfile, log changes

commit a094a44
Author: Eugene Burkov <[email protected]>
Date:   Fri Mar 24 04:04:27 2023 +0500

    all: finish scripts, imp names

commit 4db0d0e
Author: Eugene Burkov <[email protected]>
Date:   Thu Mar 23 18:33:47 2023 +0500

    docker: add script and awk program
adguard pushed a commit that referenced this issue Mar 27, 2023
Merge in DNS/adguard-home from 5642-fix-healthcheck-ssl to master

Updates #5642.
Updates #3290.

Squashed commit of the following:

commit c457ecb
Author: Eugene Burkov <[email protected]>
Date:   Mon Mar 27 15:35:32 2023 +0500

    docker: imp docs

commit fddabb9
Author: Eugene Burkov <[email protected]>
Date:   Mon Mar 27 15:18:22 2023 +0500

    docker: skip ssl check
@EugeneOne1
Copy link
Member

EugeneOne1 commented Mar 27, 2023

Hello again, @agneevX, and everyone here. The latest edge build now implements the healthcheck via Dockerfile. Could you please check if it works for you?

@stavros-k
Copy link

@EugeneOne1
Hello, healthcheck does not account for the --port flag when it's not yet run through the wizard.
https://github.com/AdguardTeam/AdGuardHome/blob/132ec556dc206c7f4453b194d58112944b5b96db/docker/healthcheck.sh#LL25C27-L25C31

@EugeneOne1
Copy link
Member

@stavros-k, thanks for the feedback. Actually, the current implementation expects the port 3000, specified in the official Dockerfile, to be used for installation. Since this use case exists, we've filled the separate issue (#5645) about taking custom ports specified via command-line options into account.

Are there any other issues happened?

@stavros-k
Copy link

@EugeneOne1 I haven't noticed anything else yet. But currently working on adding it as an app to TrueNAS SCALE. I'll let you know if I find anything

@leo15dev
Copy link

@stavros-k, thanks for the feedback. Actually, the current implementation expects the port 3000, specified in the official Dockerfile, to be used for installation. Since this use case exists, we've filled the separate issue (#5645) about taking custom ports specified via command-line options into account.

Are there any other issues happened?

If the health check status showing unhealthy, it will produce a lot of zombie process.

e.g.
root 119494 0.0 0.0 0 0 ? Z 19:01 0:00 [ssl_client]
root 119495 0.0 0.0 0 0 ? Z 19:01 0:00 [ssl_client]
root 119939 0.0 0.0 0 0 ? Z 19:01 0:00 [ssl_client]
root 119956 0.0 0.0 0 0 ? Z 19:01 0:00 [ssl_client]
root 120419 0.0 0.0 0 0 ? Z 19:02 0:00 [ssl_client]
root 120420 0.0 0.0 0 0 ? Z 19:02 0:00 [ssl_client]
root 120900 0.0 0.0 0 0 ? Z 19:02 0:00 [ssl_client]
root 120901 0.0 0.0 0 0 ? Z 19:02 0:00 [ssl_client]
root 121332 0.0 0.0 0 0 ? Z 19:03 0:00 [ssl_client]
root 121341 0.0 0.0 0 0 ? Z 19:03 0:00 [ssl_client]
root 121816 0.0 0.0 0 0 ? Z 19:04 0:00 [ssl_client]
root 121817 0.0 0.0 0 0 ? Z 19:04 0:00 [ssl_client]
root 122284 0.0 0.0 0 0 ? Z 19:04 0:00 [ssl_client]
root 122285 0.0 0.0 0 0 ? Z 19:04 0:00 [ssl_client]
root 122792 0.0 0.0 0 0 ? Z 19:05 0:00 [ssl_client]
root 122793 0.0 0.0 0 0 ? Z 19:05 0:00 [ssl_client]
root 123281 0.0 0.0 0 0 ? Z 19:05 0:00 [ssl_client]
root 123282 0.0 0.0 0 0 ? Z 19:05 0:00 [ssl_client]
root 123721 0.0 0.0 0 0 ? Z 19:06 0:00 [ssl_client]
root 123722 0.0 0.0 0 0 ? Z 19:06 0:00 [ssl_client]
root 124209 0.0 0.0 0 0 ? Z 19:07 0:00 [ssl_client]

all above zombie process was found after upgrading the edge container at 19:01 by watchtower.

Thank you!

@EugeneOne1
Copy link
Member

@leo15dev, what exact version are you using? We've investigated the case and have some assumptions. The Docker container runs the command specified by the ENTRYPOINT under PID=1. In UNIX systems this exact process is intended to handle orphan processes (and zombies too) and is usually occupied by some init system. In case of Docker container, this PID is occupied by AdGuard Home itself, so these processes are becoming its children and stay in the process table forever due to lack of appropriate handling.

However, we've only been able to reproduce this "process zombification" requesting the web UI by wget with HTTPS scheme which triggered the ssl_client. We've been using the HTTPS scheme for these requests in the initial version of our healthcheck script, but the current implementation avoids it, so it's not really clear what calls ssl_client now.

@leo15dev
Copy link

@leo15dev, what exact version are you using? We've investigated the case and have some assumptions. The Docker container runs the command specified by the ENTRYPOINT under PID=1. In UNIX systems this exact process is intended to handle orphan processes (and zombies too) and is usually occupied by some init system. In case of Docker container, this PID is occupied by AdGuard Home itself, so these processes are becoming its children and stay in the process table forever due to lack of appropriate handling.

However, we've only been able to reproduce this "process zombification" requesting the web UI by wget with HTTPS scheme which triggered the ssl_client. We've been using the HTTPS scheme for these requests in the initial version of our healthcheck script, but the current implementation avoids it, so it's not really clear what calls ssl_client now.

AdGuard Home, version v0.108.0-a.493+2eb3bf6e

I run it behind the nginx reverse proxy. But in Encryption settings -> HTTPS port , I set it to 443. Because I assigned an independent docker network for AdGuard Home container. And in docker-compose.yaml, I do not expose any port.
Maybe this is the reason why produced the ssl_client?

tls:
enabled: true
server_name:
force_https: true
port_https: 443
port_dns_over_tls: 853
port_dns_over_quic: 853
port_dnscrypt: 0
dnscrypt_config_file: ""
allow_unencrypted_doh: false
certificate_chain: ""
private_key: ""
certificate_path: /opt/adguardhome/conf/ssl/cert.pem
private_key_path: /opt/adguardhome/conf/ssl/key.pem
strict_sni_check: false

Thak you!

@EugeneOne1
Copy link
Member

EugeneOne1 commented Mar 31, 2023

@leo15dev, well, it shouldn't be the case, since the health check script runs inside the same container it checks.

Since the v0.108.0-a.493+2eb3bf6e is 1 revision ahead of the one, which started avoiding HTTPS, I should ensure, if the issue still relevant? If these zombies still present after the update?

@leo15dev
Copy link

leo15dev commented Mar 31, 2023

Since the v0.108.0-a.493+2eb3bf6e is 1 revision ahead of the one, which started avoiding HTTPS, I should ensure, if the issue still relevant? If these zombies still present after the update?

still producd the the zombies process.

2023/03/31 19:00:43.655326 [info] AdGuard Home, version v0.108.0-a.493+2eb3bf6e

root 2495058 0.0 0.0 0 0 ? Z 19:01 0:00 [ssl_client]
root 2495059 0.0 0.0 0 0 ? Z 19:01 0:00 [ssl_client]
root 2495509 0.0 0.0 0 0 ? Z 19:01 0:00 [ssl_client]
root 2495510 0.0 0.0 0 0 ? Z 19:01 0:00 [ssl_client]
root 2495974 0.0 0.0 0 0 ? Z 19:02 0:00 [ssl_client]
root 2495990 0.0 0.0 0 0 ? Z 19:02 0:00 [ssl_client]
root 2496455 0.0 0.0 0 0 ? Z 19:02 0:00 [ssl_client]
root 2496456 0.0 0.0 0 0 ? Z 19:02 0:00 [ssl_client]
root 2496895 0.0 0.0 0 0 ? Z 19:03 0:00 [ssl_client]
root 2496896 0.0 0.0 0 0 ? Z 19:03 0:00 [ssl_client]
root 2497386 0.0 0.0 0 0 ? Z 19:04 0:00 [ssl_client]
root 2497387 0.0 0.0 0 0 ? Z 19:04 0:00 [ssl_client]

Same in

2023/03/31 23:00:37.289831 [info] AdGuard Home, version v0.108.0-a.494+1731ce9c

root 2726031 0.0 0.0 0 0 ? Z 23:01 0:00 [ssl_client]
root 2726046 0.0 0.0 0 0 ? Z 23:01 0:00 [ssl_client]
root 2726798 0.1 0.0 0 0 ? Z 23:01 0:00 [ssl_client]
root 2726799 0.1 0.0 0 0 ? Z 23:01 0:00 [ssl_client]
root 2727385 0.2 0.0 0 0 ? Z 23:02 0:00 [ssl_client]
root 2727386 0.2 0.0 0 0 ? Z 23:02 0:00 [ssl_client]

Thank you!

adguard pushed a commit that referenced this issue Mar 31, 2023
Merge in DNS/adguard-home from 3290-kill-zombies to master

Updates #3290.

Squashed commit of the following:

commit 3e06260
Merge: 5aa7aa4 1731ce9
Author: Eugene Burkov <[email protected]>
Date:   Fri Mar 31 20:04:04 2023 +0500

    Merge branch 'master' into 3290-kill-zombies

commit 5aa7aa4
Author: Eugene Burkov <[email protected]>
Date:   Fri Mar 31 16:38:00 2023 +0500

    docker: add doc

commit 52a0b67
Author: Eugene Burkov <[email protected]>
Date:   Fri Mar 31 14:41:41 2023 +0500

    docker: add init emulator
@leo15dev
Copy link

I think AdGuard Home, version v0.108.0-a.495+f191cb07 fixed it, thank you!

@EugeneOne1
Copy link
Member

@leo15dev, great, thanks for your help. I suppose, the issue can be closed for now. Please feel free to open new ones for faced problems.

@agneevX
Copy link
Contributor Author

agneevX commented Apr 15, 2023

I sort of regret creating this issue as I've learnt over time that Docker healthchecks create unexpected issues and do more harm than good.

  1. https://forums.unraid.net/topic/110999-guide-on-how-to-stop-excessive-writes-destroying-your-cache-ssd/
  2. https://forum.openmediavault.org/index.php?thread/38480-plex-pms-container-creates-constant-disk-access-load/

I therefore ask that this feature be removed or that docs be updated to mention issues with this and how it can interfere with something as critical as DNS.

@chennin
Copy link

chennin commented Aug 21, 2023

What in the world is the random string of letters in the ?dns= query and how do I construct it myself?

This is apparently hard to find an answer to, but I found your question via Google when I had the same question, so here's the answer for anyone who needs it.

It seems Adguard doesn't take a name/type parameter. But here's how to craft the dns parameter it does take in one Python line. Note: requires dnspython >= 2.2.0, or for you to remove the ,id=0 piece. Change example.com as you wish.

    DOMAIN="example.com" python3 -c "import os, dns.message, requests, base64; req = dns.message.make_query(os.environ['DOMAIN'], 'A', id=0).to_wire(); print(base64.urlsafe_b64encode(req).decode('utf-8').rstrip('='))"

Note that the response will be binary, so do not use with curl and output to a terminal. Interpreting the response can also be done in Python, example here (use requests to get the response then operate on r.content).

(reference re padding and id: here and on)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

14 participants