Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commitment of clickhouse official image #15846

Merged
merged 6 commits into from
Nov 8, 2024

Conversation

Felixoid
Copy link
Contributor

@Felixoid Felixoid commented Dec 6, 2023

Hello, dear official library team. Here's our attempt to create an official image clickhouse in the scope of #14136 and ClickHouse/ClickHouse#31473

Both the docs and official images PRs are created simultaneously.

docker-library/docs#2397

Checklist for Review

NOTE: This checklist is intended for the use of the Official Images maintainers both to track the status of your PR and to help inform you and others of where we're at. As such, please leave the "checking" of items to the repository maintainers. If there is a point below for which you would like to provide additional information or note completion, please do so by commenting on the PR. Thanks! (and thanks for staying patient with us ❤️)

  • associated with or contacted upstream?
  • available under an OSI-approved license?
  • does it fit into one of the common categories? ("service", "language stack", "base distribution")
  • is it reasonably popular, or does it solve a particular use case well?
  • does a documentation PR exist? (should be reviewed and merged at roughly the same time so that we don't have an empty image page on the Hub for very long) Initial commitment of documentation for clickhouse official image docs#2397
  • official-images maintainer dockerization review for best practices and cache gotchas/improvements (ala the official review guidelines)?
  • 2+ official-images maintainer dockerization review?
  • existing official images have been considered as a base? (ie, if foobar needs Node.js, has FROM node:... instead of grabbing node via other means been considered?)
  • if FROM scratch, tarballs only exist in a single commit within the associated history?
  • passes current tests? any simple new tests that might be appropriate to add? (https://github.com/docker-library/official-images/tree/master/test)

@Felixoid
Copy link
Contributor Author

Felixoid commented Dec 6, 2023

This PR is still in draft mode to adjust the automation in ClickHouse/ClickHouse#57203

The branch now is in the WIP mode, so it will be regenerated to the refs/heads/master before the merge

@Felixoid
Copy link
Contributor Author

Felixoid commented Dec 6, 2023

Interesting case, I am addressing it in our master, which will regenerate the LDF

@tianon
Copy link
Member

tianon commented Dec 7, 2023

Unfortunately having the Dockerfile in the root of the upstream repository (while useful for local development) doesn't work well as the target for official-images. Since the Dockerfile uses COPY . ..., it essentially means the entire upstream source code is now part of the Docker build context, and thus becomes part of the reviewed artifacts (see diff-pr.sh for how we generate our diff; it is basically taking all docker build contexts from the starting library/foo file and diffing them to the new contexts from the PR).

This is why the "Diff Comment" CI job is failing 🙈

@tianon
Copy link
Member

tianon commented Dec 7, 2023

Apologies, I misread the PR -- I think that might actually be failing because it's backfilling too many versions 😅

@tianon
Copy link
Member

tianon commented Dec 7, 2023

Are all of these major.minor combinations still actively supported? (By which I mean that if they had a problem, there would be a new release/update?)

@Felixoid
Copy link
Contributor Author

Felixoid commented Dec 8, 2023

Hi Tianon, thanks for the review!

Indeed, the failure of Diff Comment is about the POST body.

Are all of these major.minor combinations still actively supported? (By which I mean that if they had a problem, there would be a new release/update?)

No, the file for currently supported versions is the following:

# The file is generated by https://github.com/ClickHouse/ClickHouse/blob/e2f425011c589c5e6d8b1d4f67ccd7a8730158a4/tests/ci/official_docker.py

Maintainers: Misha f. Shiryaev <[email protected]> (@Felixoid),
             Max Kainov <[email protected]> (@mkaynov),
             Alexander Sapin <[email protected]> (@alesapin)
GitRepo: https://github.com/ClickHouse/ClickHouse.git
GitFetch: refs/heads/docker-library
GitCommit: 43208ec354e4873f59508c45b81204b9f7bf439e

Tags: alpine, 23-alpine, 23.11-alpine, 23.11.1-alpine, 23.11.1.2711-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.11.1.2711
File: Dockerfile.alpine

Tags: latest, focal, 23, 23-focal, 23.11, 23.11-focal, 23.11.1, 23.11.1-focal, 23.11.1.2711, 23.11.1.2711-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.11.1.2711
File: Dockerfile.ubuntu

Tags: 23.10-alpine, 23.10.5-alpine, 23.10.5.20-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.10.5.20
File: Dockerfile.alpine

Tags: 23.10, 23.10-focal, 23.10.5, 23.10.5-focal, 23.10.5.20, 23.10.5.20-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.10.5.20
File: Dockerfile.ubuntu

Tags: 23.9-alpine, 23.9.6-alpine, 23.9.6.20-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.9.6.20
File: Dockerfile.alpine

Tags: 23.9, 23.9-focal, 23.9.6, 23.9.6-focal, 23.9.6.20, 23.9.6.20-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.9.6.20
File: Dockerfile.ubuntu

Tags: alpine-lts, 23.8-alpine, 23.8.8-alpine, 23.8.8.20-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.8.8.20
File: Dockerfile.alpine

Tags: lts, focal-lts, 23.8, 23.8-focal, 23.8.8, 23.8.8-focal, 23.8.8.20, 23.8.8.20-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.8.8.20
File: Dockerfile.ubuntu

Tags: 23.3-alpine, 23.3.18-alpine, 23.3.18.15-alpine
Architectures: amd64, arm64v8
Directory: docker/official/server/23.3.18.15
File: Dockerfile.alpine

Tags: 23.3, 23.3-focal, 23.3.18, 23.3.18-focal, 23.3.18.15, 23.3.18.15-focal
Architectures: amd64, arm64v8
Directory: docker/official/server/23.3.18.15
File: Dockerfile.ubuntu

I've generated all the versions that could be built at the moment, so following the next sentence from README:

When a new repository is proposed, it is common to include some older unsupported versions in the initial pull request with the agreement to remove them right after acceptance.

It's pretty easy to rebuild the file to any version, so I am very open to advice on what to use as the minimal one.

@tianon
Copy link
Member

tianon commented Dec 11, 2023

Ah, let's start with just 23.11 (unless there's a significant difference in the Dockerization for older versions) and see if that gets the diff small enough. If we start there and the delta between that and other versions is small, then future diffs will be small too because they'll all diff against the merged version instead of all versions being considered brand new code.

@tianon
Copy link
Member

tianon commented Dec 11, 2023

For a small bit of initial Dockerization review, you'll want to take a look at #10794 (Alpine + glibc is going to be a non-starter).

Additionally, having /bin/sh or /bin/bash as PID1 while the server is running (speaking of the while true loop I see in the entrypoint which is effectively acting as a poor-man's process supervisor) is going to be a very steep uphill battle to convince us it's stable and reliable, especially in the face of receiving signals and error handling / process death edge cases.

Beyond that, please be patient -- new image review requires much higher maintainer bandwidth than update review does, so we do have a backlog.

@Felixoid
Copy link
Contributor Author

Felixoid commented Dec 11, 2023

Ah, let's start with just 23.11

Yep, all versions have the same content now

For a small bit of initial Dockerization review, you'll want to take a look at #10794 (Alpine + glibc is going to be a non-starter).

I think it's related to the part COPY --from=glibc-donor? So, it looks like we won't be able to build alpine images. Looks fair to me, we can take them off the discussion

Additionally, having /bin/sh or /bin/bash as PID1 while the server is running (speaking of the while true loop I see in the entrypoint which is effectively acting as a poor-man's process supervisor) is going to be a very steep uphill battle to convince us it's stable and reliable, especially in the face of receiving signals and error handling / process death edge cases.

I feel you are referencing https://github.com/ClickHouse/ClickHouse/blob/master/docker/server/entrypoint.sh#L132-L139. It's the bringing the clickhouse-server up for the initial setup of DB. And it has the fuse to check that the server is actually finished after it. The real server works as the only process in the container after exec command. It's similar to docker_temp_server_start, docker_process_init_files, docker_temp_server_stop

Beyond that, please be patient -- new image review requires much higher maintainer bandwidth than update review does, so we do have a backlog.

Thank you! It's very nice of you to have a transparent process 🙏

This comment has been minimized.

@Felixoid Felixoid marked this pull request as ready for review January 8, 2024 11:48
@Felixoid
Copy link
Contributor Author

Felixoid commented Jan 8, 2024

Happy New Year, dear colleagues!

Is there anything we could do from our site to boost the review, maybe?

@Felixoid
Copy link
Contributor Author

Felixoid commented Feb 9, 2024

Hello, dear @tianon @yosifkit. It's already two months since the latest update. Any feedback?

@Felixoid
Copy link
Contributor Author

Colleagues? It's already three months since the last review round. Is it even possible to add the image to the library?

@melvynator
Copy link

@tianon Is it possible to have an estimate of when this PR can be reviewed?

@whalelines
Copy link
Contributor

We cannot give an estimate, but we will review this PR as soon as we can. Reviewing new DOI submissions is quite time intensive and we have to balance that commitment with ensuring the normal flow of updates to existing DOI.

It can help to speed things up if you ensure your PR adheres to our contribution guidelines, https://github.com/docker-library/official-images?tab=readme-ov-file#contributing-to-the-standard-library .

We apologize for the delay and truly appreciate your patience through this process.

This comment has been minimized.

@tianon
Copy link
Member

tianon commented Nov 7, 2024

Nice, thank you!

Regarding RUN if ! clickhouse local -q "SELECT ''" > /dev/null 2>&1; then, what if we swapped it to something like if true; then so it still matches really obviously for you, and is then obviously a no-op for us? Would that be an acceptable middle ground?

The last thing I see while re-reviewing (that I thought I'd included in my last round but I guess I missed!) was this block:

        mkdir="clickhouse su ""${USER}:${GROUP}"" mkdir"
    fi
    if ! $mkdir -p "$dir"; then

100% not a blocker, but if you wanted this to quote $USER:$GROUP properly in the actual mkdir invocation you'll want to use eval, something like this instead:

        mkdir='clickhouse su "${USER}:${GROUP}" mkdir'
    fi
    mkdir+=' -p "$dir"'
    if ! eval "$mkdir"; then

or, if you prefer, Bash arrays (which shellcheck might be less loud about):

        mkdir=( mkdir )
...
        mkdir=( clickhouse su "${USER}:${GROUP}" mkdir )
    fi
    if ! "${mkdir[@]}" -p "$dir"; then

Thank you for your patience overall on this -- I'll take a look at the docs PR now. 👀

@Felixoid
Copy link
Contributor Author

Felixoid commented Nov 7, 2024

Regarding RUN if ! clickhouse local -q "SELECT ''" > /dev/null 2>&1; then, what if we swapped it to something like if true; then so it still matches really obviously for you, and is then obviously a no-op for us? Would that be an acceptable middle ground?

Hmm, what about I change it to the other way around? clickhouse local -q "SELECT ''" && exit 0 || : ; \. So, it will work for us, and the RUN block won't be executed for special cases. And it looks a bit less dirty.

mkdir=( clickhouse su "${USER}:${GROUP}" mkdir ) looks like an excellent bashism to me. I'll add it to the review-related PR.

This comment has been minimized.

@Felixoid
Copy link
Contributor Author

Felixoid commented Nov 7, 2024

I've found when and how CLICKHOUSE_UID/CLICKHOUSE_GID was implemented ClickHouse/ClickHouse#11003

Not sure what exactly the issue was with openshift and if we can break it badly.

@tianon
Copy link
Member

tianon commented Nov 7, 2024

OpenShift runs the image with a totally random UID (--user) -- it looks like the use case for that was to run the image as root explicitly to support folks in user namespaces / "rootless mode" where root is mapped to their user on the host, but that's also arguably not a configuration I'd personally recommend (UID 0 in the kernel is still a special case in a lot of places/ways, so non-0 is suggested even in user namespaces). Perhaps a "yes I really want to run the image as root and not switch to the clickhouse user" variable would be sufficient?

Re clickhouse local, I can live with that change, thanks! ❤️

@Felixoid
Copy link
Contributor Author

Felixoid commented Nov 7, 2024

Perhaps a "yes I really want to run the image as root and not switch to the clickhouse user" variable would be sufficient?

So, should we replace CLICKHOUSE_UID/CLICKHOUSE_GID with CLICKHOUSE_RUN_AS_ROOT? That sounds fair and is a pretty solid implementation. Otherwise, people should use the --user argument to set the container UID/GID.

@tianon
Copy link
Member

tianon commented Nov 7, 2024

Yeah, that sounds a lot cleaner IMO (and pushes users to the more secure behavior) 👍 🚀

This comment has been minimized.

Copy link

github-actions bot commented Nov 8, 2024

Diff for fbf8b51:
diff --git a/_bashbrew-arches b/_bashbrew-arches
index 8b13789..e85a97f 100644
--- a/_bashbrew-arches
+++ b/_bashbrew-arches
@@ -1 +1,2 @@
-
+amd64
+arm64v8
diff --git a/_bashbrew-cat b/_bashbrew-cat
index bdfae4a..37c2135 100644
--- a/_bashbrew-cat
+++ b/_bashbrew-cat
@@ -1 +1,9 @@
-Maintainers: New Image! :D (@docker-library-bot)
+Maintainers: Misha f. Shiryaev <[email protected]> (@Felixoid), Max Kainov <[email protected]> (@mkaynov), Alexander Sapin <[email protected]> (@alesapin)
+GitRepo: https://github.com/ClickHouse/docker-library.git
+GitFetch: refs/heads/main
+GitCommit: 8f3278549dd6ac2832287c18c6a35a5f90eb3394
+
+Tags: latest, jammy, 24, 24-jammy, 24.10, 24.10-jammy, 24.10.1, 24.10.1-jammy, 24.10.1.2812, 24.10.1.2812-jammy
+Architectures: amd64, arm64v8
+Directory: server/24.10.1.2812
+File: Dockerfile.ubuntu
diff --git a/_bashbrew-list b/_bashbrew-list
index e69de29..9047fc8 100644
--- a/_bashbrew-list
+++ b/_bashbrew-list
@@ -0,0 +1,10 @@
+clickhouse:24
+clickhouse:24-jammy
+clickhouse:24.10
+clickhouse:24.10-jammy
+clickhouse:24.10.1
+clickhouse:24.10.1-jammy
+clickhouse:24.10.1.2812
+clickhouse:24.10.1.2812-jammy
+clickhouse:jammy
+clickhouse:latest
diff --git a/_bashbrew-list-build-order b/_bashbrew-list-build-order
index e69de29..669c034 100644
--- a/_bashbrew-list-build-order
+++ b/_bashbrew-list-build-order
@@ -0,0 +1 @@
+clickhouse:24.10.1.2812-jammy
diff --git a/clickhouse_24.10.1.2812-jammy/Dockerfile.ubuntu b/clickhouse_24.10.1.2812-jammy/Dockerfile.ubuntu
new file mode 100644
index 0000000..54b9722
--- /dev/null
+++ b/clickhouse_24.10.1.2812-jammy/Dockerfile.ubuntu
@@ -0,0 +1,84 @@
+FROM ubuntu:22.04
+
+# see https://github.com/moby/moby/issues/4032#issuecomment-192327844
+# It could be removed after we move on a version 23:04+
+ARG DEBIAN_FRONTEND=noninteractive
+
+# ARG for quick switch to a given ubuntu mirror
+ARG apt_archive="http://archive.ubuntu.com"
+
+# We shouldn't use `apt upgrade` to not change the upstream image. It's updated biweekly
+
+# user/group precreated explicitly with fixed uid/gid on purpose.
+# It is especially important for rootless containers: in that case entrypoint
+# can't do chown and owners of mounted volumes should be configured externally.
+# We do that in advance at the begining of Dockerfile before any packages will be
+# installed to prevent picking those uid / gid by some unrelated software.
+# The same uid / gid (101) is used both for alpine and ubuntu.
+RUN sed -i "s|http://archive.ubuntu.com|${apt_archive}|g" /etc/apt/sources.list \
+    && groupadd -r clickhouse --gid=101 \
+    && useradd -r -g clickhouse --uid=101 --home-dir=/var/lib/clickhouse --shell=/bin/bash clickhouse \
+    && apt-get update \
+    && apt-get install --yes --no-install-recommends \
+        ca-certificates \
+        locales \
+        tzdata \
+        wget \
+    && rm -rf /var/lib/apt/lists/* /var/cache/debconf /tmp/*
+
+ARG REPO_CHANNEL="stable"
+ARG REPOSITORY="deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb ${REPO_CHANNEL} main"
+ARG VERSION="24.10.1.2812"
+ARG PACKAGES="clickhouse-client clickhouse-server clickhouse-common-static"
+
+
+# A fallback to installation from ClickHouse repository
+# It works unless the clickhouse binary already exists
+RUN clickhouse local -q 'SELECT 1' >/dev/null 2>&1 && exit 0 || : \
+    ; apt-get update \
+    && apt-get install --yes --no-install-recommends \
+        dirmngr \
+        gnupg2 \
+    && mkdir -p /etc/apt/sources.list.d \
+    && GNUPGHOME=$(mktemp -d) \
+    && GNUPGHOME="$GNUPGHOME" gpg --batch --no-default-keyring \
+        --keyring /usr/share/keyrings/clickhouse-keyring.gpg \
+        --keyserver hkp://keyserver.ubuntu.com:80 \
+        --recv-keys 3a9ea1193a97b548be1457d48919f6bd2b48d754 \
+    && rm -rf "$GNUPGHOME" \
+    && chmod +r /usr/share/keyrings/clickhouse-keyring.gpg \
+    && echo "${REPOSITORY}" > /etc/apt/sources.list.d/clickhouse.list \
+    && echo "installing from repository: ${REPOSITORY}" \
+    && apt-get update \
+    && for package in ${PACKAGES}; do \
+        packages="${packages} ${package}=${VERSION}" \
+    ; done \
+    && apt-get install --yes --no-install-recommends ${packages} || exit 1 \
+    && rm -rf \
+        /var/lib/apt/lists/* \
+        /var/cache/debconf \
+        /tmp/* \
+    && apt-get autoremove --purge -yq dirmngr gnupg2
+
+# post install
+# we need to allow "others" access to clickhouse folder, because docker container
+# can be started with arbitrary uid (openshift usecase)
+RUN clickhouse-local -q 'SELECT * FROM system.build_options' \
+    && mkdir -p /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client \
+    && chmod ugo+Xrw -R /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client
+
+RUN locale-gen en_US.UTF-8
+ENV LANG en_US.UTF-8
+ENV TZ UTC
+
+RUN mkdir /docker-entrypoint-initdb.d
+
+COPY docker_related_config.xml /etc/clickhouse-server/config.d/
+COPY entrypoint.sh /entrypoint.sh
+
+EXPOSE 9000 8123 9009
+VOLUME /var/lib/clickhouse
+
+ENV CLICKHOUSE_CONFIG /etc/clickhouse-server/config.xml
+
+ENTRYPOINT ["/entrypoint.sh"]
diff --git a/clickhouse_24.10.1.2812-jammy/docker_related_config.xml b/clickhouse_24.10.1.2812-jammy/docker_related_config.xml
new file mode 100644
index 0000000..3025dc2
--- /dev/null
+++ b/clickhouse_24.10.1.2812-jammy/docker_related_config.xml
@@ -0,0 +1,12 @@
+<clickhouse>
+     <!-- Listen wildcard address to allow accepting connections from other containers and host network. -->
+    <listen_host>::</listen_host>
+    <listen_host>0.0.0.0</listen_host>
+    <listen_try>1</listen_try>
+
+    <!--
+    <logger>
+        <console>1</console>
+    </logger>
+    -->
+</clickhouse>
diff --git a/clickhouse_24.10.1.2812-jammy/entrypoint.sh b/clickhouse_24.10.1.2812-jammy/entrypoint.sh
new file mode 100755
index 0000000..2f87008
--- /dev/null
+++ b/clickhouse_24.10.1.2812-jammy/entrypoint.sh
@@ -0,0 +1,222 @@
+#!/bin/bash
+
+set -eo pipefail
+shopt -s nullglob
+
+DO_CHOWN=1
+if [[ "${CLICKHOUSE_RUN_AS_ROOT:=0}" = "1" || "${CLICKHOUSE_DO_NOT_CHOWN:-0}" = "1" ]]; then
+    DO_CHOWN=0
+fi
+
+# CLICKHOUSE_UID and CLICKHOUSE_GID are kept for backward compatibility, but deprecated
+# One must use either "docker run --user" or CLICKHOUSE_RUN_AS_ROOT=1 to run the process as
+# FIXME: Remove ALL CLICKHOUSE_UID CLICKHOUSE_GID before 25.3
+if [[ "${CLICKHOUSE_UID:-}" || "${CLICKHOUSE_GID:-}" ]]; then
+    echo 'WARNING: Support for CLICKHOUSE_UID/CLICKHOUSE_GID will be removed in a couple of releases.' >&2
+    echo 'WARNING: Either use a proper "docker run --user=xxx:xxxx" argument instead of CLICKHOUSE_UID/CLICKHOUSE_GID' >&2
+    echo 'WARNING: or set "CLICKHOUSE_RUN_AS_ROOT=1" ENV to run the clickhouse-server as root:root' >&2
+fi
+
+# support `docker run --user=xxx:xxxx`
+if [[ "$(id -u)" = "0" ]]; then
+    if [[ "$CLICKHOUSE_RUN_AS_ROOT" = 1 ]]; then
+        USER=0
+        GROUP=0
+    else
+        USER="${CLICKHOUSE_UID:-"$(id -u clickhouse)"}"
+        GROUP="${CLICKHOUSE_GID:-"$(id -g clickhouse)"}"
+    fi
+else
+    USER="$(id -u)"
+    GROUP="$(id -g)"
+    DO_CHOWN=0
+fi
+
+# set some vars
+CLICKHOUSE_CONFIG="${CLICKHOUSE_CONFIG:-/etc/clickhouse-server/config.xml}"
+
+# get CH directories locations
+DATA_DIR="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=path || true)"
+TMP_DIR="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=tmp_path || true)"
+USER_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=user_files_path || true)"
+LOG_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=logger.log || true)"
+LOG_DIR=""
+if [ -n "$LOG_PATH" ]; then LOG_DIR="$(dirname "$LOG_PATH")"; fi
+ERROR_LOG_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=logger.errorlog || true)"
+ERROR_LOG_DIR=""
+if [ -n "$ERROR_LOG_PATH" ]; then ERROR_LOG_DIR="$(dirname "$ERROR_LOG_PATH")"; fi
+FORMAT_SCHEMA_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=format_schema_path || true)"
+
+# There could be many disks declared in config
+readarray -t DISKS_PATHS < <(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key='storage_configuration.disks.*.path' || true)
+readarray -t DISKS_METADATA_PATHS < <(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key='storage_configuration.disks.*.metadata_path' || true)
+
+CLICKHOUSE_USER="${CLICKHOUSE_USER:-default}"
+CLICKHOUSE_PASSWORD_FILE="${CLICKHOUSE_PASSWORD_FILE:-}"
+if [[ -n "${CLICKHOUSE_PASSWORD_FILE}" && -f "${CLICKHOUSE_PASSWORD_FILE}" ]]; then
+  CLICKHOUSE_PASSWORD="$(cat "${CLICKHOUSE_PASSWORD_FILE}")"
+fi
+CLICKHOUSE_PASSWORD="${CLICKHOUSE_PASSWORD:-}"
+CLICKHOUSE_DB="${CLICKHOUSE_DB:-}"
+CLICKHOUSE_ACCESS_MANAGEMENT="${CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT:-0}"
+
+function create_directory_and_do_chown() {
+    local dir=$1
+    # check if variable not empty
+    [ -z "$dir" ] && return
+    # ensure directories exist
+    if [ "$DO_CHOWN" = "1" ]; then
+        mkdir=( mkdir )
+    else
+        # if DO_CHOWN=0 it means that the system does not map root user to "admin" permissions
+        # it mainly happens on NFS mounts where root==nobody for security reasons
+        # thus mkdir MUST run with user id/gid and not from nobody that has zero permissions
+        mkdir=( clickhouse su "${USER}:${GROUP}" mkdir )
+    fi
+    if ! "${mkdir[@]}" -p "$dir"; then
+        echo "Couldn't create necessary directory: $dir"
+        exit 1
+    fi
+
+    if [ "$DO_CHOWN" = "1" ]; then
+        # ensure proper directories permissions
+        # but skip it for if directory already has proper premissions, cause recursive chown may be slow
+        if [ "$(stat -c %u "$dir")" != "$USER" ] || [ "$(stat -c %g "$dir")" != "$GROUP" ]; then
+            chown -R "$USER:$GROUP" "$dir"
+        fi
+    fi
+}
+
+create_directory_and_do_chown "$DATA_DIR"
+
+# Change working directory to $DATA_DIR in case there're paths relative to $DATA_DIR, also avoids running
+# clickhouse-server at root directory.
+cd "$DATA_DIR"
+
+for dir in "$ERROR_LOG_DIR" \
+  "$LOG_DIR" \
+  "$TMP_DIR" \
+  "$USER_PATH" \
+  "$FORMAT_SCHEMA_PATH" \
+  "${DISKS_PATHS[@]}" \
+  "${DISKS_METADATA_PATHS[@]}"
+do
+    create_directory_and_do_chown "$dir"
+done
+
+# if clickhouse user is defined - create it (user "default" already exists out of box)
+if [ -n "$CLICKHOUSE_USER" ] && [ "$CLICKHOUSE_USER" != "default" ] || [ -n "$CLICKHOUSE_PASSWORD" ] || [ "$CLICKHOUSE_ACCESS_MANAGEMENT" != "0" ]; then
+    echo "$0: create new user '$CLICKHOUSE_USER' instead 'default'"
+    cat <<EOT > /etc/clickhouse-server/users.d/default-user.xml
+    <clickhouse>
+      <!-- Docs: <https://clickhouse.com/docs/en/operations/settings/settings_users/> -->
+      <users>
+        <!-- Remove default user -->
+        <default remove="remove">
+        </default>
+
+        <${CLICKHOUSE_USER}>
+          <profile>default</profile>
+          <networks>
+            <ip>::/0</ip>
+          </networks>
+          <password><![CDATA[${CLICKHOUSE_PASSWORD//]]>/]]]]><![CDATA[>}]]></password>
+          <quota>default</quota>
+          <access_management>${CLICKHOUSE_ACCESS_MANAGEMENT}</access_management>
+        </${CLICKHOUSE_USER}>
+      </users>
+    </clickhouse>
+EOT
+fi
+
+CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS="${CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS:-}"
+
+# checking $DATA_DIR for initialization
+if [ -d "${DATA_DIR%/}/data" ]; then
+    DATABASE_ALREADY_EXISTS='true'
+fi
+
+# run initialization if flag CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS is not empty or data directory is empty
+if [[ -n "${CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS}" || -z "${DATABASE_ALREADY_EXISTS}" ]]; then
+  RUN_INITDB_SCRIPTS='true'
+fi
+
+if [ -n "${RUN_INITDB_SCRIPTS}" ]; then
+    if [ -n "$(ls /docker-entrypoint-initdb.d/)" ] || [ -n "$CLICKHOUSE_DB" ]; then
+        # port is needed to check if clickhouse-server is ready for connections
+        HTTP_PORT="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=http_port --try)"
+        HTTPS_PORT="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=https_port --try)"
+
+        if [ -n "$HTTP_PORT" ]; then
+            URL="http://127.0.0.1:$HTTP_PORT/ping"
+        else
+            URL="https://127.0.0.1:$HTTPS_PORT/ping"
+        fi
+
+        # Listen only on localhost until the initialization is done
+        clickhouse su "${USER}:${GROUP}" clickhouse-server --config-file="$CLICKHOUSE_CONFIG" -- --listen_host=127.0.0.1 &
+        pid="$!"
+
+        # check if clickhouse is ready to accept connections
+        # will try to send ping clickhouse via http_port (max 1000 retries by default, with 1 sec timeout and 1 sec delay between retries)
+        tries=${CLICKHOUSE_INIT_TIMEOUT:-1000}
+        while ! wget --spider --no-check-certificate -T 1 -q "$URL" 2>/dev/null; do
+            if [ "$tries" -le "0" ]; then
+                echo >&2 'ClickHouse init process failed.'
+                exit 1
+            fi
+            tries=$(( tries-1 ))
+            sleep 1
+        done
+
+        clickhouseclient=( clickhouse-client --multiquery --host "127.0.0.1" -u "$CLICKHOUSE_USER" --password "$CLICKHOUSE_PASSWORD" )
+
+        echo
+
+        # create default database, if defined
+        if [ -n "$CLICKHOUSE_DB" ]; then
+            echo "$0: create database '$CLICKHOUSE_DB'"
+            "${clickhouseclient[@]}" -q "CREATE DATABASE IF NOT EXISTS $CLICKHOUSE_DB";
+        fi
+
+        for f in /docker-entrypoint-initdb.d/*; do
+            case "$f" in
+                *.sh)
+                    if [ -x "$f" ]; then
+                        echo "$0: running $f"
+                        "$f"
+                    else
+                        echo "$0: sourcing $f"
+                        # shellcheck source=/dev/null
+                        . "$f"
+                    fi
+                    ;;
+                *.sql)    echo "$0: running $f"; "${clickhouseclient[@]}" < "$f" ; echo ;;
+                *.sql.gz) echo "$0: running $f"; gunzip -c "$f" | "${clickhouseclient[@]}"; echo ;;
+                *)        echo "$0: ignoring $f" ;;
+            esac
+            echo
+        done
+
+        if ! kill -s TERM "$pid" || ! wait "$pid"; then
+            echo >&2 'Finishing of ClickHouse init process failed.'
+            exit 1
+        fi
+    fi
+else
+    echo "ClickHouse Database directory appears to contain a database; Skipping initialization"
+fi
+
+# if no args passed to `docker run` or first argument start with `--`, then the user is passing clickhouse-server arguments
+if [[ $# -lt 1 ]] || [[ "$1" == "--"* ]]; then
+    # Watchdog is launched by default, but does not send SIGINT to the main process,
+    # so the container can't be finished by ctrl+c
+    CLICKHOUSE_WATCHDOG_ENABLE=${CLICKHOUSE_WATCHDOG_ENABLE:-0}
+    export CLICKHOUSE_WATCHDOG_ENABLE
+
+    # This replaces the shell script with the server:
+    exec clickhouse su "${USER}:${GROUP}" clickhouse-server --config-file="$CLICKHOUSE_CONFIG" "$@"
+fi
+
+# Otherwise, we assume the user want to run his own process, for example a `bash` shell to explore this image
+exec "$@"

Copy link
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing docker-library/docs#2397 (and we need to merge both together), but this looks solid IMO now ❤️

@Felixoid
Copy link
Contributor Author

Felixoid commented Nov 8, 2024

Should we add more images from the supported and/or historical branches in another PR?

@tianon
Copy link
Member

tianon commented Nov 8, 2024

Yep, let's do anything else in a follow-up; don't jinx it 😂 ❤️

(from our side, I just want @yosifkit to give this another set of eyes before we pull the trigger)

@yosifkit
Copy link
Member

yosifkit commented Nov 8, 2024

&& chmod ugo+Xrw -R /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client

To minimize duplication across layers, chmod's should be done in the layer that creates the file/folder. Some of these folders are empty or new this layer, but others (/etc/clickhouse-*) have files that come from an earlier layer. Not a blocker this time, but I'd recommend an improvement to chmod them in the RUN layer in which they are created.

@yosifkit yosifkit merged commit 8448627 into docker-library:master Nov 8, 2024
6 checks passed
@Felixoid
Copy link
Contributor Author

Felixoid commented Nov 9, 2024

I can't say how grateful we are for this merge!

Seeing it alive is so satisfying

> docker pull clickhouse
Using default tag: latest
latest: Pulling from library/clickhouse
6414378b6477: Already exists
178d6bab72f0: Pull complete
a239d72431be: Pull complete
d9916e2f35d3: Pull complete
9762342e3969: Pull complete
3683edf9dd01: Pull complete
30f5422ae8a5: Pull complete
c5d92e4d9909: Pull complete
Digest: sha256:b7b9951c19ff195b1e6b832ef8679a46605a96ee5ec36cf8d94f199c30d810b8
Status: Downloaded newer image for clickhouse:latest
docker.io/library/clickhouse:latest

I am preparing the follow-up to supported releases

@Felixoid
Copy link
Contributor Author

Felixoid commented Nov 9, 2024

Maybe the thing worth to leave solved here is the chmod

I've tested it, and the layer has ~100kB of data. It's not nice suboptimal.

On the other hand, we have a lot of different installation steps that aren't in the DOI Dockerfile.

Do you consider the following change as enough compromise to keep the line && chmod ugo+Xrw -R /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client as is?

diff --git a/docker/server/Dockerfile.ubuntu b/docker/server/Dockerfile.ubuntu
index 0fe9a409ee4..e6bde845c4e 100644
--- a/docker/server/Dockerfile.ubuntu
+++ b/docker/server/Dockerfile.ubuntu
@@ -113,7 +113,9 @@ RUN clickhouse local -q 'SELECT 1' >/dev/null 2>&1 && exit 0 || : \
         /var/lib/apt/lists/* \
         /var/cache/debconf \
         /tmp/* \
-    && apt-get autoremove --purge -yq dirmngr gnupg2
+    && apt-get autoremove --purge -yq dirmngr gnupg2 \
+    && chmod ugo+Xrw -R /etc/clickhouse-server /etc/clickhouse-client
+# The last chmod is here to make the next one is No-op in docker official library Dockerfile

I checked the docker history, and with this change the layer has 0B of diff

> docker history --no-trunc clickhouse-server:latest | grep chmod
<missing>                                                                 5 seconds ago   RUN |10 DEBIAN_FRONTEND=noninteractive apt_archive=http://archive.ubuntu.com REPO_CHANNEL=stable REPOSITORY=deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main VERSION=24.10.1.2812 PACKAGES=clickhouse-client clickhouse-server clickhouse-common-static deb_location_url= DIRECT_DOWNLOAD_URLS= single_binary_location_url= TARGETARCH=amd64 /bin/sh -c clickhouse-local -q 'SELECT * FROM system.build_options'     && mkdir -p /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client     && chmod ugo+Xrw -R /var/lib/clickhouse /var/log/clickhouse-server /etc/clickhouse-server /etc/clickhouse-client # buildkit                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           0B        buildkit.dockerfile.v0
<missing>                                                                 5 seconds ago   RUN |10 DEBIAN_FRONTEND=noninteractive apt_archive=http://archive.ubuntu.com REPO_CHANNEL=stable REPOSITORY=deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg] https://packages.clickhouse.com/deb stable main VERSION=24.10.1.2812 PACKAGES=clickhouse-client clickhouse-server clickhouse-common-static deb_location_url= DIRECT_DOWNLOAD_URLS= single_binary_location_url= TARGETARCH=amd64 /bin/sh -c clickhouse local -q 'SELECT 1' >/dev/null 2>&1 && exit 0 || :     ; apt-get update     && apt-get install --yes --no-install-recommends         dirmngr         gnupg2     && mkdir -p /etc/apt/sources.list.d     && GNUPGHOME=$(mktemp -d)     && GNUPGHOME="$GNUPGHOME" gpg --batch --no-default-keyring         --keyring /usr/share/keyrings/clickhouse-keyring.gpg         --keyserver hkp://keyserver.ubuntu.com:80         --recv-keys 3a9ea1193a97b548be1457d48919f6bd2b48d754     && rm -rf "$GNUPGHOME"     && chmod +r /usr/share/keyrings/clickhouse-keyring.gpg     && echo "${REPOSITORY}" > /etc/apt/sources.list.d/clickhouse.list    && echo "installing from repository: ${REPOSITORY}"     && apt-get update     && for package in ${PACKAGES}; do         packages="${packages} ${package}=${VERSION}"     ; done     && apt-get install --yes --no-install-recommends ${packages} || exit 1     && rm -rf         /var/lib/apt/lists/*         /var/cache/debconf         /tmp/*     && apt-get autoremove --purge -yq dirmngr gnupg2     && chmod ugo+Xrw -R /etc/clickhouse-server /etc/clickhouse-client # buildkit   517MB     buildkit.dockerfile.v0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants