re-implement daemonize #8011

richvdh · 2020-07-31T17:28:54Z

This has long been something I've wanted to do. Basically the Daemonize code
is both too flexible and not flexible enough, in that it offers a bunch of
features that we don't use (changing UID, closing FDs in the child, logging to
syslog) and doesn't offer a bunch that we could do with (redirecting stdout/err
to a file instead of /dev/null; having the parent not exit until the child is
running).

As a first step, I've lifted the Daemonize code and removed the bits we don't
use. This should be a non-functional change. Fixing everything else will come
later.

The reveiewer is invited to compare with https://github.com/thesharp/daemonize/blob/master/daemonize.py.

This has long been something I've wanted to do. Basically the `Daemonize` code is both too flexible and not flexible enough, in that it offers a bunch of features that we don't use (changing UID, closing FDs in the child, logging to syslog) and doesn't offer a bunch that we could do with (redirecting stdout/err to a file instead of /dev/null; having the parent not exit until the child is running). As a first step, I've lifted the Daemonize code and removed the bits we don't use. This should be a non-functional change. Fixing everything else will come later.

clokep · 2020-08-03T14:22:50Z

The reveiewer is invited to compare with thesharp/daemonize@master/daemonize.py.

See a diff

--- daemonize.py	2020-08-03 10:19:48.000000000 -0400
+++ synapse/util/daemonize.py	2020-08-03 10:19:39.000000000 -0400
@@ -1,251 +1,120 @@
-# #!/usr/bin/python
+# -*- coding: utf-8 -*-
+# Copyright (c) 2012, 2013, 2014 Ilya Otyutskiy <[email protected]>
+# Copyright 2020 The Matrix.org Foundation C.I.C.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
 
+import atexit
 import fcntl
+import logging
 import os
-import pwd
-import grp
-import sys
 import signal
-import resource
-import logging
-import atexit
-from logging import handlers
-import traceback
-
+import sys
 
-__version__ = "2.5.0"
 
+def daemonize_process(pid_file: str, logger: logging.Logger, chdir: str = "/") -> None:
+    """daemonize the current process
 
-class Daemonize(object):
+    This calls fork(), and has the main process exit. When it returns we will be
+    running in the child process.
     """
-    Daemonize object.
 
-    Object constructor expects three arguments.
+    # If pidfile already exists, we should read pid from there; to overwrite it, if
+    # locking will fail, because locking attempt somehow purges the file contents.
+    if os.path.isfile(pid_file):
+        with open(pid_file, "r") as pid_fh:
+            old_pid = pid_fh.read()
+
+    # Create a lockfile so that only one instance of this daemon is running at any time.
+    try:
+        lock_fh = open(pid_file, "w")
+    except IOError:
+        print("Unable to create the pidfile.")
+        sys.exit(1)
+
+    try:
+        # Try to get an exclusive lock on the file. This will fail if another process
+        # has the file locked.
+        fcntl.flock(lock_fh, fcntl.LOCK_EX | fcntl.LOCK_NB)
+    except IOError:
+        print("Unable to lock on the pidfile.")
+        # We need to overwrite the pidfile if we got here.
+        #
+        # XXX better to avoid overwriting it, surely. this looks racey as the pid file
+        # could be created between us trying to read it and us trying to lock it.
+        with open(pid_file, "w") as pid_fh:
+            pid_fh.write(old_pid)
+        sys.exit(1)
 
-    :param app: contains the application name which will be sent to syslog.
-    :param pid: path to the pidfile.
-    :param action: your custom function which will be executed after daemonization.
-    :param keep_fds: optional list of fds which should not be closed.
-    :param auto_close_fds: optional parameter to not close opened fds.
-    :param privileged_action: action that will be executed before drop privileges if user or
-                              group parameter is provided.
-                              If you want to transfer anything from privileged_action to action, such as
-                              opened privileged file descriptor, you should return it from
-                              privileged_action function and catch it inside action function.
-    :param user: drop privileges to this user if provided.
-    :param group: drop privileges to this group if provided.
-    :param verbose: send debug messages to logger if provided.
-    :param logger: use this logger object instead of creating new one, if provided.
-    :param foreground: stay in foreground; do not fork (for debugging)
-    :param chdir: change working directory if provided or /
-    """
-    def __init__(self, app, pid, action,
-                 keep_fds=None, auto_close_fds=True, privileged_action=None,
-                 user=None, group=None, verbose=False, logger=None,
-                 foreground=False, chdir="/"):
-        self.app = app
-        self.pid = os.path.abspath(pid)
-        self.action = action
-        self.keep_fds = keep_fds or []
-        self.privileged_action = privileged_action or (lambda: ())
-        self.user = user
-        self.group = group
-        self.logger = logger
-        self.verbose = verbose
-        self.auto_close_fds = auto_close_fds
-        self.foreground = foreground
-        self.chdir = chdir
-
-    def sigterm(self, signum, frame):
-        """
-        These actions will be done after SIGTERM.
-        """
-        self.logger.warning("Caught signal %s. Stopping daemon." % signum)
+    # Fork, creating a new process for the child.
+    process_id = os.fork()
+
+    if process_id != 0:
+        # parent process
         sys.exit(0)
 
-    def exit(self):
-        """
-        Cleanup pid file at exit.
-        """
-        self.logger.warning("Stopping daemon.")
-        os.remove(self.pid)
+    # This is the child process. Continue.
+
+    # Stop listening for signals that the parent process receives.
+    # This is done by getting a new process id.
+    # setpgrp() is an alternative to setsid().
+    # setsid puts the process in a new parent group and detaches its controlling
+    # terminal.
+
+    os.setsid()
+
+    # point stdin, stdout, stderr at /dev/null
+    devnull = "/dev/null"
+    if hasattr(os, "devnull"):
+        # Python has set os.devnull on this system, use it instead as it might be
+        # different than /dev/null.
+        devnull = os.devnull
+
+    devnull_fd = os.open(devnull, os.O_RDWR)
+    os.dup2(devnull_fd, 0)
+    os.dup2(devnull_fd, 1)
+    os.dup2(devnull_fd, 2)
+    os.close(devnull_fd)
+
+    # Set umask to default to safe file permissions when running as a root daemon. 027
+    # is an octal number which we are typing as 0o27 for Python3 compatibility.
+    os.umask(0o27)
+
+    # Change to a known directory. If this isn't done, starting a daemon in a
+    # subdirectory that needs to be deleted results in "directory busy" errors.
+    os.chdir(chdir)
+
+    try:
+        lock_fh.write("%s" % (os.getpid()))
+        lock_fh.flush()
+    except IOError:
+        logger.error("Unable to write pid to the pidfile.")
+        print("Unable to write pid to the pidfile.")
+        sys.exit(1)
+
+    # write a log line on SIGTERM.
+    def sigterm(signum, frame):
+        logger.warning("Caught signal %s. Stopping daemon." % signum)
         sys.exit(0)
 
-    def start(self):
-        """
-        Start daemonization process.
-        """
-        # If pidfile already exists, we should read pid from there; to overwrite it, if locking
-        # will fail, because locking attempt somehow purges the file contents.
-        if os.path.isfile(self.pid):
-            with open(self.pid, "r") as old_pidfile:
-                old_pid = old_pidfile.read()
-        # Create a lockfile so that only one instance of this daemon is running at any time.
-        try:
-            lockfile = open(self.pid, "w")
-        except IOError:
-            print("Unable to create the pidfile.")
-            sys.exit(1)
-        try:
-            # Try to get an exclusive lock on the file. This will fail if another process has the file
-            # locked.
-            fcntl.flock(lockfile, fcntl.LOCK_EX | fcntl.LOCK_NB)
-        except IOError:
-            print("Unable to lock on the pidfile.")
-            # We need to overwrite the pidfile if we got here.
-            with open(self.pid, "w") as pidfile:
-                pidfile.write(old_pid)
-            sys.exit(1)
-
-        # skip fork if foreground is specified
-        if not self.foreground:
-            # Fork, creating a new process for the child.
-            try:
-                process_id = os.fork()
-            except OSError as e:
-                self.logger.error("Unable to fork, errno: {0}".format(e.errno))
-                sys.exit(1)
-            if process_id != 0:
-                if self.keep_fds:
-                    # This is the parent process. Exit without cleanup,
-                    # see https://github.com/thesharp/daemonize/issues/46
-                    os._exit(0)
-                else:
-                    sys.exit(0)
-            # This is the child process. Continue.
-
-            # Stop listening for signals that the parent process receives.
-            # This is done by getting a new process id.
-            # setpgrp() is an alternative to setsid().
-            # setsid puts the process in a new parent group and detaches its controlling terminal.
-            process_id = os.setsid()
-            if process_id == -1:
-                # Uh oh, there was a problem.
-                sys.exit(1)
-
-            # Add lockfile to self.keep_fds.
-            self.keep_fds.append(lockfile.fileno())
-
-            # Close all file descriptors, except the ones mentioned in self.keep_fds.
-            devnull = "/dev/null"
-            if hasattr(os, "devnull"):
-                # Python has set os.devnull on this system, use it instead as it might be different
-                # than /dev/null.
-                devnull = os.devnull
-
-            if self.auto_close_fds:
-                for fd in range(3, resource.getrlimit(resource.RLIMIT_NOFILE)[0]):
-                    if fd not in self.keep_fds:
-                        try:
-                            os.close(fd)
-                        except OSError:
-                            pass
-
-            devnull_fd = os.open(devnull, os.O_RDWR)
-            os.dup2(devnull_fd, 0)
-            os.dup2(devnull_fd, 1)
-            os.dup2(devnull_fd, 2)
-            os.close(devnull_fd)
-
-        if self.logger is None:
-            # Initialize logging.
-            self.logger = logging.getLogger(self.app)
-            self.logger.setLevel(logging.DEBUG)
-            # Display log messages only on defined handlers.
-            self.logger.propagate = False
-
-            # Initialize syslog.
-            # It will correctly work on OS X, Linux and FreeBSD.
-            if sys.platform == "darwin":
-                syslog_address = "/var/run/syslog"
-            else:
-                syslog_address = "/dev/log"
-
-            # We will continue with syslog initialization only if actually have such capabilities
-            # on the machine we are running this.
-            if os.path.exists(syslog_address):
-                syslog = handlers.SysLogHandler(syslog_address)
-                if self.verbose:
-                    syslog.setLevel(logging.DEBUG)
-                else:
-                    syslog.setLevel(logging.INFO)
-                # Try to mimic to normal syslog messages.
-                formatter = logging.Formatter("%(asctime)s %(name)s: %(message)s",
-                                              "%b %e %H:%M:%S")
-                syslog.setFormatter(formatter)
-
-                self.logger.addHandler(syslog)
-
-        # Set umask to default to safe file permissions when running as a root daemon. 027 is an
-        # octal number which we are typing as 0o27 for Python3 compatibility.
-        os.umask(0o27)
-
-        # Change to a known directory. If this isn't done, starting a daemon in a subdirectory that
-        # needs to be deleted results in "directory busy" errors.
-        os.chdir(self.chdir)
-
-        # Execute privileged action
-        privileged_action_result = self.privileged_action()
-        if not privileged_action_result:
-            privileged_action_result = []
-
-        # Change owner of pid file, it's required because pid file will be removed at exit.
-        uid, gid = -1, -1
-
-        if self.group:
-            try:
-                gid = grp.getgrnam(self.group).gr_gid
-            except KeyError:
-                self.logger.error("Group {0} not found".format(self.group))
-                sys.exit(1)
-
-        if self.user:
-            try:
-                uid = pwd.getpwnam(self.user).pw_uid
-            except KeyError:
-                self.logger.error("User {0} not found.".format(self.user))
-                sys.exit(1)
-
-        if uid != -1 or gid != -1:
-            os.chown(self.pid, uid, gid)
-
-        # Change gid
-        if self.group:
-            try:
-                os.setgid(gid)
-            except OSError:
-                self.logger.error("Unable to change gid.")
-                sys.exit(1)
-
-        # Change uid
-        if self.user:
-            try:
-                uid = pwd.getpwnam(self.user).pw_uid
-            except KeyError:
-                self.logger.error("User {0} not found.".format(self.user))
-                sys.exit(1)
-            try:
-                os.setuid(uid)
-            except OSError:
-                self.logger.error("Unable to change uid.")
-                sys.exit(1)
-
-        try:
-            lockfile.write("%s" % (os.getpid()))
-            lockfile.flush()
-        except IOError:
-            self.logger.error("Unable to write pid to the pidfile.")
-            print("Unable to write pid to the pidfile.")
-            sys.exit(1)
-
-        # Set custom action on SIGTERM.
-        signal.signal(signal.SIGTERM, self.sigterm)
-        atexit.register(self.exit)
-
-        self.logger.warning("Starting daemon.")
-
-        try:
-            self.action(*privileged_action_result)
-        except Exception:
-            for line in traceback.format_exc().split("\n"):
-                self.logger.error(line)
+    signal.signal(signal.SIGTERM, sigterm)
+
+    # Cleanup pid file at exit.
+    def exit():
+        logger.warning("Stopping daemon.")
+        os.remove(pid_file)
+        sys.exit(0)
+
+    atexit.register(exit)
+
+    logger.warning("Starting daemon.")

Edit: This doesn't really seem too useful since there's a conversion from a class to a function. The non-whitespace version didn't seem to help either.

clokep

Seems OK. I left a couple of queries about removed error handling.

clokep · 2020-08-03T14:44:17Z

synapse/util/daemonize.py

+    # Fork, creating a new process for the child.
+    process_id = os.fork()


Curious why the error handling code here was removed from the original library?

basically: I kinda felt that we should be allowing exceptions raised in this function to propagate, rather than suddenly calling sys.exit. It's not particularly clear though. I could be persuaded to go either way.

The result should be the same either way. Showing the stack trace should give more info, I suppose. 👍

clokep · 2020-08-03T14:45:28Z

synapse/util/daemonize.py

+    # Stop listening for signals that the parent process receives.
+    # This is done by getting a new process id.
+    # setpgrp() is an alternative to setsid().
+    # setsid puts the process in a new parent group and detaches its controlling
+    # terminal.
+
+    os.setsid()


Again, this differs from the original code by removing error handling.

os.setsid returns None, so that was dead code.

richvdh · 2020-08-03T14:54:44Z

synapse/app/_base.py

-        else:
-            run()
+            daemonize_process(pid_file, logger)
+        run()


I've realised that the original code wrapped this with a try/except for a reason, which is that after stderr gets redirected to /dev/null, any uncaught exceptions will otherwise get thrown away.

I'm currently wondering how best to solve that.

This replaces the try/catch that Daemonize has around the application callback.

synapse/util/daemonize.py

Co-authored-by: Patrick Cloke <[email protected]>

@lugino-emeritus

Synapse 1.19.0rc1 (2020-08-13) ============================== Removal warning --------------- As outlined in the [previous release](https://github.com/matrix-org/synapse/releases/tag/v1.18.0), we are no longer publishing Docker images with the `-py3` tag suffix. On top of that, we have also removed the `latest-py3` tag. Please see [the announcement in the upgrade notes for 1.18.0](https://github.com/matrix-org/synapse/blob/develop/UPGRADE.rst#upgrading-to-v1180). Features -------- - Add option to allow server admins to join rooms which fail complexity checks. Contributed by @lugino-emeritus. ([\#7902](#7902)) - Add an option to purge room or not with delete room admin endpoint (`POST /_synapse/admin/v1/rooms/<room_id>/delete`). Contributed by @dklimpel. ([\#7964](#7964)) - Add rate limiting to users joining rooms. ([\#8008](#8008)) - Add a `/health` endpoint to every configured HTTP listener that can be used as a health check endpoint by load balancers. ([\#8048](#8048)) - Allow login to be blocked based on the values of SAML attributes. ([\#8052](#8052)) - Allow guest access to the `GET /_matrix/client/r0/rooms/{room_id}/members` endpoint, according to MSC2689. Contributed by Awesome Technologies Innovationslabor GmbH. ([\#7314](#7314)) Bugfixes -------- - Fix a bug introduced in Synapse v1.7.2 which caused inaccurate membership counts in the room directory. ([\#7977](#7977)) - Fix a long standing bug: 'Duplicate key value violates unique constraint "event_relations_id"' when message retention is configured. ([\#7978](#7978)) - Fix "no create event in auth events" when trying to reject invitation after inviter leaves. Bug introduced in Synapse v1.10.0. ([\#7980](#7980)) - Fix various comments and minor discrepencies in server notices code. ([\#7996](#7996)) - Fix a long standing bug where HTTP HEAD requests resulted in a 400 error. ([\#7999](#7999)) - Fix a long-standing bug which caused two copies of some log lines to be written when synctl was used along with a MemoryHandler logger. ([\#8011](#8011), [\#8012](#8012)) Updates to the Docker image --------------------------- - We no longer publish Docker images with the `-py3` tag suffix, as [announced in the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/UPGRADE.rst#upgrading-to-v1180). ([\#8056](#8056)) Improved Documentation ---------------------- - Document how to set up a client .well-known file and fix several pieces of outdated documentation. ([\#7899](#7899)) - Improve workers docs. ([\#7990](#7990), [\#8000](#8000)) - Fix typo in `docs/workers.md`. ([\#7992](#7992)) - Add documentation for how to undo a room shutdown. ([\#7998](#7998), [\#8010](#8010)) Internal Changes ---------------- - Reduce the amount of whitespace in JSON stored and sent in responses. Contributed by David Vo. ([\#7372](#7372)) - Switch to the JSON implementation from the standard library and bump the minimum version of the canonicaljson library to 1.2.0. ([\#7936](#7936), [\#7979](#7979)) - Convert various parts of the codebase to async/await. ([\#7947](#7947), [\#7948](#7948), [\#7949](#7949), [\#7951](#7951), [\#7963](#7963), [\#7973](#7973), [\#7975](#7975), [\#7976](#7976), [\#7981](#7981), [\#7987](#7987), [\#7989](#7989), [\#8003](#8003), [\#8014](#8014), [\#8016](#8016), [\#8027](#8027), [\#8031](#8031), [\#8032](#8032), [\#8035](#8035), [\#8042](#8042), [\#8044](#8044), [\#8045](#8045), [\#8061](#8061), [\#8062](#8062), [\#8063](#8063), [\#8066](#8066), [\#8069](#8069), [\#8070](#8070)) - Move some database-related log lines from the default logger to the database/transaction loggers. ([\#7952](#7952)) - Add a script to detect source code files using non-unix line terminators. ([\#7965](#7965), [\#7970](#7970)) - Log the SAML session ID during creation. ([\#7971](#7971)) - Implement new experimental push rules for some users. ([\#7997](#7997)) - Remove redundant and unreliable signature check for v1 Identity Service lookup responses. ([\#8001](#8001)) - Improve the performance of the register endpoint. ([\#8009](#8009)) - Reduce less useful output in the newsfragment CI step. Add a link to the changelog section of the contributing guide on error. ([\#8024](#8024)) - Rename storage layer objects to be more sensible. ([\#8033](#8033)) - Change the default log config to reduce disk I/O and storage for new servers. ([\#8040](#8040)) - Add an assertion on `prev_events` in `create_new_client_event`. ([\#8041](#8041)) - Add a comment to `ServerContextFactory` about the use of `SSLv23_METHOD`. ([\#8043](#8043)) - Log `OPTIONS` requests at `DEBUG` rather than `INFO` level to reduce amount logged at `INFO`. ([\#8049](#8049)) - Reduce amount of outbound request logging at `INFO` level. ([\#8050](#8050)) - It is no longer necessary to explicitly define `filters` in the logging configuration. (Continuing to do so is redundant but harmless.) ([\#8051](#8051)) - Add and improve type hints. ([\#8058](#8058), [\#8064](#8064), [\#8060](#8060), [\#8067](#8067))

* commit '916cf2d43': re-implement daemonize (#8011)

richvdh force-pushed the rav/daemonize branch from de09dc3 to a529d73 Compare July 31, 2020 17:31

richvdh mentioned this pull request Jul 31, 2020

Stop the parent process flushing the logs on exit #8012

Merged

richvdh requested a review from a team July 31, 2020 17:39

clokep reviewed Aug 3, 2020

View reviewed changes

clokep approved these changes Aug 3, 2020

View reviewed changes

richvdh commented Aug 3, 2020

View reviewed changes

Add an unhandled-exception handler

a016564

This replaces the try/catch that Daemonize has around the application callback.

clokep reviewed Aug 3, 2020

View reviewed changes

synapse/util/daemonize.py Outdated Show resolved Hide resolved

Update synapse/util/daemonize.py

409b744

Co-authored-by: Patrick Cloke <[email protected]>

richvdh merged commit 916cf2d into develop Aug 4, 2020

richvdh deleted the rav/daemonize branch August 4, 2020 09:03

richvdh mentioned this pull request Jan 8, 2021

stderr goes to /dev/null when we run with --daemonize #1539

Open

babolivier pushed a commit that referenced this pull request Sep 1, 2021

Merge commit '916cf2d43' into anoa/dinsic_release_1_21_x

6bd8ea3

* commit '916cf2d43': re-implement daemonize (#8011)

DMRobertson pushed a commit that referenced this pull request May 2, 2022

We haven't used daemonize since #8011.

6162b2c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

re-implement daemonize #8011

re-implement daemonize #8011

richvdh commented Jul 31, 2020

clokep commented Aug 3, 2020 •

edited

Loading

clokep left a comment

clokep Aug 3, 2020

richvdh Aug 3, 2020

clokep Aug 3, 2020

clokep Aug 3, 2020

richvdh Aug 3, 2020

richvdh Aug 3, 2020

		# Fork, creating a new process for the child.
		process_id = os.fork()

re-implement daemonize #8011

re-implement daemonize #8011

Conversation

richvdh commented Jul 31, 2020

clokep commented Aug 3, 2020 • edited Loading

clokep left a comment

Choose a reason for hiding this comment

clokep Aug 3, 2020

Choose a reason for hiding this comment

richvdh Aug 3, 2020

Choose a reason for hiding this comment

clokep Aug 3, 2020

Choose a reason for hiding this comment

clokep Aug 3, 2020

Choose a reason for hiding this comment

richvdh Aug 3, 2020

Choose a reason for hiding this comment

richvdh Aug 3, 2020

Choose a reason for hiding this comment

clokep commented Aug 3, 2020 •

edited

Loading