Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Docker labels to have per-service configuration #329

Open
muratcorlu opened this issue Jan 17, 2024 · 29 comments
Open

Using Docker labels to have per-service configuration #329

muratcorlu opened this issue Jan 17, 2024 · 29 comments
Labels
enhancement New feature or request pr welcome

Comments

@muratcorlu
Copy link

Is your feature request related to a problem? Please describe.

I have a big Docker Swarm machine that I host all of my projects as Docker containers. I want to have a central backup solution for all of the Swarm Stacks/Services that can be configured inside the projects' own docker-compose files.

Describe the solution you'd like

I use Traefik for routing incoming traffic to the containers. In Traefik, you don't give a central configuration, instead all of the service configurations are set by Docker labels inside project docker-compose files. This simplify the things a lot.

So I would like to be able to enable backup for a docker service like below:

myservice:
  image: ....
  volumes:
    - myvolume:/data
  deploy:
    labels:
      - "backup.enabled=true"
      - "backup.source=myvolume"
      # and other configurations as well, like a custom schedule, retention etc.

Describe alternatives you've considered

  1. Running separate backup instances per stack doesn't seem efficient, since I'll duplicate a lot of configuration.
  2. I considered backing up the parent folder of the path that Docker keeps all of the volumes. But that doesn't sound a good idea, maybe is? 🤷🏻
  3. Configuring everything in a central place was another consideration but every project has its own Git repository, docker-compose file and CI/CD pipelines and keeping backup logic in another place is always a risk to have hassle.
@m90
Copy link
Member

m90 commented Jan 18, 2024

Thanks for this suggestion. I'm not a traefik user myself, but I have seen its configuration approach being very popular in the Swarm/compose world. The trouble is that such an approach is very different from the way the tool is currently sourcing its configuration from, so adding support for this would require some major refactoring (which is probably good). I'll have to think about a bit. If you have a full blown "dream API example" (i.e. a compose service definition) of how you would think this could work, that'd also be helpful.

For the time being, your options are:

@m90 m90 added the enhancement New feature or request label Jan 18, 2024
@pixxon
Copy link
Contributor

pixxon commented Jan 29, 2024

I also considered this approach, but one major obstacle would be to figure out how to mount the volumes nicely for backup. I would not want every single volume to be mounted in the manager, so instead some sort of containerized periodic task based approach would be the best.

However, I am not sure how to configure the volume that needs to be backed up. The full name of the volume that needs to be backed up might not be available when the service is being labeled. ( The stack name will be added as a postfix. ) Best way would be to just specify a path in the application and mount the directory from there, but I have no idea if that is possible. ( Something like COPY --from in builds. )

@pixxon
Copy link
Contributor

pixxon commented Jan 29, 2024

Maybe a wild take, but instead of adding the labels to services, one could add them to the volumes themselves? Docker allows objects to be labeled, including volumes. Both the volume create and docker compose file allow the specification of them.

To me it is very appealing as it would really bind the service to volumes themselves. The labels to stop containers / services would still be applied to them, but when and what to backup could be defined on the volume itself.

Lastly the where to backup could be defined on the manager itself, somewhat how traefik defines entrypoints. Then these are just referenced in the volume labels to avoid redundancy.

@muratcorlu
Copy link
Author

Labelling volumes instead of containers for configuring backup makes so much sense. It would also fix the potential issue of what will happen if same volume mounted to multiple containers.

volumes:
  my_data:
    labels:
      backup.enabled: true
      backup.retention: 7

Looks awesome! 😊

@m90
Copy link
Member

m90 commented Jan 29, 2024

I also like the idea of labeling volumes a lot, but I still have a slightly hazy vision of who'd be controlling whom when you can label both volumes and services and containers. I.e. in a setup that runs multiple schedules, who'd tell containers/services when they need to be stopped? Would each labeled volume create a cron schedule? What happens if labels change, who's notfiying the backup container that it needs to create a new cron? What about bind mounts?

I could continue the list forever, but I guess that doesn't lead anywhere. Maybe a good next step would be translating one of the test cases https://github.com/offen/docker-volume-backup/tree/main/test into the desired new configuration style, so we can get an idea of what this would really look like, and how (and if even) such a configuration approach could be compatible with the existing one, or if this would require a hard cut?

@pixxon
Copy link
Contributor

pixxon commented Jan 29, 2024

I have tried to make a quick mock example of what I was trying to explain above. This would certainly change the whole approach of the backup software.

The need to backup a volume would be labelled on the volume itself. This would define when to backup and other options related to how long they should be kept, what name should it be stored under.

The connection to where the backups are stored is only defined on the backup container itself, to avoid redundant definitions. ( There could be some issues, for example trying to store in different buckets, then those labels would need to move onto the volumes. )

Thirdly the containers to be interacted with are still defined on the containers themselves. So all the executions and which ones need to be stopped would remain there.

version: '3.9'

volumes:
  postgresql_db:
    labels:
      docker-volume-backup.stop-during-backup: postgres
      docker-volume-backup.filename: postgres-%Y-%m-%dT%H-%M-%S.tar.gz
      docker-volume-backup.pruning-prefix: postgres
      docker-volume-backup.retention-days: 7
      docker-volume-backup.cron-expression: 0 2 * * *

  redis_db:
    labels:
      docker-volume-backup.stop-during-backup: redis
      docker-volume-backup.filename: redis-%Y-%m-%dT%H-%M-%S.tar.gz
      docker-volume-backup.pruning-prefix: redis
      docker-volume-backup.retention-days: 7
      docker-volume-backup.cron-expression: 0 2 * * *

services:
  postgres:
    image: postgres
    volumes:
      - type: volume
        source: postgresql_db
        target: /var/lib/postgresql/data
    labels:
      docker-volume-backup.archive-pre: pg_dumpall -U postgres > /var/lib/postgresql/data/backup.sql
      docker-volume-backup.exec-label: postgres

  redis:
    image: redis
    volumes:
      - type: volume
        source: redis_db
        target: /data

  backup:
    image: offen/docker-volume-backup
    environment:
      AWS_ENDPOINT: minio
      AWS_S3_BUCKET_NAME: backup
      AWS_ACCESS_KEY_ID: test
      AWS_SECRET_ACCESS_KEY: test
    volumes:
      - type: bind
        source: /var/run/docker.sock
        target: /var/run/docker.sock
        read_only: true

  postgres_user:
    image: testimage:latest
    labels:
      docker-volume-backup.stop-during-backup: postgres

  redis_user:
    image: testimage:latest
    labels:
      docker-volume-backup.stop-during-backup: redis

@pixxon
Copy link
Contributor

pixxon commented Jan 29, 2024

how (and if even) such a configuration approach could be compatible with the existing one, or if this would require a hard cut?

It would certainly be possible to keep these two compatible. Not sure if you are familiar with Prometheus. They use something called static configs and service discovery. First would be what is currently available with the backup, second is what this would become.

@pixxon
Copy link
Contributor

pixxon commented Jan 29, 2024

What happens if labels change, who's notfiying the backup container that it needs to create a new cron? What about bind mounts?

The backup would need to poll the changes from the docker socket and when there is a label change / new label, it would need to update the configuration. Minor help for this task is that labels cannot be added or changed on docker volumes, they have to be defined when the volume is created. ( At least this is via the cli, I am not sure if possible with some other coding, or from docker plugins. )

who'd tell containers/services when they need to be stopped?

From the above example, the volume would define a label postgres and redis that could be referenced by containers that use the volume ( or have an indirect dependency ) with their own labels.

Would each labeled volume create a cron schedule?

I think yes, not sure if there is a need for one volume to create multiple schedules. ( For something like keeping the daily backups for a week, weekly backups for a year. ) In that case the approach could be similar to how Traefik groups the routers/endpoints using the name of the labels.
So in the above example they would become:

  redis_db:
    labels:
      docker-volume-backup.stop-during-backup: redis
      docker-volume-backup.daily.filename: redis-daily-%Y-%m-%dT%H-%M-%S.tar.gz
      docker-volume-backup.daily.pruning-prefix: redis-daily
      docker-volume-backup.daily.retention-days: 7
      docker-volume-backup.daily.cron-expression: 0 2 * * *
      docker-volume-backup.weekly.filename: redis-weekly-%Y-%m-%dT%H-%M-%S.tar.gz
      docker-volume-backup.weekly.pruning-prefix: redis-weekly
      docker-volume-backup.weekly.retention-days: 365
      docker-volume-backup.weekly.cron-expression: 0 0 * * 0

@m90
Copy link
Member

m90 commented Jan 29, 2024

Just to manage expectations: I appreciate all of your feedback, and I like the direction this is going, but it also means the entire tool would need to be rearchitected (i.e. supporting both static config and service discovery), so this is not something I can implement easily in my free time (the way this project is currently run).

I'll move this around in the back of my head for a while, maybe I can come up with a way this could be sliced into several "sub-features" that could be worked on one after the other.

If you have further ideas, please leave them here, I'm happy to learn about them.

@pixxon
Copy link
Contributor

pixxon commented Jan 29, 2024

this is not something I can implement easily in my free time (the way this project is currently run).

Since this is something that I would like to use, I could try to help out with the implementation. Disclaimer is that I am not a go developer. ( I mainly use C++. )

I can come up with a way this could be sliced into several "sub-features" that could be worked on one after the other.

Tasks like #268 would lead up to this. I could also help by making a dummy implementation of the above idea that could be adapted into the tool down the line.

@m90
Copy link
Member

m90 commented Jan 30, 2024

Thanks for offering your help. I'd be happy if you wanted to work on this.

Still, I won't be able to merge a single PR that basically rewrites the entire tool, so we'd need to plan this out a bit better. I'll still need to understand what's going on as I'll also keep maintaining this.

Right now, my idea would be to maybe proceed something like this:

  1. Remove crond usage and instead allow this tool to run as a long running process that schedules work itself. It should still be possible to invoke a backup manually. This will also need to support reading configuration from conf.d. Kind of like Consider using a dedicated cron package #268 as you already mentioned.
  2. Come up with a mechanism that can pull configuration from basically anything, i.e. env vars, conf.d or Docker labels. This is probably something abstract, and then we write adapters for each mean of configuration. Maybe this already exists, I'd think it made sense to look at how Traefik does this.
  3. Connect the new configuration method with the Docker daemon, polling repeatedly for changes.

I'm not sure if this should be worked on as 1,2,3 or 2,1,3

Let me know what you think.

@pixxon
Copy link
Contributor

pixxon commented Jan 30, 2024

I understand the concerns, I was not thinking of a single large PR either.

The 1,2,3 order seems to be easier, especially since 1 is already in the form of an issue. If it's alright, I will start looking into that one.

@m90
Copy link
Member

m90 commented Jan 30, 2024

#99 is probably also related to what I wrote, albeit I'm not sure if this tool should start having any sort of persistence, so I'd maybe not offer this feature, even if it'd be possible.

Also, I wanted to mention a change this big would probably warrant a v3, so some minor breaking changes would probably be ok, see #80

@pixxon
Copy link
Contributor

pixxon commented Jan 30, 2024

I am not a fan of introducing persistence to a tool that should back that up. ( It would need to create a backup of its own? )

Regarding REST API, the most I could imagine is a read only visualization of the setup. ( What are the storage options, what volumes are configured, which containers/services are labelled. )

@m90
Copy link
Member

m90 commented Jan 30, 2024

Yeah, let's not bother about this for now. If we want a read only visualization we could also introduce a backup --debug functionality or something that dumps everything it currently thinks it should be doing.

@m90 m90 mentioned this issue Jan 30, 2024
3 tasks
@m90
Copy link
Member

m90 commented Feb 2, 2024

One situation we should think about (and where I don't have a solution at hand right now) popped up in my head just this morning:

Assuming users can create backup schedules by labeling their volumes, this means the tool will repeatedly poll the Docker daemon for volumes, check their labels and then create schedules. My concern is: how does this work in case I deploy multiple stacks that each run a offen/docker-volume-backup container (which I've seen people do from what is being posted in issues and discussions)? From what I understand the daemon will always return all volumes when running docker volume ls. How would a backup container know which of the volumes are within its own stack and which ones aren't (and should therefore be skipped)?

@pixxon
Copy link
Contributor

pixxon commented Feb 3, 2024

how does this work in case I deploy multiple stacks that each run a offen/docker-volume-backup container (which I've seen people do from what is being posted in issues and discussions)?

In the new setup this should not be happening. What could be a solution is simply exit when the second instant would start up. If this option is really something that needs to be supported, then a flag could disable the docker service discovery feature so multiple instances can be deployed in the same swarm.

@m90
Copy link
Member

m90 commented Feb 4, 2024

If this option is really something that needs to be supported

Yes, this definitely needs to stay supported. It's still the easiest way to get multiple schedules up and running in case you just want to get the job done, and also what I would pick in such cases. Also, it's been the only way of having multiple schedules before v2.14.0 so users from before that version might still be using such setups.

If there is no way around this problem, we'll need to make service discovery disabled by default. When enabled the container can somehow check if a sibling is already runnning on the same host and refuse to start in such cases (how exactly this is implemented I don't know yet).

@MyWay
Copy link

MyWay commented Feb 14, 2024

This is an interesting approach I was looking for too, though as already said probably it needs some time finding the best solution. Furthermore the old approach is more convenient in some use cases.

@m90
Copy link
Member

m90 commented Feb 16, 2024

@pixxon I did some refactoring of the configuration handlinfg in order to prepare for this and #364 in #360

I think everything should be ready now for starting to work on this, so in case you want to start working on this, please feel free to go ahead. No need to hurry or anything though, just wanted to let you know about the changes.

@m90 m90 added the pr welcome label Feb 16, 2024
@pixxon
Copy link
Contributor

pixxon commented Apr 27, 2024

Hey @m90, sorry for the delay, I did not have time to look into this issue before.

I plan to use traefik paerser to process the labels, but to make it nice, it requires modifications to the configs themselves. I think it would also make sense to utilize the paerser for both env vars and files, since it has the capability and would remove some techdebt. ( Using both envvars and paerser could result in duplicated configs. )

Brief show of how the paerser works

Example 1:

config

type Config struct {
    BackupCronExpression string
    BackupStopDuringBackupLabel string
}

flag/label

--backupcronexpression=@daily
--backupstopduringbackuplabel=test

env var

BACKUPCRONEXPRESSION=@daily
BACKUPSTOPDURINGBACKUPLABEL=test

Example 2

config

type Config struct {
    Backup BackupConfig
}
type BackupConfig struct {
    CronExpression string
    StopDuringBackupLabel string
}

flag/label

--backup.cronexpression=@daily
--backup.stopduringbackuplabel=test

env var

BACKUP_CRONEXPRESSION=@daily
BACKUP_STOPDURINGBACKUPLABEL=test

Example 3

config

type Config struct {
    Backup BackupConfig
}
type BackupConfig struct {
    Cron struct {
        Expression string
    }
    Stop struct {
        During struct {
            Backup struct {
                Label string
            }
        }
    }
}

flag/label

--backup.cron.expression=@daily
--backup.stop.during.backup.label=test

env var

BACKUP_CRON_EXPRESSION=@daily
BACKUP_STOP_DURING_BACKUP_LABEL=test

The first example uses the current config, but I personally find it pretty unreadable and using more structs would help with it.
The third example results in the env vars looking like what they look like currently, but it results in the config being way too verbose with anonym types.

I would personally go with the second option, where settings that are related, are stored in a struct. However this would most certainly create a breaking change, many underscores would be gone from the environment variables. Another, not impossible to overcome, but worth to mention, is that traefik likes to start env vars with the same prefix. ( They use TRAEFIK_ which matches the traefik at the start of labels. )

Something that could be used to avoid breaking changes is to create a mapping between old and new configs and before sending the map to paerser, manually rename the variables to their new values. This would result in a lot of deprecated variables tho, so I am not sure which one would you prefer to go with.

@pixxon
Copy link
Contributor

pixxon commented Apr 29, 2024

Had some time on my hands to look more into this and made a PoC to show how changes could be made. I moved config into a new package.

Other than normal changes, there are two significant workarounds:

First for the mapping of the old variables: https://github.com/pixxon/docker-volume-backup/blob/refactor-configuration/internal/config/util.go#L23

Second I had to process the "FILE" variables. https://github.com/pixxon/docker-volume-backup/blob/refactor-configuration/internal/config/util.go#L23
I went with something that I saw in linuxserver projects, where the prefix FILE__ is used and instead of being processed during reading of the variable, there is a preprocessing. ( It could be moved even more outside, into an init script like it is for them, but that might cause other problems. )

@m90
Copy link
Member

m90 commented Apr 29, 2024

Thanks for starting to work on and no rush from my end. I'm a bit busier than usual so I didn't really dig into the code yet, however I wanted to check what you think of the following approach in building this:

  • the existing approach to configuring the application is left as is
  • you add a dedicated "serviceDiscovery" (or similar) configuration strategy here
    // sourceConfiguration returns a list of config objects using the given
    // strategy. It should be the single entrypoint for retrieving configuration
    // for all consumers.
    func sourceConfiguration(strategy configStrategy) ([]*Config, error) {
    switch strategy {
    case configStrategyEnv:
    c, err := loadConfigFromEnvVars()
    return []*Config{c}, err
    case configStrategyConfd:
    cs, err := loadConfigsFromEnvFiles("/etc/dockervolumebackup/conf.d")
    if err != nil {
    if os.IsNotExist(err) {
    return sourceConfiguration(configStrategyEnv)
    }
    return nil, errwrap.Wrap(err, "error loading config files")
    }
    return cs, nil
    default:
    return nil, errwrap.Wrap(nil, fmt.Sprintf("received unknown config strategy: %v", strategy))
    }
    }
    that kicks in when a flag is passed to the command
  • we figure out how this method of configuration is actually working
  • once we know that, we try to reconcile what's there to reuse as much as possible

I'm mostly just worried we spend a lot of time on reworking the existing approach when we (or at least I) don't really know how service discovery works in detail yet.

Let me know what you think.

@pixxon
Copy link
Contributor

pixxon commented Apr 29, 2024

I do plan to add a new strategy there for volume labels. However to achieve that, the current configuration struct has to be changed or the name of the labels might be really weird. ( I could search for an alternative, but I found traefik paerser to be pretty handy for creating configuration out of labels. )

If I just use the current Config, that has everything in a flat order, the expected label names will be unreadable in my opinion.

--backupcronexpression=@daily
--backupstopduringbackuplabel=test

To have some more structure in it, I have to change the Config struct to contain more types. I might be wrong, but envconfig would not be able to load the nested members properly, since it concats the naming. ( So OUTER_INNER would happen. )

If you want as a prototype I can implement gathering config from labels even if they are a bit unreadable. And if it seems to be fine then I can do something about the structs to make the names nicer.

@pixxon
Copy link
Contributor

pixxon commented Apr 30, 2024

Wrote some code to handle the volume discovery. It is very primitive and probably need some fine tuning for the future.
Key points to look at:

Some issues that I ran into while making it:

  • The new container should be scheduled on the docker machine where the volume is. ( For local volumes and obviously just for swarms. )
  • The new container needs access to the same resources to be able to upload. In this case I had to share the network and some environment variables. Could be fixed if the backup container only creates a tar and sends that to the manager who does the upload.
  • New container needs access to the docker host to stop containers and scale down services. This could be factored out, the manager does the stop/start and between the two steps creates the new container for backup.

@m90
Copy link
Member

m90 commented Apr 30, 2024

Quick question after reading the issues you are describing: why is spawning a new container necessary in the first place? Up until now, this isn't done either. Is it about multi node swarm setups? In case yes, maybe there is a simpler, even if less smart solution to that problem (which affects a tiny.fraction of users anyways).

@pixxon
Copy link
Contributor

pixxon commented Apr 30, 2024

My assumption is that volumes will be created after the backup container. Therefor they will not be mounted and I am not sure if it is possible to attach volumes to a running container. If we expect the user to mount the volumes into the container and redeploy it, then new backup schedule could be added by a new confd file.

@m90
Copy link
Member

m90 commented Apr 30, 2024

Would it be possible to commit the current container and then run the scheduled backup off that committed image without having to copy over the entire configuration?

Alternatively, is there a way to create a container spec off the output of docker inspect with some changes instead?

@pixxon
Copy link
Contributor

pixxon commented Apr 30, 2024

I had some look and I think using CopyFromContainer could allow the minimal copy required. It would need to spawn a new container that collects the backup from the volume, then copy the tar back to the manager. It would have more overhead but would allow us to not bother with networks and other resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pr welcome
Projects
None yet
Development

No branches or pull requests

4 participants