Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore command #146

Open
m90 opened this issue Aug 18, 2022 · 9 comments
Open

Restore command #146

m90 opened this issue Aug 18, 2022 · 9 comments
Labels
enhancement New feature or request

Comments

@m90
Copy link
Member

m90 commented Aug 18, 2022

A container based off this image could also expose a command to restore a volume from a tar archive. This would have to be run manually in the context of the existing Docker setup, providing an archive location (this could probably also be remote).

@m90 m90 added the enhancement New feature or request label Aug 18, 2022
@Akruidenberg
Copy link

+1

2 similar comments
@octdanb
Copy link

octdanb commented Dec 13, 2022

+1

@GeneralTao2
Copy link
Contributor

+1

@m90
Copy link
Member Author

m90 commented May 26, 2023

Writing down a few thoughts on this before I forget them.


Goals

  • Users can restore the contents of a volume from an archive
  • The command can be run in a Docker compose setup, using the existing image
  • If containers need to be stopped during the procedure, they shall be stopped (just like when backing up)

Non-Goals

  • Fetching archives from remote storages is out of scope for a first version
  • Volume mounts and paths will not be figured out automagically, instead users must provide them

API

This assumes the archive is available on the host that issues the compose command. Is this a problem?

docker compose run --rm -v volume_name:/restore backup restore -path /backup/my_app_backup < archive.tar.gz

Alternatively the archive could be mounted as well:

docker compose run --rm -v ./archive.tar.gz:/tmp/archive.tar.gz -v volume_name:/restore backup restore -path /backup/my_app_backup -archive /tmp/archive.tar.gz

Error handling

How cautious does error handling need to be? Should the command stash the previous contents so it can always roll back to the pre-restoration state?

@MaxJa4
Copy link
Contributor

MaxJa4 commented Aug 28, 2023

I like the stepped approach: first just concentrate of the copy/restore process using the docker compose command. Add the download etc. later.

How cautious does error handling need to be? Should the command stash the previous contents so it can always roll back to the pre-restoration state?

As a restore is done in either a testing scenario (non-critical) or when it's actually needed (often critical), any sources of issues should be limited as much as possible imo.

I'm thinking if something like this right now:

  • Copy each original object that is being modified by restore in any way to a temporary location for potential recovery from failure or also maybe allow aborting and then resuming (would come almost as a "free" feature maybe)
  • Maybe also keep a record of modifications (e.g. with status done, ongoing, queued) for the recovery/resume
  • Delete the temporary files after a successful restore (all at once at the end or optionally object-by-object to limit storage footprint), otherwise resume or undo the restore

Edit: Atomic file writes are only possible on Linux based systems, not Windows: https://github.com/google/renameio

@MaxJa4
Copy link
Contributor

MaxJa4 commented Aug 28, 2023

Restoring workflow concept (draft, open for discussion)

RestoreV0_1
(Direct link)


I'd prefer to choose one approach where multiple options exist and not let the user choose (or only were it makes really sense) to not make everything too huge and complex.

Also, we might do the more complicated stuff later for a restore v2 including downloading the archive from the specified storage backend.

This is also by no means final, more like notes to get a restore strategy built up step by step.

Edit: Replaced long list with flow chart for easier understanding and thinking.

@m90
Copy link
Member Author

m90 commented Aug 30, 2023

Some thoughts without having worked through your write up in all detail:

  • If we have a proper recovery on failure in place, are all of the pre checks even needed? I would think if a restoration process fails for whatever reason, and the code is able to recover just fine, there's no real need to check anything I would guess. Checks will be incomplete in any case.
  • You talk about checksums a lot, where are these coming from? If we don't trust the integrity of backups taken, this should maybe rather be adressed at backup time?
  • I like the idea of doing an atomic copy for speed. Is there a Golang library that can do this for us?

I.e. I'd personally maybe focus on a. fast copy / extraction, b. robust recovery.

@MaxJa4
Copy link
Contributor

MaxJa4 commented Aug 30, 2023

If we have a proper recovery on failure in place, are all of the pre checks even needed?

Not necessarily, no. It would just safe time. With many gigabytes of data, errors could occur from minutes to hours after starting, as insufficient space or (partially) permissions could lead to an error in the middle or end stage of the recovery. Could also lead to unstable system behavior (storage full).
But to be fair, the user needs to be vary of that to a certain extend. Consider the extended validation as an optional addon for a later point if we feel like it's a benefit to add it.

You talk about checksums a lot, where are these coming from?

Without any checksums provided in the backup - which would be a nice addition - we could only stream the extracted contents of files through checksum calculation (block-wise for large files which don't fit into memory?) and when writing of that file is done, check if writing was successful and complete (verify with the calculated checksum). But that's arguably quite a lot, complex and too much effort.

Is there a Golang library that can do this for us?

Atomic copy is basically just doing os.rename on Linux based systems. But there is https://github.com/google/renameio which handles that.

@MaxJa4
Copy link
Contributor

MaxJa4 commented Aug 30, 2023

Updated workflow above, without checksums or crazy amounts of pre-checks. Also visual now, way easier on the eyes - at least for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants