Restore command #146

m90 · 2022-08-18T10:24:47Z

A container based off this image could also expose a command to restore a volume from a tar archive. This would have to be run manually in the context of the existing Docker setup, providing an archive location (this could probably also be remote).

Akruidenberg · 2022-09-04T16:47:23Z

+1

octdanb · 2022-12-13T02:45:57Z

+1

GeneralTao2 · 2023-02-16T17:58:35Z

+1

m90 · 2023-05-26T06:15:46Z

Writing down a few thoughts on this before I forget them.

Goals

Users can restore the contents of a volume from an archive
The command can be run in a Docker compose setup, using the existing image
If containers need to be stopped during the procedure, they shall be stopped (just like when backing up)

Non-Goals

Fetching archives from remote storages is out of scope for a first version
Volume mounts and paths will not be figured out automagically, instead users must provide them

API

This assumes the archive is available on the host that issues the compose command. Is this a problem?

docker compose run --rm -v volume_name:/restore backup restore -path /backup/my_app_backup < archive.tar.gz

Alternatively the archive could be mounted as well:

docker compose run --rm -v ./archive.tar.gz:/tmp/archive.tar.gz -v volume_name:/restore backup restore -path /backup/my_app_backup -archive /tmp/archive.tar.gz

Error handling

How cautious does error handling need to be? Should the command stash the previous contents so it can always roll back to the pre-restoration state?

MaxJa4 · 2023-08-28T16:33:26Z

I like the stepped approach: first just concentrate of the copy/restore process using the docker compose command. Add the download etc. later.

How cautious does error handling need to be? Should the command stash the previous contents so it can always roll back to the pre-restoration state?

As a restore is done in either a testing scenario (non-critical) or when it's actually needed (often critical), any sources of issues should be limited as much as possible imo.

I'm thinking if something like this right now:

Copy each original object that is being modified by restore in any way to a temporary location for potential recovery from failure or also maybe allow aborting and then resuming (would come almost as a "free" feature maybe)
Maybe also keep a record of modifications (e.g. with status done, ongoing, queued) for the recovery/resume
Delete the temporary files after a successful restore (all at once at the end or optionally object-by-object to limit storage footprint), otherwise resume or undo the restore

Edit: Atomic file writes are only possible on Linux based systems, not Windows: https://github.com/google/renameio

MaxJa4 · 2023-08-28T23:01:48Z

Restoring workflow concept (draft, open for discussion)

(Direct link)

I'd prefer to choose one approach where multiple options exist and not let the user choose (or only were it makes really sense) to not make everything too huge and complex.

Also, we might do the more complicated stuff later for a restore v2 including downloading the archive from the specified storage backend.

This is also by no means final, more like notes to get a restore strategy built up step by step.

Edit: Replaced long list with flow chart for easier understanding and thinking.

m90 · 2023-08-30T10:37:44Z

Some thoughts without having worked through your write up in all detail:

If we have a proper recovery on failure in place, are all of the pre checks even needed? I would think if a restoration process fails for whatever reason, and the code is able to recover just fine, there's no real need to check anything I would guess. Checks will be incomplete in any case.
You talk about checksums a lot, where are these coming from? If we don't trust the integrity of backups taken, this should maybe rather be adressed at backup time?
I like the idea of doing an atomic copy for speed. Is there a Golang library that can do this for us?

I.e. I'd personally maybe focus on a. fast copy / extraction, b. robust recovery.

MaxJa4 · 2023-08-30T14:31:14Z

If we have a proper recovery on failure in place, are all of the pre checks even needed?

Not necessarily, no. It would just safe time. With many gigabytes of data, errors could occur from minutes to hours after starting, as insufficient space or (partially) permissions could lead to an error in the middle or end stage of the recovery. Could also lead to unstable system behavior (storage full).
But to be fair, the user needs to be vary of that to a certain extend. Consider the extended validation as an optional addon for a later point if we feel like it's a benefit to add it.

You talk about checksums a lot, where are these coming from?

Without any checksums provided in the backup - which would be a nice addition - we could only stream the extracted contents of files through checksum calculation (block-wise for large files which don't fit into memory?) and when writing of that file is done, check if writing was successful and complete (verify with the calculated checksum). But that's arguably quite a lot, complex and too much effort.

Is there a Golang library that can do this for us?

Atomic copy is basically just doing os.rename on Linux based systems. But there is https://github.com/google/renameio which handles that.

MaxJa4 · 2023-08-30T22:12:54Z

Updated workflow above, without checksums or crazy amounts of pre-checks. Also visual now, way easier on the eyes - at least for me.

m90 added the enhancement New feature or request label Aug 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restore command #146

Restore command #146

m90 commented Aug 18, 2022

Akruidenberg commented Sep 4, 2022

octdanb commented Dec 13, 2022

GeneralTao2 commented Feb 16, 2023

m90 commented May 26, 2023

MaxJa4 commented Aug 28, 2023 •

edited

Loading

MaxJa4 commented Aug 28, 2023 •

edited

Loading

m90 commented Aug 30, 2023

MaxJa4 commented Aug 30, 2023

MaxJa4 commented Aug 30, 2023

Restore command #146

Restore command #146

Comments

m90 commented Aug 18, 2022

Akruidenberg commented Sep 4, 2022

octdanb commented Dec 13, 2022

GeneralTao2 commented Feb 16, 2023

m90 commented May 26, 2023

Goals

Non-Goals

API

Error handling

MaxJa4 commented Aug 28, 2023 • edited Loading

MaxJa4 commented Aug 28, 2023 • edited Loading

Restoring workflow concept (draft, open for discussion)

m90 commented Aug 30, 2023

MaxJa4 commented Aug 30, 2023

MaxJa4 commented Aug 30, 2023

MaxJa4 commented Aug 28, 2023 •

edited

Loading

MaxJa4 commented Aug 28, 2023 •

edited

Loading