Increase performance of 'dump' #3406

baurmatt · 2021-05-21T12:49:51Z

Output of `restic version`

restic 0.9.6 compiled with go1.12.12 on linux/amd64

(Ubuntu 20.04)

What should restic do differently? Which functionality do you think we should add?

There is a major performance difference between using restore and dump. It would be great to get the same performance from both commands.

Restoring the same 50GB file from a object storage showed the following performance difference:

restore: 122 MB/s (Restore time of 50GB: ~7min)
dump: 17 MB/s (Restore time of 50GB: ~50min)

This problem was also discussed in the forum -> https://forum.restic.net/t/performance-differences-between-restic-dump-mount-restore/3878

@MichaelEischer Suggested to open a feature request regarding this topic.

What are you trying to do? What problem would this solve?

I'm trying to restore a LVM snapshot backup from restic repository stored on a object storage:

restic dump latest /vg0-data--snapshot.img > /dev/vg0/data

Others have reported on the forum that they're trying to restore large database backups directly into the database.

Did restic help you today? Did it make you happy in any way?

I discovered that I can do LVM snapshot backups by piping the snapshots directly to restic by leveraging the --stdin and --stdin-filename parameter. Thanks! 👍

The text was updated successfully, but these errors were encountered:

MichaelEischer · 2021-05-23T11:33:29Z

In case anyone wants to help, here are some thoughts of how the performance could be improved:

At least for large files it should be relatively straightforward to improve the performance of the dump command: internal/dump/common.GetNodeData currently iterates over all blocks of a file sequentially. The simplest option would be to just add a bit of code which runs multiple LoadBlob calls in parallel (and makes sure to not mix up the blob order).

A more advanced option would be to keep a cache of limited size to allow reusing blobs if possible. Ideally, the cache would only cache blobs which will be reused in the future, I'm just not sure whether its worth the complexity that would introduce.

…command. restic#3406

mlew980 · 2021-05-27T00:41:00Z

I decided to locally implement your first suggestion and attempted to restore a 4.1 GB file.

restic restore
time to restore: 4m34.161s
md5 checksum: c9ea43abe651810576e6f1c197025f1f

original restic dump
time to restore: 31m15.299s
md5 checksum: c9ea43abe651810576e6f1c197025f1f

parallel restic dump
time to restore (using 8 workers): 6m47.095s
md5 checksum: c9ea43abe651810576e6f1c197025f1f

I don't normally code using Golang so my code might be a bit rough, please don't hesitate to point out any errors:

https://github.com/mlew980/restic/blob/master/internal/dump/common.go

MichaelEischer · 2021-06-12T22:10:22Z

@mlew980 Could you open a PR for you change? That would make it easier to discuss the code.

From a high-level perspective it is possible that completedJobs grows without bounds. The simplest solution to that is probably to use a buffered job input queue and fill that with e.g. up to 16 jobs relative to the blob that should be written next. Then the output collection loop can wait for loaded blobs, and if blob to write next was loaded, then it can refill the job queue.

MichaelEischer added category: optimization category: dump help: wanted type: feature enhancement improving existing features labels May 23, 2021

mlew980 added a commit to mlew980/restic that referenced this issue May 27, 2021

Added parallel processing (loading of blobs) while using restic dump …

4e7b3b6

…command. restic#3406

baurmatt mentioned this issue Jul 2, 2021

Option for restic restore to delete local files not in snapshot #2348

Closed

youam mentioned this issue Sep 11, 2021

Put blobs read by dump into LRU cache to speed up reuse #3508

Closed

8 tasks

This was referenced Sep 24, 2021

Use LRU cache in restic dump #3522

Merged

Refactor internal/dump + concurrent load/write #3526

Merged

MichaelEischer mentioned this issue Jul 17, 2022

dump and cp from mounted filesystem is about 8 times slower than restore #3828

Open

MichaelEischer mentioned this issue May 5, 2024

dump: Parallelize loading large files #4796

Merged

7 tasks

BrewTestBot mentioned this issue Jul 26, 2024

restic 0.17.0 Homebrew/homebrew-core#178554

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase performance of 'dump' #3406

Increase performance of 'dump' #3406

baurmatt commented May 21, 2021

MichaelEischer commented May 23, 2021

mlew980 commented May 27, 2021

MichaelEischer commented Jun 12, 2021

Increase performance of 'dump' #3406

Increase performance of 'dump' #3406

Comments

baurmatt commented May 21, 2021

Output of restic version

What should restic do differently? Which functionality do you think we should add?

What are you trying to do? What problem would this solve?

Did restic help you today? Did it make you happy in any way?

MichaelEischer commented May 23, 2021

mlew980 commented May 27, 2021

MichaelEischer commented Jun 12, 2021

Output of `restic version`