[BUG] Failed to restore snapshot - snapshot missing hash #749

emaildanwilson · 2024-07-09T17:12:00Z

Describe the bug:
Incomplete full backups can get stored onto S3 and the automatic restore fails on startup.

Expected behavior:
If a backup is missing the hash it should never get copied to S3 or the restore should be able to handle skipping incomplete backups with an option.

How To Reproduce (as minimally and precisely as possible):
We don't have specific reproduction steps but have hit this issue almost daily when using spot instances to run. We suspect that an incomplete file is written and it is packaged up and sent to S3 possibly after a restart but thats just a guess based on the behavior we're seeing.

Logs:

backup files in S3 and there sizes. note that the latest full backup is smaller than the previous

2024-06-15 02:01:28   31313943 Full-00000000-45097790-1718406015.gz
2024-06-22 12:00:42   36773619 Full-00000000-50651393-1719046806.gz
2024-07-02 02:07:15   37702857 Full-00000000-50773612-1719875144.gz
2024-07-03 02:00:30   37159378 Full-00000000-51376728-1719961206.gz
2024-07-04 02:00:46   36381522 Full-00000000-51903039-1720047606.gz
2024-07-05 02:00:20   36852954 Full-00000000-52493106-1720134005.gz
2024-07-06 02:00:49   45905029 Full-00000000-52994707-1720220405.gz
2024-07-07 02:00:59   49948335 Full-00000000-53749154-1720306813.gz
2024-07-07 23:00:56   52115138 Full-00000000-54365976-1720382415.gz
2024-07-09 12:02:16   52325395 Full-00000000-54368955-1720515676.gz
2024-07-09 13:07:23   54435890 Full-00000000-54379750-1720519561.gz
2024-07-09 14:01:07    3403978 Full-00000000-54399713-1720522852.gz

command to create the restore

 docker run -ti --rm \
        europe-docker.pkg.dev/gardener-project/releases/gardener/etcdbrctl:v0.28.1 \
        restore \
        --data-dir=/tmp/data \
        --snapstore-temp-directory=/tmp/snapshots \
        --restoration-temp-snapshots-dir=/tmp/restoration-snapshots \
        --store-container="..." \
        --storage-provider=S3 \
        --store-prefix=".../etcd-backups"

restore error message

INFO[0000] Temporary directory /tmp/snapshots does not exist. Creating it... 
INFO[0000] Finding latest set of snapshot to recover from... 
INFO[0000] Restoring from base snapshot: Full-00000000-54399713-1720522852.gz  actor=restorer
INFO[0000] start decompressing the snapshot with gzip compressionPolicy  actor=de-compressor
INFO[0002] successfully fetched data of base snapshot in 1.5338427590000001 seconds [CompressionPolicy:gzip]  actor=restorer
FATA[0002] Failed to restore snapshot: failed to restore from the base snapshot: snapshot missing hash but --skip-hash-check=false

if run with skip-hash-check=true it segfaults:

INFO[0000] Temporary directory /tmp/snapshots does not exist. Creating it... 
INFO[0000] Finding latest set of snapshot to recover from... 
INFO[0000] Restoring from base snapshot: Full-00000000-54399713-1720522852.gz  actor=restorer
INFO[0000] start decompressing the snapshot with gzip compressionPolicy  actor=de-compressor
INFO[0002] successfully fetched data of base snapshot in 1.5047977750000001 seconds [CompressionPolicy:gzip]  actor=restorer
unexpected fault address 0x776bdb053000
fatal error: fault
[signal SIGBUS: bus error code=0x2 addr=0x776bdb053000 pc=0xf19fd6]

goroutine 1 [running]:
runtime.throw({0x207bb9d?, 0x0?})

Screenshots (if applicable):

Environment (please complete the following information):

Etcd version/commit ID : http://europe-docker.pkg.dev/gardener-project/public/gardener/etcd:v3.4.13-bootstrap-1
Etcd-backup-restore version/commit ID: v1.28.0
Cloud Provider [All/AWS/GCS/ABS/Swift/OSS]: AWS/GCS

Anything else we need to know?:

backup sidecar configuration:

                name: 'backup-restore',
                command: [
                  '/etcdbrctl',
                  'server',
                  '--schedule=0 */1 * * *',
                  '--garbage-collection-period=1m',
                  '--data-dir=/var/etcd/data/new.etcd',
                  '--storage-provider=S3',
                  `--store-container=....`,
                  `--store-prefix=.../etcd-backups`,
                  '--embedded-etcd-quota-bytes=8589934592',
                  '--endpoints=http://localhost:2379/',
                  '--defragmentation-schedule=0 0 */3 * *',
                  '--etcd-connection-timeout=30s',
                  '--etcd-snapshot-timeout=8m',
                  '--etcd-defrag-timeout=8m',
                  '--delta-snapshot-period=60s',
                  '--delta-snapshot-memory-limit=104857600',
                  '--compress-snapshots=true',
                  '--compression-policy=gzip',
                  '--auto-compaction-mode=periodic',
                  '--auto-compaction-retention=30m',
                  '--restoration-temp-snapshots-dir=/tmp/restore',
                ],

If we manually delete the bad full backup then the restore completes successfully.

@jvassev can provide more details if needed.

The text was updated successfully, but these errors were encountered:

renormalize · 2024-07-10T04:14:33Z

@emaildanwilson @jvassev thanks for opening the issue.

Are you folks seeing this on the newly released patch version v0.28.1 only, or did you observe this behavior in v0.28.0 and before as well?

Will look into this, since we haven't observed this behavior yet on v0.28.1.

anveshreddy18 · 2024-07-10T06:04:51Z

@emaildanwilson @jvassev The Etcd and backup-restore versions you're using are not compatible anymore, pls see the release notes of v0.25.0. You're using older version of Etcd custom image ( which uses a base image ) with newer version of etcdbr which doesn't use a base image.

To make it work, you have two options,

Either use a etcdbr version from v0.24.x series, preferably the latest i.e v0.24.9 which is compatible with the Etcd image you are using
Or to use the etcdbr v0.28.1 , you need to use etcd-wrapper instead of Etcd custom image.

If this works, you can close the issue. If it doesn't, please let us know :)

emaildanwilson · 2024-07-10T16:20:01Z

Thank you! We'll make the change and see if the issue still happens.

jvassev · 2024-07-10T19:13:43Z

@anveshreddy18 Thanks for the advice! I decided to downgrade etcdbr to v0.24.9 as a first step.
For doing a proper upgrade to etcd-wrapper is this a good starting point https://github.com/gardener/etcd-wrapper/blob/main/hack/local-dev/manifests/templates/etcd.sts.template.yaml?

renormalize · 2024-07-11T14:04:51Z

@jvassev @emaildanwilson , I'd say that's a good enough starting point along with the other files in the template directory. etcd-wrapper is a drop-in replacement for etcd-custom-image. You can also look at the documentation, specifically deployment/configuring-etcd-wrapper.md

I would also suggest you take a look at etcd-druid, an operator for etcd clusters. It provisions an etcd cluster and the corresponding resources through a simple manifest and it also handles the lifecycle of all the corresponding resources necessary to deploy an etcd cluster.

We have quite a major release coming up as well (in the next couple weeks), v0.23.0 of etcd-druid which makes the operator more robust and also provides a (opt-in) webhook which prevents deletion of kubernetes resources that are provisioned for the functioning of the etcd cluster, until the cluster is explicitly deleted by deleting the Etcd CR as defined by etcd-druid, which represents an etcd cluster.

Let us know if v0.24.9 currently works as expected with etcd-custom-image, and if v0.28.0+ works as expected with etcd-wrapper.

We strongly urge that you move away from etcd-custom-image to etcd-wrapper since we will stop supporting etcd-custom-image very soon; since etcd-wrapper is significantly better from both a development, operations, and security perspective.

ishan16696 · 2024-07-15T09:44:35Z

/assign

ishan16696 · 2024-07-16T20:11:20Z

Hi @jvassev @emaildanwilson ,
Thanks for reporting the issue. I take a look why restoration failed

FATA[0002] Failed to restore snapshot: failed to restore from the base snapshot: snapshot missing hash but --skip-hash-check=false

It looks like etcd failed to append hash of full snapshot while sending the full snapshot to backup-restore client.
Basically, to take full-snapshot backup-restore leverages the etcd API call snapshot which returns the snapshot of etcd and while doing the restoration, backup-restore checks the hash value, which seems to be missing in this case, hence the backup-restore throws an error.

--skip-hash-check=true flag is meant to be used only when restoration is triggered to restore etcd from copied db file of etcd instead of snapshot taken.

ishan16696 · 2024-07-17T09:57:53Z

We have to detect the corrupt snapshot or missing hash of snapshot (taken via snapshot) early rather than waiting till restoration.

I have following methods in my mind:

Method One:

Take a full-snapshot via snapshot api call.
Before uploading full-snapshot to bucket, verify the integrity of full-snapshot by doing a fake restoration.
Start an embedded etcd with different data-dir path and try to restore from this latest snapshot. Then close this embedded etcd.
If restoration fails then re-take a full-snapshot again (retry till restoration succeeds) else it's fine(snapshot isn't corrupted), backup-restore can upload this snapshot to bucket.

This method will work but starting an embedded etcd while taking a full-snapshot and wait for restoration to complete can be a little time taking but it still, it will safe guard us from such scenario where restoration can be failed due to corrupt snapshot.

Method two:

Calculate the hash of db till x revision.
Calculate the hash of snapshot (removing the appended hash).
Compare both of them, if it matches then our snapshot integrity is intact else re-try to take the snapshot again till hash matches.

But, I'm not sure about how to calculate the hash of db till x revision ? Is there any api call available for that ?

I guess api call HashKV won't work here as value return by HashKV api call till x revision can't be equal to the Hash of snapshot taken upto x revision (removing the appended hash) as HashKV api call calculates the hash of all MVCC key-values, whereas snapshot is snapshot of etcd db which also contains cluster information, hence the hash will not be same.

Method three:

Calculate the hash of snapshot (removing the appended hash).
Take a hash value which is appended by etcd on snapshot.
Compare both of them, if it matches then our snapshot integrity is intact else re-try to take the snapshot again till hash matches.

jvassev · 2024-07-20T10:54:40Z

Running with the suggested versions:

europe-docker.pkg.dev/gardener-project/public/gardener/etcd:v3.4.13-bootstrap-1
europe-docker.pkg.dev/gardener-project/releases/gardener/etcdbrctl:v0.24.9

Occasionally this happens:

FATA[0019] Failed to restore snapshot: snapshot revision verification failed for delta snapshot Incr-55795465-55796279-1721466724.gz : mismatched event revision while applying delta snapshot, expected 55796279 but applied 55797093

--skip-hash-check sometimes leads to segfaults.

I think it may have to do abrupt node restarts since we mostly see this on spot instances.

Also including the full log from an attempted restore:

INFO[0000] Temporary directory /tmp/snapshots does not exist. Creating it... 
INFO[0000] Finding latest set of snapshot to recover from... 
INFO[0000] Restoring from base snapshot: Full-00000000-55795464-1721466605.gz  actor=restorer
INFO[0000] start decompressing the snapshot with gzip compressionPolicy  actor=de-compressor
INFO[0011] successfully fetched data of base snapshot in 11.019503244 seconds [CompressionPolicy:gzip]  actor=restorer
{"level":"info","ts":1721472682.9959612,"caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":55788239}
{"level":"info","ts":1721472683.3767295,"caller":"membership/cluster.go:392","msg":"added member","cluster-id":"cdf818194e3a8c32","local-member-id":"0","added-peer-id":"8e9e05c52164694d","added-peer-peer-urls":["http://localhost:2380"]}
INFO[0012] Attempting to apply 29 delta snapshots for restoration.  actor=restorer
INFO[0012] Creating temporary directory /tmp/restoration-snapshots for persisting delta snapshots locally.  actor=restorer
INFO[0012] Starting an embedded etcd server...           actor=restorer
{"level":"info","ts":"2024-07-20T10:51:23.387Z","caller":"embed/etcd.go:117","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:0"]}
{"level":"info","ts":"2024-07-20T10:51:23.387Z","caller":"embed/etcd.go:127","msg":"configuring client listeners","listen-client-urls":["http://localhost:0"]}
{"level":"info","ts":"2024-07-20T10:51:23.387Z","caller":"embed/etcd.go:302","msg":"starting an etcd server","etcd-version":"3.4.13","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.20.3","go-os":"linux","go-arch":"amd64","max-cpu-set":20,"max-cpu-available":20,"member-initialized":true,"name":"default","data-dir":"/tmp/data","wal-dir":"","wal-dir-dedicated":"","member-dir":"/tmp/data/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":100000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:0"],"listen-peer-urls":["http://localhost:0"],"advertise-client-urls":["http://localhost:0"],"listen-client-urls":["http://localhost:0"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-size-bytes":8589934592,"pre-vote":false,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction-retention":"30m0s","auto-compaction-interval":"30m0s","discovery-url":"","discovery-proxy":""}
{"level":"info","ts":"2024-07-20T10:51:23.511Z","caller":"etcdserver/backend.go:80","msg":"opened backend db","path":"/tmp/data/member/snap/db","took":"123.637015ms"}
{"level":"info","ts":"2024-07-20T10:51:23.511Z","caller":"etcdserver/server.go:451","msg":"recovered v2 store from snapshot","snapshot-index":1,"snapshot-size":"6.5 kB"}
{"level":"info","ts":"2024-07-20T10:51:23.518Z","caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":55788239}
{"level":"info","ts":"2024-07-20T10:51:23.888Z","caller":"etcdserver/server.go:469","msg":"recovered v3 backend from snapshot","backend-size-bytes":461660160,"backend-size":"462 MB","backend-size-in-use-bytes":418439168,"backend-size-in-use":"418 MB"}
{"level":"info","ts":"2024-07-20T10:51:23.895Z","caller":"etcdserver/raft.go:536","msg":"restarting local member","cluster-id":"cdf818194e3a8c32","local-member-id":"8e9e05c52164694d","commit-index":1}
{"level":"info","ts":"2024-07-20T10:51:23.895Z","caller":"raft/raft.go:1530","msg":"8e9e05c52164694d switched to configuration voters=(10276657743932975437)"}
{"level":"info","ts":"2024-07-20T10:51:23.895Z","caller":"raft/raft.go:700","msg":"8e9e05c52164694d became follower at term 1"}
{"level":"info","ts":"2024-07-20T10:51:23.895Z","caller":"raft/raft.go:383","msg":"newRaft 8e9e05c52164694d [peers: [8e9e05c52164694d], term: 1, commit: 1, applied: 1, lastindex: 1, lastterm: 1]"}
{"level":"info","ts":"2024-07-20T10:51:23.895Z","caller":"membership/cluster.go:256","msg":"recovered/added member from store","cluster-id":"cdf818194e3a8c32","local-member-id":"8e9e05c52164694d","recovered-remote-peer-id":"8e9e05c52164694d","recovered-remote-peer-urls":["http://localhost:2380"]}
{"level":"warn","ts":"2024-07-20T10:51:23.896Z","caller":"auth/store.go:1366","msg":"simple token is not cryptographically signed"}
{"level":"info","ts":"2024-07-20T10:51:23.896Z","caller":"mvcc/kvstore.go:380","msg":"restored last compact revision","meta-bucket-name":"meta","meta-bucket-name-key":"finishedCompactRev","restored-compact-revision":55788239}
{"level":"info","ts":"2024-07-20T10:51:24.252Z","caller":"etcdserver/quota.go:126","msg":"enabled backend quota","quota-name":"v3-applier","quota-size-bytes":8589934592,"quota-size":"8.6 GB"}
{"level":"info","ts":"2024-07-20T10:51:24.253Z","caller":"etcdserver/server.go:803","msg":"starting etcd server","local-member-id":"8e9e05c52164694d","local-server-version":"3.4.13","cluster-version":"to_be_decided"}
{"level":"info","ts":"2024-07-20T10:51:24.253Z","caller":"etcdserver/server.go:669","msg":"started as single-node; fast-forwarding election ticks","local-member-id":"8e9e05c52164694d","forward-ticks":9,"forward-duration":"900ms","election-ticks":10,"election-timeout":"1s"}
{"level":"info","ts":"2024-07-20T10:51:24.253Z","caller":"embed/etcd.go:244","msg":"now serving peer/client/metrics","local-member-id":"8e9e05c52164694d","initial-advertise-peer-urls":["http://localhost:0"],"listen-peer-urls":["http://localhost:0"],"advertise-client-urls":["http://localhost:0"],"listen-client-urls":["http://localhost:0"],"listen-metrics-urls":[]}
{"level":"info","ts":"2024-07-20T10:51:24.253Z","caller":"embed/etcd.go:579","msg":"serving peer traffic","address":"127.0.0.1:33861"}
{"level":"info","ts":"2024-07-20T10:51:24.996Z","caller":"raft/raft.go:923","msg":"8e9e05c52164694d is starting a new election at term 1"}
{"level":"info","ts":"2024-07-20T10:51:25.003Z","caller":"raft/raft.go:713","msg":"8e9e05c52164694d became candidate at term 2"}
{"level":"info","ts":"2024-07-20T10:51:25.003Z","caller":"raft/raft.go:824","msg":"8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 2"}
{"level":"info","ts":"2024-07-20T10:51:25.003Z","caller":"raft/raft.go:765","msg":"8e9e05c52164694d became leader at term 2"}
{"level":"info","ts":"2024-07-20T10:51:25.003Z","caller":"raft/node.go:325","msg":"raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 2"}
{"level":"info","ts":"2024-07-20T10:51:25.003Z","caller":"etcdserver/server.go:2037","msg":"published local member to cluster through raft","local-member-id":"8e9e05c52164694d","local-member-attributes":"{Name:default ClientURLs:[http://localhost:0]}","request-path":"/0/members/8e9e05c52164694d/attributes","cluster-id":"cdf818194e3a8c32","publish-timeout":"7s"}
INFO[0014] Embedded server is ready to listen client at: 127.0.0.1:44425  actor=restorer
INFO[0014] Applying delta snapshots...                   actor=restorer
{"level":"info","ts":"2024-07-20T10:51:25.003Z","caller":"etcdserver/server.go:2528","msg":"setting up initial cluster version","cluster-version":"3.4"}
{"level":"info","ts":"2024-07-20T10:51:25.003Z","caller":"membership/cluster.go:558","msg":"set initial cluster version","cluster-id":"cdf818194e3a8c32","local-member-id":"8e9e05c52164694d","cluster-version":"3.4"}
{"level":"info","ts":"2024-07-20T10:51:25.003Z","caller":"api/capability.go:76","msg":"enabled capabilities for version","cluster-version":"3.4"}
{"level":"info","ts":"2024-07-20T10:51:25.003Z","caller":"etcdserver/server.go:2560","msg":"cluster version is updated","cluster-version":"3.4"}
{"level":"info","ts":"2024-07-20T10:51:25.003Z","caller":"embed/serve.go:139","msg":"serving client traffic insecurely; this is strongly discouraged!","address":"127.0.0.1:44425"}
INFO[0014] Applying first delta snapshot Incr-55795465-55795626-1721466663.gz  actor=restorer
INFO[0014] start decompressing the snapshot with gzip compressionPolicy  actor=de-compressor
INFO[0014] successfully decompressed data of delta snapshot in 0.017085131 seconds [CompressionPolicy:gzip]  actor=restorer
{"level":"warn","ts":"2024-07-20T10:51:25.610Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"382.210946ms","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"\\000\" range_end:\"\\000\" limit:1 sort_order:DESCEND sort_target:MOD ","response":"range_response_count:1 size:700"}
{"level":"info","ts":"2024-07-20T10:51:25.610Z","caller":"traceutil/trace.go:145","msg":"trace[970151589] range","detail":"{range_begin:\u0000; range_end:\u0000; response_count:1; response_revision:55795464; }","duration":"382.284741ms","start":"2024-07-20T10:51:25.228Z","end":"2024-07-20T10:51:25.610Z","steps":["trace[970151589] 'range keys from in-memory index tree'  (duration: 25.377667ms)","trace[970151589] 'range keys from bolt db'  (duration: 291.628859ms)","trace[970151589] 'filter and sort the key-value pairs'  (duration: 49.852674ms)"]}
INFO[0014] Applying first delta snapshot Incr-55795465-55795626-1721466663.gz  actor=restorer
INFO[0014] Fetcher #1 fetching delta snapshot Incr-55796280-55796561-1721466784.gz  actor=restorer
INFO[0014] Fetcher #3 fetching delta snapshot Incr-55795465-55796279-1721466724.gz  actor=restorer
INFO[0014] Fetcher #4 fetching delta snapshot Incr-55796560-55796771-1721466844.gz  actor=restorer
INFO[0014] Fetcher #2 fetching delta snapshot Incr-55796562-55796773-1721466844.gz  actor=restorer
INFO[0014] Fetcher #6 fetching delta snapshot Incr-55796279-55796559-1721466783.gz  actor=restorer
INFO[0014] Starting to handle an alarm...                actor=restorer
INFO[0014] Fetcher #5 fetching delta snapshot Incr-55795627-55796278-1721466723.gz  actor=restorer
INFO[0014] Fetcher #1 fetching delta snapshot Incr-55796772-55797078-1721466904.gz  actor=restorer
INFO[0015] Fetcher #4 fetching delta snapshot Incr-55796774-55797078-1721466904.gz  actor=restorer
INFO[0015] Fetcher #6 fetching delta snapshot Incr-55797079-55797112-1721467204.gz  actor=restorer
INFO[0015] Fetcher #1 fetching delta snapshot Incr-55797079-55797112-1721467205.gz  actor=restorer
INFO[0015] Fetcher #2 fetching delta snapshot Incr-55797113-55797756-1721467265.gz  actor=restorer
INFO[0015] Fetcher #5 fetching delta snapshot Incr-55797113-55797762-1721467265.gz  actor=restorer
INFO[0015] Reading snapshot contents Incr-55795627-55796278-1721466723.gz from raw snapshot file /tmp/restoration-snapshots/Incr-55795627-55796278-1721466723.gz  actor=restorer
INFO[0015] start decompressing the snapshot with gzip compressionPolicy  actor=de-compressor
INFO[0015] successfully decompressed data of delta snapshot in 0.004005825 seconds [CompressionPolicy:gzip]  actor=restorer
INFO[0015] Applying delta snapshot Incr-55795627-55796278-1721466723.gz [2/29]  actor=restorer
INFO[0015] Fetcher #4 fetching delta snapshot Incr-55797757-55798069-1721467325.gz  actor=restorer
INFO[0015] Fetcher #6 fetching delta snapshot Incr-55797763-55798073-1721467325.gz  actor=restorer
INFO[0015] Fetcher #1 fetching delta snapshot Incr-55798070-55798473-1721467385.gz  actor=restorer
INFO[0016] Removing temporary delta snapshot events file /tmp/restoration-snapshots/Incr-55795627-55796278-1721466723.gz for snapshot Incr-55795627-55796278-1721466723.gz  actor=restorer
INFO[0016] Fetcher #4 fetching delta snapshot Incr-55798074-55798478-1721467385.gz  actor=restorer
INFO[0016] Fetcher #6 fetching delta snapshot Incr-55798474-55798783-1721467445.gz  actor=restorer
INFO[0016] Fetcher #5 fetching delta snapshot Incr-55798479-55798783-1721467445.gz  actor=restorer
INFO[0016] Fetcher #4 fetching delta snapshot Incr-55798784-55798983-1721467745.gz  actor=restorer
INFO[0016] Fetcher #1 fetching delta snapshot Incr-55798784-55798984-1721467746.gz  actor=restorer
INFO[0016] Fetcher #5 fetching delta snapshot Incr-55798984-55799606-1721467806.gz  actor=restorer
INFO[0016] Fetcher #6 fetching delta snapshot Incr-55798985-55799608-1721467806.gz  actor=restorer
INFO[0016] Fetcher #4 fetching delta snapshot Incr-55799607-55799858-1721467866.gz  actor=restorer
INFO[0016] Fetcher #5 fetching delta snapshot Incr-55799609-55799859-1721467866.gz  actor=restorer
INFO[0016] Fetcher #4 fetching delta snapshot Incr-55799859-55800221-1721467926.gz  actor=restorer
INFO[0016] Fetcher #1 fetching delta snapshot Incr-55799860-55800221-1721467926.gz  actor=restorer
INFO[0016] Fetcher #3 fetching delta snapshot Incr-55800222-55800511-1721467986.gz  actor=restorer
INFO[0016] Reading snapshot contents Incr-55795465-55796279-1721466724.gz from raw snapshot file /tmp/restoration-snapshots/Incr-55795465-55796279-1721466724.gz  actor=restorer
INFO[0016] start decompressing the snapshot with gzip compressionPolicy  actor=de-compressor
INFO[0016] successfully decompressed data of delta snapshot in 0.009425055 seconds [CompressionPolicy:gzip]  actor=restorer
INFO[0016] Applying delta snapshot Incr-55795465-55796279-1721466724.gz [3/29]  actor=restorer
INFO[0016] Fetcher #6 fetching delta snapshot Incr-55800222-55800511-1721467987.gz  actor=restorer
INFO[0018] Cleanup complete                              actor=restorer
ERRO[0018] Restoration failed.                           actor=restorer
{"level":"info","ts":"2024-07-20T10:51:29.344Z","caller":"etcdserver/server.go:1485","msg":"skipped leadership transfer for single voting member cluster","local-member-id":"8e9e05c52164694d","current-leader-member-id":"8e9e05c52164694d"}
INFO[0018] Closing handleAlarm...                        actor=restorer
{"level":"info","ts":"2024-07-20T10:51:29.359Z","caller":"embed/etcd.go:363","msg":"closing etcd server","name":"default","data-dir":"/tmp/data","advertise-peer-urls":["http://localhost:0"],"advertise-client-urls":["http://localhost:0"]}
{"level":"info","ts":"2024-07-20T10:51:29.359Z","caller":"etcdserver/server.go:1485","msg":"skipped leadership transfer for single voting member cluster","local-member-id":"8e9e05c52164694d","current-leader-member-id":"8e9e05c52164694d"}
{"level":"info","ts":"2024-07-20T10:51:29.359Z","caller":"embed/etcd.go:558","msg":"stopping serving peer traffic","address":"127.0.0.1:33861"}
{"level":"info","ts":"2024-07-20T10:51:29.359Z","caller":"embed/etcd.go:565","msg":"stopped serving peer traffic","address":"127.0.0.1:33861"}
{"level":"info","ts":"2024-07-20T10:51:29.359Z","caller":"embed/etcd.go:367","msg":"closed etcd server","name":"default","data-dir":"/tmp/data","advertise-peer-urls":["http://localhost:0"],"advertise-client-urls":["http://localhost:0"]}
FATA[0018] Failed to restore snapshot: snapshot revision verification failed for delta snapshot Incr-55795465-55796279-1721466724.gz : mismatched event revision while applying delta snapshot, expected 55796279 but applied 55797093

emaildanwilson added the kind/bug Bug label Jul 9, 2024

renormalize self-assigned this Jul 10, 2024

anveshreddy18 self-assigned this Jul 10, 2024

gardener-robot assigned ishan16696 Jul 15, 2024

ishan16696 mentioned this issue Jul 21, 2024

Etcd send a corrupt snapshot or missing hash snapshot to a snapshot api call which causes the restoration to fail. etcd-io/etcd#18340

Open

4 tasks

anveshreddy18 removed their assignment Jul 31, 2024

ishan16696 added this to the v0.31.0 milestone Aug 1, 2024

ishan16696 unassigned renormalize Aug 19, 2024

ishan16696 mentioned this issue Sep 23, 2024

[Enhancement] Restoration failed due to full-snapshot taken by backup-restore is found to be corrupted or not having a hash #778

Open

ishan16696 modified the milestones: v0.31.0, v0.33.0 Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Failed to restore snapshot - snapshot missing hash #749

[BUG] Failed to restore snapshot - snapshot missing hash #749

emaildanwilson commented Jul 9, 2024

renormalize commented Jul 10, 2024

anveshreddy18 commented Jul 10, 2024

emaildanwilson commented Jul 10, 2024

jvassev commented Jul 10, 2024

renormalize commented Jul 11, 2024 •

edited

Loading

ishan16696 commented Jul 15, 2024

ishan16696 commented Jul 16, 2024 •

edited

Loading

ishan16696 commented Jul 17, 2024

jvassev commented Jul 20, 2024

[BUG] Failed to restore snapshot - snapshot missing hash #749

[BUG] Failed to restore snapshot - snapshot missing hash #749

Comments

emaildanwilson commented Jul 9, 2024

renormalize commented Jul 10, 2024

anveshreddy18 commented Jul 10, 2024

emaildanwilson commented Jul 10, 2024

jvassev commented Jul 10, 2024

renormalize commented Jul 11, 2024 • edited Loading

ishan16696 commented Jul 15, 2024

ishan16696 commented Jul 16, 2024 • edited Loading

ishan16696 commented Jul 17, 2024

jvassev commented Jul 20, 2024

renormalize commented Jul 11, 2024 •

edited

Loading

ishan16696 commented Jul 16, 2024 •

edited

Loading