core/rawdb: avoid exiting geth from freezer on fsync failure (ref #22112) #22118

holiman · 2021-01-05T08:40:20Z

This is a partial fix for #22112 .

In that ticket, an fsync operation failed -- unclear why. When that happens, the current code hard-exits, without any kind of recovery.

The current behaviour is wrong, and is prone to corrupt the database since we don't properly close anything. In the particular case where the error occurs, it's actually possible to just un-write the data that we just wrote (truncate), back off and try again later. That would probably have a higher chance of working.

If the error persists, it will just lead to the move-data-from-leveldb-to-ancient becomes a no-op, but it won't cause corruption.

…um#22112)

DGKSK8LIFE

AppVeyor CI tests are failing:

--- FAIL: TestTransactionPropagation65 (2.45s)
    handler_eth_test.go:459: sink 0: transaction propagation timed out: have 0, want 1024
FAIL
coverage: 21.4% of statements
FAIL	github.com/ethereum/go-ethereum/eth	16.913s

holiman · 2021-01-19T10:13:39Z

Closing this after discussion - it might be better to close (hard exit) and force a re-open of all files.

vdamle · 2021-01-22T18:07:14Z

Thanks for providing additional details on Discord, @holiman . For context, we are seeing on Azure that writing to Azure FS experiences transient failures which seem to go away on subsequent retries (which is obviously after a restart of geth, currently) such as:

CRIT [01-18|03:49:02.013] Failed to flush frozen tables            err="[sync /<path>/ethereum/geth/chaindata/ancient/bodies.cidx: input/output error]"

In such environments, we believe there's value in retrying on the next timer, to see if such a transient error has gone away. Please let me know your thoughts.

Ref: ethereum/go-ethereum#22118

core/rawdb: don't hard-exit from freezer on fsync failure (ref ethere…

5e1b86a

…um#22112)

holiman requested review from karalabe and rjl493456442 as code owners January 5, 2021 08:40

DGKSK8LIFE suggested changes Jan 5, 2021

View reviewed changes

holiman added the status:triage label Jan 12, 2021

fjl changed the title ~~core/rawdb: don't hard-exit from freezer on fsync failure (ref #22112)~~ core/rawdb: avoid exiting geth from freezer on fsync failure (ref #22112) Jan 19, 2021

holiman closed this Jan 19, 2021

vdamle pushed a commit to kaleido-io/quorum that referenced this pull request Jan 22, 2021

core/rawdb: avoid exiting geth from freezer on fsync failure

a483a6d

Ref: ethereum/go-ethereum#22118

vdamle pushed a commit to kaleido-io/quorum that referenced this pull request Jan 26, 2021

core/rawdb: avoid exiting geth from freezer on fsync failure

1327fa5

Ref: ethereum/go-ethereum#22118

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/rawdb: avoid exiting geth from freezer on fsync failure (ref #22112) #22118

core/rawdb: avoid exiting geth from freezer on fsync failure (ref #22112) #22118

holiman commented Jan 5, 2021

DGKSK8LIFE left a comment

holiman commented Jan 19, 2021

vdamle commented Jan 22, 2021

core/rawdb: avoid exiting geth from freezer on fsync failure (ref #22112) #22118

core/rawdb: avoid exiting geth from freezer on fsync failure (ref #22112) #22118

Conversation

holiman commented Jan 5, 2021

DGKSK8LIFE left a comment

Choose a reason for hiding this comment

holiman commented Jan 19, 2021

vdamle commented Jan 22, 2021