Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initialHeight == 0 not treated identically to initialHeight == 1 #5986

Closed
silasdavis opened this issue Jan 26, 2021 · 4 comments · Fixed by #6059
Closed

initialHeight == 0 not treated identically to initialHeight == 1 #5986

silasdavis opened this issue Jan 26, 2021 · 4 comments · Fixed by #6059
Assignees
Labels
C:sync Component: Fast Sync, State Sync T:bug-unconfirmed Type Bug (has been reported but not verified or triaged)

Comments

@silasdavis
Copy link
Contributor

This change: #5191

Introduced support for an arbitrary initialHeight in genesis. The spec:

https://github.com/tendermint/spec/blob/master/rfc/002-nonzero-genesis.md

States:

A new field initial_height is added to the genesis file, defaulting to 1. It can be set to any
non-negative integer, and 0 is considered equivalent to 1.

However, currently this is not the case in the following code:

https://github.com/tendermint/tendermint/blob/master/state/execution.go#L348-L349

Here, if initialHeight is 0 then loading validators will throw. In Burrow we use a checkpointing mechanism that will replay the previous block during a crash, if we have a chain with just the initial block then InfoSync will announce block height 0 (which also has meaning for 'state restored' chains) and this line will get hit with store.LoadValidators(0) and fail. The old behaviour avoided this because it checked block.Height > 1

Here is my suggested fix: #5985

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jan 27, 2021

As mentioned in the PR, the genesis state is normalized with InitialHeight = 1 in GenesisDoc.ValidateAndComplete(), and this is already called in MakeGenesisState(), so the suggested fix is redundant.

I don't doubt that you're seeing an error, but I've tried to trace all the calls paths to getBeginBlockValidatorInfo() and couldn't find anything obviously wrong -- there's a ton of them though, and they're not always easy to reason about. If you have any further details about how the node is configured and started then that'd be very helpful in tracking this down -- or even better, a repro case.

@tessr tessr added C:sync Component: Fast Sync, State Sync T:bug-unconfirmed Type Bug (has been reported but not verified or triaged) labels Feb 1, 2021
@tessr
Copy link
Contributor

tessr commented Feb 1, 2021

Putting this one in the icebox until we have more details.

@erikgrinaker
Copy link
Contributor

Closing this due to inactivity.

@silasdavis
Copy link
Contributor Author

silasdavis commented Feb 6, 2021

Sorry about the tardiness on getting back on this one, needed to find some time to dig.

You are quite right my kneejerk PR does not fix the issue.

We create the Tendermint node programmatically and pass in a GenesisDocProvider here: https://github.com/hyperledger/burrow/blob/5ec0c8b2fee8e6beb26aed71ee88912750058ce9/consensus/tendermint/tendermint.go#L74-L76.

Tendermint saves the genesis doc from the provider here:

tendermint/node/node.go

Lines 1426 to 1434 in b9b55db

genDoc, err = genesisDocProvider()
if err != nil {
return sm.State{}, nil, err
}
// save genesis doc to prevent a certain class of user errors (e.g. when it
// was changed, accidentally or not). Also good for audit trail.
if err := saveGenesisDoc(stateDB, genDoc); err != nil {
return sm.State{}, nil, err
}

When we restart tendermint performs a replay loading the original genesis doc from the function linked above. This genesis doc in our case has a zeroed initialHeight which is passed down to:

lastValSet, err := store.LoadValidators(block.Height - 1)
if err != nil {
panic(err)
}
which panics. If initialHeight had been normalised to 1 we'd never enter that block.

So it seems like we should be calling ValidateAndComplete() when we ingest the genesis doc earlier than currently in MakeGenesisState

I've opened a PR that I think would fix this (just quickly to illustrate, untested! soz): #6059

Here is the stack trace:

panic: could not find validator set for height #0 [recovered]
        panic: could not find validator set for height #0

goroutine 1 [running]:
github.com/jawher/mow.cli/internal/flow.(*Step).Run(0xc000e58660, 0x15189e0, 0x228be40)
        /home/silas/code/go/pkg/mod/github.com/jawher/[email protected]/internal/flow/flow.go:39 +0xd4
github.com/jawher/mow.cli/internal/flow.(*Step).Run(0xc000e587e0, 0x15189e0, 0x228be40)
        /home/silas/code/go/pkg/mod/github.com/jawher/[email protected]/internal/flow/flow.go:29 +0xb6
github.com/jawher/mow.cli/internal/flow.(*Step).Run(0xc000e59740, 0x15189e0, 0x228be40)
        /home/silas/code/go/pkg/mod/github.com/jawher/[email protected]/internal/flow/flow.go:29 +0xb6
github.com/jawher/mow.cli/internal/flow.(*Step).callDo.func1(0xc000e59770, 0x0, 0x0)
        /home/silas/code/go/pkg/mod/github.com/jawher/[email protected]/internal/flow/flow.go:52 +0x4e
panic(0x15189e0, 0x228be40)
        /usr/lib/go/src/runtime/panic.go:969 +0x1b9
github.com/tendermint/tendermint/state.getBeginBlockValidatorInfo(0xc00000c1e0, 0x19cb620, 0xc001050630, 0x0, 0xc000f940c0, 0x20, 0x20, 0xc000f940e0)
        /home/silas/code/go/pkg/mod/github.com/tendermint/[email protected]/state/execution.go:346 +0x4f9
github.com/tendermint/tendermint/state.execBlockOnProxyApp(0x19bb2a0, 0xc000110580, 0x19c6260, 0xc001051100, 0xc00000c1e0, 0x19cb620, 0xc001050630, 0x0, 0x165, 0xc000f6a4e0, ...)
        /home/silas/code/go/pkg/mod/github.com/tendermint/[email protected]/state/execution.go:293 +0x1f2
github.com/tendermint/tendermint/state.ExecCommitBlock(0x19c6260, 0xc001051100, 0xc00000c1e0, 0x19bb2a0, 0xc000110580, 0x19cb620, 0xc001050630, 0x0, 0xc00052afc0, 0x14, ...)
        /home/silas/code/go/pkg/mod/github.com/tendermint/[email protected]/state/execution.go:534 +0x8d
github.com/tendermint/tendermint/consensus.(*Handshaker).replayBlocks(0xc0004eac70, 0xb, 0x0, 0x0, 0x0, 0xc0005a92a0, 0x19, 0x1, 0x1, 0xc0005a9300, ...)
        /home/silas/code/go/pkg/mod/github.com/tendermint/[email protected]/consensus/replay.go:471 +0x26d
github.com/tendermint/tendermint/consensus.(*Handshaker).ReplayBlocks(0xc0004eac70, 0xb, 0x0, 0x0, 0x0, 0xc0005a92a0, 0x19, 0x1, 0x1, 0xc0005a9300, ...)
        /home/silas/code/go/pkg/mod/github.com/tendermint/[email protected]/consensus/replay.go:393 +0x930
github.com/tendermint/tendermint/consensus.(*Handshaker).Handshake(0xc0004eac70, 0x19cebe0, 0xc000593520, 0xc000e67140, 0xaaaaaaaaaaaaaaaa)
        /home/silas/code/go/pkg/mod/github.com/tendermint/[email protected]/consensus/replay.go:268 +0x4d8
github.com/tendermint/tendermint/node.doHandshake(0x19cb620, 0xc001050630, 0xb, 0x0, 0x0, 0x0, 0xc0005a92a0, 0x19, 0x1, 0x1, ...)
        /home/silas/code/go/pkg/mod/github.com/tendermint/[email protected]/node/node.go:277 +0x1d8
github.com/tendermint/tendermint/node.NewNode(0xc001088140, 0x19ac9a0, 0xc00013d120, 0xc000e99380, 0x198abe0, 0xc00013d400, 0xc0004eb5a8, 0xc0004eb598, 0xc000e99310, 0x19bb2a0, ...)
        /home/silas/code/go/pkg/mod/github.com/tendermint/[email protected]/node/node.go:685 +0x1e05
github.com/hyperledger/burrow/consensus/tendermint.NewNode(0xc001088140, 0x19ac9a0, 0xc00013d120, 0xc000115000, 0xc00016b830, 0xc000e99310, 0xc000dee0f0, 0x19c5a20, 0xc0003a9680, 0x198a3e0)
        /home/silas/code/go/src/github.com/hyperledger/burrow/consensus/tendermint/tendermint.go:72 +0x3a5
github.com/hyperledger/burrow/core.(*Kernel).LoadTendermintFromConfig(0xc000e88360, 0xc0001a1630, 0x19ac9a0, 0xc00013d120, 0x19ac9a0, 0xc00013d120)
        /home/silas/code/go/src/github.com/hyperledger/burrow/core/config.go:94 +0x8a6
github.com/hyperledger/burrow/core.LoadKernelFromConfig(0xc0001a1630, 0x16b3e6c, 0x1b, 0xc000e657f0)
        /home/silas/code/go/src/github.com/hyperledger/burrow/core/config.go:133 +0x525
github.com/hyperledger/burrow/cmd/burrow/commands.Start.func1.1()
        /home/silas/code/go/src/github.com/hyperledger/burrow/cmd/burrow/commands/start.go:25 +0x138
github.com/jawher/mow.cli/internal/flow.(*Step).callDo(0xc000e59770, 0x0, 0x0)
        /home/silas/code/go/pkg/mod/github.com/jawher/[email protected]/internal/flow/flow.go:55 +0x70
github.com/jawher/mow.cli/internal/flow.(*Step).Run(0xc000e59770, 0x0, 0x0)
        /home/silas/code/go/pkg/mod/github.com/jawher/[email protected]/internal/flow/flow.go:25 +0x45
github.com/jawher/mow.cli/internal/flow.(*Step).Run(0xc000e59710, 0x0, 0x0)
        /home/silas/code/go/pkg/mod/github.com/jawher/[email protected]/internal/flow/flow.go:29 +0xb6
github.com/jawher/mow.cli/internal/flow.(*Step).Run(0xc000e587b0, 0x0, 0x0)
        /home/silas/code/go/pkg/mod/github.com/jawher/[email protected]/internal/flow/flow.go:29 +0xb6
github.com/jawher/mow.cli/internal/flow.(*Step).Run(0xc00101fef0, 0x0, 0x0)
        /home/silas/code/go/pkg/mod/github.com/jawher/[email protected]/internal/flow/flow.go:29 +0xb6
github.com/jawher/mow%2ecli.(*Cmd).parse(0xc0001b8b00, 0xc00003c0b0, 0x3, 0x3, 0xc00101fef0, 0xc000e587b0, 0xc000e587e0, 0x203000, 0x7)
        /home/silas/code/go/pkg/mod/github.com/jawher/[email protected]/commands.go:693 +0x545
github.com/jawher/mow%2ecli.(*Cmd).parse(0xc0001b8a00, 0xc00003c0a0, 0x4, 0x6, 0xc00101fef0, 0xc00101fef0, 0xc000e58660, 0x30, 0x15ad740)
        /home/silas/code/go/pkg/mod/github.com/jawher/[email protected]/commands.go:707 +0x78e
github.com/jawher/mow%2ecli.(*Cli).parse(0xc000e0ad80, 0xc00003c080, 0x6, 0x6, 0xc00101fef0, 0xc00101fef0, 0xc000e58660, 0x16bc17d, 0x1f)
        /home/silas/code/go/pkg/mod/github.com/jawher/[email protected]/cli.go:76 +0x85
github.com/jawher/mow%2ecli.(*Cli).Run(0xc000e0ad80, 0xc00003c070, 0x7, 0x7, 0x0, 0x0)
        /home/silas/code/go/pkg/mod/github.com/jawher/[email protected]/cli.go:105 +0x129
main.main()
        /home/silas/code/go/src/github.com/hyperledger/burrow/cmd/burrow/main.go:15 +0x77

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:sync Component: Fast Sync, State Sync T:bug-unconfirmed Type Bug (has been reported but not verified or triaged)
Projects
None yet
3 participants