Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] UnbondSupplier with 0 balance causes consensus failure #841

Closed
7 tasks
okdas opened this issue Sep 26, 2024 · 12 comments
Closed
7 tasks

[Bug] UnbondSupplier with 0 balance causes consensus failure #841

okdas opened this issue Sep 26, 2024 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@okdas
Copy link
Member

okdas commented Sep 26, 2024

Objective

There's an issue on main branch that can cause a chain halt.

Origin Document

10:29PM INF Stored block hash at height 70 EndBlock=SessionModuleEndBlock module=x/session
10:29PM INF found 0 expiring claims at block height 70 method=SettlePendingClaims module=x/tokenomics
10:29PM INF settled 0 and expired 0 claims at block height 70 method=SettlePendingClaims module=x/tokenomics
10:29PM INF settled 0 claims and expired 0 claims method=EndBlocker module=x/tokenomics
10:29PM INF successfully updated the relay mining difficulty for 0 services method=EndBlocker module=x/tokenomics
10:29PM ERR could not send 0upokt coins from module supplier to account pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj due to 0upokt: invalid coins method=UnbondSupplier module=x/supplier
10:29PM ERR CONSENSUS FAILURE!!! err="runtime error: invalid memory address or nil pointer dereference" module=consensus stack="goroutine 191 [running]:\nruntime/debug.Stack()\n\t/opt/homebrew/Cellar/go/1.23.1/libexec/src/runtime/debug/stack.go:26 +0x64\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:801 +0x4c\npanic({0x3ea9f40?, 0x7079230?})\n\t/opt/homebrew/Cellar/go/1.23.1/libexec/src/runtime/panic.go:785 +0xf0\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock.func1()\n\t/Users/dk/gopath/pkg/mod/github.com/cosmos/[email protected]/baseapp/abci.go:860 +0x124\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock(0x4000298d88, 0x4002d7f680)\n\t/Users/dk/gopath/pkg/mod/github.com/cosmos/[email protected]/baseapp/abci.go:892 +0x374\ngithub.com/cosmos/cosmos-sdk/server.cometABCIWrapper.FinalizeBlock({{0xffff72f58d10, 0x4000a5d308}}, {0x524a248, 0x713f6e0}, 0x4002d7f680)\n\t/Users/dk/gopath/pkg/mod/github.com/cosmos/[email protected]/server/cmt_abci.go:44 +0x54\ngithub.com/cometbft/cometbft/abci/client.(*localClient).FinalizeBlock(0x400327b3e0, {0x524a248, 0x713f6e0}, 0x4002d7f680)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/abci/client/local_client.go:185 +0xf8\ngithub.com/cometbft/cometbft/proxy.(*appConnConsensus).FinalizeBlock(0x4003231518, {0x524a248, 0x713f6e0}, 0x4002d7f680)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/proxy/app_conn.go:104 +0x1d0\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).applyBlock(_, {{{0xb, 0x0}, {0x4001cabd19, 0x7}}, {0x4001cabd50, 0x8}, 0x1, 0x45, {{0x400321ed40, ...}, ...}, ...}, ...)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/state/execution.go:224 +0x3c0\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).ApplyVerifiedBlock(_, {{{0xb, 0x0}, {0x4001cabd19, 0x7}}, {0x4001cabd50, 0x8}, 0x1, 0x45, {{0x400321ed40, ...}, ...}, ...}, ...)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/state/execution.go:202 +0xd8\ngithub.com/cometbft/cometbft/consensus.(*State).finalizeCommit(0x40013ca008, 0x46)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:1772 +0xd50\ngithub.com/cometbft/cometbft/consensus.(*State).tryFinalizeCommit(0x40013ca008, 0x46)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:1682 +0x2c0\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit.func1()\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:1617 +0xb8\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit(0x40013ca008, 0x46, 0x0)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:1655 +0xd90\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0x40013ca008, 0x40011e1520, {0x0, 0x0})\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:2335 +0x26c0\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0x40013ca008, 0x40011e1520, {0x0, 0x0})\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:2067 +0x50\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0x40013ca008, {{0x5213ae0, 0x4001e14030}, {0x0, 0x0}})\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:929 +0x5c0\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0x40013ca008, 0x0)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:856 +0x5fc\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 1\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:398 +0x1e4\n"
10:29PM INF service stop impl=baseWAL module=consensus msg="Stopping baseWAL service" wal=/root/.poktroll/data/cs.wal/wal
10:29PM INF service stop impl=Group module=consensus msg="Stopping Group service" wal=/root/.poktroll/data/cs.wal/wal

Steps to reproduce

  1. Start Localnet.
  2. Execute many requests: hey -n 10000 -c 2 -m POST -H "Content-Type: application/json" -d '{"method":"eth_blockNumber","params":[],"id":1,"jsonrpc":"2.0"}' http://localhost:42069/anvil.
  3. Look at the logs as supplier balance drains. When it hits 0 the exception will appear.

Goals

  • Address the edge case where supplier balance reaches zero
  • Check other places where similar logic needs to be verified (e.g., application balance)

Deliverables

  • Implement a fix for the edge case when supplier balance reaches zero
  • Review and update similar logic in other parts of the codebase if necessary
  • Add unit tests to cover the edge case and prevent regression

Non-goals / Non-deliverables

  • Refactoring unrelated parts of the codebase
  • Implementing new features unrelated to this specific issue

General deliverables

  • Comments: Add/update TODOs and comments alongside the source code so it is easier to follow.
  • Testing: Add new tests (unit and/or E2E) to the test suite.
  • Makefile: Add new targets to the Makefile to make the new functionality easier to use.
  • Documentation: Update architectural or development READMEs; use mermaid diagrams where appropriate.

Creator: @okdas
Co-Owners: TBD

@okdas okdas added the bug Something isn't working label Sep 26, 2024
@okdas okdas added this to the Shannon Beta TestNet Launch milestone Sep 26, 2024
@okdas okdas added this to Shannon Sep 26, 2024
@okdas okdas moved this to 🔖 Ready in Shannon Sep 26, 2024
@okdas
Copy link
Member Author

okdas commented Sep 27, 2024

Seems to be a similar but a different issue.


12:17AM INF Stored block hash at height 650 EndBlock=SessionModuleEndBlock module=x/session
12:17AM INF found 1 expiring claims at block height 650 method=SettlePendingClaims module=x/tokenomics
12:17AM INF claim does not require proof due to claimed amount (78750upokt) being less than the threshold (20000000upokt) and random sample (0.75) being greater than probability (0.25) method=proofRequirementForClaim module=server
12:17AM INF Claim by supplier pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj IS WITHIN LIMITS of servicing application pokt1mrqt5f7qh8uxs27cjm9t7v9e74a9vvdnq5jva4. Max claimable amount >= Claim amount: 6666601310upokt >= 78750 application=pokt1mrqt5f7qh8uxs27cjm9t7v9e74a9vvdnq5jva4 claim_settlement_upokt=78750 helper=ensureClaimAmountLimits method=ProcessTokenLogicModules module=x/tokenomics num_claim_compute_units=1875 num_relays=1875 service_id=anvil session_id=6b8e37440824da10f528cc54fb4766bfbcc760bf1f16433a05124a3689c18db0 supplier_operator=pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
12:17AM INF About to start processing TLMs for (1875) compute units, equal to (78750upokt) claimed actual_settlement_upokt=78750upokt application=pokt1mrqt5f7qh8uxs27cjm9t7v9e74a9vvdnq5jva4 claim_settlement_upokt=78750 method=ProcessTokenLogicModules module=x/tokenomics num_claim_compute_units=1875 num_relays=1875 service_id=anvil session_id=6b8e37440824da10f528cc54fb4766bfbcc760bf1f16433a05124a3689c18db0 supplier_operator=pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
12:17AM INF Starting TLM processing: "TLMRelayBurnEqualsMint" actual_settlement_upokt=78750upokt application=pokt1mrqt5f7qh8uxs27cjm9t7v9e74a9vvdnq5jva4 claim_settlement_upokt=78750 method=ProcessTokenLogicModules module=x/tokenomics num_claim_compute_units=1875 num_relays=1875 service_id=anvil session_id=6b8e37440824da10f528cc54fb4766bfbcc760bf1f16433a05124a3689c18db0 supplier_operator=pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
12:17AM INF sent 78750upokt from the supplier module to the supplier shareholder with address "pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj" method=distributeSupplierRewardsToShareHolders module=x/tokenomics
12:17AM INF distributed 78750 uPOKT to supplier "pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj" shareholders method=distributeSupplierRewardsToShareHolders module=x/tokenomics
12:17AM ERR error processing token logic modules for claim "6b8e37440824da10f528cc54fb4766bfbcc760bf1f16433a05124a3689c18db0": TLM "TLMRelayBurnEqualsMint": burning 78750upokt from the application module account: spendable balance 19662upokt is smaller than 78750upokt: insufficient funds [cosmos/[email protected]/x/bank/keeper/send.go:278]: failed to burn uPOKT from application module account [/Users/dk/gopath/pkg/mod/cosmossdk.io/[email protected]/errors.go:155]: failed to process TLM [/Users/dk/gopath/pkg/mod/cosmossdk.io/[email protected]/errors.go:155] module=server num_claim_compute_units=1875 num_relays_in_session_tree=1875 proof_requirement=NOT_REQUIRED session_id=6b8e37440824da10f528cc54fb4766bfbcc760bf1f16433a05124a3689c18db0 supplier_operator_address=pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
12:17AM ERR could not settle pending claims due to error TLM "TLMRelayBurnEqualsMint": burning 78750upokt from the application module account: spendable balance 19662upokt is smaller than 78750upokt: insufficient funds [cosmos/[email protected]/x/bank/keeper/send.go:278]: failed to burn uPOKT from application module account [/Users/dk/gopath/pkg/mod/cosmossdk.io/[email protected]/errors.go:155]: failed to process TLM [/Users/dk/gopath/pkg/mod/cosmossdk.io/[email protected]/errors.go:155] method=EndBlocker module=x/tokenomics
12:17AM INF Closing application.db module=server
12:17AM INF Closing snapshots/metadata.db module=server

@okdas
Copy link
Member Author

okdas commented Sep 27, 2024

Replicated to gather logs.

Note:

appgate.txt
relayminer.txt
validator.txt

@okdas
Copy link
Member Author

okdas commented Sep 27, 2024

Another similar but different issue:


7:47PM INF Stored block hash at height 590 EndBlock=SessionModuleEndBlock module=x/session
7:47PM INF found 1 expiring claims at block height 590 method=SettlePendingClaims module=x/tokenomics
7:47PM INF claim requires proof due to random sample (0.11) being less than or equal to probability (0.25) method=proofRequirementForClaim module=server
7:47PM INF claim expired due to PROOF_MISSING module=server num_claim_compute_units=1871 num_relays_in_session_tree=1871 proof_requirement=PROBABILISTIC session_id=25a2006d70a05959561149ae36edb13396115ddf20db0a11c2ac83e5aee2fd33 supplier_operator_address=pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
7:47PM INF deleted claim with primary key 25a2006d70a05959561149ae36edb13396115ddf20db0a11c2ac83e5aee2fd33/pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj for supplier pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj and session 25a2006d70a05959561149ae36edb13396115ddf20db0a11c2ac83e5aee2fd33 method=RemoveClaim module=x/proof
7:47PM ERR error slashing supplier pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj: spendable balance 1000068upokt is smaller than 320000000upokt: insufficient funds module=server num_claim_compute_units=1871 num_relays_in_session_tree=1871 proof_requirement=PROBABILISTIC session_id=25a2006d70a05959561149ae36edb13396115ddf20db0a11c2ac83e5aee2fd33 supplier_operator_address=pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj
7:47PM ERR could not settle pending claims due to error spendable balance 1000068upokt is smaller than 320000000upokt: insufficient funds [cosmos/[email protected]/x/bank/keeper/send.go:278] method=EndBlocker module=x/tokenomics
7:47PM ERR CONSENSUS FAILURE!!! err="runtime error: invalid memory address or nil pointer dereference" module=consensus stack="goroutine 191 [running]:\nruntime/debug.Stack()\n\t/opt/homebrew/Cellar/go/1.23.1/libexec/src/runtime/debug/stack.go:26 +0x64\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:801 +0x4c\npanic({0x3ea9f40?, 0x7079210?})\n\t/opt/homebrew/Cellar/go/1.23.1/libexec/src/runtime/panic.go:785 +0xf0\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock.func1()\n\t/Users/dk/gopath/pkg/mod/github.com/cosmos/[email protected]/baseapp/abci.go:860 +0x124\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock(0x40011b2b48, 0x400527b2c0)\n\t/Users/dk/gopath/pkg/mod/github.com/cosmos/[email protected]/baseapp/abci.go:892 +0x374\ngithub.com/cosmos/cosmos-sdk/server.cometABCIWrapper.FinalizeBlock({{0xffff5048a0f8, 0x4000bc0008}}, {0x524a288, 0x713f6c0}, 0x400527b2c0)\n\t/Users/dk/gopath/pkg/mod/github.com/cosmos/[email protected]/server/cmt_abci.go:44 +0x54\ngithub.com/cometbft/cometbft/abci/client.(*localClient).FinalizeBlock(0x4001f1cd80, {0x524a288, 0x713f6c0}, 0x400527b2c0)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/abci/client/local_client.go:185 +0xf8\ngithub.com/cometbft/cometbft/proxy.(*appConnConsensus).FinalizeBlock(0x4001382d80, {0x524a288, 0x713f6c0}, 0x400527b2c0)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/proxy/app_conn.go:104 +0x1d0\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).applyBlock(_, {{{0xb, 0x0}, {0x4002c060f9, 0x7}}, {0x4002c06b80, 0x8}, 0x1, 0x24d, {{0x4006be47c0, ...}, ...}, ...}, ...)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/state/execution.go:224 +0x3c0\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).ApplyVerifiedBlock(_, {{{0xb, 0x0}, {0x4002c060f9, 0x7}}, {0x4002c06b80, 0x8}, 0x1, 0x24d, {{0x4006be47c0, ...}, ...}, ...}, ...)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/state/execution.go:202 +0xd8\ngithub.com/cometbft/cometbft/consensus.(*State).finalizeCommit(0x40012dbc08, 0x24e)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:1772 +0xd50\ngithub.com/cometbft/cometbft/consensus.(*State).tryFinalizeCommit(0x40012dbc08, 0x24e)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:1682 +0x2c0\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit.func1()\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:1617 +0xb8\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit(0x40012dbc08, 0x24e, 0x0)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:1655 +0xd90\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0x40012dbc08, 0x4004f58270, {0x0, 0x0})\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:2335 +0x26c0\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0x40012dbc08, 0x4004f58270, {0x0, 0x0})\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:2067 +0x50\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0x40012dbc08, {{0x5213b20, 0x4005033df8}, {0x0, 0x0}})\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:929 +0x5c0\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0x40012dbc08, 0x0)\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:856 +0x5fc\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 1\n\t/Users/dk/gopath/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:398 +0x1e4\n"
7:47PM INF service stop impl=baseWAL module=consensus msg="Stopping baseWAL service" wal=/root/.poktroll/data/cs.wal/wal
7:47PM INF service stop impl=Group module=consensus msg="Stopping Group service" wal=/root/.poktroll/data/cs.wal/wal

Attaching relayminer logs - though nothing there seems relevant.

relayminer_3.txt

@bryanchriswhite
Copy link
Contributor

bryanchriswhite commented Oct 2, 2024

@okdas Regarding the origin document in the description: it seems to me that what's happening is the supplier module account has a 0 balance (as opposed to any individual supplier).

  1. It's not yet clear to me how the supplier module account balance is reaching zero unless something is wrong with minting or distribution. Perhaps the supplier's address is not included in the rev_share_percent (see our docs; note the warning)?
  2. It seems like the issue happens when a supplier tries to unbond while the supplier module account balance is 0. It's interesting that it seems like the stake amount that unbonding is trying to send back to the supplier is also 0. I'm not sure how a supplier could wind up being staked with 0 stake but that would be consistent with the lack of tokens in the supplier module account. 🤔

@bryanchriswhite
Copy link
Contributor

bryanchriswhite commented Oct 2, 2024

Perhaps the supplier's address is not included in the rev_share_percent (see our docs; note the warning)

This 👆 doesn't seem to be the case, given what's in config.yml. 🤔

Could it be that the supplier is being slashed to 0?

@okdas
Copy link
Member Author

okdas commented Oct 4, 2024

Could it be that the supplier is being slashed to 0?

Just replicated and checked the balances (apps, suppliers, modules) - they didn't seem to go to 0. It's possible they might have and the RPC only provides a presentation of the data from the previous (correct, working) block. 🤔

@okdas
Copy link
Member Author

okdas commented Oct 15, 2024

Just replicated this on main branch, the bug is still active:


{"level":"error","module":"server","session_id":"cdc88b54d7daec21b72d6eb64434a132b633f426478b3a9150e50d9b72d2ace3","supplier_operator_address":"pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj","num_claim_compute_units":894,"num_relays_in_session_tree":894,"num_estimated_compute_units":894,"claimed_upokt":"37548upokt","proof_requirement":"NOT_REQUIRED","time":"2024-10-15T01:24:06Z","message":"error processing token logic modules for claim \"cdc88b54d7daec21b72d6eb64434a132b633f426478b3a9150e50d9b72d2ace3\": TLM \"TLMRelayBurnEqualsMint\": burning 37548upokt from the application module account: spendable balance 28650upokt is smaller than 37548upokt: insufficient funds [cosmos/[email protected]/x/bank/keeper/send.go:278]: failed to burn uPOKT from application module account [/Users/dk/go/pkg/mod/cosmossdk.io/[email protected]/errors.go:155]: failed to process TLM [/Users/dk/go/pkg/mod/cosmossdk.io/[email protected]/errors.go:155]"}
{"level":"error","module":"server","module":"x/tokenomics","method":"EndBlocker","time":"2024-10-15T01:24:06Z","message":"could not settle pending claims due to error TLM \"TLMRelayBurnEqualsMint\": burning 37548upokt from the application module account: spendable balance 28650upokt is smaller than 37548upokt: insufficient funds [cosmos/[email protected]/x/bank/keeper/send.go:278]: failed to burn uPOKT from application module account [/Users/dk/go/pkg/mod/cosmossdk.io/[email protected]/errors.go:155]: failed to process TLM [/Users/dk/go/pkg/mod/cosmossdk.io/[email protected]/errors.go:155]"}
{"level":"error","module":"server","module":"consensus","err":"runtime error: invalid memory address or nil pointer dereference","stack":"goroutine 195 [running]:\nruntime/debug.Stack()\n\t/opt/homebrew/Cellar/go/1.23.2/libexec/src/runtime/debug/stack.go:26 +0x64\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:801 +0x4c\npanic({0x3f0b5a0?, 0x710a990?})\n\t/opt/homebrew/Cellar/go/1.23.2/libexec/src/runtime/panic.go:785 +0xf0\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock.func1()\n\t/Users/dk/go/pkg/mod/github.com/cosmos/[email protected]/baseapp/abci.go:860 +0x124\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock(0x4000384fc8, 0x400507ba40)\n\t/Users/dk/go/pkg/mod/github.com/cosmos/[email protected]/baseapp/abci.go:892 +0x374\ngithub.com/cosmos/cosmos-sdk/server.cometABCIWrapper.FinalizeBlock({{0xffff3a72c660, 0x4001c0b908}}, {0x52b6368, 0x71d17e0}, 0x400507ba40)\n\t/Users/dk/go/pkg/mod/github.com/cosmos/[email protected]/server/cmt_abci.go:44 +0x54\ngithub.com/cometbft/cometbft/abci/client.(*localClient).FinalizeBlock(0x40012898c0, {0x52b6368, 0x71d17e0}, 0x400507ba40)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/abci/client/local_client.go:185 +0xf8\ngithub.com/cometbft/cometbft/proxy.(*appConnConsensus).FinalizeBlock(0x4000d00018, {0x52b6368, 0x71d17e0}, 0x400507ba40)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/proxy/app_conn.go:104 +0x1d0\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).applyBlock(_, {{{0xb, 0x0}, {0x4001a73a08, 0x7}}, {0x4001a73a30, 0x8}, 0x1, 0xb3, {{0x40045e1640, ...}, ...}, ...}, ...)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/state/execution.go:224 +0x3c0\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).ApplyVerifiedBlock(_, {{{0xb, 0x0}, {0x4001a73a08, 0x7}}, {0x4001a73a30, 0x8}, 0x1, 0xb3, {{0x40045e1640, ...}, ...}, ...}, ...)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/state/execution.go:202 +0xd8\ngithub.com/cometbft/cometbft/consensus.(*State).finalizeCommit(0x4001b00008, 0xb4)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:1772 +0xd50\ngithub.com/cometbft/cometbft/consensus.(*State).tryFinalizeCommit(0x4001b00008, 0xb4)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:1682 +0x2c0\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit.func1()\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:1617 +0xb8\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit(0x4001b00008, 0xb4, 0x0)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:1655 +0xd90\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0x4001b00008, 0x4006d8d110, {0x0, 0x0})\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:2335 +0x26c0\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0x4001b00008, 0x4006d8d110, {0x0, 0x0})\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:2067 +0x50\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0x4001b00008, {{0x527f8c0, 0x40052ba518}, {0x0, 0x0}})\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:929 +0x5c0\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0x4001b00008, 0x0)\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:856 +0x5fc\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 1\n\t/Users/dk/go/pkg/mod/github.com/cometbft/[email protected]/consensus/state.go:398 +0x1e4\n","time":"2024-10-15T01:24:06Z","message":"CONSENSUS FAILURE!!!"}
{"level":"info","module":"server","module":"consensus","wal":"/root/.poktroll/data/cs.wal/wal","msg":"Stopping baseWAL service","impl":"baseWAL","time":"2024-10-15T01:24:06Z","message":"service stop"}

@okdas
Copy link
Member Author

okdas commented Oct 18, 2024

I was working on a service that consumes events and there were a couple of events I randomly caught before the node died:


EventDataNewBlock: Block 6fdf1ec07a57e68138457d91391d764f5a95e30dc8b0c06dd10a02081a2bad9d - Height: 110 
Attribute: Key=claim, Value={"supplier_operator_address":"pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj","session_header":{"application_address":"pokt1mrqt5f7qh8uxs27cjm9t7v9e74a9vvdnq5jva4","service_id":"anvil","session_id":"0f33ddca0526c75ad52a107cb9a2188f9312e123752896be75cd0d310f63a397","session_start_block_height":"91","session_end_block_height":"100"},"root_hash":"EjMu2HvB00rrqkZfcm069xE+wyxetUQs09C30yvK40cAAAAAAAAAYwAAAAAAAABj"}
Attribute: Key=claimed_upokt, Value={"denom":"upokt","amount":"4158"}
Attribute: Key=expiration_reason, Value="PROOF_MISSING"
Attribute: Key=num_claimed_compute_units, Value="99"
Attribute: Key=num_estimated_compute_units, Value="99"
Attribute: Key=num_relays, Value="99"
Attribute: Key=mode, Value="EndBlock"
^ poktroll.tokenomics.EventClaimExpired
Attribute: Key=num_expired_claims, Value="1"
Attribute: Key=slashing_amount, Value={"denom":"upokt","amount":"1000068"}
Attribute: Key=supplier_operator_addr, Value="pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj"
Attribute: Key=mode, Value="EndBlock"
^: poktroll.tokenomics.EventSupplierSlashed
EventDataNewBlock: Block fd552373a5131034c9767e5f5a8d2407e4ad53ceaa3d6473bad3c167464576fc - Height: 111 

119 was the last height before the validator had a nil pointer reference:


{"level":"info","module":"server","module":"state","height":119,"block_app_hash":"7CC5EE44F5C6CBB9E5FA4B51B61A9C610DB86614599C8CEA2BF6FA567A97A872","time":"2024-10-18T22:23:15Z","message":"committed state"}
{"level":"info","module":"server","module":"txindex","height":119,"time":"2024-10-18T22:23:15Z","message":"indexed block events"}
{"level":"info","module":"server","module":"consensus","dur":1979.909217,"height":120,"round":0,"step":"RoundStepNewHeight","time":"2024-10-18T22:23:17Z","message":"Timed out"}
{"level":"info","module":"server","module":"consensus","proposal":"Proposal{120/0 (09B71CA43583505BD48E198076DE9F7562674DDB56A6BB78FF38D4475352F896:1:6ECA0745CAFE, -1) B3135B062E2C @ 2024-10-18T22:23:17.175477048Z}","proposer":"CD585EC8B5BA4408462CB95B7B3937E11135D887","time":"2024-10-18T22:23:17Z","message":"received proposal"}
{"level":"info","module":"server","module":"consensus","height":120,"hash":"09B71CA43583505BD48E198076DE9F7562674DDB56A6BB78FF38D4475352F896","time":"2024-10-18T22:23:17Z","message":"received complete proposal block"}
{"level":"info","module":"server","module":"consensus","height":120,"hash":"09B71CA43583505BD48E198076DE9F7562674DDB56A6BB78FF38D4475352F896","root":"36B2306696BDE03FF0B117A62E14DB625F87F49F4CC69347D0EBBB9A3E2B2D69","num_txs":0,"time":"2024-10-18T22:23:17Z","message":"finalizing commit of block"}
{"level":"info","module":"server","module":"x/session","EndBlock":"SessionModuleEndBlock","time":"2024-10-18T22:23:17Z","message":"Stored block hash at height 120"}
{"level":"info","module":"server","module":"x/tokenomics","method":"SettlePendingClaims","time":"2024-10-18T22:23:17Z","message":"found 0 expiring claims at block height 120"}
{"level":"info","module":"server","module":"x/tokenomics","method":"SettlePendingClaims","time":"2024-10-18T22:23:17Z","message":"settled 0 and expired 0 claims at block height 120"}
{"level":"info","module":"server","module":"x/tokenomics","method":"EndBlocker","time":"2024-10-18T22:23:17Z","message":"settled 0 claims and expired 0 claims"}
{"level":"info","module":"server","module":"x/tokenomics","method":"EndBlocker","time":"2024-10-18T22:23:17Z","message":"successfully updated the relay mining difficulty for 0 services"}
{"level":"error","module":"server","module":"x/supplier","method":"UnbondSupplier","time":"2024-10-18T22:23:17Z","message":"could not send 0upokt coins from module supplier to account pokt19a3t4yunp0dlpfjrp7qwnzwlrzd5fzs2gjaaaj due to 0upokt: invalid coins"}

@Olshansk
Copy link
Member

@red-0ne Could you provide an update on whether this is fixed or not?

@red-0ne
Copy link
Contributor

red-0ne commented Oct 21, 2024

@red-0ne Could you provide an update on whether this is fixed or not?

This will be investigated this week. Was planning to have Path integrated first, but I think I'll do without it.

@Olshansk
Copy link
Member

@red-0ne Let's prioritize this as P0 because it's a MUCH smaller effort and is blocking @okdas + launch to Beta TestNet.

@red-0ne red-0ne moved this from 🔖 Ready to 🏗 In progress in Shannon Oct 22, 2024
red-0ne added a commit that referenced this issue Oct 23, 2024
## Summary

This PR fixes a few bugs that were dependent to each other:
1. Gracefully unbond suppliers that have 0upokt due to off-stake
slashing.
2. Fix access to an expired session tree on the realy miner
3. Fix proof block hash seed used on chain.

@okdas , One case I didn't manage to reproduce is having the application
module account go to `0`. Please tell me if you encounter it again.

## Issue

- #841 

## Type of change

Select one or more from the following:

- [ ] New feature, functionality or library
- [ ] Consensus breaking; add the `consensus-breaking` label if so. See
#791 for details
- [x] Bug fix
- [ ] Code health or cleanup
- [ ] Documentation
- [ ] Other (specify)

## Testing

- [ ] **Documentation**: `make docusaurus_start`; only needed if you
make doc changes
- [x] **Unit Tests**: `make go_develop_and_test`
- [x] **LocalNet E2E Tests**: `make test_e2e`
- [ ] **DevNet E2E Tests**: Add the `devnet-test-e2e` label to the PR.

## Sanity Checklist

- [x] I have tested my changes using the available tooling
- [x] I have commented my code
- [x] I have performed a self-review of my own code; both comments &
source code
@okdas
Copy link
Member Author

okdas commented Oct 29, 2024

This issue has been addressed - thank you @red-0ne

@okdas okdas closed this as completed Oct 29, 2024
@github-project-automation github-project-automation bot moved this from 🏗 In progress to ✅ Done in Shannon Oct 29, 2024
okdas pushed a commit that referenced this issue Nov 14, 2024
## Summary

This PR fixes a few bugs that were dependent to each other:
1. Gracefully unbond suppliers that have 0upokt due to off-stake
slashing.
2. Fix access to an expired session tree on the realy miner
3. Fix proof block hash seed used on chain.

@okdas , One case I didn't manage to reproduce is having the application
module account go to `0`. Please tell me if you encounter it again.

## Issue

- #841 

## Type of change

Select one or more from the following:

- [ ] New feature, functionality or library
- [ ] Consensus breaking; add the `consensus-breaking` label if so. See
#791 for details
- [x] Bug fix
- [ ] Code health or cleanup
- [ ] Documentation
- [ ] Other (specify)

## Testing

- [ ] **Documentation**: `make docusaurus_start`; only needed if you
make doc changes
- [x] **Unit Tests**: `make go_develop_and_test`
- [x] **LocalNet E2E Tests**: `make test_e2e`
- [ ] **DevNet E2E Tests**: Add the `devnet-test-e2e` label to the PR.

## Sanity Checklist

- [x] I have tested my changes using the available tooling
- [x] I have commented my code
- [x] I have performed a self-review of my own code; both comments &
source code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: ✅ Done
Development

No branches or pull requests

4 participants