Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/slashing: Use single bit array to store all non-signed validator blocks #5461

Closed
wants to merge 3 commits into from

Conversation

dshulyak
Copy link
Contributor

@dshulyak dshulyak commented Jan 2, 2020

Instead of storing each index of the bit array as a separate key in iavl tree, this change stores single bit array for each validator. This makes iavl tree smaller, and greatly reduces time spent in a queries to leveldb.

To profile this change i cherry-picked it on current version used in gaia, and compared cpu profile with and without this change. You can find gaia branch with cherry-picked change at https://github.com/dshulyak/gaia/tree/bitarray-v2.0.3. It comes with two changes:

  • cosmos-sdk cherry-pick dshulyak@7b76694
  • tendermint change to disable app hash verification, due to the incompatible changes with current gaia application state. dshulyak/tendermint@2bda814 . basically a hack that allowed to get a profile from running app, won't be needed.

I am attachiing web view for cosmos-sdk.BeginBlock so that you can see difference with and without these changes:
without , with

If someone will need full profile i can attach it, but it is also possible to reproduce same results using gaiad branch i linked before.

changes

  • two new data types in x/lashing, bit array and vote array. bit array is simple implementation of this data structure that basically supports only set and get. vote array provides a proxy to bit array, as well as handles some other function, e.g it can change size of underlying bit array if request, or pretty print contents of the bit array
  • changes to keeper, methods to update/get/delete bit array for the validator
  • in HandleValidatorSignature bit array size can be changed, and based on new size some indexes might get discarded
  • export genesis includes only missed blocks now
  • some changes in simulation due to the keeper changes

todo

  • update spec once changes are finalized
  • fetch signed block window and min block window once per slashing.BeginBlock (will submit a different pr for that)

For contributor use:

  • Targeted PR against correct branch (see CONTRIBUTING.md)
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the module structure standards.
  • Wrote unit and integration tests
  • Updated relevant documentation (docs/) or specification (x/<module>/spec/)
  • Added relevant godoc comments.
  • Added a relevant changelog entry to the Unreleased section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer

For admin use:

  • Added appropriate labels to PR (ex. WIP, R4R, docs, etc)
  • Reviewers assigned
  • Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

Signed-off-by: dmitry <[email protected]>
array := types.NewVoteArray(size)
addr := sdk.ConsAddress(Addrs[0])

misses := map[int64]struct{}{1: struct{}{}, 10: struct{}{}, 29: struct{}{}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File is not gofmt-ed with -s (from gofmt)

Suggested change
misses := map[int64]struct{}{1: struct{}{}, 10: struct{}{}, 29: struct{}{}}
misses := map[int64]struct{}{1: {}, 10: {}, 29: {}}

@codecov
Copy link

codecov bot commented Jan 2, 2020

Codecov Report

Merging #5461 into master will increase coverage by 0.02%.
The diff coverage is 56.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5461      +/-   ##
==========================================
+ Coverage    54.5%   54.53%   +0.02%     
==========================================
  Files         318      320       +2     
  Lines       19131    19200      +69     
==========================================
+ Hits        10428    10471      +43     
- Misses       7917     7943      +26     
  Partials      786      786
Impacted Files Coverage Δ
x/slashing/genesis.go 0% <0%> (ø) ⬆️
x/slashing/internal/types/keys.go 0% <0%> (ø) ⬆️
x/slashing/internal/keeper/signing_info.go 77.19% <100%> (+9.62%) ⬆️
x/slashing/internal/types/votearray.go 33.33% <33.33%> (ø)
x/slashing/internal/keeper/infractions.go 82.35% <83.33%> (+0.68%) ⬆️
x/slashing/internal/types/bitarray.go 93.33% <93.33%> (ø)

@dshulyak
Copy link
Contributor Author

dshulyak commented Jan 6, 2020

btw this change can be reviewed, there is only a minor warning from golang bot

@alexanderbez
Copy link
Contributor

Thanks @dshulyak -- we'll do our best to get to reviewing this shortly.

Copy link
Contributor

@alexanderbez alexanderbez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dshulyak I want to apologize for the prolonged delay in reviewing this important PR, we're very appreciative of your time and willingness to contribute to the SDK, especially in such an integral piece of the framework.

Our resources have been fairly limited lately and we needed to focus on getting 0.38 out the door and working out the path forward with encoding, etc..

That being said, this PR is great! The benchmarks provided are inspiring too! Overall, I think the PR accomplishes what we need it to. I mainly left feedback around nomenclature and API design, but otherwise, it LGTM! I would like to get this merged ASAP as well because we're in the process of migrating all modules to protobuf.

It would also probably be of benefit if we had @cwgoes and/or @ValarDragon review as well.

@@ -9,7 +9,6 @@ import (
"github.com/cosmos/cosmos-sdk/x/slashing/internal/types"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this file only tracks liveness now, we should rename it to liveness.go I think.

@@ -31,20 +30,29 @@ func (k Keeper) HandleValidatorSignature(ctx sdk.Context, addr crypto.Address, p
index := signInfo.IndexOffset % k.SignedBlocksWindow(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SignedBlocksWindow is called multiple times. Let's store it in a variable and use that instead 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be even better to load SignedBlockWindow and MinSignedBlocksWindow outside of the loop, and reuse for all validators. I had that change somewhere in older branch, and wanted to submit once this is merged

@@ -31,20 +30,29 @@ func (k Keeper) HandleValidatorSignature(ctx sdk.Context, addr crypto.Address, p
index := signInfo.IndexOffset % k.SignedBlocksWindow(ctx)
signInfo.IndexOffset++

votearray := k.GetVoteArray(ctx, consAddr)
if votearray == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this be the case? If the validator existed at genesis, then we should obviously populate it there. If the validator was introduced into the voting set later on, then it should be created then via a hook. By the time we're in this method, the vote array should exist, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, i didn't notice that there was a hook

votearray := k.GetVoteArray(ctx, consAddr)
if votearray == nil {
votearray = types.NewVoteArray(k.SignedBlocksWindow(ctx))
} else if votearray.Size() != k.SignedBlocksWindow(ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would this ever be the case? Via a parameter change proposal? If so, can you document that here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the previous slashing code did correctly handle param-change-proposals for signed-blocks-window, so that may be what this is for. Agreed on documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is what i was thinking about

// That way we avoid needing to read/write the whole array each time
previous := k.GetValidatorMissedBlockBitArray(ctx, consAddr, index)
vote := votearray.Get(index)
previous := vote.Missed()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always found this variable to be confusing whenever I read this code. I think this helps:

Suggested change
previous := vote.Missed()
missedPrev := vote.Missed()


"github.com/stretchr/testify/require"
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test that creates an array, loops over n (doesn't matter the value) and sets/clears each index, instead of just set/clear a few single indexes?

"fmt"
)

// NewVoteArrayiFromBytes creates VoteArray from stored bytes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments earlier on BitArray

store := ctx.KVStore(k.storeKey)
bz := k.cdc.MustMarshalBinaryLengthPrefixed(missed)
store.Set(types.GetValidatorMissedBlockBitArrayKey(address, index), bz)
value := store.Get(types.GetValidatorMissedBlockBitArrayKey(address))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API nomenclature (keys, methods, etc...) is now mumbled -- VoteArray and BitArray. Let's pick one and stick with it.

// NewVoteArrayiFromBytes creates VoteArray from stored bytes.
func NewVoteArrayFromBytes(value []byte) *VoteArray {
bitarray := BitArray(value)
return &VoteArray{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we return a reference?

v.bitarray = bitarray
}

// Vote can be used to check current state of the vote at a given index or update it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a reason to have a Vote type at all -- it redundant IMHO.

You have a VoteArray which internally just wraps a BitArray, then on the VoteArray you return a Vote, which internally references the parent VoteArray which then references the BitArray. I think this is a bit confusing and obscuates the API.

Just have a single type -- VoteArray. On this VoteArray type, provide the necessary methods:

// ...
func (va VoteArray) Signed() bool
func (va VoteArray) Missed() bool
func (va VoteArray) Sign()

Also, now that I read this more, votearray may not be the best name for this type. Perhaps SignArray?

@alexanderbez alexanderbez added T: State Machine Breaking State machine breaking changes (impacts consensus). T: Performance Performance improvements labels Feb 7, 2020
Copy link
Contributor

@cwgoes cwgoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK.

Note that this will impose some upper limit on the length of the downtime slashing window, since we now load/store the entire bit array (which has size proportional to the length of the downtime slashing window) each time. In practice it seems like the concrete gains (plus the ability to cache) are well worth it for the current window, just something to keep in mind.

Generally I agree with @alexanderbez's comments, I don't have strong preferences on the datatypes.

votearray := k.GetVoteArray(ctx, consAddr)
if votearray == nil {
votearray = types.NewVoteArray(k.SignedBlocksWindow(ctx))
} else if votearray.Size() != k.SignedBlocksWindow(ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the previous slashing code did correctly handle param-change-proposals for signed-blocks-window, so that may be what this is for. Agreed on documentation.

require.Nil(t, keeper.GetVoteArray(ctx, sdk.ConsAddress(Addrs[0])))
}

func TestValidatorMissedBlockBitArrrayPersisted(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func TestValidatorMissedBlockBitArrrayPersisted(t *testing.T) {
func TestValidatorMissedBlockBitArrayPersisted(t *testing.T) {

@ValarDragon
Copy link
Contributor

ValarDragon commented Feb 9, 2020

Whats happening here is we are currently facing high data storage overheads (200MB) due to the data overhead of hashes. See #2562 (comment). However the I/O activity of this is actually quite decent, it just requires updating 100 merkle paths. So merging the validators together into one struct ought to fix this.

This change, as I understand it, proposes what I originally thought was happening in the original post of the same issue, where we essentially read/write 400kb (800kb of total I/O activity per block). It then merges them all into one struct. This seems like it ought to be worse, and the reason its performing concretely better is the merging, not the loading/writing an entire bit-array every block.

I would actually recommend against this PR if it loads the entire bitarray every block. Its much better to have a single bitarray with one state entry per block as before, and just merge all validators into the same struct.

The optimal thing AFAIK is what I propose in the original post of that issue. (#2562)

@alexanderbez
Copy link
Contributor

Good point @ValarDragon and @cwgoes -- having a relative cap on the window size due to the underlying implementation may not be worth it. Perhaps we should go back to the original implementation and update a single struct which we set at the end of the iteration as @ValarDragon suggests.

@dshulyak
Copy link
Contributor Author

dshulyak commented Feb 10, 2020

I guess #2562 wasn't clear to me, and I decided that the suggestion was to partition bit array for a single validator into multiple bit arrays. e.g. if you have bit array of size 1kb you can partition it into two of 512b and locate any part of it efficiently based on the current block in the window.

Regarding storing single vote array per block in the window, i tried that, hence Vote/VoteArray so that Vote can be passed separately to a function. There were several implementation issues, can't recall them all, but if you want to jail a validator and reset all validator votes then you will have to load bit arrays for whole window and set relevant bit to zero for the validator. When new validator is created you have to ensure that all arrays have sufficient space. Also, if position of the validator in the bit array is not deterministic, e.g. there is some randomized sortition to select validator set per block then whole idea won't work.

So because of this complexity i decided to go for a bit array per validator, and partition into multiple bit arrays if unbounded capacity will be an issue. Note, that i don't think that this will be an issue, 10000 votes fit into one kb, loading 1kb for 100 validators per block is reasonable and fast. Worst case 100kb (200kb) without all inner nodes that will be loaded (e.g. ~100b per inner node). Do you expect it to grow 100 times? If thats the case then i would partition space into multiple bit arrays.

I see that all votes per block may seem optimal, but there will be almost no noticeable difference between total IO of 50kb or 800kb, the problem that needs to be solved is amplification of reads/writes in kv store because of the tree growth.

I don't have time right now to make a solution based on comments, so i will leave this change for a while in case there would be any other comments and then close.

@dshulyak
Copy link
Contributor Author

@ValarDragon

This seems like it ought to be worse, and the reason its performing concretely better is the merging, not the loading/writing an entire bit-array every block.

somewhat confusing, what ought to be worth? as you mention single update inserts 100 merkle inner nodes, single inner node is lets say 100 bytes (2 hashes, 2 integers, key preimage). Thats 100 writes with 100 bytes each, note that it can't be compared with single 10kb write.

maybe you are thinking about storing not an actual "bit array" but rather some sort of a mapping for each block, e.g. map[address]bool, where value is voted/missed. makes some things easier, but still will require to load bit arrays for whole window in some cases.

on the surface proposed change will create more leafs, current change creates one leaf per validator, and new approach will create leaf per block in the window (1k vs 10k leafs, or 35k for mainnet). probably won't cause much issues, but still not clear to me which approach will have better performance.

i re-read the proposal in #2562, it does seem that you wanted to partition bit array for a validator into multiple bit arrays, like:

I propose instead separating more of the bit array into separate leaves. I propose that in state, per validator, we split the bit array into separate bit arrays of 4096 bits.

@dshulyak dshulyak closed this Feb 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:x/slashing T: Performance Performance improvements T: State Machine Breaking State machine breaking changes (impacts consensus).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants