-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/slashing: Use single bit array to store all non-signed validator blocks #5461
Conversation
Signed-off-by: dmitry <[email protected]>
…edBlockWindow Signed-off-by: dmitry <[email protected]>
Signed-off-by: dmitry <[email protected]>
array := types.NewVoteArray(size) | ||
addr := sdk.ConsAddress(Addrs[0]) | ||
|
||
misses := map[int64]struct{}{1: struct{}{}, 10: struct{}{}, 29: struct{}{}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File is not gofmt
-ed with -s
(from gofmt
)
misses := map[int64]struct{}{1: struct{}{}, 10: struct{}{}, 29: struct{}{}} | |
misses := map[int64]struct{}{1: {}, 10: {}, 29: {}} |
Codecov Report
@@ Coverage Diff @@
## master #5461 +/- ##
==========================================
+ Coverage 54.5% 54.53% +0.02%
==========================================
Files 318 320 +2
Lines 19131 19200 +69
==========================================
+ Hits 10428 10471 +43
- Misses 7917 7943 +26
Partials 786 786
|
btw this change can be reviewed, there is only a minor warning from golang bot |
Thanks @dshulyak -- we'll do our best to get to reviewing this shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dshulyak I want to apologize for the prolonged delay in reviewing this important PR, we're very appreciative of your time and willingness to contribute to the SDK, especially in such an integral piece of the framework.
Our resources have been fairly limited lately and we needed to focus on getting 0.38 out the door and working out the path forward with encoding, etc..
That being said, this PR is great! The benchmarks provided are inspiring too! Overall, I think the PR accomplishes what we need it to. I mainly left feedback around nomenclature and API design, but otherwise, it LGTM! I would like to get this merged ASAP as well because we're in the process of migrating all modules to protobuf.
It would also probably be of benefit if we had @cwgoes and/or @ValarDragon review as well.
@@ -9,7 +9,6 @@ import ( | |||
"github.com/cosmos/cosmos-sdk/x/slashing/internal/types" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this file only tracks liveness now, we should rename it to liveness.go
I think.
@@ -31,20 +30,29 @@ func (k Keeper) HandleValidatorSignature(ctx sdk.Context, addr crypto.Address, p | |||
index := signInfo.IndexOffset % k.SignedBlocksWindow(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SignedBlocksWindow
is called multiple times. Let's store it in a variable and use that instead 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be even better to load SignedBlockWindow and MinSignedBlocksWindow outside of the loop, and reuse for all validators. I had that change somewhere in older branch, and wanted to submit once this is merged
@@ -31,20 +30,29 @@ func (k Keeper) HandleValidatorSignature(ctx sdk.Context, addr crypto.Address, p | |||
index := signInfo.IndexOffset % k.SignedBlocksWindow(ctx) | |||
signInfo.IndexOffset++ | |||
|
|||
votearray := k.GetVoteArray(ctx, consAddr) | |||
if votearray == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this be the case? If the validator existed at genesis, then we should obviously populate it there. If the validator was introduced into the voting set later on, then it should be created then via a hook. By the time we're in this method, the vote array should exist, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, i didn't notice that there was a hook
votearray := k.GetVoteArray(ctx, consAddr) | ||
if votearray == nil { | ||
votearray = types.NewVoteArray(k.SignedBlocksWindow(ctx)) | ||
} else if votearray.Size() != k.SignedBlocksWindow(ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this ever be the case? Via a parameter change proposal? If so, can you document that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the previous slashing code did correctly handle param-change-proposals for signed-blocks-window, so that may be what this is for. Agreed on documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is what i was thinking about
// That way we avoid needing to read/write the whole array each time | ||
previous := k.GetValidatorMissedBlockBitArray(ctx, consAddr, index) | ||
vote := votearray.Get(index) | ||
previous := vote.Missed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always found this variable to be confusing whenever I read this code. I think this helps:
previous := vote.Missed() | |
missedPrev := vote.Missed() |
|
||
"github.com/stretchr/testify/require" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a test that creates an array, loops over n
(doesn't matter the value) and sets/clears each index, instead of just set/clear a few single indexes?
"fmt" | ||
) | ||
|
||
// NewVoteArrayiFromBytes creates VoteArray from stored bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments earlier on BitArray
store := ctx.KVStore(k.storeKey) | ||
bz := k.cdc.MustMarshalBinaryLengthPrefixed(missed) | ||
store.Set(types.GetValidatorMissedBlockBitArrayKey(address, index), bz) | ||
value := store.Get(types.GetValidatorMissedBlockBitArrayKey(address)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API nomenclature (keys, methods, etc...) is now mumbled -- VoteArray and BitArray. Let's pick one and stick with it.
// NewVoteArrayiFromBytes creates VoteArray from stored bytes. | ||
func NewVoteArrayFromBytes(value []byte) *VoteArray { | ||
bitarray := BitArray(value) | ||
return &VoteArray{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we return a reference?
v.bitarray = bitarray | ||
} | ||
|
||
// Vote can be used to check current state of the vote at a given index or update it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason to have a Vote
type at all -- it redundant IMHO.
You have a VoteArray
which internally just wraps a BitArray
, then on the VoteArray
you return a Vote
, which internally references the parent VoteArray
which then references the BitArray
. I think this is a bit confusing and obscuates the API.
Just have a single type -- VoteArray
. On this VoteArray
type, provide the necessary methods:
// ...
func (va VoteArray) Signed() bool
func (va VoteArray) Missed() bool
func (va VoteArray) Sign()
Also, now that I read this more, votearray may not be the best name for this type. Perhaps SignArray
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK.
Note that this will impose some upper limit on the length of the downtime slashing window, since we now load/store the entire bit array (which has size proportional to the length of the downtime slashing window) each time. In practice it seems like the concrete gains (plus the ability to cache) are well worth it for the current window, just something to keep in mind.
Generally I agree with @alexanderbez's comments, I don't have strong preferences on the datatypes.
votearray := k.GetVoteArray(ctx, consAddr) | ||
if votearray == nil { | ||
votearray = types.NewVoteArray(k.SignedBlocksWindow(ctx)) | ||
} else if votearray.Size() != k.SignedBlocksWindow(ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the previous slashing code did correctly handle param-change-proposals for signed-blocks-window, so that may be what this is for. Agreed on documentation.
require.Nil(t, keeper.GetVoteArray(ctx, sdk.ConsAddress(Addrs[0]))) | ||
} | ||
|
||
func TestValidatorMissedBlockBitArrrayPersisted(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func TestValidatorMissedBlockBitArrrayPersisted(t *testing.T) { | |
func TestValidatorMissedBlockBitArrayPersisted(t *testing.T) { |
Whats happening here is we are currently facing high data storage overheads (200MB) due to the data overhead of hashes. See #2562 (comment). However the I/O activity of this is actually quite decent, it just requires updating 100 merkle paths. So merging the validators together into one struct ought to fix this. This change, as I understand it, proposes what I originally thought was happening in the original post of the same issue, where we essentially read/write 400kb (800kb of total I/O activity per block). It then merges them all into one struct. This seems like it ought to be worse, and the reason its performing concretely better is the merging, not the loading/writing an entire bit-array every block. I would actually recommend against this PR if it loads the entire bitarray every block. Its much better to have a single bitarray with one state entry per block as before, and just merge all validators into the same struct. The optimal thing AFAIK is what I propose in the original post of that issue. (#2562) |
Good point @ValarDragon and @cwgoes -- having a relative cap on the window size due to the underlying implementation may not be worth it. Perhaps we should go back to the original implementation and update a single struct which we set at the end of the iteration as @ValarDragon suggests. |
I guess #2562 wasn't clear to me, and I decided that the suggestion was to partition bit array for a single validator into multiple bit arrays. e.g. if you have bit array of size 1kb you can partition it into two of 512b and locate any part of it efficiently based on the current block in the window. Regarding storing single vote array per block in the window, i tried that, hence Vote/VoteArray so that Vote can be passed separately to a function. There were several implementation issues, can't recall them all, but if you want to jail a validator and reset all validator votes then you will have to load bit arrays for whole window and set relevant bit to zero for the validator. When new validator is created you have to ensure that all arrays have sufficient space. Also, if position of the validator in the bit array is not deterministic, e.g. there is some randomized sortition to select validator set per block then whole idea won't work. So because of this complexity i decided to go for a bit array per validator, and partition into multiple bit arrays if unbounded capacity will be an issue. Note, that i don't think that this will be an issue, 10000 votes fit into one kb, loading 1kb for 100 validators per block is reasonable and fast. Worst case 100kb (200kb) without all inner nodes that will be loaded (e.g. ~100b per inner node). Do you expect it to grow 100 times? If thats the case then i would partition space into multiple bit arrays. I see that all votes per block may seem optimal, but there will be almost no noticeable difference between total IO of 50kb or 800kb, the problem that needs to be solved is amplification of reads/writes in kv store because of the tree growth. I don't have time right now to make a solution based on comments, so i will leave this change for a while in case there would be any other comments and then close. |
somewhat confusing, what ought to be worth? as you mention single update inserts 100 merkle inner nodes, single inner node is lets say 100 bytes (2 hashes, 2 integers, key preimage). Thats 100 writes with 100 bytes each, note that it can't be compared with single 10kb write. maybe you are thinking about storing not an actual "bit array" but rather some sort of a mapping for each block, e.g. map[address]bool, where value is voted/missed. makes some things easier, but still will require to load bit arrays for whole window in some cases. on the surface proposed change will create more leafs, current change creates one leaf per validator, and new approach will create leaf per block in the window (1k vs 10k leafs, or 35k for mainnet). probably won't cause much issues, but still not clear to me which approach will have better performance. i re-read the proposal in #2562, it does seem that you wanted to partition bit array for a validator into multiple bit arrays, like:
|
Instead of storing each index of the bit array as a separate key in iavl tree, this change stores single bit array for each validator. This makes iavl tree smaller, and greatly reduces time spent in a queries to leveldb.
To profile this change i cherry-picked it on current version used in gaia, and compared cpu profile with and without this change. You can find gaia branch with cherry-picked change at https://github.com/dshulyak/gaia/tree/bitarray-v2.0.3. It comes with two changes:
I am attachiing web view for cosmos-sdk.BeginBlock so that you can see difference with and without these changes:
without , with
If someone will need full profile i can attach it, but it is also possible to reproduce same results using gaiad branch i linked before.
changes
todo
For contributor use:
docs/
) or specification (x/<module>/spec/
)godoc
comments.Unreleased
section inCHANGELOG.md
Files changed
in the Github PR explorerFor admin use:
WIP
,R4R
,docs
, etc)