-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposals to increase state modification performance #4310
Comments
Neat! Thanks for writing up this proposal @mossid! Here are some thoughts of mine: flushkvThis seems like a neat idea, but aren't we now also incurring the storage overhead of As to the drawbacks of checkpoint idea:
Can we really guarantee this? What happens if we don't persist? How do we revert to the previous checkpoint? Sounds complicated. batched valuesThis seems like a more efficient, yet more complicated approach. My fear here is that there is necessary changes to how modules wish to define and set their "segments". This will create a greater burden on SDK developers and users because they'd have to a decent understand of a such a concept. Perhaps it's best if we can come up with a solution that abstracts away everything from the end developer. persinterfI think this is something we can actually do! But does this save us any time complexity on re-merklizing the tree? Also, since we keep the cached items between blocks possibly, do we have to worry about stale data or are the key, values replaced? Also, how large do we keep the cache and for how long to objects live? Is it a fixed-size LRU cache? /cc @cwgoes |
flushkvYes, there will be storage overhead for I think I misexplained the checkpoint concept. Checkpointing is necessary on the flushkv store. If we don't checkpoint at all every node has to run the whole blockchain if the node process was closed. , No matter the rest of the state is stored in the disk or not. If we checkpoint for every 100 blocks, if the node stopped on height 1150, it only has to run the blocks from 1100 again. State dumping is another functionality, separated from the checkpointing. Checkpointing is expected to be done for every validator for some interval of blocks and is part of the state machine, but state dumping is literally dumping all states when a node stops. If state dump failed, it can still run from the latest checkpoint. The latest checkpoint is just an iavl tree, but only with a past version, so can be easily loaded. The rootmultistore has to be updated to check the loadable versions of the trees before actually loading. bached valuesIf we implement this "segment" successfully, the interface for end developers may not be changed drastically. In their perspective, the store that they will get with the storekey it is just a multistore, with a large number of kvstores mounted, and each of them has a limited size to store key-value pairs. I have also considered using flatbuffers, but too complicated for our purpose. However I agree that this approach is bit complicated and may cause client side overhead. I don't think this is for general use cases, but rather a special feature for high-io application developer who has a prior understanding about the sdk. persinterfSo in this benchmark, the data is defined as protobuf, and persinterf actually worked slower when not used with batch/flushkv. But we are using amino, so it would work in our case. This does not save remerklizing time, only for marshal/unmarshal/get. The benchmark assumed that the keys are fixed size set, but we can easily overlay the LRU cache on it, I guess. For now, it persists, forever growing. |
Seems like the flushkv is the best of the suggested approaches, perhaps with a few small alternations. Also, we should keep in mind how this will play with the upcoming state sync work. |
Cross linking to meta issue re performance concerns: #2131 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
closing this as it is covered in store v2 work that wil begin soon |
The current SDK state uses IAVL tree, which is backed by goleveldb. IAVL tree is a Merkle tree working with a Key-Value mapping interface, and goleveldb is a Key-Value database which lives on disk. There are two major bottlenecks on the current workflow
Marshal/Unmarshal is a conversion between language-native objects and universal format byte slices. Certain data has to be marshaled and unmarshaled in each transaction (more specifically each deliverTx), even it was marshaled and unmarshaled before within the same block.
Get/Set is database access. A byte slice can be set into the disk, and can be read. In IAVL implementation, get can be done with binary searching and does not need to modify the existing nodes, but set requires remerklizing all parent nodes, taking much more time. This increases logarithmically, depending on the depth, so having more key-value pairs in it means more time to remerklize.
There are three methods that possibly can solve these problems, without modifying IAVL or underlying database, rather purely on the SDK side.
All of the methods are expecting a fixed-size set of key-value pairs which is updated very frequently, almost every block. Data that is being updated on
EndBlocker
is a good example.The benchmark assumes each value has the size of 200bytes and the total number of key-value pairs is 160000, total size 200*160000=32MB.
flushkv
flushkv.Store
consists of two parentCommitKVStore
, each calledmain
andhash
. The store also hascache
, which is cache wrappedmain
. OnKVStore
method call, it forwards the call tocache
andhash
.When
Commit
is called, it first checks the current height and if it is a multiple of predefinedflushRate
thenCommit
s bothcache
andhash
. If not, it only commitshash
.In summary,
flushkv.Store
only stores the hash of each values in order to reduce the setting time, and stores the full value only on the checkpoint heights. The full values are always stored in memory(by cached main), so it can be queried and be used by the app.The downside is that when the node process stops the data stored in the cache is removed, so the node has to start state machine execution from the last checkpoint. However, we can just make the application dump the current cache to the disk when it stops by an interrupt, or make the flushRate small enough.
The benchmark shows about 30% of time reduce.
batched values
If the values are going to be updated frequently, then having it as a single struct will be more efficient, because setting one large key-value pair is faster than setting multiple small key-value pairs. However, it is hard to simply concatenate unrelated objects, as it requires the client to query the whole batch which contains a lot of unnecessary information.
One way is to treat a
KVStore
as aMultiStore
. Each leaf of the IAVL store represents a batched key-value pairs, not a single pair.I haven't researched how we can effectively batch the data, as it is a proposal, not an implementation detail(yet). One of the ideas is forming a virtual memory segment on a byte slice. A segment consists of table space and memory space. The table is an array of pointer
{key, offset, size}
, which refers to the specific range of the memory space(space[offset:offset+size]) where the value associated with the key lives. Since the pointers have a fixed length, a user can simply binary search for the desired key, decodes the offset and size, slices the space, and only unmarshals that slice. Modules can specify which value will be stored under which segment(as it is retrieved by the sameMultiStore
interface), they can optimize by storing related data in the same segment.The benchmark shows 88% of time reduce(not considering segment overhead), where each batch contains 160 pairs, approx 32KB, with total batch number of 1000.
persinterf
persinterf.Store
is a inmemory persistent interface store.type InterfaceStore
has a similar interface withKVStore
, but having different method forGet
andSet
:where
type Interface
has the methodsMarshal
andUnmarshal
.persinterf.Store
persists the pairs in its fieldcache map[string]types.Interface
, where it is copied to the pointer if the requested key exists in the cache. If not, it reads the value from the parent store and caches it.On commit, unlike cachestores, it does not clear the cache, but keep holding it. This essentially works as an inter-block object cache. Excepts for that, the store works identical with IAVL store, and does not alter the appHash when applied.
The benchmark shows 40% time reduce
The text was updated successfully, but these errors were encountered: