-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Conversation
Note that KeyspacedDB are still use. KeyspacedDBMut only for test.
switching to a btreemap.
allocation whise unless we break all api for a close to nothing perf change: switching to simple single child info struct.
I did start implementing the possibility to remove deleted nodes from proof like describe in previous comment master...cheme:proof_backends . The code is incomplete (no handle of child trie prefix deletion), and need many tests, but enough to formulate a few interesting observation. I end up with three kind of proof construction and two kind of proof verification.
This should be the default choice unless we need to use the transactions (performance).
The implementation simply switch of the proof node recording during prefixed key access and during child trie key access on delete prefix and delete child trie operation. Both variant needs to be verified with a verifier that do not query child trie nodes on deletion, that is the same change in processsing than the first variant.
Interesting for testing. So my gist out of this branch would be to say that it is doable to skip unneeded deleted content in proof, but rely on the assumption that change trie root will never be read during block processing. In both scenario (the linked branch and this PR approach), delete by prefix optimization would require additional tools (radix trie map and specific trie update operation, plus some reverse deathrow logic on non archive node (this would be needed for bulk child trie deletion too if we would allow using a child trie again after deletion)). |
It is frozen, I need a feedback on wether we want to make child trie deletion (and by extension possibly prefix deletion) a first class citizen operation (currently we got insert key value and delete key in changesets). I mean @rphmeier did reply that we would not include child trie key in proofs, so I started implemented that, but end up having complicated redundant code (state machine need to behave differently depending on context), and I still think that registering a specific child trie deletion operation is the easier and cleaner thing to do. From a quick look at the changes, things that could be trimmed:
Only way I see to avoid child info in client would be to create a new storage change that contains the bulk child deletion with either keyspace for rocksdb delete or fetched child trie node hash for parity db (but it would means we fetch child trie content at the time of block processing which makes things awkward with proof, and make use store useless journal of key when we only need root). |
I still think subtree deletion should not leak to state-db abstraction. It should really be backend agnostic, and not rely on prefixes or backend-provided identifiers. ParityDb uses reference-counting, so that same node could be shared by multiple tries. Saving a lot of space potentially. This does not play well with prefixed keys. The most generic way to implement it is to walk the tree and delete each individual key. Avoiding any special handling of this operation. I'd suggest we don't do any DB-optimized deletion until it is shown to be a problem in practice. Regarding proofs: Pausing recording during the deletion operation looks like the right thing to do. It is an optimization though. The validator would not access nodes when it validates deletion, so there's no need to provide them. I'm not sure I understand the issue with the change trie. How does skipping collecting proof for deletion affects the change trie root? As far as I know, change tries are only implemented for the main tree. The long-term plan for ParityDB is to evolve to support trie-related operations. That would involve not just deletion, but also reference-counted insertion and iteration at the DB level. Until then we'd better keep it simple. |
This is basically what we have now, right? You argue for keeping child trie deletion a |
The PR do not use prefixed keys in this case: for rocksdb the drop tree operation contains both child trie root and keyspace, then only in the case of rocksdb we try to apply a delete by prefix, parity-db still iterate on trie but only when pruning is called.
Using trie iteration for rocksdb (the same way as parity-db) seems doable, it will then only helps with the previous point.
That is what I did in master...cheme:proof_backends , it means that there is different mode of execution for state-machine, one that produce the transaction payload, and one that don't (IIRC I did add a third one for doing both (tx payload and proof not recording ), I believe it is what is needed for cumulus).
Change trie root can be calculated from a storage host function so if a parachain uses it in its runtime, it would need to include all deletion in its proof (so running the proof add all the child trie deletion in change trie).
I reply to quickly to previous point, change trie do also include child trie changes.
Sounds interesting, I will try to keep looking at parity-db branches and issues if there is. |
Note that in the parity-db case we still got a O(n) trie parsing removal operation. Regarding the perf gains the trade off seems fine. |
The issue I have with |
It is O(n) best case anyway. The only difference is with rocksdb you get O(n) on commit, and not on the actual delete operation. Unless you keep each subtree in a separate database. We can indeed optimize to perform it in the background on the database level but I'd argue it is too early for that. |
I think it does not make much sense for change tries to collect all removed keys for subtries. If it does, there's your O(n) right there and there's little sense to optimize it in the database, since you have to do iteration anyway. |
I see. Thanks for your answers. We then need cope with the fact that eviction is linear time operation. |
From my current understanding if we cannot get around the fact that contract eviction is linear time we should at least make it possible to perform the clean-up lazily or defer it when possible or break it up into chunks of clean-up. Otherwise we end up having no way to clean-up contract instances that are especially big. Might this even be an attack vector in certain cases? Maybe there are other operations that need to be deferred or chunked up into pieces to be distributed among several blocks so that we could find a generalized solution to this problem? |
closing because of inactivity. |
Using a specific child trie delete operation does inherently cover this point. Closing note: |
Also for #6594 having a single child trie delete info send back to the client makes the non O(1) cost only happen at pruning (or never for archive nodes), and it could be easier to implement this way. |
Note, this PR lacks a correct trie root calculation in state-machine, code such as https://github.com/cheme/trie/blob/379bf92720e23b2465badd4739c7d2ba75115c06/trie-db/src/traverse.rs#L658 (old code please don't use), mainly have a detach branch from trie action (can be implemented in triedbmut too, but more involve). |
This Pr is built upon #4827.
It implements bulk child deletion: instead of deleting every element of a child trie one by one with
kill_child_storage
, the deletion is done as a whole.This means:
This PR depending on #4827 first, I put its status in progress, but it should be usable for contract testing cc @Robbepop @pepyakin .