Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Manage child trie content independently #4827

Closed
wants to merge 135 commits into from

Conversation

cheme
Copy link
Contributor

@cheme cheme commented Feb 4, 2020

This PR is a refactoring of child trie transaction.

Before this PR as single transaction of encoded trie node was produced by state-machine, packing the node from the top trie and the child trie without distinction.
As a result, prefixing rocksdb key by a unique id for child trie was done in state machine crate.

This PR move those key prefixing to client/db/src/lib.rs and split all transaction content between child trie.

We move from a single hashmap of key value to a btreemap of hashmaps of key values, but there will be a less data redundancy.

Main changes:

  • all code prepending unique id for child trie is now in client/db/src/lib.rs, there is no longer KeySpacedDB plugged in an artificial way in state machine.
  • child info has been rework by removing 'OwnedChildInfo', at some point I did implement a borrow implementation (this commit removing it is best to check impl : cheme@0d45d85 ) but it was probably bad design.
  • top trie can be seen as a child trie with empty unique id. To keep thing sane this change do not touch directly state-machine where we keep a top_storage and dedicated/duplicated api: that way things do not change much except that child api can now redirect to parent api when it is safe (currently no runtime can call a empty unique id so this is only changing rpc behavior in a way that does not seem problematic). (see ext.rs).
  • A hasher trait with constant is added, this should be moved to hashdb crate (then it will cover all memorydb instance, not only this specific state machine case), but seems fine here for the time (branch with hashdb possible change : https://github.com/cheme/trie/tree/const_empty ).
- change of format for journal of block change and pruning cc @arkpar : - the new format just use a different prefix, so if any change happen we would just change this prefix again. Alternate design would be to include a versioning information in the encoded journal. - the old format is still in use, this costs a double query to fetch old journal. I do not think this is very problematic in the journal case, an alternate way of doing thing would be to migrate all existing journals. I am not sure if it is useful at this point (would need rencode/rewrite of everything to avoid a double query), maybe it is fast, though it should need some testing could be done later.
  • change trie is not using this split: the code do not need key isolation as it is prefixed to avoid any key collision conflict. So in this part of code everything run in top trie.
  • proof is still running without child trie handle, everything is in a top trie, if at some point we got different child trie type the split will be needed. It should be possible to split proof, this would be useful for proof compaction (at this point compacting proof of a call will require to isolate each trie payload by key prefix, that is far from ideal). I think this should switch, but it will break light client proof and cumulus witness, so I keep this question out of the scope of this PR. CC @bkchr @jimpo

Minor changes

  • full storage root can return child_infos for technical reason.

Even if this PR mainly touches internals, it requires changes to polkadot (cheme/polkadot-1@b94eb46 , pr will be created later).

cheme added 30 commits January 28, 2020 09:45
Note that KeyspacedDB are still use.
KeyspacedDBMut only for test.
allocation whise unless we break all api for a close to nothing perf
change: switching to simple single child info struct.
@arkpar
Copy link
Member

arkpar commented Apr 24, 2020

In #5769 I've made prefixing optional for the database that does not require them.

PrefixedMemoryDB that currently acts as the trie transaction does indeed mix keys and prefixes right away. A better implementation would be to keep them separated and let the client-db decide whether to use prefixes or not. I would not bother changing that though, we'll probably drop support for rocksdb at some point, or implement reference counting at the lower level. The long term plan is to remove rocksdb and prefixing all together.

@cheme
Copy link
Contributor Author

cheme commented Apr 25, 2020

In #5769 I've made prefixing optional for the database that does not require them.

So the way it works in #5769 is that you switch at client level the storage accessed by the state machine when querying.

And you also need to probably use a MemoryDB overlay in state machine rather than a PrefixedMemory one (actually you drop of additional content from key in client, which also work but is a bit more costy but code is way more readable than having an additional types everywhere). Ok just realize that it was in your comment :)

My first version of this PR (the one where child info was put in the journals), acts rather similarly by adding the child trie prefix at the place where #5769 is removing it (but that is can not achieve what #5769 does because the trie prefixes are already here).

So the question at this point is 'is this PR of any use' (I am talking from the perspective of the first version where child info was written in journals).
The goods

  • separation of child trie contents: this is not really necessary at this point but if at any time we want a child trie with different hash or a child trie storage that does not work as a merkle trie, or even using different db column depending on child trie (different hash length on a db that do not support that, or for performance purpose), this can prove usefull. The point is that it is a bit of refactoring.

The bads

  • btreemap everywhere, new journal storage format.

TLDR; this PR split all data payload in order to allows the db to use child trie definition, and more generally allow db specific process for child trie as in #5280.

So even if we decide to drop #5280, this could still be interesting, (if dropped, I do not know if it will be easy to bring back to life).
There is also some code improvement that can be extracted (but I see some similar thing in #5769 regarding ephemeral).

It could make sense to drop those two pr (just wish we did this when they were drafted and got no real alternative for the rent use case).

@arkpar
Copy link
Member

arkpar commented Apr 27, 2020

separation of child trie contents: this is not really necessary at this point but if at any time we want a child trie with different hash or a child trie storage that does not work as a merkle trie, or even using different db column depending on child trie (different hash length on a db that do not support that, or for performance purpose), this can prove usefull. The point is that it is a bit of refactoring.

I would not complicate things now because of this potential future implementation. If at some point separate storage for child trees would be required, some changes to the backend would still be needed. E.g. rocksdb does not really support dynamic columns, and neither does parity-db at this point. Also, it might make sense to reuse the existing code, rather than extend it. I.e have a separate instance of StateDb for each chile tree.

@kianenigma kianenigma removed their request for review April 27, 2020 11:45
@cheme
Copy link
Contributor Author

cheme commented Apr 27, 2020

have a separate instance of StateDb for each chilie tree.

Yes, that is a different way to do it, then state-machine is the only abstraction that cannot in my opinion be split (need to share info between child trie).
The glue between top trie and child trie is mainly full_storage_root, and it is in Backend (for proof we consume backend).
But if we consider moving it upward in externality and create a new ProofProcessing ext (currently proof are run over TrieStorageBackend Backend impl).
We would have one Ext containing multiple overlay change and multiple backend with possibly Statedbs as a backends.
That seems like a bigger step from existing code (the code internally was managing child trie content into overlaydb and got a single backend using multiple child trie as needed).
Statedb being directly use as a backend then we could have one statedb by child trie.
It looks like a lot more code change, but indeed better architecture, I remember trying to go in this direction and being blocked (probably on the proof code, but I feel like it is doable).

In #4938 I did rollback these pr change and there was a single place where I needed to keep using child_info as storage input and it was TrieBackendStorage. This is because it is the backend for proof production and I needed to split proof payload for compaction. Doing with multiple backend and multiple proof recorder seems possible (in #4938 I kept the possibility to use a flat proof recorder for perf but it does not make much sense, but generally #4938 is allowing too many configuration but at this point it is mainly for testing the proof compaction).

Edit: maybe not a 'lot' more change, but a lot more state-machine change and no client change (not sure how big is the current client changes).

@gnunicorn gnunicorn marked this pull request as draft April 28, 2020 16:36
@gnunicorn
Copy link
Contributor

converted to draft as this is still in progress.

@cheme
Copy link
Contributor Author

cheme commented Jun 11, 2020

Closing this as it all depends on replies about question in #5280 (comment) and there is no use in keeping two pr frozen behind this, will reopen if 5280 approach get some kind of positive feedback and we proceed this way.

@cheme cheme closed this Jun 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A3-in_progress Pull request is in progress. No review needed at this stage.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants