Manage child trie content independently #4827

cheme · 2020-02-04T19:03:53Z

This PR is a refactoring of child trie transaction.

Before this PR as single transaction of encoded trie node was produced by state-machine, packing the node from the top trie and the child trie without distinction.
As a result, prefixing rocksdb key by a unique id for child trie was done in state machine crate.

This PR move those key prefixing to client/db/src/lib.rs and split all transaction content between child trie.

We move from a single hashmap of key value to a btreemap of hashmaps of key values, but there will be a less data redundancy.

Main changes:

all code prepending unique id for child trie is now in client/db/src/lib.rs, there is no longer KeySpacedDB plugged in an artificial way in state machine.
child info has been rework by removing 'OwnedChildInfo', at some point I did implement a borrow implementation (this commit removing it is best to check impl : cheme@0d45d85 ) but it was probably bad design.
top trie can be seen as a child trie with empty unique id. To keep thing sane this change do not touch directly state-machine where we keep a top_storage and dedicated/duplicated api: that way things do not change much except that child api can now redirect to parent api when it is safe (currently no runtime can call a empty unique id so this is only changing rpc behavior in a way that does not seem problematic). (see ext.rs).
A hasher trait with constant is added, this should be moved to hashdb crate (then it will cover all memorydb instance, not only this specific state machine case), but seems fine here for the time (branch with hashdb possible change : https://github.com/cheme/trie/tree/const_empty ).

- change of format for journal of block change and pruning cc @arkpar : - the new format just use a different prefix, so if any change happen we would just change this prefix again. Alternate design would be to include a versioning information in the encoded journal. - the old format is still in use, this costs a double query to fetch old journal. I do not think this is very problematic in the journal case, an alternate way of doing thing would be to migrate all existing journals. I am not sure if it is useful at this point (would need rencode/rewrite of everything to avoid a double query), maybe it is fast, though it should need some testing could be done later.

change trie is not using this split: the code do not need key isolation as it is prefixed to avoid any key collision conflict. So in this part of code everything run in top trie.
proof is still running without child trie handle, everything is in a top trie, if at some point we got different child trie type the split will be needed. It should be possible to split proof, this would be useful for proof compaction (at this point compacting proof of a call will require to isolate each trie payload by key prefix, that is far from ideal). I think this should switch, but it will break light client proof and cumulus witness, so I keep this question out of the scope of this PR. CC @bkchr @jimpo

Minor changes

full storage root can return child_infos for technical reason.

Even if this PR mainly touches internals, it requires changes to polkadot (cheme/polkadot-1@b94eb46 , pr will be created later).

Note that KeyspacedDB are still use. KeyspacedDBMut only for test.

switching to a btreemap.

issue).

allocation whise unless we break all api for a close to nothing perf change: switching to simple single child info struct.

…o child_trie_w3_change

This reverts commit 619b454.

arkpar · 2020-04-24T18:21:28Z

In #5769 I've made prefixing optional for the database that does not require them.

PrefixedMemoryDB that currently acts as the trie transaction does indeed mix keys and prefixes right away. A better implementation would be to keep them separated and let the client-db decide whether to use prefixes or not. I would not bother changing that though, we'll probably drop support for rocksdb at some point, or implement reference counting at the lower level. The long term plan is to remove rocksdb and prefixing all together.

cheme · 2020-04-25T08:33:46Z

In #5769 I've made prefixing optional for the database that does not require them.

So the way it works in #5769 is that you switch at client level the storage accessed by the state machine when querying.

And you also need to probably use a MemoryDB overlay in state machine rather than a PrefixedMemory one (actually you drop of additional content from key in client, which also work but is a bit more costy but code is way more readable than having an additional types everywhere). Ok just realize that it was in your comment :)

My first version of this PR (the one where child info was put in the journals), acts rather similarly by adding the child trie prefix at the place where #5769 is removing it (but that is can not achieve what #5769 does because the trie prefixes are already here).

So the question at this point is 'is this PR of any use' (I am talking from the perspective of the first version where child info was written in journals).
The goods

separation of child trie contents: this is not really necessary at this point but if at any time we want a child trie with different hash or a child trie storage that does not work as a merkle trie, or even using different db column depending on child trie (different hash length on a db that do not support that, or for performance purpose), this can prove usefull. The point is that it is a bit of refactoring.

The bads

btreemap everywhere, new journal storage format.

TLDR; this PR split all data payload in order to allows the db to use child trie definition, and more generally allow db specific process for child trie as in #5280.

So even if we decide to drop #5280, this could still be interesting, (if dropped, I do not know if it will be easy to bring back to life).
There is also some code improvement that can be extracted (but I see some similar thing in #5769 regarding ephemeral).

It could make sense to drop those two pr (just wish we did this when they were drafted and got no real alternative for the rent use case).

arkpar · 2020-04-27T10:20:31Z

separation of child trie contents: this is not really necessary at this point but if at any time we want a child trie with different hash or a child trie storage that does not work as a merkle trie, or even using different db column depending on child trie (different hash length on a db that do not support that, or for performance purpose), this can prove usefull. The point is that it is a bit of refactoring.

I would not complicate things now because of this potential future implementation. If at some point separate storage for child trees would be required, some changes to the backend would still be needed. E.g. rocksdb does not really support dynamic columns, and neither does parity-db at this point. Also, it might make sense to reuse the existing code, rather than extend it. I.e have a separate instance of StateDb for each chile tree.

cheme · 2020-04-27T14:05:52Z

have a separate instance of StateDb for each chilie tree.

Yes, that is a different way to do it, then state-machine is the only abstraction that cannot in my opinion be split (need to share info between child trie).
The glue between top trie and child trie is mainly full_storage_root, and it is in Backend (for proof we consume backend).
But if we consider moving it upward in externality and create a new ProofProcessing ext (currently proof are run over TrieStorageBackend Backend impl).
We would have one Ext containing multiple overlay change and multiple backend with possibly Statedbs as a backends.
That seems like a bigger step from existing code (the code internally was managing child trie content into overlaydb and got a single backend using multiple child trie as needed).
Statedb being directly use as a backend then we could have one statedb by child trie.
It looks like a lot more code change, but indeed better architecture, I remember trying to go in this direction and being blocked (probably on the proof code, but I feel like it is doable).

In #4938 I did rollback these pr change and there was a single place where I needed to keep using child_info as storage input and it was TrieBackendStorage. This is because it is the backend for proof production and I needed to split proof payload for compaction. Doing with multiple backend and multiple proof recorder seems possible (in #4938 I kept the possibility to use a flat proof recorder for perf but it does not make much sense, but generally #4938 is allowing too many configuration but at this point it is mainly for testing the proof compaction).

Edit: maybe not a 'lot' more change, but a lot more state-machine change and no client change (not sure how big is the current client changes).

gnunicorn · 2020-04-28T16:36:28Z

converted to draft as this is still in progress.

cheme · 2020-06-11T10:15:56Z

Closing this as it all depends on replies about question in #5280 (comment) and there is no use in keeping two pr frozen behind this, will reopen if 5280 approach get some kind of positive feedback and we proceed this way.

cheme added 30 commits January 28, 2020 09:45

change from cache root pr

e97accb

Merge branch 'master' into split_child_payload

d7d69a4

Targetted way of putting keyspace.

cf6393a

Note that KeyspacedDB are still use. KeyspacedDBMut only for test.

changes to state-db

2845d0e

change transaction to be by child trie.

a0532d1

slice index fix, many failing tests.

eb5961f

fix state-db tests

67687f8

vec with multiple entry of a same rc prefixeddb did not make sense,

48df830

switching to a btreemap.

Merge branch 'master' into split_child_payload

9e9a684

change set to btreemap, seems useless (at least do no solve changetrie

cb4c4a9

issue).

moving get_1 to get, state-machine needs rework

a398b82

Resolve a bit of child trie

7b26a93

Merge branch 'split_child_payload' into split_child_payload_map2

4d1378f

small refact

f39ce3f

Merge branch 'master' into split_child_payload

bc4fe52

Merge branch 'split_child_payload' into split_child_payload_map2

f27a2ec

indent

9f0c600

Use const of null hash check on BackendStorageRef.

ecb43d0

Merge branch 'split_child_payload_map2' into split_child_payload

d76a316

Associated null node hash set to a non optional const.

ae29df5

Make ChildInfo borrow of OwnedChildInfo.

6a06c0a

Removing unsafe cast, using ref_cast asumption for borrow case.

25aaa3a

Borrow approach on OwnedChildInfo and ChildInfo did not make sense

0d45d85

allocation whise unless we break all api for a close to nothing perf change: switching to simple single child info struct.

Factoring map of children code, before switching key.

274a923

Switching children key from optional to simple ChildInfo.

2e47a1d

Merge branch 'master' into split_child_payload

328ed66

fix merge test

b07d7ca

clean todos

e5d7b04

fix

3a71669

End up removing all keypacedDB from code.

c846471

cheme and others added 21 commits April 2, 2020 19:54

revert ci changes.

4bbb24d

Merge branch 'master' into child_trie_w3_change

50c3dd7

Merge branch 'child_trie_w3_change' of github.com:cheme/substrate int…

a7c16eb

…o child_trie_w3_change

Merge branch 'master' into child_trie_w3_change

ca40040

name of children in chain spec change.

226f7cd

Merge branch 'master' into child_trie_w3_change

5ab115e

Merge branch 'master' into child_trie_w3_change

1ad4c5a

Merge branch 'child_trie_w3_change' into split_child_payload

94b78f3

remove terminal space

5938d86

Merge branch 'master' into child_trie_w3_change

30e8192

sp-io documentation changes.

bc2a198

Merge branch 'master' into child_trie_w3_change

83ff4a0

Merge branch 'master' into child_trie_w3_change

ecb1c39

Retain compatibility with network protocol.

619b454

Revert "Retain compatibility with network protocol."

fa52a8c

This reverts commit 619b454.

Merge branch 'master' into child_trie_w3_change

167f20c

Merge branch 'master' into child_trie_w3_change

928fac0

fix renamed field related error

69ed118

Merge branch 'master' into child_trie_w3_change

b79bd84

Merge branch 'child_trie_w3_change' into split_child_payload

ae3a3e1

Merge branch 'master' into split_child_payload

ee1f768

kianenigma removed their request for review April 27, 2020 11:45

gnunicorn marked this pull request as draft April 28, 2020 16:36

cheme closed this Jun 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manage child trie content independently #4827

Manage child trie content independently #4827

cheme commented Feb 4, 2020 •

edited

Loading

arkpar commented Apr 24, 2020

cheme commented Apr 25, 2020

arkpar commented Apr 27, 2020

cheme commented Apr 27, 2020 •

edited

Loading

gnunicorn commented Apr 28, 2020

cheme commented Jun 11, 2020

Manage child trie content independently #4827

Manage child trie content independently #4827

Conversation

cheme commented Feb 4, 2020 • edited Loading

Main changes:

Minor changes

arkpar commented Apr 24, 2020

cheme commented Apr 25, 2020

arkpar commented Apr 27, 2020

cheme commented Apr 27, 2020 • edited Loading

gnunicorn commented Apr 28, 2020

cheme commented Jun 11, 2020

cheme commented Feb 4, 2020 •

edited

Loading

cheme commented Apr 27, 2020 •

edited

Loading