Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Document the difference between sp_io::storage and sp_io::default_child_storage #7574

Open
vporton opened this issue Nov 22, 2020 · 8 comments
Labels
J2-unconfirmed Issue might be valid, but it’s not yet known.

Comments

@vporton
Copy link

vporton commented Nov 22, 2020

The answers to the following questions should be immediately obvious (but now they are not) after reading https://docs.rs/sp-io/2.0.0/sp_io/storage/index.html and/or https://docs.rs/sp-io/2.0.0/sp_io/default_child_storage/index.html:

What is the difference between sp_io::storage and sp_io::default_child_storage? How each of them is used? Which of the two is used to store contract storage variables? What is stored in the other one? What is the "default child trie" and why at all there is a necessity to have a default child trie? How are these two storages related to each other?

Please also answer the questions commenting to this issue.

@github-actions github-actions bot added the J2-unconfirmed Issue might be valid, but it’s not yet known. label Nov 22, 2020
@vporton
Copy link
Author

vporton commented Nov 22, 2020

Also: Why does default_child_storage API has storage_key: &[u8] argument and storage doesn't? (It is counterintuitive, it would be rather the reverse: default_child_storage to be attached to a determined storage_key, so it would be not to be passed; and storage having multiple sub-storages, so may need storage_key argument.)

@vporton
Copy link
Author

vporton commented Nov 22, 2020

It even should be documented what is the path of default_child_storage tries inside storage tries.

@bkchr
Copy link
Member

bkchr commented Nov 22, 2020

CC @cheme

@cheme
Copy link
Contributor

cheme commented Nov 23, 2020

Quick first answer:

  • What is the difference between sp_io::storage and sp_io::default_child_storage?

sp_io::storage only stores state in a single trie. Content can be organized by prefix: the api
do not enforce key to be hashes, the chain designer can unbalance the trie a bit (eg use
specific prefixes for each modules).
The root of this trie state is directly in the block header.

sp_io::default_child_storage:
'Child storage' api can addresses multiple trie states and stores the root of each trie in a parent root.

'Default' kind (only this kind of child state exists) uses the same kind of trie
state as sp_io::storage (see https://github.com/w3f/polkadot-spec/releases , there may also be a part on child trie there).

It always store its root in a single parent trie.
This parent trie is always the one used in sp_io::storage api.
Also to avoid overwriting a root by mistake, there is a dedicated prefix in the parent trie for child root storage (ensure
the state machine do not write into this prefix):

pub const DEFAULT_CHILD_STORAGE_KEY_PREFIX: &'static [u8] = b":child_storage:default:";

  • How each of them is used?

sp_io::storage is used by most frame pallet by using 'decl_storage' macro, and soon the new procedural storage macro.

sp_io::default_child_storage is call almost directly, there is just a very small api in front https://github.com/paritytech/substrate/blob/master/frame/support/src/storage/child.rs , but almost the same as directly call the sp_io api (especially since we got a single kind of child state/trie).

  • Which of the two is used to store contract storage variables?

contract pallet is one of the only frame module using it: the contract variable are in a child storage (one trie per contract account), and the accounts are in the parent trie.

  • What is stored in the other one?

Almost all the chain state is using sp_io::storage with prefixes. I only know contract and crowdfund pallets to use child storage (but did not check new pallets recently).

  • What is the "default child trie" and why at all there is a necessity to have a default child trie?

At the time 'default' is not necessary, because we have a single kind of child trie. I wanted to put different format of trie or different states, but my PR going in this direction where a bit impacting and I did not reach a consensus, so at the time I am focusing on different problematic, but will probably go back to it at some point.

  • How are these two storages related to each other?

They're actually the same, only the location of their roots changes (in header or in parent trie).
I remember rewriting the api to only have child trie (using optional parent trie for the top one) in order to make it more natural
to have multiple level of trie, but using two different api was less impacting (I think child trie did not exist in the first versions of substrate).

  • Why does default_child_storage API has storage_key: &[u8] argument and storage doesn't?

storage_key is the location of the state root in the parent key, while storage stores root in the header.

  • what is the path of default_child_storage tries inside storage tries
    'default_child_trie_prefix ++ storage_key' see previous link.

But the api should use only 'storage_key' when it targets the 'default' child trie (rpc does not, iirc it uses the concatenation as parameter).

@cheme cheme pinned this issue Nov 23, 2020
@bkchr bkchr unpinned this issue Nov 23, 2020
@vporton
Copy link
Author

vporton commented Nov 23, 2020

  • Why does default_child_storage API has storage_key: &[u8] argument and storage doesn't?

storage_key is the location of the state root in the parent key, while storage stores root in the header.

I still don't understand why default_child_storage does use it and storage doesn't. Why not for example vice versa?

@cheme
Copy link
Contributor

cheme commented Nov 23, 2020

for the first a query is done:
header of current block hash -> get state root from header -> fetch value at 'child_prefix' ++ 'storage_key' in state trie with this header root -> decode this value to hash -> fetch value at 'key' in state trie with this hash as root

So you need as input 'storage_key' and 'key' to get your value (header of block is related to context or a block hash for rpc).

for the second one:
header of current block hash -> get state root from header -> fetch value at 'key' in state trie with this header root

so you need only key.

Why not for example vice versa?

it is a two level only structure so if you query first level you need one key, and two key for second level.

For a more hierarchical child trie (n level of trie), the api should be unique, either passing a array of key or using a separtor in key like in uri. I would like it better (as stated in previous message), but the current api is two level only. Probably to avoid changing the existing api (storage) there is another one (default_child_storage).
Note that it should be possible to run a n level hierachical trie with the current host api, just use child tries only and write the root of every child trie wherever you need them at the end of the block processing (using child_trie_storage root calls when running 'on_finalize' system frame hook). That way you only need to work at frame level.

@vporton
Copy link
Author

vporton commented Nov 23, 2020

for the first a query is done:
header of current block hash -> get state root from header -> fetch value at 'child_prefix' ++ 'storage_key' in state trie with this header root -> decode this value to hash -> fetch value at 'key' in state trie with this hash as root

"For the first" - you mean for default_child_storage?

So get value at data(child_prefix ++ storage_key) ++ key (data(child_prefix ++ storage_key) is the hash mentioned by you). Why not just child_prefix ++ storage_key ++ key? It would be simpler. Is it to improve performance not to querying child_prefix ++ storage_key prefix every time?

@cheme
Copy link
Contributor

cheme commented Nov 23, 2020

"For the first" - you mean for default_child_storage?

👍

Why not just child_prefix ++ storage_key ++ key? It would be simpler.

Yeah an unbalanced trie would work as well. With current trie implementation we would need some additional api to extract root for a given prefix (there is a small subtletie related to the fact that the trie node might contain a prefix (the fact that substrate trie do not use 'extension node' makes it a bit more awkward, and I would probably want to attach the size of the prefix with the contract root).

I did not find the discussion where the choice was made (somewhere on github), one of the reason for chosing child trie was
the lesser cost of deleting the whole child trie (can be true also with prefix but the api is not really good for it, I remember drafting some code to attach/detach branch of a trie: it is doable).

Also note that it is rather similar to how ethereum account storages work.

When the choice was made I wanted #5280 so the rent would work well, I still think it is the right way to go, but it did not get merge.
In theory it is super simple: isolate the child trie node in db collection storage so delete the whole child trie is straight forward.
So I realize today that there is #6594 and #7526 open as a consequence, but I am not sure what the propose solution is.

Is it to improve performance not to querying child_prefix ++ storage_key prefix every time?

Yes in theory during processing you only need to access a child trie root once, but in practice, caching makes the prefix approach also fine.

TLDR; the fact that you need to deal with root of the contract storage makes it rather natural: each contract got its own state trie.
Nothing impossible with a prefix in an unbalance trie, but not as direct.
In general a child trie of the same kind as its parent trie is just one more node (in fact if I wanted to implement the same, I would probably also create this node in the trie too to avoid having a prefix in the root of the contract storage).
Also I really like the idea to allow different state implementation as child trie, but today it is only 'default'.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
J2-unconfirmed Issue might be valid, but it’s not yet known.
Projects
None yet
Development

No branches or pull requests

3 participants