Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixup merkle-value #698

Merged
merged 1 commit into from
Sep 6, 2023
Merged

Fixup merkle-value #698

merged 1 commit into from
Sep 6, 2023

Conversation

bkchr
Copy link
Contributor

@bkchr bkchr commented Sep 5, 2023

I assume the merkle value follows how we declare an inline node in the trie crate? If yes, then the size of the node value can be up to 32 bytes.

I assume the merkle value follows how we declare an inline node in the trie crate? If yes, then the size of the  node value can be up to 32 bytes.
@Noc2 Noc2 requested a review from bhargavbh September 6, 2023 06:41
@bhargavbh bhargavbh merged commit 4b4b74e into w3f:main Sep 6, 2023
@bkchr bkchr deleted the patch-1 branch September 6, 2023 08:11
@tjjfvi
Copy link
Contributor

tjjfvi commented Sep 7, 2023

Where is this defined in the source? I only found this, which would suggest that it's only inline if len < 32, like in the original text.

@bkchr
Copy link
Contributor Author

bkchr commented Sep 8, 2023

And threshold being 33.

@tjjfvi
Copy link
Contributor

tjjfvi commented Sep 8, 2023

As far as I can tell, the code you linked relates to whether or not a node has a hashed subvalue (see this, which links the usage of that threshold constant to the state version, and this, which shows that the Value type linked is not related to children).

The section of the spec edited in this PR does not relate to the hashed subvalue, but rather the merkle value of a trie node.

(The format of the hashed subvalue is, except for how it changes the header, almost entirely undocumented in the spec; there is no mention of the threshold you link, nor even how the format of the subvalue changes.)

@bkchr
Copy link
Contributor Author

bkchr commented Sep 11, 2023

I honestly have no idea what the "merkle value" is used for. Our trie crate doesn't use this term nor does it cares. However, I would be surprised why the threshold that decides on inline child nodes should be different to the value that decides if the merkle value is a hash or the node value.

Maybe @cheme can shine some light here on what merkle value is used for. I can not recall on anything outside of the encoding that should care if something is a hash or a value.

@cheme
Copy link

cheme commented Sep 11, 2023

Yes I am not really using "merkle value" wording usually, but it is used in the rpc spec paritytech/json-rpc-interface-spec#37.
In this proposal the goal is to subscribe on this to get alerted if content starting with this node prefix do change (hash or inline node will be different).
Actually for sake of simplicity we could use directly hash of inline node to, so we only return hash nodes, but it would be a bit more costly on size.
Having same threshold for inline node and value nodes comes from the same reasoning:
saving space by avoiding a 32 byte hash storage when storing directly node content or value would take less or equal size.

Actually for inline node the limit would have been better at 31 (since for 32 byte we encode the size and use 33 bytes in the end).
Also 33 in code means start using value node from 33 bytes (so up to 32 bytes, inline values are used). So it kind of a lower value threshold for inline value which can be a bit confusing (probably could change to a threshold for node value which would be 32).

@tjjfvi
Copy link
Contributor

tjjfvi commented Sep 11, 2023

I can not recall on anything outside of the encoding that should care if something is a hash or a value.

If one has only partial information about the trie (i.e. only has the preimages of some of the node hashes, as in the case of a light client), when attempting to read from the trie, one must determine if a child is inlined or hashed. The only way to do so is to check the length – if it's <= 31, it's inline; otherwise, it's hashed.

(For values, on the other hand, whether it's inline or hashed is explicitly stored in the node header.)


If nodes are in fact inlined when len < 32, this PR should be reverted.

@cheme
Copy link

cheme commented Sep 11, 2023

It is when node < 33 (<= 32). I was just mentioning < 32 would have been better.

@tjjfvi
Copy link
Contributor

tjjfvi commented Sep 11, 2023

@cheme Then can you link to where in the source that is defined/handled?

What I linked strongly indicates < 32, and what @bkchr linked does not relate to inline nodes, but rather inline values.

@cheme
Copy link

cheme commented Sep 11, 2023

yes you're right, I got bad memory. Inline node are actually from 0 to 31 encoded len, then one byte is used to encode the len in the node encoding which result to 32 byte so good. (you can see more in triedbmut and lookup (same thing look for comparison againt LENGTH from hasher (32 for blake)).
For inline value, there is though this issue that a 32 byte value is inline encoded to 33 byte (with the additional 1 byte).
So the threshold is different between both. (for code look for comparison against MAX_INLINE_VALUE (33 in polkadot-sdk/primitive/trie crate)).
For inline value, it means that the many node that only store a hash will not use an external node (so little cost when accessing node but not value, but overall might be better given we avoid splitting there).

tjjfvi added a commit to tjjfvi/polkadot-spec that referenced this pull request Sep 11, 2023
This reverts commit 4b4b74e, reversing
changes made to 9f6d48a.
@bkchr
Copy link
Contributor Author

bkchr commented Sep 11, 2023

For inline value, there is though this issue that a 32 byte value is inline encoded to 33 byte (with the additional 1 byte).
So the threshold is different between both. (for code look for comparison against MAX_INLINE_VALUE (33 in polkadot-sdk/primitive/trie crate)).

W00t.... Why do we have done this..?

@cheme
Copy link

cheme commented Sep 11, 2023

For inline value, there is though this issue that a 32 byte value is inline encoded to 33 byte (with the additional 1 byte).
So the threshold is different between both. (for code look for comparison against MAX_INLINE_VALUE (33 in polkadot-sdk/primitive/trie crate)).

W00t.... Why do we have done this..?

tbh partially an overlook, but at the same time I cannot say if putting cursor for using a value node at 32 or 33 or even 40 is the better choice.
strictly for proof size 32 looks better, but this also means that hash storage (which is very commont) will always use an external node value in db.
also considering we currently put proof as a sequence of Vec where all vec are scale encoded we have an additional size byte for the value node too.
I mean a single node trie with a 32 byte value will encode as node to 3 + 1 + 32 = 36 bytes then as proof (Vec<Vec>): 1 + 1 + 36 = 38.
If the value node started at 32 byte and not 33, it would encode to 3 + 32 = 35 bytes (1 less), but as this is two different nodes the Vec<Vec> will encode to 1 + 1 + 1 + 35 = 38.

So unless we do some smarter proof encoding, this is not really impacting at the time (only impacting in a good way).

@bkchr
Copy link
Contributor Author

bkchr commented Sep 13, 2023

IMO it would just have been easier to consume if inline node and inline value use the same threshold. Deciding what exactly this threshold should be, would be some extra step. Just for understanding the entire structure it would have been easier. But yeah, we are now too late :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants