Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make chainHead_unstable_storage more powerful #37

Merged
merged 9 commits into from
May 25, 2023

Conversation

tomaka
Copy link
Contributor

@tomaka tomaka commented May 15, 2023

Close #21

Even though they're unrelated, this PR is based on top of #35. I expect #35 to be merged, and didn't want to have conflicts.

This PR enhances chainHead_unstable_storage by adding back the type field that was removed in #17.
You can now query not just the value of a storage item, but also the hash of its value, the values of the item and all its descendants, the hashes of item and all its descendants, or the Merkle value of the item or its closest ancestor.

Querying the descendants of the storage item makes it possible to enumerate all the keys under a certain prefix, which is important to know the content of maps.
In order to reproduce something similar to state_getKeysPaged, in the case when you enumerate descendants the server can now generate a waiting-for-continue event, after which the client must call chainHead_unstable_storageContinue in order to resume the enumeration. This makes it possible for the client to interrupt the enumeration at any point in time. Rather than letting the client decide of the number of items (like is the case for state_getKeysPaged), it's now the server which decides, which makes it possible to decide to generate this event based on the size of what was sent rather than the number of items.

The closest-ancestor-merkle-value query type makes it possible to know when the content of a map has changed. It can be seen as similar to hash, except that hash is only the hash of the value while closest-ancestor-merkle-value is the hash of the value plus the hash of all the descendants in the trie.
Basically, in order to watch a map, a JSON-RPC client would periodically query the closest-ancestor-merkle-value of the map, then redownload the list of keys if this Merkle value changes.

@tomaka tomaka mentioned this pull request May 15, 2023
@bkchr bkchr requested review from skunert and lexnv May 15, 2023 15:58
lexnv
lexnv previously approved these changes May 16, 2023
Copy link
Contributor

@lexnv lexnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing!

This should give us parity between what substrate exposed previously wrt iterating over key storages, while having a more compact and robust API.

Hopefully, we'll be able to make a complete transition in the whole ecosystem to the new methods and deprecate the old RPC ones 👍

@tomaka
Copy link
Contributor Author

tomaka commented May 16, 2023

I'm actually not completely happy with this design where the client has to call storageContinue.

The function achieves all its goals, but the problem is that the client can delay calling storageContinue indefinitely. There exists a limit to the number of active subscriptions, so I'm not worried about DoS attacks or anything like that.

What I'm worried about it a very specific situation: if a load balancer redirects a storage query to server A, and later wants to shut down server A because it is no longer needed or needs maintenance, the load balancer currently can generate a stop event for all active follow subscriptions, wait for all active requests to finish, then shut down server A.

But with the introduction of this storageContinue thing, the client can prevent this from happening by keeping a storage query active indefinitely.
I feel like there should be something similar to stop that the load balancer can generate. The error event currently means "fatal irrecoverable error", which isn't the case here as the client should simply try again. I think that inaccessible is appropriate, because the load balancer can now longer access the node in question.

@tomaka
Copy link
Contributor Author

tomaka commented May 24, 2023

Since it's sitting there, I just wanted to mention that I can't merge this pull request myself as I'm not at Parity anymore.

Copy link
Contributor

@lexnv lexnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

// cc @skunert @jsdw @bkchr


If `type` is `hash` or `descendants-hashes`, then the cryptographic hash of each item is provided rather than the full value. The hashing algorithm used is the one of the chain, which is typically blake2b. This can lead to significantly less bandwidth usage and can be used in order to compare the value of an item with a known hash and querying the full value only if it differs.

If `type` is `closest-ancestor-merkle-value`, then the so-called trie Merkle value of the `key` is provided. If `key` doesn't exist in the trie, then the Merkle value of the closest ancestor of `key` is provided. Contrary to `hash`, a `closest-ancestor-merkle-value` always exists for every `key`. The Merkle value is similar to a hash of the value and all of its descendants together.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, I find the naming "hard/confusing". Why not "closest-ancestor-merkle-node" and then also saying that this is the hash of the closest merkle node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the same vocabulary as the spec. There's no such thing as a "merkle node". There are "trie nodes" that have a "node value" and a "merkle value" (the merkle value is either the node value or its hash).

"closest ancestor merkle value" means "the merkle value of the closest ancestor".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the same vocabulary as the spec.

Okay 👍

(the merkle value is either the node value or its hash).

What is the node value? The value attached to a node? Shouldn't we always be interested in the hash as we only use this for tracking changes?

Copy link
Contributor Author

@tomaka tomaka May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node value is (basically) concat(node_type, partial_key, storage_value, merkle_values_of_children).

If the node value is less than 32 bytes then the Merkle value is the same as the node value. If the node value is 32 bytes or more, then it's the hash.
https://spec.polkadot.network/chap-state#defn-merkle-value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh okay, ty for the explanation 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is it planned to support storage iterations?
4 participants