-
Notifications
You must be signed in to change notification settings - Fork 6
Improvements to the networking storage read requests #5
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looking good! Just some questions.
|
||
Implementers of the replier side should be careful to detect early on when a reply would exceed the maximum reply size, rather than inconditionally generate a reply, as this could take a very large amount of CPU, disk I/O, and memory. Existing implementations might currently be accidentally protected from such an attack thanks to the fact that requests have a maximum size, and thus that the list of keys in the query was bounded. After this proposal, this accidental protection would no longer exist. | ||
|
||
It is not possible for the querier to know in advance whether its query will lead to a reply that exceeds the maximum size. If the reply is too large, the querier should send a different, smaller request. While this is not an ideal solution, it is believed that this problem is rare enough that it's not worth making things more complicated in order to solve it. Ideal solution could include proofs that can be verified in a streaming way, or sending a partial proof whose remainder can be queried afterwards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we already have any way to indicate that a request was too big? Or how would the requester find out about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The policy right now is the same for all request-response protocols:
-
If the request is too big, the remote abruptly closes (i.e. resets) the substream.
-
If you're asking what if the response is too big, well the answerer is supposed to cap the size of the response so that it doesn't exceed that limit. If it's not possible to send back any response because it would be too big, then the answerer absurptly closes (i.e. resets) the substream.
This isn't very optimal, but as I've mentioned we just assume that it can't happen. The limit exists in order to avoid very abusive situations where one party maliciously sends an infinite stream of bytes. This limit isn't meant to ensure granularity in the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If the request is too big, the remote abruptly closes (i.e. resets) the substream.
This is the answer to my question! Ty.
@cheme could you please also check this out again? |
+message Key { | ||
+ required bytes key = 1; | ||
+ optional bool skipValue = 2; // Defaults to `false` if missing | ||
+ optional bool includeDescendants = 3; // Defaults to `false` if missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding request being too big, I would find it neat if it was allowed to reply only to part of the query:
- the n first keys when no descendant
- the n first keys and possibly an incomplete number of descendants for the key n
as long as it match the order of the queried content.
(the problemetic of stopping a process in the middle if proof get too big is something that is needed in pvf context too).
This would be up to the full node to decide where to stop.
The end of content is simply the first key (main key or key in the descendant) that is not part of the proof.
Then the light client could send the query again minus these n first keys.
But to handle the case where the prefix is too big to include all descendant in a single response, I would propose to change this so :
includeDescendant: Option<Option<&[u8]>>
None would means don't include descendant, Some(None) would mean inculde descendant starting by the key itself, Some(Some(&[1, 2, 3u8])) would mean include descendant of prefix, starting with prefix ++ b"123"
with no need to include value for this starting key (but need to prove the next key is indeed following it).
That way we can query any range.
+ } | ||
+ | ||
+ required bytes hash = 1; | ||
+ required ChildTrieNamespace namespace = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering previous (child trie root hash) mandatory field, this field is not strictly necessary. Though I like having it as it makes it simple to implement (for rocksdb otherwhise we would need to change storage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(child trie root hash)
Are you assuming that hash
means the trie root hash of the child trie?
The way I intended it, hash
is the "name" of the child trie.
In other words, the child trie is the one at concat(":child_storage:", namespace, ":", hash)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, sorry I misread, was assuming the namespace was a different field.
namespace is actually the kind of child trie, which is fine.
so here we do pass the child trie location in the top trie (hash
is the child trie unique id) and will require the proof to include the proof of the child trie root in the top trie.
I was thinking we pass the child trie root and then only expect to receive proof from the child trie only.
That's fine this way (closer to previous api).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cheme: Just to double-check: I'm happy to merge it if you are also okay with it.
Works for me, even if I think |
I think it's not as simple as what you suggest. After all, even if you don't ask for descendants, you could be asking for many keys and the proof could be very large. If we add a system where results are incomplete and you have to resume your query, then IMO it should be done on the entire query and not just for descendants. Basically, my opinion is: this all seems very complicated to do properly, and we can deal with it later. |
Since compact proofs are completely deterministic, IMO a simple way to add resumable queries is to add a field in the request that indicates after which storage key you want the proof to start. We could also do this for non-compact proofs if we enforce a deterministic order. |
In this case (no descendent but too many key queried), the query can be resume later by just removing the keys from the resulting set from the query set.
this is what I did suggest. (just instead of a different field it changes the type of the boolean to Option<Option but a different optional field sounds good to me). |
This seems like two vastly different solutions to me, no? What I suggest is that if you pass a "resume value" in this new field, then the proof would for example not even contain the node value of the root node. You basically take your proof as a |
Note that if you define it at the query level (rather than key level), you will need to make it a pair (key, Option<key_under_prefix>). |
but defining at query level (as you propose IIUC) would make sense as we should only need to resume for a single prefixed key query at a time. |
Uhhh... no? To give a concrete example of what I suggest: Let's say you do a request for keys No matter what the request is, the remote needs to send back N entries. But instead of sending N entries at once, it sends entries 0..M and then later M..N. There's no need to remove |
Why don't you resume directly from "0xaaab1"? (trying to manage thing at node level would sounds complex) In your scenario if you got a request with prefix "0xaa" and only can reply "0xaaaa and "0xaaab1", how do you get "0xaaab2"? |
You send the same request and pass What I suggest is literally doing something like this but a bit more optimized: fn answer_request(requested_keys: ..., resume_value: ...) -> ... {
let proof: Vec<...> = generate_proof(requested_keys);
while proof.first().key < resume_value {
proof.remove(0);
}
let truncate_pos = 0;
let mut size = 0;
for elem in proof.iter() {
if size + elem.size >= LIMIT { break; }
size += elem.size;
truncate_pos += 1;
}
proof.truncate(truncate_pos);
proof.encode()
} |
Ah ok, I get you, your resume value is a lower bound filter. Exclusive sounds ok (even if I don't think it was already implemented this way for range (includeDescendant case), but I think it just means inclusive of the value hash of the bound), it sounds good. On the other hand your code snippet completely lost me, generating a proof with all keys for truncating after is complex. I would rather manage the limit at a key level (only stop recording after a value is accessed so there is no orphan nodes). fn record(accessed_node: Vec<u8>, proof: &mut Vec<Vec<u8>>, last_key_size: &mut usize, has_value: bool) -> bool {
if proof.size() + accessed_node.size() > LIMIT {
proof.truncate(*last_key_size);
return false
}
proof.append(accessed_node);
if has_value {
*last_key_size = proof.len();
}
true
} with has value being inline value in node or a value node. |
I actually think that inclusive is a bit better. This way, an empty resume value and a missing resume value are equivalent.
I don't think it's a problem to send orphan nodes, is it? The problem that this resume value solves is to handle the situation where a proof is larger than the arbitrary limit that we have decided for the response size. This response size limit exists in order for the sender to be sure that "something is happening", in other words that it is not being sent an infinite stream of randomly-generated data. It's not a limit to the size of a proof, but to the size of a networking message. In fact, we strictly speaking don't even need a limit. Every information in a Merkle proof is either bounded in size or is prefixed with its size. And if nodes in the proof are strictly ordered (like I suggest we enforce), then the recipient of the proof can be sure that what it is receiving is a correct proof while still receiving it. In other words, the receiver can be sure that the sender of the proof isn't malicious while the streaming is still going on. What I want to express is that to me the resume value isn't really "related" to the content of the proof, but to its form. It's more a way to resume the download of a proof whose download was interrupted for some reason. |
I did made my mind with exclusive (like for the prefix it is better as we avoid a value (we need to start resume from previous obtained value anyway). It make resume straightforward (last read value).
no not really, as long as ordered
for the streaming case, then it is not a limit but a bandwidth to insert delays while serving.
Issue of being able to check only at the end of the big value looks similar to what I mention for compact proof, differencie is only that in this case, even with a limit you cannot know if value is correct until you get the
Definitely something that chains should put in place, but I feel like at substrate level it could be free (limit define in runtime).
yes All in all I believe your idea of passing a lower bound threshold for resume is good, and would be happy to see it in the PSP (more an exclusive one), and implement it. |
I've updated the proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the late change (will try to implement that this week).
@@ -84,14 +85,17 @@ If `skipValue` is `true` for a `Key`, then the value associated with this key is | |||
|
|||
If `includeDescendants` is `true` for a `Key`, then the replier must also include in the proof all keys that are descendant of the given key (in other words, its children, children of children, children of children of children, etc.). It must do so even if `key` itself doesn't have any storage value associated to it. The values of all of these descendants are replaced with their hashes if `skipValue` is `true`, similarly to `key` itself. | |||
|
|||
The optional `onlyKeysAfter` field can provide a lower bound for the keys contained in the proof. The responder must not include in its proof any node whose key is inferior or equal to the value in `onlyKeysAfter`. If no `onlyKeysAfter` is provided, the response must start with the root node of the trie. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding this sentence, it can be a bit confusing for implementor in the sense that when we are doing prefix, we query from this bound (if bound is under prefix), and will include in the proof the nodes containing the key (needed to show that next key follow this bound), but not its value.
May be a bit too implementation oriented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what you're saying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the exclusive bound when we do a range proof (for prefix case) will include the restart key even if exclusive (not the value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let s say your range query stopped at key (nibbles not bytes) [1, 2, 15], then you ask again with starting exclusive at [1, 2, 15], so you will need to descend into the key to check if [1, 2, 15, X] exist before going into [1, 2, 16], and you will include [1, 2, 15] key (not value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it means that the emitter of the query also keep in memory that there is no children to the node
Yes, that's intended
I feel like it would be confusing to put nibbles in the query, we only use keys everywhere and user of the rpc should not need to know anything about the state storage.
That's not the RPC layer, that's the networking protocol.
The concept of nibbles is deeply tied with the trie and with Merkle proofs. You can't do anything with the proof you receive in the response without knowing what nibbles are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, I still think it is too complicated (stateless got value in my mind).
if I want to do what you suggest, I would need:
struct Cursor {
key: Vec<u8>,
padded: bool, // indicate if last nibble of key is a padding
next_child: Option<u8>, // indicate iteration next item or parent if none
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I get where you're going, next_child
don t need to be an option as if it is parent, we already got the node in previous query and should just restart at parent and next sibbling.
And if you define next child as the last nibble of key you don t need the field. Then you also not repeat the parent nodes in the proof.
Would work, is stateless from the server point of view, not from the client point of view. Not really compatible with compact proof (or it also cannot be stateless from the server point of view).
Still sounds a bit complex to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually compact local to a reply would work.
Would be ok then (just need to define the OnlyKeysAfter as nibbles (add a boolean to indicate if padded)).
Actually when reading "The responder must not include in its proof any node whose key is inferior or equal to the value in onlyKeysAfter
." I was thinking of value key not node key.
So this means all parent key will not be include, that s a bit fun as the proof will jump in sequence, but implementable.
From the client side it means keeping a stack of node in memory and passing the cursor, finally I understand what you did propose.
We can just add the padding boolean to OnlyKeysAfter
and we re good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for other readers, in the trie crate use by substrate a node key is called a node prefix.
I'm going ahead and merging it. But if we don't move this repo to https://github.com/polkadot-fellows any time soon, I will give you access to this repo and the rights to approve PRs. |
@Noc2 This wasn't finished, we were still talking about modifications. |
Please revert the merge |
Maybe I should have revert my approval when we did see a slight issue, but it is @tomaka who see it. I am still ok to continue conversation here and have a small patch in a different pr, but reverting would be easier yes (only a slight change seems needed). |
Done. |
No description provided.