-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block relay improvements #1883
Block relay improvements #1883
Conversation
Why draft again, should this be reviewed already? |
Local testing looks good, made this to get a run on dev net. Pls ignore till then |
Enable block relay by default, with override to disable.
5bbf59a
to
72cf71d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One concern about size estimation and some questions
|
||
/// Message to be handled by the protocol | ||
ProtocolRequest(ProtocolRequest), | ||
Protocol(ProtocolRequest), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything here seems to be handled by the protocol 🤔 Is there a better name for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks short/self explanatory, but open to suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that is my point. It is not realy explanatory, those requests are protocol requests just like other requests which are not called protocol requests, but clearly still are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
// Enforce the max limit on response size. | ||
let mut bytes = body.as_ref().map_or(0, |extrinsics| { | ||
extrinsics.iter().map(|ext| ext.encoded_size()).sum() | ||
}); | ||
bytes += partial_block | ||
.indexed_body | ||
.as_ref() | ||
.map_or(0, |entries| entries.iter().map(|entry| entry.len()).sum()); | ||
if !blocks.is_empty() && (total_size + bytes) > MAX_RESPONSE_SIZE.into() { | ||
break; | ||
} | ||
total_size += bytes; | ||
|
||
block_id = match block_request.direction { | ||
Direction::Ascending => BlockId::Number(partial_block.block_number + One::one()), | ||
Direction::Descending => { | ||
if partial_block.block_number.is_zero() { | ||
break; | ||
} | ||
BlockId::Hash(partial_block.parent_hash) | ||
} | ||
}; | ||
blocks.push(partial_block.block_data(body)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you're pushing partial blocks into response, but measure something different when checking bytes size. I see Substrate is probably buggy the same way (will submit PR there too). I think what you should do instead is to push new entry into blocks
, check blocks.encoded_size()
and if it exceeds the limit just pop last entry before breaking out of this loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tries to mirror substrate as much as possible. They omit headers, justifications in the size (possible to pack more blocks in response). Unless we have big hdrs (related to segment index, PoT), this should not make much difference IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why would they omit them though? Because to me having a size limit and then consciously exceeding that such that response doesn't fit into protocol message size limit makes no sense whatsoever and most likely is a bug, not a feature. Hence I reported it as such in Substrate: paritytech/polkadot-sdk#1232
So rather than mirroring bugs (that will be hard to debug BTW) we should fix them here and report upstream as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extrinsics is the bulk of the transfer I really care about, I am fine either ways with the other fields. Do you want to include whole encoded size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I certainly want to include the whole encoded size. Otherwise we're waiting for issue to happen and it certainly will and there will be no recovery out of it. I don't see why we want to intentionally bake such bugs into the protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I would have liked to just call the default block handler in cases like these, instead of duplicating the logic.. but that would need more upstream changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think calling .encoded_size()
is not difficult and upstream does have a bug, so it is actually nice that we handle it ourselves and can bypass it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will make the change, I am not so much worried about the size of the change itself, but deviating from upstream behavior during sync and hitting some other issue (this upstream path has proved to be fragile in the past)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed, retested
f367c03
to
602170f
Compare
602170f
to
06146e1
Compare
let bytes = block_data.encoded_size(); | ||
if !blocks.is_empty() && (total_size + bytes) > MAX_RESPONSE_SIZE.into() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't 100% correct. Encoding depends on number of elements, total_size + bytes
will not account for that. This is why I was recommending to push a new element, use blocks.encoded_size()
from the beginning to check the size and popping last element if after insertion the limit was exceeded.
See https://docs.substrate.io/reference/scale-codec/#fn-1 for details of how compact numbers are encoded in SCALE codec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand, but that is bit ugly. This may be off by a few bytes over, but would be negligible compared to the 8MB size limit (which itself is arbitrary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is annoying because we're doing more calculation. But those few bytes might be enough for node to get stuck and being unable to continue sync because it is one or two bytes above the limit.
If you want to optimize this (not checking encoded_size
on the same blocks over and over again), you can initialize total_size
with max size of compact length encoding (so compact length encoding of 128 in our case, can be done with https://docs.rs/parity-scale-codec/latest/parity_scale_codec/trait.CompactLen.html#tymethod.compact_len), then you can just use addition after that the same way you do now.
This is blockchain, why be not precise when you can be exact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But those few bytes might be enough for node to get stuck and being unable to continue sync because it is one or two bytes above the limit.
That is not how it works (e.g) say the block has one extrinsic which is > the limit, it will get sent (see similar check in upstream). The right solution for this would be chunking at the source.
I don't mind making the change, but if there is a strong reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not how it works (e.g) say the block has one extrinsic which is > the limit, it will get sent (see similar check in upstream). The right solution for this would be chunking at the source.
I do not understand what you mean here. I think we have already established that upstream is buggy and shouldn't be followed. We also know that at least a single block will fit into the response, so we don't need to worry about that. Thus we don't need any chunking, we just send as many block as fit into the limit and client will reach out again if they need more blocks later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I was concerned about the repeated encoded_size()
calls(+ duplicate encode()
in the end). Other option would be to keep the encoded Vec<>
as we add entries, and use something like EncodeAppend
to append. But the rollback of the last entry becomes expensive/messy.
I could not find anything satisfactory, so ended up implementing a simpler scheme. This allows appending encoded blocks, while able to check for overflow before append (no rollback needed)
Was merged as part of #1911 |
Fixes #1901
Code contributor checklist: