Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block relay improvements #1883

Closed
wants to merge 6 commits into from
Closed

Block relay improvements #1883

wants to merge 6 commits into from

Conversation

rahulksnv
Copy link
Contributor

@rahulksnv rahulksnv commented Aug 25, 2023

  1. Add support for returning multiple complete blocks, without going through the protocol. This is meant for scenarios where a node is syncing upon start
  2. Enable block relay by default.
  3. Clean ups

Fixes #1901

Code contributor checklist:

@nazar-pc
Copy link
Member

Why draft again, should this be reviewed already?

@rahulksnv
Copy link
Contributor Author

Why draft again, should this be reviewed already?

Local testing looks good, made this to get a run on dev net. Pls ignore till then

Enable block relay by default, with override to disable.
@rahulksnv rahulksnv marked this pull request as ready for review August 28, 2023 22:42
Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern about size estimation and some questions

crates/sc-subspace-block-relay/src/consensus.rs Outdated Show resolved Hide resolved

/// Message to be handled by the protocol
ProtocolRequest(ProtocolRequest),
Protocol(ProtocolRequest),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything here seems to be handled by the protocol 🤔 Is there a better name for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks short/self explanatory, but open to suggestions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that is my point. It is not realy explanatory, those requests are protocol requests just like other requests which are not called protocol requests, but clearly still are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

Comment on lines 468 to 490
// Enforce the max limit on response size.
let mut bytes = body.as_ref().map_or(0, |extrinsics| {
extrinsics.iter().map(|ext| ext.encoded_size()).sum()
});
bytes += partial_block
.indexed_body
.as_ref()
.map_or(0, |entries| entries.iter().map(|entry| entry.len()).sum());
if !blocks.is_empty() && (total_size + bytes) > MAX_RESPONSE_SIZE.into() {
break;
}
total_size += bytes;

block_id = match block_request.direction {
Direction::Ascending => BlockId::Number(partial_block.block_number + One::one()),
Direction::Descending => {
if partial_block.block_number.is_zero() {
break;
}
BlockId::Hash(partial_block.parent_hash)
}
};
blocks.push(partial_block.block_data(body));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you're pushing partial blocks into response, but measure something different when checking bytes size. I see Substrate is probably buggy the same way (will submit PR there too). I think what you should do instead is to push new entry into blocks, check blocks.encoded_size() and if it exceeds the limit just pop last entry before breaking out of this loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tries to mirror substrate as much as possible. They omit headers, justifications in the size (possible to pack more blocks in response). Unless we have big hdrs (related to segment index, PoT), this should not make much difference IMO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know why would they omit them though? Because to me having a size limit and then consciously exceeding that such that response doesn't fit into protocol message size limit makes no sense whatsoever and most likely is a bug, not a feature. Hence I reported it as such in Substrate: paritytech/polkadot-sdk#1232

So rather than mirroring bugs (that will be hard to debug BTW) we should fix them here and report upstream as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extrinsics is the bulk of the transfer I really care about, I am fine either ways with the other fields. Do you want to include whole encoded size?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly want to include the whole encoded size. Otherwise we're waiting for issue to happen and it certainly will and there will be no recovery out of it. I don't see why we want to intentionally bake such bugs into the protocol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I would have liked to just call the default block handler in cases like these, instead of duplicating the logic.. but that would need more upstream changes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling .encoded_size() is not difficult and upstream does have a bug, so it is actually nice that we handle it ourselves and can bypass it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will make the change, I am not so much worried about the size of the change itself, but deviating from upstream behavior during sync and hitting some other issue (this upstream path has proved to be fragile in the past)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed, retested

Comment on lines 473 to 474
let bytes = block_data.encoded_size();
if !blocks.is_empty() && (total_size + bytes) > MAX_RESPONSE_SIZE.into() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't 100% correct. Encoding depends on number of elements, total_size + bytes will not account for that. This is why I was recommending to push a new element, use blocks.encoded_size() from the beginning to check the size and popping last element if after insertion the limit was exceeded.

See https://docs.substrate.io/reference/scale-codec/#fn-1 for details of how compact numbers are encoded in SCALE codec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, but that is bit ugly. This may be off by a few bytes over, but would be negligible compared to the 8MB size limit (which itself is arbitrary)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is annoying because we're doing more calculation. But those few bytes might be enough for node to get stuck and being unable to continue sync because it is one or two bytes above the limit.

If you want to optimize this (not checking encoded_size on the same blocks over and over again), you can initialize total_size with max size of compact length encoding (so compact length encoding of 128 in our case, can be done with https://docs.rs/parity-scale-codec/latest/parity_scale_codec/trait.CompactLen.html#tymethod.compact_len), then you can just use addition after that the same way you do now.

This is blockchain, why be not precise when you can be exact?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But those few bytes might be enough for node to get stuck and being unable to continue sync because it is one or two bytes above the limit.

That is not how it works (e.g) say the block has one extrinsic which is > the limit, it will get sent (see similar check in upstream). The right solution for this would be chunking at the source.

I don't mind making the change, but if there is a strong reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not how it works (e.g) say the block has one extrinsic which is > the limit, it will get sent (see similar check in upstream). The right solution for this would be chunking at the source.

I do not understand what you mean here. I think we have already established that upstream is buggy and shouldn't be followed. We also know that at least a single block will fit into the response, so we don't need to worry about that. Thus we don't need any chunking, we just send as many block as fit into the limit and client will reach out again if they need more blocks later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I was concerned about the repeated encoded_size() calls(+ duplicate encode() in the end). Other option would be to keep the encoded Vec<> as we add entries, and use something like EncodeAppend to append. But the rollback of the last entry becomes expensive/messy.

I could not find anything satisfactory, so ended up implementing a simpler scheme. This allows appending encoded blocks, while able to check for overflow before append (no rollback needed)

@nazar-pc nazar-pc mentioned this pull request Aug 30, 2023
1 task
@nazar-pc
Copy link
Member

Was merged as part of #1911

@nazar-pc nazar-pc closed this Aug 30, 2023
@nazar-pc nazar-pc deleted the rsub/br-changes branch August 30, 2023 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support download of multiple blocks
2 participants