Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete 'shallow' fh eth blks on rpc+ingestor start #4790

Merged
merged 8 commits into from
Aug 7, 2023

Conversation

sduchesneau
Copy link
Contributor

when launching grpah-node with an ingestor on an eth chain with an RPC provider (i.e. not firehose), this will delete the 'shallow' blocks in chainX.blocks, so they do not cause issues to the RPC provider.

It only runs on ethereum chains where firehose is not used and when ingestor is active.

when launching grpah-node with an ingestor on an eth chain with an RPC
provider (i.e. not firehose), this will delete the 'shallow' blocks in
chainX.blocks, so they do not cause issues to the RPC provider.

It only runs on ethereum chains where firehose is not used and when
ingestor is active.
@sduchesneau sduchesneau requested a review from leoyvens July 31, 2023 16:09
@sduchesneau
Copy link
Contributor Author

@leoyvens I am not quite sure how to delete the shallow blocks in a background thread without causing a race condition with the indexer...
So I have two questions:

  1. Do you believe there are risks associated with the deletion while the ingestor will be ingesting more blocks ?
  2. If there are none, could you give me some pointers on what to use for this ? :)

@leoyvens
Copy link
Collaborator

leoyvens commented Aug 1, 2023

Good thinking that there could be a race condition with the ingestion. One possibility would be that blocks are deleted within the reorg threshold, and therefore should be re-ingested, but due to a race condition the ingestor believes they are already ingested and doesn't reingest. I checked and I think we're safe, ultimately because the missing parent query in fn missing_parent always checks that all the necessary parents are present, so if some are suddenly deleted, they will be reingested.

But here is another idea, what if we only cleanup blocks within the reorg threshold? That would make the query much cheaper, as ultimately less blocks need to be deleted. So it would be able to use the existing index on the block number, then we wouldn't need the migration and also we wouldn't need to run the cleanup in the background.

@@ -456,9 +456,15 @@ impl BlockStore {
continue;
};
}
match store.chain_head_block(&&store.chain).unwrap_or(None) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

if let Some(head) = store.chain_head_block(&&store.chain)?

@@ -456,9 +456,15 @@ impl BlockStore {
continue;
};
}
match store.chain_head_block(&&store.chain).unwrap_or(None) {
Some(head) => {
let lower_bound = head - ENV_VARS.reorg_threshold;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check for overflow. Also I'd give some slack here so we don't have to worry about off by one, race conditions or whatever, maybe use 2 * ENV_VARS.reorg_threshold.

@leoyvens
Copy link
Collaborator

leoyvens commented Aug 4, 2023

I did a review of the code to check if it's safe to keep the final blocks ingested by Firehose in the cache when switching to RPC. The concern being that those blocks could have a data column in a format incompatible with what the RPC block stream accepts. It really is incompatible, but the RPC block stream is robust to this because:

  • In chain_store.rs, the data json value is returned only in fn ancestor_block and fn blocks.
  • The RPC block stream uses fn ancestor_block only for non-final blocks.
  • When it uses fn blocks, it ignores parsing errors:
    .filter_map(|value| json::from_value(value).ok())

So while the situation with json formats for the data field is a bit of mess, it seems deleting just the blocks within the reorg threshold will work.

If my review is wrong and something complains of deserialization failure, we can re-revisit this and resort to the alternative of truncating the whole blocks table when a switch from Firehose to RPC is detected.

store/postgres/src/block_store.rs Outdated Show resolved Hide resolved
store/postgres/src/chain_store.rs Show resolved Hide resolved
store/postgres/src/block_store.rs Outdated Show resolved Hide resolved
@leoyvens leoyvens merged commit 27cbcdd into master Aug 7, 2023
@leoyvens leoyvens deleted the stepd/delete_shallowblocks_2 branch August 7, 2023 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants