delete 'shallow' fh eth blks on rpc+ingestor start #4790

sduchesneau · 2023-07-31T16:09:24Z

when launching grpah-node with an ingestor on an eth chain with an RPC provider (i.e. not firehose), this will delete the 'shallow' blocks in chainX.blocks, so they do not cause issues to the RPC provider.

It only runs on ethereum chains where firehose is not used and when ingestor is active.

when launching grpah-node with an ingestor on an eth chain with an RPC provider (i.e. not firehose), this will delete the 'shallow' blocks in chainX.blocks, so they do not cause issues to the RPC provider. It only runs on ethereum chains where firehose is not used and when ingestor is active.

sduchesneau · 2023-07-31T16:11:32Z

@leoyvens I am not quite sure how to delete the shallow blocks in a background thread without causing a race condition with the indexer...
So I have two questions:

Do you believe there are risks associated with the deletion while the ingestor will be ingesting more blocks ?
If there are none, could you give me some pointers on what to use for this ? :)

leoyvens · 2023-08-01T15:50:22Z

Good thinking that there could be a race condition with the ingestion. One possibility would be that blocks are deleted within the reorg threshold, and therefore should be re-ingested, but due to a race condition the ingestor believes they are already ingested and doesn't reingest. I checked and I think we're safe, ultimately because the missing parent query in fn missing_parent always checks that all the necessary parents are present, so if some are suddenly deleted, they will be reingested.

But here is another idea, what if we only cleanup blocks within the reorg threshold? That would make the query much cheaper, as ultimately less blocks need to be deleted. So it would be able to use the existing index on the block number, then we wouldn't need the migration and also we wouldn't need to run the cleanup in the background.

…dexes (too costly)

leoyvens · 2023-08-03T13:32:45Z

store/postgres/src/block_store.rs

@@ -456,9 +456,15 @@ impl BlockStore {
                    continue;
                };
            }
+            match store.chain_head_block(&&store.chain).unwrap_or(None) {


How about:

if let Some(head) = store.chain_head_block(&&store.chain)?

leoyvens · 2023-08-03T13:33:46Z

store/postgres/src/block_store.rs

@@ -456,9 +456,15 @@ impl BlockStore {
                    continue;
                };
            }
+            match store.chain_head_block(&&store.chain).unwrap_or(None) {
+                Some(head) => {
+                    let lower_bound = head - ENV_VARS.reorg_threshold;


Check for overflow. Also I'd give some slack here so we don't have to worry about off by one, race conditions or whatever, maybe use 2 * ENV_VARS.reorg_threshold.

…for safety

leoyvens · 2023-08-04T17:04:01Z

I did a review of the code to check if it's safe to keep the final blocks ingested by Firehose in the cache when switching to RPC. The concern being that those blocks could have a data column in a format incompatible with what the RPC block stream accepts. It really is incompatible, but the RPC block stream is robust to this because:

In chain_store.rs, the data json value is returned only in fn ancestor_block and fn blocks.
The RPC block stream uses fn ancestor_block only for non-final blocks.
When it uses fn blocks, it ignores parsing errors:

graph-node/chain/ethereum/src/ethereum_adapter.rs

Line 1271 in aeacbeb

.filter_map(|value| json::from_value(value).ok())

So while the situation with json formats for the data field is a bit of mess, it seems deleting just the blocks within the reorg threshold will work.

If my review is wrong and something complains of deserialization failure, we can re-revisit this and resort to the alternative of truncating the whole blocks table when a switch from Firehose to RPC is detected.

store/postgres/src/block_store.rs

store/postgres/src/chain_store.rs

store/postgres/src/block_store.rs

sduchesneau requested a review from leoyvens July 31, 2023 16:09

sduchesneau added 2 commits August 2, 2023 10:18

cleanup shallowblocks only within reorg threshold, remove nulldata in…

a952853

…dexes (too costly)

also remove cursor on non-firehose ethereum chains

e6a80bd

leoyvens reviewed Aug 3, 2023

View reviewed changes

sduchesneau added 2 commits August 3, 2023 10:53

delete_shallowblocks: check for overflow, use double reorg_threshold …

3302d5b

…for safety

add comment with github pr link regarding cleanup_shallow_blocks

5e85dc4

leoyvens approved these changes Aug 4, 2023

View reviewed changes

Only remove shallow blocks if cursor was removed

3cfca28

leoyvens reviewed Aug 7, 2023

View reviewed changes

store/postgres/src/block_store.rs Outdated Show resolved Hide resolved

store/postgres/src/chain_store.rs Show resolved Hide resolved

store/postgres/src/block_store.rs Outdated Show resolved Hide resolved

sduchesneau added 2 commits August 7, 2023 11:54

minor style improvements

170105c

more minor style refinement in block_store

630fca4

leoyvens merged commit 27cbcdd into master Aug 7, 2023

leoyvens deleted the stepd/delete_shallowblocks_2 branch August 7, 2023 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

delete 'shallow' fh eth blks on rpc+ingestor start #4790

delete 'shallow' fh eth blks on rpc+ingestor start #4790

sduchesneau commented Jul 31, 2023

sduchesneau commented Jul 31, 2023

leoyvens commented Aug 1, 2023

leoyvens Aug 3, 2023

leoyvens Aug 3, 2023

leoyvens commented Aug 4, 2023

delete 'shallow' fh eth blks on rpc+ingestor start #4790

delete 'shallow' fh eth blks on rpc+ingestor start #4790

Conversation

sduchesneau commented Jul 31, 2023

sduchesneau commented Jul 31, 2023

leoyvens commented Aug 1, 2023

leoyvens Aug 3, 2023

Choose a reason for hiding this comment

leoyvens Aug 3, 2023

Choose a reason for hiding this comment

leoyvens commented Aug 4, 2023