feat(events): compare-amt option for lotus-shed indexes inspect-events #12571

rvagg · 2024-10-10T05:28:05Z

Instead of relying just on entry counts, compare the regenerated AMT root using just what we have in the db with the message receipt event root. This should tell us precisely that we have what we should or not.

Ref: #12570

--

I mainly just wanted to see if this was practical for the discussion in #12570, and it seems to be. The only problem is address lookup. Some of my tipsets don't resolve f410 addresses. I get back 1,500 tipsets, to epoch 4339494 and get a actor not found error so I'm not sure what to do about that. I've turned that into a "problem" rather than an error and left a TODO to figure out if it can be resolved. I assumed that we could get this mapping at the tipset, but I obviously don't understand something about addresses.

With addresses sorted out, this could work for ChainIndexer too. It's more expensive than comparing lengths (0.5s compared to 0.3s to do 20 epochs from head in a quick test), which is a bit of a surprise because to compare lengths you need to load the original AMT from the blockstore, but the expense comes from doing the address lookups via the API, which may be slightly cheaper when run in-daemon rather than via the RPC like lotus-shed does.

cmd/lotus-shed/indexes.go

rvagg · 2024-10-10T05:38:48Z

4358 epochs in my splitstore that it can do this comparison on, and only 12 fail with these address lookup errors:

✗ Epoch 4339494 (bafy2bzacedi67cc4hnoxtyfwcxjfqpx6vtwylejf6copjljphwrv4lazxojoo): failed to resolve address (f410frpwhuxbvcpa7sk3iz6mnjnsix7je6zs46qc7pja), could not compare amt
✗ Epoch 4339491 (bafy2bzacecpv4bqtleojyj7hi7dusnkuykn6kibkbd56c4ob46l42x66lwopi): failed to resolve address (f410fxh3uvls6o65egzapjlwkr5iv56kk7pu7uf7m4ci), could not compare amt
✗ Epoch 4339483 (bafy2bzaceb7dv7xb34kktigsjewsuzinlyuib5ava6p6zstqnbst3x2fylb5y): failed to resolve address (f410f35pbojlm5onzmzw55oocnwgbfo5md7za7bx4iaa), could not compare amt
✗ Epoch 4339481 (bafy2bzacedveo3p7dazs3yklnkikfve4aou6gdlcycjnx4ysfsrookbngk6wo): failed to resolve address (f410f7px5a2sebnbvatcaadtceafxrnckxaw4mavydbq), could not compare amt
✗ Epoch 4339475 (bafy2bzacec3ffum3odwicjpubvg7j7afg26et6gpzhfixg4iscrzrgmwunxf6): failed to resolve address (f410fwnajqo2ozwf36bunolkn4tuzd6gcjamldsnumwq), could not compare amt
✗ Epoch 4339472 (bafy2bzacebikp6g4qje5rt3ydn54spuokrtcybe7doqfwfsi2p45bv7uwh3ly): failed to resolve address (f410fpzr5ifoqmsqlwgio2gyfzfqzldazh6mv7yxsjwq), could not compare amt
✗ Epoch 4338989 (bafy2bzaceb3v7vnki2bqze6b4qqkg4y2ijv3rlrkhclvzcmnpmrvhgejnnvx2): failed to resolve address (f410fgrholovr2ab54n6nnmerklhkcigmk3mczzgkrza), could not compare amt
✗ Epoch 4338987 (bafy2bzacebvx5txuxekq2inesre7bwqchcd3qdpwjrouisp6p7ceqan2awmqk): failed to resolve address (f410fqlqx24dg6cn5lfmf5twixhvilnwfsig4lf4rx3q), could not compare amt
✗ Epoch 4338985 (bafy2bzacebfh2hdtc5vk3oaywtmrcxyowraeqcdw5v5gqzgesboc6pgd4odr4): failed to resolve address (f410f3f6lkjrxmpszyn62dg6ihtvfgpe4lgxzqqfqz3i), could not compare amt
✗ Epoch 4338984 (bafy2bzaceafev3gi7bpywpti3yc6ahkj7vqaggjnugcex4t3hcgdpb4tjo276): failed to resolve address (f410f6roex63vvdy4hwvvmsmt4msl3ts42hlzompxlhq), could not compare amt
✗ Epoch 4338983 (bafy2bzacec7ahfjagwbrm5lzeskct3mxgmdcoszwwpdumpekz7msolict3qgw): failed to resolve address (f410f7rxno56bkace75glk33iyhi2urlzdyqfspsrb6q), could not compare amt, failed to resolve address (f410fdcu75eaa5wxp6qmvp7qmobwa2i6i76lqpj2tcty), could not compare amt
✗ Epoch 4338951 (bafy2bzacebnxhcqokrfa7n5trn444ivipurvqyahrxnasbzcpvw5ppogn3gam): failed to resolve address (f410fopdzv2y2xb6lai2oi3nt3xlpbxpki2a3g5fcima), could not compare amt, failed to resolve address (f410fbcbcxvugqeblszsaghnbwcpdkiwgfvsqzolyeia), could not compare amt

jennijuju · 2024-10-10T21:06:30Z

I think this is also close to the “validation tool” in my head & I think it provides a more reliable response compare to the count checks (this should be a part of the reconciliation process or recommended for index providers to run on a regular basis)

rvagg · 2024-10-11T04:49:14Z

Cleaned this up a bit and now it's faster than just doing the counts because I managed to reduce the number of API calls. So, I've just made this the normal behaviour, not bothering with counts. I left the counting code in there so it can provide some informative output in the case of a mismatch but there's not really a good reason to keep doing it that way if we can be accurate.

cmd/lotus-shed/indexes.go

aarshkshah1992

nice

cmd/lotus-shed/indexes.go

Instead of relying just on entry counts, compare the regenerated AMT root using just what we have in the db with the message receipt event root. This should tell us precisely that we have what we should or not. Ref: #12570

…t-events

rvagg requested a review from aarshkshah1992 October 10, 2024 05:28

rvagg commented Oct 10, 2024

View reviewed changes

cmd/lotus-shed/indexes.go Outdated Show resolved Hide resolved

rvagg mentioned this pull request Oct 10, 2024

ChainIndexer: make epoch validation more comprehensive #12570

Closed

rvagg force-pushed the rvagg/inspect-amt branch from 9c46baa to e6c4c71 Compare October 10, 2024 05:41

rvagg mentioned this pull request Oct 10, 2024

feat: migration("re-indexing"), backfilling and diasgnostics tooling for the ChainIndexer #12450

Merged

1 task

BigLep assigned rvagg Oct 10, 2024

rvagg force-pushed the rvagg/inspect-amt branch from e6c4c71 to b8f5b73 Compare October 11, 2024 04:15

rvagg requested a review from Stebalien October 11, 2024 04:51

rvagg force-pushed the rvagg/inspect-amt branch from 90d4808 to 2502aad Compare October 11, 2024 10:41

rvagg mentioned this pull request Oct 14, 2024

feat: a new ChainIndexer to index tipsets, messages and events #12421

Merged

5 tasks

aarshkshah1992 mentioned this pull request Oct 14, 2024

ChainValidateIndex should validate the actual content of the events and not just match on the number of events #12594

Closed