Halt prevention #141

bryanchriswhite · 2022-10-14T10:46:48Z

Background

Currently, mapping handler functions can throw errors which will cause the subquery node process to terminate. Outside the context of mapping handlers, in the event of non-persistent errors (e.g. network interruptions), this behavior is desirable as it delegates the responsibility of resuming the process to the orchestration environment (kubernetes). However, once control flow reaches a mapping handler, I suspect that all errors are classifiable as persistent (i.e. would be deterministically reproducible given the same input and handler function).

Relevance to #91

One of the benefits of re-indexing from genesis in our preview environments is that we ensure that the indexer will not halt (on historical data), and if it does, we have the opportunity to iterate, all without affecting the availability of production data. Once we start using migrations to transform the data from one state to another, we will no longer be exercising the mapping handlers for data which was indexed prior to a given migration. In other words, it will become possible that the DB can be in a state which it could not have reached by re-indexing from scratch. This interaction with migrations increases the likelihood that a class of bug survives the preview environment to be encountered in production. This is not acceptable.

Acceptance criteria

All mapping handler functions MUST:

catch any error thrown downstream of them
persist the ~~handler function name,~~(redundant) provided arguments, and the error message/stack
- persisted data MUST be sufficient to retry later
log a warning
allow the indexer to continue

(We can design the retry mechanism in subsequent work)

bryanchriswhite · 2022-10-31T12:14:40Z

Proposal

~~Use TypeScript decorators to generalize the application of this logic to handler functions.~~
- Decorators are only applicable to Class constructors, methods, method parameters, getters/setters, and properties (i.e. not with function expressions or top-level function declarations 🙄)
try/catch in the root of the decorator body

bryanchriswhite · 2022-11-17T11:18:06Z

I'm having a difficult time conceiving of a way to test a case where a tx with either unexpected or malformed data makes it through the network to be picked up by the indexer. It seems to me that to test this properly, we would need to introduce a mock somewhere which can implement this incorrect behavior reliably and more straightforwardly (in my opinion). One example of this would be to mock a fetchd endpoint which we would point the indexer at for classes of tests which require test-specific behavior from their integrated components (fetchd, in this case).

Unfortunately, that adds complexity and overhead to getting a useful test in place for this change. I would propose to tackle that in additional PRs, the failsafe error log should suffice in the meantime. I've opened #180 to track this.

bryanchriswhite · 2022-11-21T12:50:39Z

Proposal for testing

We could construct an inert (i.e. doesn't interact with the DB) handler which only serves to throw an error in the main function body, while following the conventional error handling pattern:

export async function handleTestUnprocessed(event: CosmosEvent): Promise<void> {
  attemptHandling(() => {throw new Error("test error")}, unprocessedEventHandler);
}

bryanchriswhite · 2022-11-21T13:10:18Z

There seems to be some higher-level error handling which is aware of the sequelize client state and is "out-scoping" our in-handler error handling, at least in the case where the SQL transaction failed:

ledger-subquery-subquery-node-1  | 2022-11-21T13:05:02.308Z <fetch> ERROR failed to index block at height 11  SequelizeDatabaseError: current transaction is aborted, commands ignored until end of transaction block

Additionally, as the error message indicates, we don't seem to be able to save an UnprocessedEntity once the sequelize client has entered this state.

bryanchriswhite added C-bug Category: Something isn't working C-enhancement Category: New feature or request labels Oct 14, 2022

bryanchriswhite added the P-high Priority: High priority task label Oct 31, 2022

bryanchriswhite self-assigned this Nov 9, 2022

bryanchriswhite linked a pull request Nov 16, 2022 that will close this issue

chore: no halting 🚀 #178

Merged

bryanchriswhite changed the title ~~Indexer shouldn't halt on handler failure~~ Halt prevention Nov 16, 2022

bryanchriswhite closed this as completed in #178 Nov 17, 2022

bryanchriswhite mentioned this issue Nov 17, 2022

Availability #182

Closed

bryanchriswhite added this to the Availability milestone Nov 17, 2022

bryanchriswhite reopened this Nov 21, 2022

bryanchriswhite removed their assignment Dec 5, 2022

ejfitzgerald modified the milestones: Availability, v1.0 Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Halt prevention #141

Halt prevention #141

bryanchriswhite commented Oct 14, 2022 •

edited

Loading

bryanchriswhite commented Oct 31, 2022 •

edited

Loading

bryanchriswhite commented Nov 17, 2022 •

edited

Loading

bryanchriswhite commented Nov 21, 2022

bryanchriswhite commented Nov 21, 2022 •

edited

Loading

Halt prevention #141

Halt prevention #141

Comments

bryanchriswhite commented Oct 14, 2022 • edited Loading

Background

Relevance to #91

Acceptance criteria

bryanchriswhite commented Oct 31, 2022 • edited Loading

Proposal

bryanchriswhite commented Nov 17, 2022 • edited Loading

bryanchriswhite commented Nov 21, 2022

Proposal for testing

bryanchriswhite commented Nov 21, 2022 • edited Loading

bryanchriswhite commented Oct 14, 2022 •

edited

Loading

bryanchriswhite commented Oct 31, 2022 •

edited

Loading

bryanchriswhite commented Nov 17, 2022 •

edited

Loading

bryanchriswhite commented Nov 21, 2022 •

edited

Loading