Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Halt prevention #141

Open
bryanchriswhite opened this issue Oct 14, 2022 · 4 comments · Fixed by #178
Open

Halt prevention #141

bryanchriswhite opened this issue Oct 14, 2022 · 4 comments · Fixed by #178
Labels
C-bug Category: Something isn't working C-enhancement Category: New feature or request P-high Priority: High priority task
Milestone

Comments

@bryanchriswhite
Copy link

bryanchriswhite commented Oct 14, 2022

Background

Currently, mapping handler functions can throw errors which will cause the subquery node process to terminate. Outside the context of mapping handlers, in the event of non-persistent errors (e.g. network interruptions), this behavior is desirable as it delegates the responsibility of resuming the process to the orchestration environment (kubernetes). However, once control flow reaches a mapping handler, I suspect that all errors are classifiable as persistent (i.e. would be deterministically reproducible given the same input and handler function).

Relevance to #91

One of the benefits of re-indexing from genesis in our preview environments is that we ensure that the indexer will not halt (on historical data), and if it does, we have the opportunity to iterate, all without affecting the availability of production data. Once we start using migrations to transform the data from one state to another, we will no longer be exercising the mapping handlers for data which was indexed prior to a given migration. In other words, it will become possible that the DB can be in a state which it could not have reached by re-indexing from scratch. This interaction with migrations increases the likelihood that a class of bug survives the preview environment to be encountered in production. This is not acceptable.

Acceptance criteria

All mapping handler functions MUST:

  1. catch any error thrown downstream of them
  2. persist the handler function name,(redundant) provided arguments, and the error message/stack
    • persisted data MUST be sufficient to retry later
  3. log a warning
  4. allow the indexer to continue

(We can design the retry mechanism in subsequent work)

@bryanchriswhite bryanchriswhite added C-bug Category: Something isn't working C-enhancement Category: New feature or request labels Oct 14, 2022
@bryanchriswhite bryanchriswhite added the P-high Priority: High priority task label Oct 31, 2022
@bryanchriswhite
Copy link
Author

bryanchriswhite commented Oct 31, 2022

Proposal

  • Use TypeScript decorators to generalize the application of this logic to handler functions.
    • Decorators are only applicable to Class constructors, methods, method parameters, getters/setters, and properties (i.e. not with function expressions or top-level function declarations 🙄)
  • try/catch in the root of the decorator body

@bryanchriswhite bryanchriswhite self-assigned this Nov 9, 2022
@bryanchriswhite bryanchriswhite linked a pull request Nov 16, 2022 that will close this issue
@bryanchriswhite bryanchriswhite changed the title Indexer shouldn't halt on handler failure Halt prevention Nov 16, 2022
@bryanchriswhite
Copy link
Author

bryanchriswhite commented Nov 17, 2022

I'm having a difficult time conceiving of a way to test a case where a tx with either unexpected or malformed data makes it through the network to be picked up by the indexer. It seems to me that to test this properly, we would need to introduce a mock somewhere which can implement this incorrect behavior reliably and more straightforwardly (in my opinion). One example of this would be to mock a fetchd endpoint which we would point the indexer at for classes of tests which require test-specific behavior from their integrated components (fetchd, in this case).

Unfortunately, that adds complexity and overhead to getting a useful test in place for this change. I would propose to tackle that in additional PRs, the failsafe error log should suffice in the meantime. I've opened #180 to track this.

@bryanchriswhite
Copy link
Author

Proposal for testing

We could construct an inert (i.e. doesn't interact with the DB) handler which only serves to throw an error in the main function body, while following the conventional error handling pattern:

export async function handleTestUnprocessed(event: CosmosEvent): Promise<void> {
  attemptHandling(() => {throw new Error("test error")}, unprocessedEventHandler);
}

@bryanchriswhite
Copy link
Author

bryanchriswhite commented Nov 21, 2022

There seems to be some higher-level error handling which is aware of the sequelize client state and is "out-scoping" our in-handler error handling, at least in the case where the SQL transaction failed:

ledger-subquery-subquery-node-1  | 2022-11-21T13:05:02.308Z <fetch> ERROR failed to index block at height 11  SequelizeDatabaseError: current transaction is aborted, commands ignored until end of transaction block

Additionally, as the error message indicates, we don't seem to be able to save an UnprocessedEntity once the sequelize client has entered this state.

@bryanchriswhite bryanchriswhite removed their assignment Dec 5, 2022
@ejfitzgerald ejfitzgerald modified the milestones: Availability, v1.0 Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: Something isn't working C-enhancement Category: New feature or request P-high Priority: High priority task
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants