Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open iterators when cancelling queries #2881

Open
Aklakan opened this issue Dec 6, 2024 · 6 comments · May be fixed by #2882
Open

Open iterators when cancelling queries #2881

Aklakan opened this issue Dec 6, 2024 · 6 comments · May be fixed by #2882
Labels

Comments

@Aklakan
Copy link
Contributor

Aklakan commented Dec 6, 2024

Version

5.3.0-SNAPSHOT

What happened?

Sometimes when cancelling queries warnings are logged such as:

Open iterator: QueryIterPeek/123
Open iterator: QueryIterSingleton/456

So far I noticed this to happen when BGPs and UNIONs are involved, but this may not be exhaustive.

It turns out that not all the iterator construction in ARQ is performed truly lazy, which causes some parts of the code to leave iterators open if an exception happens during the iterator construction.

After investigation it seems that the warnings should (usually) be harmless, because the QueryIters are tracked in the ExecutionContext and thus eventually become closed without leaking resources.

However, seeing these warnings suggests that resources might be leaked - so this is an issue that IMO should be fixed.

See the PR for a test case and the proposed fixes.

Relevant output and stacktrace

No response

Are you interested in making a pull request?

Yes

@afs
Copy link
Member

afs commented Dec 7, 2024

I took the test case from the PR and have run it multiple times. I haven't got any test failures.

resources might be leaked

QueryIteratorCheck is 2015 (!!) and was then for development/internal consistency checking for non-cancelled queries.

One thing to consider is invert the responsibilities have specific management of external resources - for example, add to a specific manager, one per query execution, pass in the context. End of execution clears up. This would also be good for per-execution caches.

Let's enumerate the possibilities for such managed resources.

For in-memory all resources will eventually be garbage collected so there aren't any.

For TDB, the transaction mechanism clears up resources needing explicit work (ThreadLocal variables). Query execution resources are garbage collected.

@Aklakan
Copy link
Contributor Author

Aklakan commented Dec 7, 2024

The test case is not failing - but it produces warnings non-deterministically:
Screenshot from 2024-12-07 10-09-42

To make the test fail one could introduce a symbol ARQ.failOnOpenIterators which gets picked up by QueryIteratorCheck.

So its an issue that's quite hard to track down because (at least) OpUnion and StageGeneratorGeneric both start to construct QueryIters (which are tracked in the execution context), but the process may fail midway, leading to dangling QueryIters in the execution context which are disconnected from the 'input' iterator - and QueryIterCheck will warn in those cases - which looks as if something went wrong.

I think its good that QueryIterCheck warns in those cases - because dangling unclosed iterators shouldn't happen.

@Aklakan
Copy link
Contributor Author

Aklakan commented Dec 7, 2024

One thing to consider is invert the responsibilities have specific management of external resources - for example, add to a specific manager, one per query execution, pass in the context. End of execution clears up. This would also be good for per-execution caches.

Maybe I am misunderstanding, how would that differ from the current design?
Currently:

  1. QueryIter registers itself at the execution context - the resource manager.
  2. Something fails midway, resources are still tracked in the resource manager (although they could have been closed earlier).
  3. End of execution closes all tracked but unclosed resources - but warns because this shouldn't normally happen.

@afs
Copy link
Member

afs commented Dec 7, 2024

The current design does not have a way to track item that must have clear-up applied. The resource manager would a new thing - not based on the close iterator mechanism. It is only things that need clear-up that need to be handled carefully. The GC will deal with the rest.

If there are declared resources, we don't have to put in code for less-common situations on the performance path and we don't have to code in one place (general execution) so connected to code in another place (spacial resources).

@afs
Copy link
Member

afs commented Dec 7, 2024

I don't get warnings. How many times do they need to be run?

Tests that pass-don't-pass are difficult to handle in the overall build. It really is error prone to rely on spotting warnings in a maven build. I've been trying to fix the current ones which (I hope) are spurious cases where the test code is by design hitting a warning case.

@Aklakan
Copy link
Contributor Author

Aklakan commented Dec 7, 2024

I updated the code to your comments and added an OpenIteratorException to QueryIteratorCheck that should now cause the test TestQueryExecutionCancel.test_cancel_concurrent_2 to fail.

The screenshot is created by removing this PR's try-catch block from QueryIterPeek.peek().

image

How many times do they need to be run?

The system I am working on has 6+7 physical and 6+(7*2)=20 virtual cores. The longest streak without warnings was maybe 10 times. Usually I get it roughly every 3 to 5 runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants