-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding query stack fault to MSQ to capture native query errors. #13926
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a new error code QueryStackError
that is only used in one case: when the query stack throws UnexpectedMultiValueDimensionException
. Did you mean to associate this error code with more kinds of problems?
If we keep this design, also, to me QueryRuntimeError
makes more sense than QueryStackError
. I think end users are more likely to think of the thing that runs queries as a runtime vs a stack. ("Stack of what?")
I earlier made changes in BaseLeafFrameProcessor#runIncrementally method but since that would also throw FrameSpecific exceptions I dropped that idea.
I was thinking we would eventually add more errors and Another approach was to get the exception and check if the exception class name starts with
Sure I can rename |
...ore/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/error/QueryStackFault.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes 🚀 . Agreeing with @gianm's comments on 2 things:
- Renaming the fault class name
- This seems like a one-off case that we handle currently. Can we identify the cases where we defer to the native query engine for our processing, and wrap the errors there in this new Fault. For example, in the
GroupByPreShuffleProcessor
, we create the frame writer and iterate over it. I remember adding a similar fault a while back, and I ran into the same issue so I wrapped a large portion of the code (which might encapsulate the frame errors as well). Is there a way to change the design somehow (maybe in a separate PR), so that we can identify the frame errors v/s rest of the errors easily?
The second point above might not be possible in this PR, so I am okay with the changes, and you can add a comment in the base class or the MSQErrorReport's code to mention the possibility of a better design.
...ore/multi-stage-query/src/main/java/org/apache/druid/msq/indexing/error/QueryStackFault.java
Outdated
Show resolved
Hide resolved
docs/multi-stage-query/reference.md
Outdated
@@ -679,6 +679,7 @@ The following table describes error codes you may encounter in the `multiStageQu | |||
| <a name="error_InsertTimeOutOfBounds">`InsertTimeOutOfBounds`</a> | A REPLACE query generated a timestamp outside the bounds of the TIMESTAMP parameter for your OVERWRITE WHERE clause.<br /> <br />To avoid this error, verify that the you specified is valid. | `interval`: time chunk interval corresponding to the out-of-bounds timestamp | | |||
| <a name="error_InvalidNullByte">`InvalidNullByte`</a> | A string column included a null byte. Null bytes in strings are not permitted. | `column`: The column that included the null byte | | |||
| <a name="error_QueryNotSupported">`QueryNotSupported`</a> | QueryKit could not translate the provided native query to a multi-stage query.<br /> <br />This can happen if the query uses features that aren't supported, like GROUPING SETS. | | | |||
| <a name="error_QueryStackError">`QueryStackError`</a> | MSQ uses the native query engine to run the leaf stages. This error tells MSQ that error is in native query engine.<br /> <br /> Since this is a generic error, the user needs to look at the error message and stack trace to figure out the course of action. If the user is stuck, consider raising a github issue for assistance. | `baseErrorMessage` error message from the native stack. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rename either the description here to mention QueryStack
(which seems incomplete, or rename the fault to QueryStackErrorFault
because all of the faults in the docs are derived from their class names by removing the -Fault
suffix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tests, and the changes LGTM. We should look for a way to reduce the if..else conditions in the report generation in subsequent PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
UnexpectedMultiValueDimensionException
was thrown.org.apache.druid.query
as the package name is thrown as a QueryRuntimeErrorRelease note
Add a new fault "QueryRuntimeError" to MSQ engine to capture native query errors.
Fixed bug in MSQ fault tolerance where worker were being retried if
UnexpectedMultiValueDimensionException
was thrown.Key changed/added classes in this PR
MSQErrorReport
QueryRuntimeError
This PR has: