Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join context hints #17541

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

adarshsanjeev
Copy link
Contributor

@adarshsanjeev adarshsanjeev commented Dec 6, 2024

Rebased and addressed some issues of #17406

Currently, using the query context, all joins in a query can be hinted to either broadcast or sort merge join. However, this does not allow this to be set at a join level.

Add support for hints to queries, which can dictate the join type.

Syntax

select /*+ sort_merge */ w1.cityName, w2.countryName
from
(
  select /*+ broadcast */ w3.cityName AS cityName, w4.countryName AS countryName from wikipedia w3 LEFT JOIN wikipedia-set2 w4 ON w3.regionName = w4.regionName
) w1
JOIN wikipedia-set1 w2 ON w1.cityName = w2.cityName
where w1.cityName='New York';

Note
Calcite does not expose if the hint is at a particular join level, and allows inheritance of hints from parent nodes. This causes join hints to to affect inner queries. Additional join hints can be added to inner queries as a workaround.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions bot added Area - Batch Ingestion Area - Querying Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Dec 6, 2024
@adarshsanjeev adarshsanjeev marked this pull request as ready for review December 11, 2024 09:11
private static HintStrategyTable createHintStrategies()
{
return HintStrategyTable.builder()
.hintStrategy("broadcast", HintPredicates.JOIN)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it really needed to burn in the "broadcast" literal here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I think this class should be moved in the DruidHints or something class you rename/restructure the JoinHint class to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to the enum name instead.

Comment on lines 66 to 68
if (closestHint == null || hint.inheritPath.size() < closestHint.inheritPath.size()) {
closestHint = hint;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this assumes that all hints are joinhints - which may not be true in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check here


import java.util.Arrays;

public enum JoinHint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to not have this as an enum - and you could probably use as(JoinAlgorithm.class) or asJoinAlgorithm() to obtain the current value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue with that, Calcite only seems to accept certain keywords as hints, and this is sort_merge. Trying to replace the hint name with sortMerge seems to make it not read by calcite.

Comment on lines +53 to +55
@JsonCreator
public static JoinHint fromString(final String id)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be serializable by jackson; if it gets on the wire it should be a JoinAlgorithm already

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might have been to deserializer it for the planner context at some point

@@ -15585,7 +15587,8 @@ public void testOrderByAlongWithInternalScanQueryNoDistinct()
JoinType.INNER,
null,
ExprMacroTable.nil(),
CalciteTests.createJoinableFactoryWrapper()
CalciteTests.createJoinableFactoryWrapper(),
JoinAlgorithm.BROADCAST
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is meaning of this in this context as this engine doesn't support joinalgorithms ; so this change should be absent here or null ?

maybe use JsonInclude.Include.NON_NULL and make the class behave like null means BROADCAST ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Querying
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants