-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexMergeJoin get different result when using different charset #54064
Comments
The plan is different, the one return correct result:
The one with wrong result:
Both plan should work, so I tag this as an executor bug rather than planner bug.
The later should be the root cause (the plan itself seems correct). |
I can narrow down the issue to this code snippet: tidb/pkg/executor/join/index_lookup_merge_join.go Lines 455 to 479 in e7d67b1
For utf8 plan (the one with incorrect result), the data read from out join b is
and from inner join a is:
After that sort operation above, the data for table b change from
to
Then the join get the wrong result. |
Sort by y,z,x should be
Why that sort reorder the rows??? |
Listing some permutation of some variants of the original query. All the following queries give the correct result: -- Group A: utf8mb4:
-- Group A1: Any order of join condition conjunction, with filter on column y:
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.x=b.x and a.y=b.y and a.z=b.z where a.y='CN000';
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.x=b.x and a.z=b.z and a.y=b.y where a.y='CN000';
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.y=b.y and a.x=b.x and a.z=b.z where a.y='CN000';
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.y=b.y and a.z=b.z and a.x=b.x where a.y='CN000';
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.z=b.z and a.x=b.x and a.y=b.y where a.y='CN000';
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.z=b.z and a.y=b.y and a.x=b.x where a.y='CN000';
-- Group A2: Any order of join condition conjunction except "y, z, x", no filter:
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.x=b.x and a.y=b.y and a.z=b.z;
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.x=b.x and a.z=b.z and a.y=b.y;
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.y=b.y and a.x=b.x and a.z=b.z;
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.z=b.z and a.x=b.x and a.y=b.y;
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.z=b.z and a.y=b.y and a.x=b.x;
-- Group B: utf8:
-- Group B1: Any order of join condition conjunction except "y, z, x",, with filter on column y:
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.x=b.x and a.y=b.y and a.z=b.z where a.y='CN000';
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.x=b.x and a.z=b.z and a.y=b.y where a.y='CN000';
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.y=b.y and a.x=b.x and a.z=b.z where a.y='CN000';
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.z=b.z and a.x=b.x and a.y=b.y where a.y='CN000';
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.z=b.z and a.y=b.y and a.x=b.x where a.y='CN000';
-- Group B2: Any order of join condition conjunction except "y, z, x", no filter:
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.x=b.x and a.y=b.y and a.z=b.z;
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.x=b.x and a.z=b.z and a.y=b.y;
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.y=b.y and a.x=b.x and a.z=b.z;
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.z=b.z and a.x=b.x and a.y=b.y;
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.z=b.z and a.y=b.y and a.x=b.x; All the following queries give the wrong result: -- Group D: utf8mb4:
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.y=b.y and a.z=b.z and a.x=b.x;
-- Group E: utf8
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.y=b.y and a.z=b.z and a.x=b.x where a.y='CN000';
select /*+ inl_merge_join(a, b) */ a.* from b join a on a.y=b.y and a.z=b.z and a.x=b.x; Analysis followed. Group A1 all have the plan containing join keys only on "x" and "z", because the existence of the filter on "y" is effectively pushed down to each join sides. Group A2 all have the plan containing join keys on all "x", "y" and "z" with the order other than "y, z, x". Group B1 all have the plan containing join keys on all "x", "y" and "z" with the order other than "y, z, x". I guess it is the specified utf8 charset that is preventing eliminating the join key of "y" even a filter on "y" is given. Group B2 is the same as Group A2. Both Group D and Group E have the plan pattern: join keys on all "x", "y" and "z", and in order of "y, z, x". So I'm guessing the join key order in the plan is somehow wrong for Group D and Group E, causing the build side (table b) being incorrectly sorted, e.g., should be sort by "y" then "z" then "x", but actually is "y" then "x" then "z":
Hence eventually the wrong query result. |
tidb version: 01a4573 After debugging, we find that column sort order is wrong in the runtime. Take sql When we reach to index_loopup_merge_join.go:L460,
And the runtime column sort order becomes |
/sig planner |
/remove-sig execution |
after some insight and understanding of the execution code. we found:
|
/found customer |
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
I expect whether using utf8 or utf8mb4, the query get the same result:
3. What did you see instead (Required)
4. What is your TiDB version? (Required)
Both v6.5.3 and master(8.2+) get the wrong result, so this is a new bug.
The text was updated successfully, but these errors were encountered: