Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexMergeJoin get different result when using different charset #54064

Closed
tiancaiamao opened this issue Jun 17, 2024 · 9 comments · Fixed by #54681
Closed

IndexMergeJoin get different result when using different charset #54064

tiancaiamao opened this issue Jun 17, 2024 · 9 comments · Fixed by #54681
Assignees
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.2 impact/wrong-result report/customer Customers have encountered this bug. severity/major sig/planner SIG: Planner type/bug The issue is confirmed as a bug.

Comments

@tiancaiamao
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

create table A
(id int primary key nonclustered auto_increment,
x varchar(32) not null,
y char(5) not null,
z varchar(25) not null,
key idx_sub_tsk(z,x,y)
)

create table B
( y char(5) not null,
z varchar(25) not null,
x varchar(32) not null,
primary key(z, x, y) nonclustered
)

insert into A (y, z, x) values
('CN000', '123', 'RW '),
('CN000', '456', '123');

insert into B values
('CN000', '123', 'RW '),
('CN000', '456', '123');

select /*+ inl_merge_join(a, b) */
a.*
from a join b on a.y=b.y and a.z=b.z and a.x = b.x
where a.y='CN000';

2. What did you expect to see? (Required)

mysql> set names utf8mb4;
Query OK, 0 rows affected (0.00 sec)

mysql> select /*+ inl_merge_join(a, b) */  a.* from b  join a  on a.y=b.y and a.z=b.z and a.x = b.x where a.y = 'cn000';
+----+-----+-------+-----+
| id | x   | y     | z   |
+----+-----+-------+-----+
|  1 | rw  | cn000 | 123 |
|  3 | 123 | cn000 | 456 |
+----+-----+-------+-----+
2 rows in set (0.00 sec)

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select /*+ inl_merge_join(a, b) */  a.* from b  join a  on a.y=b.y and a.z=b.z and a.x = b.x where a.y = 'cn000';
+----+----+-------+-----+
| id | x  | y     | z   |
+----+----+-------+-----+
|  1 | rw | cn000 | 123 |
+----+----+-------+-----+
1 row in set (0.00 sec)

I expect whether using utf8 or utf8mb4, the query get the same result:

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select /*+ inl_merge_join(a, b) */  a.* from b  join a  on a.y=b.y and a.z=b.z and a.x = b.x where a.y = 'cn000';
+----+----+-------+-----+
| id | x  | y     | z   |
+----+----+-------+-----+
|  1 | rw | cn000 | 123 |
|  3 | 123 | cn000 | 456 |
+----+----+-------+-----+
1 row in set (0.00 sec)

3. What did you see instead (Required)

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select /*+ inl_merge_join(a, b) */  a.* from b  join a  on a.y=b.y and a.z=b.z and a.x = b.x where a.y = 'cn000';
+----+----+-------+-----+
| id | x  | y     | z   |
+----+----+-------+-----+
|  1 | rw | cn000 | 123 |
+----+----+-------+-----+
1 row in set (0.00 sec)

4. What is your TiDB version? (Required)

Both v6.5.3 and master(8.2+) get the wrong result, so this is a new bug.

@tiancaiamao tiancaiamao added type/bug The issue is confirmed as a bug. sig/execution SIG execution labels Jun 17, 2024
@tiancaiamao
Copy link
Contributor Author

tiancaiamao commented Jun 17, 2024

The plan is different, the one return correct result:

mysql> set names utf8mb4;
Query OK, 0 rows affected (0.00 sec)

mysql> explain select /*+ inl_merge_join(a, b) */  a.* from b  join a  on a.y=b.y and a.z=b.z and a.x = b.x where a.y = 'cn000';
+------------------------------------+---------+-----------+-------------------------------------+----------------------------------------------------------------------------------------------------------------------+
| id                                 | estRows | task      | access object                       | operator info                                                                                                        |
+------------------------------------+---------+-----------+-------------------------------------+----------------------------------------------------------------------------------------------------------------------+
| IndexMergeJoin_21                  | 0.00    | root      |                                     | inner join, inner:Projection_19, outer key:test.b.z, test.b.x, inner key:test.a.z, test.a.x                          |
| ├─IndexReader_36(Build)            | 0.00    | root      |                                     | index:Selection_35                                                                                                   |
| │ └─Selection_35                   | 0.00    | cop[tikv] |                                     | eq("cn000", test.b.y)                                                                                                |
| │   └─IndexFullScan_34             | 2.00    | cop[tikv] | table:B, index:PRIMARY(z, x, y)     | keep order:false, stats:pseudo                                                                                       |
| └─Projection_19(Probe)             | 0.00    | root      |                                     | test.a.id, test.a.x, test.a.y, test.a.z                                                                              |
|   └─IndexLookUp_18                 | 0.00    | root      |                                     |                                                                                                                      |
|     ├─IndexRangeScan_16(Build)     | 0.00    | cop[tikv] | table:A, index:idx_sub_tsk(z, x, y) | range: decided by [eq(test.a.z, test.b.z) eq(test.a.x, test.b.x) eq(test.a.y, cn000)], keep order:true, stats:pseudo |
|     └─TableRowIDScan_17(Probe)     | 0.00    | cop[tikv] | table:A                             | keep order:false, stats:pseudo                                                                                       |
+------------------------------------+---------+-----------+-------------------------------------+----------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.00 sec)

The one with wrong result:

mysql> explain select /*+ inl_merge_join(a, b) */  a.* from b  join a  on a.y=b.y and a.z=b.z and a.x = b.x where a.y = 'cn000';
+------------------------------------+---------+-----------+-------------------------------------+-------------------------------------------------------------------------------------------------------------------------+
| id                                 | estRows | task      | access object                       | operator info                                                                                                           |
+------------------------------------+---------+-----------+-------------------------------------+-------------------------------------------------------------------------------------------------------------------------+
| IndexMergeJoin_23                  | 0.00    | root      |                                     | inner join, inner:Projection_21, outer key:test.b.y, test.b.z, test.b.x, inner key:test.a.y, test.a.z, test.a.x         |
| ├─IndexReader_40(Build)            | 0.00    | root      |                                     | index:Selection_39                                                                                                      |
| │ └─Selection_39                   | 0.00    | cop[tikv] |                                     | eq(test.b.y, "cn000")                                                                                                   |
| │   └─IndexFullScan_38             | 2.00    | cop[tikv] | table:B, index:PRIMARY(z, x, y)     | keep order:false, stats:pseudo                                                                                          |
| └─Projection_21(Probe)             | 0.00    | root      |                                     | test.a.id, test.a.x, test.a.y, test.a.z                                                                                 |
|   └─IndexLookUp_20                 | 0.00    | root      |                                     |                                                                                                                         |
|     ├─Selection_19(Build)          | 0.00    | cop[tikv] |                                     | eq(test.a.y, "cn000")                                                                                                   |
|     │ └─IndexRangeScan_17          | 0.00    | cop[tikv] | table:A, index:idx_sub_tsk(z, x, y) | range: decided by [eq(test.a.z, test.b.z) eq(test.a.x, test.b.x) eq(test.a.y, test.b.y)], keep order:true, stats:pseudo |
|     └─TableRowIDScan_18(Probe)     | 0.00    | cop[tikv] | table:A                             | keep order:false, stats:pseudo                                                                                          |
+------------------------------------+---------+-----------+-------------------------------------+-------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.00 sec)

Both plan should work, so I tag this as an executor bug rather than planner bug.
Two differences between the plans:

  • utf8 there is a push down cop Selection_19(Build) | 0.00 | cop[tikv] | | eq(test.a.y, "cn000")
  • the merge join condition for utf8mb4 is z,x while for utf8 is y,z,x

The later should be the root cause (the plan itself seems correct).

@tiancaiamao
Copy link
Contributor Author

tiancaiamao commented Jun 17, 2024

I can narrow down the issue to this code snippet:

exprCtx := imw.ctx.GetExprCtx()
slices.SortFunc(task.outerOrderIdx, func(idxI, idxJ chunk.RowPtr) int {
rowI, rowJ := task.outerResult.GetRow(idxI), task.outerResult.GetRow(idxJ)
var c int64
var err error
for _, keyOff := range imw.KeyOff2KeyOffOrderByIdx {
joinKey := imw.outerMergeCtx.JoinKeys[keyOff]
c, _, err = imw.outerMergeCtx.CompareFuncs[keyOff](exprCtx.GetEvalCtx(), joinKey, joinKey, rowI, rowJ)
terror.Log(err)
if c != 0 {
break
}
}
if c != 0 || imw.nextColCompareFilters == nil {
if imw.Desc {
return int(-c)
}
return int(c)
}
c = int64(imw.nextColCompareFilters.CompareRow(rowI, rowJ))
if imw.Desc {
return int(-c)
}
return int(c)
})

For utf8 plan (the one with incorrect result), the data read from out join b is

mysql> select * from b;
+-------+-----+-----+
| y     | z   | x   |
+-------+-----+-----+
| cn000 | 123 | rw  |
| cn000 | 456 | 123 |
+-------+-----+-----+
2 rows in set (0.00 sec)

and from inner join a is:

mysql> select * from a;
+----+-----+-------+-----+
| id | x   | y     | z   |
+----+-----+-------+-----+
|  1 | rw  | cn000 | 123 |
|  3 | 123 | cn000 | 456 |
+----+-----+-------+-----+
2 rows in set (0.00 sec)

After that sort operation above, the data for table b change from

| cn000 | 123 | rw  |
| cn000 | 456 | 123 |

to

| cn000 | 456 | 123 |
| cn000 | 123 | rw  |

Then the join get the wrong result.

@tiancaiamao
Copy link
Contributor Author

Sort by y,z,x should be

mysql>  select * from b order by y,z,x;
+-------+-----+-----+
| y     | z   | x   |
+-------+-----+-----+
| cn000 | 123 | rw  |
| cn000 | 456 | 123 |
+-------+-----+-----+
2 rows in set (0.00 sec)

Why that sort reorder the rows???

@zanmato1984
Copy link
Contributor

Listing some permutation of some variants of the original query.

All the following queries give the correct result:

-- Group A: utf8mb4:

-- Group A1: Any order of join condition conjunction, with filter on column y:
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.x=b.x and a.y=b.y and a.z=b.z where a.y='CN000';
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.x=b.x and a.z=b.z and a.y=b.y where a.y='CN000';
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.y=b.y and a.x=b.x and a.z=b.z where a.y='CN000';
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.y=b.y and a.z=b.z and a.x=b.x where a.y='CN000';
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.z=b.z and a.x=b.x and a.y=b.y where a.y='CN000';
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.z=b.z and a.y=b.y and a.x=b.x where a.y='CN000';

-- Group A2: Any order of join condition conjunction except "y, z, x", no filter:
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.x=b.x and a.y=b.y and a.z=b.z;
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.x=b.x and a.z=b.z and a.y=b.y;
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.y=b.y and a.x=b.x and a.z=b.z;
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.z=b.z and a.x=b.x and a.y=b.y;
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.z=b.z and a.y=b.y and a.x=b.x;

-- Group B: utf8:

-- Group B1: Any order of join condition conjunction except "y, z, x",, with filter on column y:
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.x=b.x and a.y=b.y and a.z=b.z where a.y='CN000';
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.x=b.x and a.z=b.z and a.y=b.y where a.y='CN000';
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.y=b.y and a.x=b.x and a.z=b.z where a.y='CN000';
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.z=b.z and a.x=b.x and a.y=b.y where a.y='CN000';
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.z=b.z and a.y=b.y and a.x=b.x where a.y='CN000';

-- Group B2: Any order of join condition conjunction except "y, z, x", no filter:
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.x=b.x and a.y=b.y and a.z=b.z;
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.x=b.x and a.z=b.z and a.y=b.y;
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.y=b.y and a.x=b.x and a.z=b.z;
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.z=b.z and a.x=b.x and a.y=b.y;
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.z=b.z and a.y=b.y and a.x=b.x;

All the following queries give the wrong result:

-- Group D: utf8mb4:
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.y=b.y and a.z=b.z and a.x=b.x;

-- Group E: utf8
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.y=b.y and a.z=b.z and a.x=b.x where a.y='CN000';
select /*+ inl_merge_join(a, b) */  a.* from b join a on a.y=b.y and a.z=b.z and a.x=b.x;

Analysis followed.

Group A1 all have the plan containing join keys only on "x" and "z", because the existence of the filter on "y" is effectively pushed down to each join sides.

Group A2 all have the plan containing join keys on all "x", "y" and "z" with the order other than "y, z, x".

Group B1 all have the plan containing join keys on all "x", "y" and "z" with the order other than "y, z, x". I guess it is the specified utf8 charset that is preventing eliminating the join key of "y" even a filter on "y" is given.

Group B2 is the same as Group A2.

Both Group D and Group E have the plan pattern: join keys on all "x", "y" and "z", and in order of "y, z, x".

So I'm guessing the join key order in the plan is somehow wrong for Group D and Group E, causing the build side (table b) being incorrectly sorted, e.g., should be sort by "y" then "z" then "x", but actually is "y" then "x" then "z":

After that sort operation above, the data for table b change from

| cn000 | 123 | rw  |
| cn000 | 456 | 123 |

to

| cn000 | 456 | 123 |
| cn000 | 123 | rw  |

Then the join get the wrong result.

Hence eventually the wrong query result.

@xzhangxian1008
Copy link
Contributor

tidb version: 01a4573

After debugging, we find that column sort order is wrong in the runtime. Take sql select /*+ inl_merge_join(a, b) */ a.* from b join a on a.y=b.y and a.z=b.z and a.x=b.x; for example.

When we reach to index_loopup_merge_join.go:L460, imw.KeyOff2KeyOffOrderByIdx contains following values:

[0]: 2
[1]: 0
[2]: 1

imw.outerMergeCtx.JoinKeys contains following values:

[0]: OriginName: "test.a.y"
[1]: OriginName: "test.a.z"
[2]: OriginName: "test.a.x"

And the runtime column sort order becomes x, y, z. It's not the expected behaviour.

@xzhangxian1008
Copy link
Contributor

/sig planner

@ti-chi-bot ti-chi-bot bot added the sig/planner SIG: Planner label Jun 18, 2024
@xzhangxian1008
Copy link
Contributor

/remove-sig execution

@ti-chi-bot ti-chi-bot bot removed the sig/execution SIG execution label Jun 18, 2024
@AilinKid
Copy link
Contributor

tidb> set names utf8;
Query OK, 0 rows affected (0.00 sec)

tidb> select /*+ inl_merge_join(a, b) */ a.* from a join b on a.y=b.y and a.z=b.z and a.x = b.x where a.y='CN000';
+----+-----+-------+-----+------+
| id | x   | y     | z   | k    |
+----+-----+-------+-----+------+
|  1 | RW  | CN000 | 123 | NULL |
|  2 | 123 | CN000 | 456 | NULL |
+----+-----+-------+-----+------+
2 rows in set (0.01 sec)

tidb> set names utf8mb4;
Query OK, 0 rows affected (0.00 sec)

tidb> select /*+ inl_merge_join(a, b) */ a.* from a join b on a.y=b.y and a.z=b.z and a.x = b.x where a.y='CN000';
+----+-----+-------+-----+------+
| id | x   | y     | z   | k    |
+----+-----+-------+-----+------+
|  1 | RW  | CN000 | 123 | NULL |
|  2 | 123 | CN000 | 456 | NULL |
+----+-----+-------+-----+------+
2 rows in set (0.01 sec)

after some insight and understanding of the execution code. we found:

  • indexMergeLookupJoin will pass the upper's sort requirements to its outer child.
  • indexMergeLookupJoin will respect the inner side index's order while needing an inner order sort about its outer keys to make it convenient for building consecutive ranges of the inner side, otherwise, it will be much more dispersed.
  • if the parent requires a sort on {a,b,c}, inner index with {c,b,a}, so we need a sort on outer keys to make range build effectively.
  • indexMergeLookupJoin will be seen as prop-satisfied, only when the prop is empty or the prop is a subset of the inner index order.

@AilinKid AilinKid removed affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. labels Jun 27, 2024
@seiya-annie
Copy link

/found customer

@ti-chi-bot ti-chi-bot bot added the report/customer Customers have encountered this bug. label Jul 5, 2024
@ti-chi-bot ti-chi-bot added affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. and removed may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 may-affects-8.1 labels Jul 5, 2024
ti-chi-bot bot pushed a commit that referenced this issue Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.2 impact/wrong-result report/customer Customers have encountered this bug. severity/major sig/planner SIG: Planner type/bug The issue is confirmed as a bug.
Projects
None yet
8 participants