Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning: auto increment value became much larger than expected after physical import with parallel import enabled #56814

Closed
kabileshKabi opened this issue Oct 24, 2024 · 26 comments · Fixed by #57398
Assignees
Labels
affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. component/ddl This issue is related to DDL of TiDB. component/lightning This issue is related to Lightning of TiDB. severity/moderate type/bug The issue is confirmed as a bug.

Comments

@kabileshKabi
Copy link

Bug Report

We made a cut-over to TiDB

The table is Messages table with Auto increment on "ID"
to maintain the ID sequence the table was created with "auto_id_cache" as below

**) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci AUTO_INCREMENT=730234878 /T![auto_id_cache] AUTO_ID_CACHE=1 /

In the MySQL last sequence was "14609942" but in TiDB ID was incremented by a vast number "730170668" pl find as below

mysql> select id,updatedAt from messages where id >=14609942 limit 10;
+-----------+------------+
| id | updatedAt |
+-----------+------------+
| 14609942 | 1729291247 | ----> MySQL
| 730170668 | 1729313914 | -----> TIDB
| 730170669 | 1729313914 |
| 730170670 | 1729313914 |
| 730170671 | 1729313914 |
| 730170672 | 1729313914 |
| 730170673 | 1729313914 |
| 730170674 | 1729313914 |
| 730170675 | 1729313914 |
| 730170676 | 1729313914 |
+-----------+------------+

the TIDB has added or incremented the value on its own, This is a critical bug making TiDB unusable in production

1. Minimal reproduce step (Required)

1,Create the table structure same as MySQL also have auto_id_cache=1 to maintain compatibility.
2, Migrate the data to TiDB using DM and setup replication. till this point ID would be matching
3, Stop or Pause the DM
4, Point the application directly to use TiDB, you can see a huge jump in auto increment value.

2. What did you expect to see? (Required)

we expected the same sequence to be continue with TiDB, making it compatible

3. What did you see instead (Required)

we see a complete new ID sequence has been introduced, breaking the application

4. What is your TiDB version? (Required)

Release Version: v8.1.1
Edition: Community
Git Commit Hash: a7df4f9
Git Branch: HEAD
UTC Build Time: 2024-08-22 05:49:03
GoVersion: go1.21.13
Race Enabled: false
Check Table Before Drop: false
Store: tikv

@kabileshKabi kabileshKabi added the type/bug The issue is confirmed as a bug. label Oct 24, 2024
@kabileshKabi kabileshKabi changed the title TiDB automatically makes the AUTO_INCREMENT value breaking the application TiDB automatically makes the AUTO_INCREMENT value higher breaking mysql compatibility Oct 24, 2024
@dveeden
Copy link
Contributor

dveeden commented Oct 24, 2024

Could you add the output of SHOW CREATE TABLE ... for

  • MySQL
  • TiDB before starting DM
  • TiDB after starting DM

What does SELECT MAX(id) FROM ... return?

What was the task-mode in DM set to for this? Did you specify a import-mode ?

@kabileshKabi
Copy link
Author

kabileshKabi commented Oct 24, 2024

Table Structure with MySQL

CREATE TABLE `messages` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `muid` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `conversationId` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `sender` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `receiverType` enum('group','user') CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'user',
  `receiver` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `category` enum('message','action','call','event','custom','interactive') CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `type` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'text',
  `data` json NOT NULL,
  `senderOnly` tinyint(1) NOT NULL DEFAULT '0',
  `sentAt` int NOT NULL,
  `updatedAt` int DEFAULT NULL,
  `editedBy` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `editedAt` int DEFAULT NULL,
  `deletedBy` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `deletedAt` int DEFAULT NULL,
  `deliveredAt` int DEFAULT NULL,
  `readAt` int DEFAULT NULL,
  `sentAtFPid` decimal(30,20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `messages_sender_index` (`sender`),
  KEY `messages_receiver_index` (`receiver`),
  KEY `messages_receivertype_index` (`receiverType`),
  KEY `messages_sentat_index` (`sentAt`),
  KEY `messages_deliveredat_index` (`deliveredAt`),
  KEY `messages_readat_index` (`readAt`),
  KEY `messages_editedat_index` (`editedAt`),
  KEY `messages_deletedat_index` (`deletedAt`),
  KEY `messages_updatedat_index` (`updatedAt`),
  KEY `msg_category_type` (`category`,`type`),
  KEY `msg_type` (`type`),
  KEY `msg_convoid_sender_senderonly` (`conversationId`,`sender`,`senderOnly`),
  KEY `msg_convoid_receivertype_receiver` (`conversationId`,`receiverType`,`receiver`),
  KEY `msg_convoid_category_type` (`conversationId`,`category`,`type`),
  KEY `msg_convoid_type` (`conversationId`,`type`),
  KEY `msg_convoid_sentat_id` (`conversationId`,`sentAt`,`id`),
  KEY `convoid_sender_deliveredat` (`conversationId`,`sender`,`deliveredAt`),
  KEY `convoid_sender_sentat` (`conversationId`,`sender`,`sentAt`),
  KEY `msg_sentatfpid` (`sentAtFPid`),
  KEY `msg_muid` (`muid`),
  KEY `convoid_sentatfpid_index` (`conversationId`,`sentAtFPid` DESC)
) ENGINE=InnoDB  AUTO_INCREMENT=14430410 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Table structure created in TIDB with Auto_ID_CACHE before starting DM,

CREATE TABLE `messages` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `muid` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `conversationId` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `sender` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `receiverType` enum('group','user') CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'user',
  `receiver` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `category` enum('message','action','call','event','custom','interactive') CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `type` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'text',
  `data` json NOT NULL,
  `senderOnly` tinyint(1) NOT NULL DEFAULT '0',
  `sentAt` int NOT NULL,
  `updatedAt` int DEFAULT NULL,
  `editedBy` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `editedAt` int DEFAULT NULL,
  `deletedBy` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `deletedAt` int DEFAULT NULL,
  `deliveredAt` int DEFAULT NULL,
  `readAt` int DEFAULT NULL,
  `sentAtFPid` decimal(30,20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `messages_sender_index` (`sender`),
  KEY `messages_receiver_index` (`receiver`),
  KEY `messages_receivertype_index` (`receiverType`),
  KEY `messages_sentat_index` (`sentAt`),
  KEY `messages_deliveredat_index` (`deliveredAt`),
  KEY `messages_readat_index` (`readAt`),
  KEY `messages_editedat_index` (`editedAt`),
  KEY `messages_deletedat_index` (`deletedAt`),
  KEY `messages_updatedat_index` (`updatedAt`),
  KEY `msg_category_type` (`category`,`type`),
  KEY `msg_type` (`type`),
  KEY `msg_convoid_sender_senderonly` (`conversationId`,`sender`,`senderOnly`),
  KEY `msg_convoid_receivertype_receiver` (`conversationId`,`receiverType`,`receiver`),
  KEY `msg_convoid_category_type` (`conversationId`,`category`,`type`),
  KEY `msg_convoid_type` (`conversationId`,`type`),
  KEY `msg_convoid_sentat_id` (`conversationId`,`sentAt`,`id`),
  KEY `convoid_sender_deliveredat` (`conversationId`,`sender`,`deliveredAt`),
  KEY `convoid_sender_sentat` (`conversationId`,`sender`,`sentAt`),
  KEY `msg_sentatfpid` (`sentAtFPid`),
  KEY `msg_muid` (`muid`),
  KEY `convoid_sentatfpid_index` (`conversationId`,`sentAtFPid` DESC)
) ENGINE=InnoDB  AUTO_INCREMENT=14430410 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci AUTO_ID_CACHE=1;

DM was started with "full" import mode with replication, we used physical import.

Auto-increment was fine in TiDB as well since Binlog format was FULL and replication happening from mysql

During the day of Cut-over we paused the DM task

last ID synced from mysql

+-----------+------------+
| id | updatedAt |
+-----------+------------+
| 14609942 | 1729291247 | ----> MySQL

When Application was directly pointed to TiDB from application it generated id starting with "730170668"

+-----------+------------+
| id | updatedAt |
+-----------+------------+
| 730170668 | 1729313914 | -----> TIDB

@dveeden
Copy link
Contributor

dveeden commented Oct 24, 2024

@dveeden
Copy link
Contributor

dveeden commented Oct 24, 2024

I tried to reproduce this, but failed to do so:

MySQL 8.0.40 with:

CREATE TABLE t1 (id int AUTO_INCREMENT PRIMARY KEY) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;
INSERT INTO t1 values (100);
INSERT INTO t1() values ();

TiDB v8.1.1 with:

CREATE TABLE t1 (id int AUTO_INCREMENT PRIMARY KEY) AUTO_ID_CACHE=1 CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;

DM task with task-mode: full

Result after stopping the DM task:

mysql-8.0.11-TiDB-v8.1.1> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci AUTO_INCREMENT=102 /*T![auto_id_cache] AUTO_ID_CACHE=1 */
1 row in set (0.00 sec)

mysql-8.0.11-TiDB-v8.1.1> TABLE t1;
+-----+
| id  |
+-----+
| 100 |
| 101 |
+-----+
2 rows in set (0.00 sec)

mysql-8.0.11-TiDB-v8.1.1> INSERT INTO t1() VALUES ();
Query OK, 1 row affected (0.01 sec)

mysql-8.0.11-TiDB-v8.1.1> TABLE t1;
+-----+
| id  |
+-----+
| 100 |
| 101 |
| 102 |
+-----+
3 rows in set (0.00 sec)

mysql-8.0.11-TiDB-v8.1.1> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci AUTO_INCREMENT=103 /*T![auto_id_cache] AUTO_ID_CACHE=1 */
1 row in set (0.00 sec)

@kabileshKabi
Copy link
Author

Hope you tried with the physical import also after making the replication pls insert some via replication and then pause the DM task and make a direct write to TiDB

@jebter jebter added the component/ddl This issue is related to DDL of TiDB. label Oct 25, 2024
@bb7133
Copy link
Member

bb7133 commented Oct 26, 2024

Hi @kabileshKabi , thanks for reporting this issue.

According to the reproduction steps and other comments, you mentioned both 'DM' and 'Physical Import'(which is a concept of Lightning / IMPORT statement), could you please provide more details about how they're used in each step, so we're able to have an analysis?

Also, please notice that:

  1. There's a known bug related to the physical import & auto_increment, which is not fixed in v8.1.1 yet(Incorrect AUTO_INCREMENT Value After Data Import Causing Duplicate Key Error(when AUTO_ID_CACHE=1) #56476).
  2. As a workaround for your migration, you can use ALTER TABLE ... FORCE AUTO_INCREMENT=... to rebase the auto_increment value to a specific value(14609942 for your case).
  3. As a distributed system, we do not guarantee AUTO_INCREMENT to be 100% compatible with MySQL in all cases, even when the AUTO_ID_CACHE is set to 1(see the MySQL compatibility mode part of the documentation).

@siddontang
Copy link
Member

Hello, @kabileshKabi

From your comment above, I found

**) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci AUTO_INCREMENT=730234878 /T![auto_id_cache] AUTO_ID_CACHE=1 /

It seems that you use don't use the same AUTO_INCREMENT between MySQL and TiDB, for TiDB, it may be AUTO_INCREMENT=730234878. So if you insert data into TiDB directly, the ID may be from 730234878, not from the value set in MySQL.

I tried to reproduce:

Step 1

For MySQL

CREATE TABLE `messages` (   `id` bigint unsigned NOT NULL AUTO_INCREMENT,   `updatedAt` int DEFAULT NULL,   PRIMARY KEY (`id`) ) ENGINE=InnoDB  AUTO_INCREMENT=14430410 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

For TiDB

CREATE TABLE `messages` (   `id` bigint unsigned NOT NULL AUTO_INCREMENT,   `updatedAt` int DEFAULT NULL,   PRIMARY KEY (`id`) ) ENGINE=InnoDB  AUTO_INCREMENT=74430410 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Step 2

For MySQL

insert into message (updatedAt) values (now());

then start the DM replication

For TiDB

select * from messages;
+----------+------------+
| id       | updatedAt  |
+----------+------------+
| 14430410 | 1730085690 |
+----------+------------+

Step 3

stop the DM replication

For TiDB

insert into message (updatedAt) values (now());

Then select in TiDB

select * from messages;
+----------+------------+
| id       | updatedAt  |
+----------+------------+
| 14430410 | 1730085690 |
| 74460410 | 1730086179 |
+----------+------------+

you can see different id here. I guess this meets the expectation? please correct me if I am wrong @bb7133 @dveeden

@bb7133
Copy link
Member

bb7133 commented Oct 28, 2024

@siddontang

CREATE TABLE `messages` (   `id` bigint unsigned NOT NULL AUTO_INCREMENT,   `updatedAt` int DEFAULT NULL,   PRIMARY KEY (`id`) ) ENGINE=InnoDB  AUTO_INCREMENT=74430410 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

If this is the original create table syntax, it's expected that the next ID allocated starts from 74430410.

But I'm not sure if this syntax is generated manually or by DM.

@kabileshKabi
Copy link
Author

@siddontang your previous comment is the issue i exactly face, when directly Inserted in TiDB the Autoincrement ID gets leaped

@kabileshKabi
Copy link
Author

Below Are the exact steps followed:
1, Dump the structure from MySQL
2, Import the structure file into TiDB by adding AUTO_ID_CACHE
3, Start the DM with Physical import and FULL mode
4, replication started between MySQL and TiDB

Till this point ID is matching between MySQL and TiDB

On the day of cut-over we paused the DM task replication and pointed the writes directly to TiDB --> This is when we see the jump in Auti-increment ID.

@siddontang @bb7133 @dveeden

@bb7133
Copy link
Member

bb7133 commented Oct 28, 2024

3, Start the DM with Physical import and FULL mode

@kabileshKabi About 'Physical Import' of DM, is it some argument or configuration of DM? I didn't find such a thing in DM document.

@kabileshKabi
Copy link
Author

These are the configs exactly used in DM @bb7133

task-mode: all
import-mode: "physical"

@bb7133
Copy link
Member

bb7133 commented Oct 28, 2024

@kabileshKabi Thanks! I confirmed that DM with physical import actually used 'Lightning import' code, so probably this is caused by #56476.

We'll try to confirm and reproduce it.

BTW, the version of DM you're using is also v8.1.1, right?

@kabileshKabi
Copy link
Author

yes its version 8.1.1 @bb7133

@kabileshKabi
Copy link
Author

@bb7133 i have read thru the bug #56476 here the auto-increment is not getting updated as n+1 post migration data with DM using physical loading causing duplicate error while pointing application or writes.

In my case the ID is getting jumped to a higher valuer on its own and goes as n+1 from there, hope the issue is clear now

@D3Hunter
Copy link
Contributor

D3Hunter commented Oct 28, 2024

DM physical mode integrates lightning, in this case lightning will set the auto_incr_id allocator after physical import, but from above info your table is using clustered index, the id shouldn't jump, I will try to reproduce it locally.

@kabileshKabi also can you paste the result of show create table messages on TiDB so I can confirm that the table is using clustered index, previous create-table SQL doesn't contains this info.

@D3Hunter
Copy link
Contributor

I try import a similar table with clustered index and AUTO_ID_CACHE=1 using lightning(as DM only uses calls lightning, so it should make no difference), data source file is about 150M and have 330000 rows, after import no ID jump

CREATE TABLE `t` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` varchar(100) DEFAULT NULL,
  `b` varchar(100) DEFAULT NULL,
  `c` bigint(20) DEFAULT NULL,
  `d` varchar(1024) DEFAULT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=330001 /*T![auto_id_cache] AUTO_ID_CACHE=1 */

// after import

mysql> select max(id) from t;
+---------+
| max(id) |
+---------+
|  330000 |
+---------+
1 row in set (0.00 sec)

mysql> insert into t(id) values(default);
Query OK, 1 row affected (0.01 sec)

mysql> select max(id) from t;
+---------+
| max(id) |
+---------+
|  330001 |
+---------+
1 row in set (0.00 sec)

@kabileshKabi
Copy link
Author

this is the current structure in TiDB

CREATE TABLE `messages` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `muid` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `conversationId` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `sender` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL, `receiverType` enum('group','user') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'user', `receiver` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL, `category` enum('message','action','call','event','custom','interactive') COLLATE utf8mb4_unicode_ci DEFAULT NULL, `type` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'text', `data` json NOT NULL, `senderOnly` tinyint(1) NOT NULL DEFAULT '0', `sentAt` int(11) NOT NULL, `updatedAt` int(11) DEFAULT NULL, `editedBy` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `editedAt` int(11) DEFAULT NULL, `deletedBy` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL, `deletedAt` int(11) DEFAULT NULL, `deliveredAt` int(11) DEFAULT NULL, `readAt` int(11) DEFAULT NULL, `sentAtFPid` decimal(30,20) DEFAULT NULL, PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */, KEY `messages_sender_index` (`sender`), KEY `messages_receiver_index` (`receiver`), KEY `messages_receivertype_index` (`receiverType`), KEY `messages_sentat_index` (`sentAt`), KEY `messages_deliveredat_index` (`deliveredAt`), KEY `messages_readat_index` (`readAt`), KEY `messages_editedat_index` (`editedAt`), KEY `messages_deletedat_index` (`deletedAt`), KEY `messages_updatedat_index` (`updatedAt`), KEY `msg_category_type` (`category`,`type`), KEY `msg_type` (`type`), KEY `msg_convoid_sender_senderonly` (`conversationId`,`sender`,`senderOnly`), KEY `msg_convoid_receivertype_receiver` (`conversationId`,`receiverType`,`receiver`), KEY `msg_convoid_category_type` (`conversationId`,`category`,`type`), KEY `msg_convoid_type` (`conversationId`,`type`), KEY `msg_convoid_sentat_id` (`conversationId`,`sentAt`,`id`), KEY `convoid_sender_deliveredat` (`conversationId`,`sender`,`deliveredAt`), KEY `convoid_sender_sentat` (`conversationId`,`sender`,`sentAt`), KEY `msg_sentatfpid` (`sentAtFPid`), KEY `msg_muid` (`muid`), KEY `convoid_sentatfpid_index` (`conversationId`,`sentAtFPid`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci AUTO_INCREMENT=730390411 /*T![auto_id_cache] AUTO_ID_CACHE=1 */

@kabileshKabi
Copy link
Author

I try import a similar table with clustered index and AUTO_ID_CACHE=1 using lightning(as DM only uses calls lightning, so it should make no difference), data source file is about 150M and have 330000 rows, after import no ID jump

CREATE TABLE `t` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` varchar(100) DEFAULT NULL,
  `b` varchar(100) DEFAULT NULL,
  `c` bigint(20) DEFAULT NULL,
  `d` varchar(1024) DEFAULT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin AUTO_INCREMENT=330001 /*T![auto_id_cache] AUTO_ID_CACHE=1 */

// after import

mysql> select max(id) from t;
+---------+
| max(id) |
+---------+
|  330000 |
+---------+
1 row in set (0.00 sec)

mysql> insert into t(id) values(default);
Query OK, 1 row affected (0.01 sec)

mysql> select max(id) from t;
+---------+
| max(id) |
+---------+
|  330001 |
+---------+
1 row in set (0.00 sec)

Have you used the physical-import can you pls confirm ?

@kabileshKabi
Copy link
Author

kabileshKabi commented Oct 28, 2024

Hello, @kabileshKabi

From your comment above, I found

**) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci AUTO_INCREMENT=730234878 /T![auto_id_cache] AUTO_ID_CACHE=1 /

It seems that you use don't use the same AUTO_INCREMENT between MySQL and TiDB, for TiDB, it may be AUTO_INCREMENT=730234878. So if you insert data into TiDB directly, the ID may be from 730234878, not from the value set in MySQL.

I tried to reproduce:

Step 1

For MySQL

CREATE TABLE `messages` (   `id` bigint unsigned NOT NULL AUTO_INCREMENT,   `updatedAt` int DEFAULT NULL,   PRIMARY KEY (`id`) ) ENGINE=InnoDB  AUTO_INCREMENT=14430410 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

For TiDB

CREATE TABLE `messages` (   `id` bigint unsigned NOT NULL AUTO_INCREMENT,   `updatedAt` int DEFAULT NULL,   PRIMARY KEY (`id`) ) ENGINE=InnoDB  AUTO_INCREMENT=74430410 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Step 2

For MySQL

insert into message (updatedAt) values (now());

then start the DM replication

For TiDB

select * from messages;
+----------+------------+
| id       | updatedAt  |
+----------+------------+
| 14430410 | 1730085690 |
+----------+------------+

Step 3

stop the DM replication

For TiDB

insert into message (updatedAt) values (now());

Then select in TiDB

select * from messages;
+----------+------------+
| id       | updatedAt  |
+----------+------------+
| 14430410 | 1730085690 |
| 74460410 | 1730086179 |
+----------+------------+

you can see different id here. I guess this meets the expectation? please correct me if I am wrong @bb7133 @dveeden

yes @siddontang you are correct
@D3Hunter FYI

@D3Hunter
Copy link
Contributor

D3Hunter commented Oct 28, 2024

Have you used the physical-import can you pls confirm ?

yes, physical mode of lightning, I will try DM physical mode too, although they uses same code actually

@kabileshKabi
If i understand correctly, comment of @siddontang says the table is created by yourself with AUTO_INCREMENT=730234878, so when insert later by the application the ID will start with 730234878, it's not a jump, it's what you do explicitly.

**) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci AUTO_INCREMENT=730234878 /T![auto_id_cache] AUTO_ID_CACHE=1 /

@kabileshKabi
Copy link
Author

Have you used the physical-import can you pls confirm ?

yes, physical mode of lightning, I will try DM physical mode too, although they uses same code actually

@kabileshKabi If i understand correctly, comment of @siddontang says the table is created by yourself with AUTO_INCREMENT=730234878, so when insert later by the application the ID will start with 730234878, it's not a jump, it's what you do explicitly.

**) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci AUTO_INCREMENT=730234878 /T![auto_id_cache] AUTO_ID_CACHE=1 /

This was given to depict the current auto-increment value post cut-over with TiDB and also to show that auto-id-cache has been enabled
below is the ID with MySQL and with TIDB post cut over @D3Hunter

| id | updatedAt |
+-----------+------------+
| 14609942 | 1729291247 | ----> MySQL
| 730170668 | 1729313914 | -----> TIDB```

@D3Hunter
Copy link
Contributor

D3Hunter commented Oct 28, 2024

reproduced locally. the cause is DM enabled parallel-import to support the case of Shard Merge where there will be multiple workers importing to same target table, and in that case we have a meta_manager to coordinate all parallel instances and it will rebase ID too(this part have a bug, it shouldn't rebase at that time), and the ID is calculated from the dumped data source files, so it's quite large. I will try fix it later.

we see a complete new ID sequence has been introduced, breaking the application

the ID is still that sequence, but have a jump. how is this affects your application?

@D3Hunter
Copy link
Contributor

D3Hunter commented Oct 28, 2024

BTW: for performance reasons when using AUTO_ID_CACHE=1, we still cache IDs in the id-allocator owner node, so if the owner crashes, all cached ID is gone, or if the DML transaction failed, all used IDs are not reused, so there is still a bump in the ID range, but the bump is quite small in most times

@kabileshKabi
Copy link
Author

@D3Hunter application is completely dependent on the auto-generated ID from DB, since we had seen a complete it had caused lot of data mismatch as a sequence is lost.

Is there any work around to stop this behaviour of DM, also if there any ETA for this fix?

@D3Hunter
Copy link
Contributor

a workaround is to use DM with logical import

or you can dump the data using dumpling and import using lightning with parallel-import = false, then start a DM incremental task

the fix PR should be filed within this week, but we have a releasing schedule for bug fix versions, if you are urgent you can apply for a hotfix through supporters.

@D3Hunter D3Hunter added the component/lightning This issue is related to Lightning of TiDB. label Nov 14, 2024
@D3Hunter D3Hunter added affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. and removed affects-6.5 This bug affects the 6.5.x(LTS) versions. labels Nov 14, 2024
@D3Hunter D3Hunter changed the title TiDB automatically makes the AUTO_INCREMENT value higher breaking mysql compatibility lightning: auto increment value became much larger than expected after import with parallel import enabled Nov 15, 2024
@D3Hunter D3Hunter changed the title lightning: auto increment value became much larger than expected after import with parallel import enabled lightning: auto increment value became much larger than expected after physical import with parallel import enabled Nov 15, 2024
@ti-chi-bot ti-chi-bot bot closed this as completed in ecca340 Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-8.5 This bug affects the 8.5.x(LTS) versions. component/ddl This issue is related to DDL of TiDB. component/lightning This issue is related to Lightning of TiDB. severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants