Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data inconsistency in table (lightning didn't ingest all kv) after tikv is down longer than 810s #55808

Closed
Lily2025 opened this issue Sep 3, 2024 · 3 comments · Fixed by #56345
Assignees
Labels
affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. component/ddl This issue is related to DDL of TiDB. impact/inconsistency incorrect/inconsistency/inconsistent severity/critical type/bug The issue is confirmed as a bug.

Comments

@Lily2025
Copy link

Lily2025 commented Sep 3, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

1、tidb_enable_dist_task='off'
2、run sysbench
3、add index for one table
4、tikv rolling restart
5、admin check

logs:
tidb-0.zip
tidb-1.zip

2. What did you expect to see? (Required)

admin check success

3. What did you see instead (Required)

admin check failed with error

admin check failed (Error 8223 (HY000): data inconsistency in table: sbtest1, index: index_test_1725249594568, handle: 428304, index-values:"" != record-values:"handle: 428304, values: [KindString 72115182252-02679645832-08099037427-76043912395-43440959176-93229371484-33119819645-19100075546-83614225936-74898877917]")
operatorLogs:
[2024-09-02 11:59:54] ###### start adding index
ALTER TABLE sbtest1 ADD INDEX index_test_1725249594568(c)
[2024-09-02 11:59:54] ###### wait for ddl job finish
[2024-09-02 12:18:27] ###### ddl job finished
select job_id, db_name, table_name, job_type, create_time, start_time, end_time, state, query from information_schema.ddl_jobs where query = 'ALTER TABLE sbtest1 ADD INDEX index_test_1725249594568(c)'
jobId: 563, job type: add index /* ingest */, state: synced
add index done, it takes: 18m32.781865519s
[2024-09-02 12:18:27] ###### start admin check
admin check index sbtest1 index_test_1725249594568

4. What is your TiDB version? (Required)

./tidb-server -V
Release Version: v8.4.0-alpha
Edition: Community
Git Commit Hash: 3419bde
Git Branch: heads/refs/tags/v8.4.0-alpha
UTC Build Time: 2024-08-31 11:47:42
GoVersion: go1.21.10
Race Enabled: false
Check Table Before Drop: false
Store: unistore
2024-09-02T11:56:15.822+0800

@Lily2025 Lily2025 added the type/bug The issue is confirmed as a bug. label Sep 3, 2024
@Lily2025
Copy link
Author

Lily2025 commented Sep 3, 2024

/assign tangenta

@jebter jebter added severity/critical component/ddl This issue is related to DDL of TiDB. impact/inconsistency incorrect/inconsistency/inconsistent labels Sep 3, 2024
@lance6716 lance6716 self-assigned this Sep 26, 2024
@lance6716
Copy link
Contributor

the problem is lightning forget to set j.lastRetryableErr before line 636

for retry := 0; retry < maxRetryTimes; retry++ {
resp, err := local.doIngest(ctx, j)
if err == nil && resp.GetError() == nil {
j.convertStageTo(ingested)
return nil
}
if err != nil {
if common.IsContextCanceledError(err) {
return err
}
log.FromContext(ctx).Warn("meet underlying error, will retry ingest",
log.ShortError(err), logutil.SSTMetas(j.writeResult.sstMeta),
logutil.Region(j.region.Region), logutil.Leader(j.region.Leader))
continue

when retry exceeded the limit (30), it will return nil error which is not expected

case job, ok = <-jobFromWorkerCh:
}
if !ok {
retryer.close()
return nil
}
switch job.stage {
case regionScanned, wrote:
job.retryCount++
if job.retryCount > maxWriteAndIngestRetryTimes {
job.done(&jobWg)
return job.lastRetryableErr

@lance6716 lance6716 changed the title data inconsistency in table after tikv rolling restart during add index data inconsistency in table after tikv is down larger than 810s Sep 26, 2024
@lance6716 lance6716 changed the title data inconsistency in table after tikv is down larger than 810s data inconsistency in table after tikv is down longer than 810s Sep 26, 2024
@lance6716
Copy link
Contributor

introduced by #40692 , affected version started from v7.0.0

@lance6716 lance6716 added affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. labels Sep 26, 2024
@lance6716 lance6716 removed may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 labels Sep 26, 2024
@lance6716 lance6716 changed the title data inconsistency in table after tikv is down longer than 810s data inconsistency in table (lightning didn't ingest all kv) after tikv is down longer than 810s Sep 26, 2024
@tangenta tangenta removed their assignment Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. component/ddl This issue is related to DDL of TiDB. impact/inconsistency incorrect/inconsistency/inconsistent severity/critical type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants