data inconsistency in table (lightning didn't ingest all kv) after tikv is down longer than 810s #55808

Lily2025 · 2024-09-03T02:32:08Z

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

1、tidb_enable_dist_task='off'
2、run sysbench
3、add index for one table
4、tikv rolling restart
5、admin check

logs:
tidb-0.zip
tidb-1.zip

2. What did you expect to see? (Required)

admin check success

3. What did you see instead (Required)

admin check failed with error

admin check failed (Error 8223 (HY000): data inconsistency in table: sbtest1, index: index_test_1725249594568, handle: 428304, index-values:"" != record-values:"handle: 428304, values: [KindString 72115182252-02679645832-08099037427-76043912395-43440959176-93229371484-33119819645-19100075546-83614225936-74898877917]")
operatorLogs:
[2024-09-02 11:59:54] ###### start adding index
ALTER TABLE sbtest1 ADD INDEX index_test_1725249594568(c)
[2024-09-02 11:59:54] ###### wait for ddl job finish
[2024-09-02 12:18:27] ###### ddl job finished
select job_id, db_name, table_name, job_type, create_time, start_time, end_time, state, query from information_schema.ddl_jobs where query = 'ALTER TABLE sbtest1 ADD INDEX index_test_1725249594568(c)'
jobId: 563, job type: add index /* ingest */, state: synced
add index done, it takes: 18m32.781865519s
[2024-09-02 12:18:27] ###### start admin check
admin check index sbtest1 index_test_1725249594568

4. What is your TiDB version? (Required)

./tidb-server -V
Release Version: v8.4.0-alpha
Edition: Community
Git Commit Hash: 3419bde
Git Branch: heads/refs/tags/v8.4.0-alpha
UTC Build Time: 2024-08-31 11:47:42
GoVersion: go1.21.10
Race Enabled: false
Check Table Before Drop: false
Store: unistore
2024-09-02T11:56:15.822+0800

The text was updated successfully, but these errors were encountered:

Lily2025 · 2024-09-03T02:33:14Z

/assign tangenta

lance6716 · 2024-09-26T09:33:17Z

the problem is lightning forget to set j.lastRetryableErr before line 636

tidb/pkg/lightning/backend/local/region_job.go

Lines 623 to 636 in 01797cb

    
           for retry := 0; retry < maxRetryTimes; retry++ { 
        
           	resp, err := local.doIngest(ctx, j) 
        
           	if err == nil && resp.GetError() == nil { 
        
           		j.convertStageTo(ingested) 
        
           		return nil 
        
           	} 
        
           	if err != nil { 
        
           		if common.IsContextCanceledError(err) { 
        
           			return err 
        
           		} 
        
           		log.FromContext(ctx).Warn("meet underlying error, will retry ingest", 
        
           			log.ShortError(err), logutil.SSTMetas(j.writeResult.sstMeta), 
        
           			logutil.Region(j.region.Region), logutil.Leader(j.region.Leader)) 
        
           		continue

when retry exceeded the limit (30), it will return nil error which is not expected

tidb/pkg/lightning/backend/local/local.go

Lines 1463 to 1474 in 01797cb

    
           case job, ok = <-jobFromWorkerCh: 
        
           } 
        
           if !ok { 
        
           	retryer.close() 
        
           	return nil 
        
           } 
        
           switch job.stage { 
        
           case regionScanned, wrote: 
        
           	job.retryCount++ 
        
           	if job.retryCount > maxWriteAndIngestRetryTimes { 
        
           		job.done(&jobWg) 
        
           		return job.lastRetryableErr

lance6716 · 2024-09-26T09:36:02Z

introduced by #40692 , affected version started from v7.0.0

…56345) (#56346) close #55808

…56345) (#56913) close #55808

…56345) (#58165) close #55808

Lily2025 added the type/bug The issue is confirmed as a bug. label Sep 3, 2024

ti-chi-bot bot assigned tangenta Sep 3, 2024

jebter added severity/critical component/ddl This issue is related to DDL of TiDB. impact/inconsistency incorrect/inconsistency/inconsistent labels Sep 3, 2024

ti-chi-bot bot added may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 may-affects-8.1 labels Sep 3, 2024

lance6716 self-assigned this Sep 26, 2024

lance6716 changed the title ~~data inconsistency in table after tikv rolling restart during add index~~ data inconsistency in table after tikv is down larger than 810s Sep 26, 2024

lance6716 changed the title ~~data inconsistency in table after tikv is down larger than 810s~~ data inconsistency in table after tikv is down longer than 810s Sep 26, 2024

lance6716 added affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. labels Sep 26, 2024

ti-chi-bot bot removed may-affects-7.5 may-affects-8.1 may-affects-7.1 labels Sep 26, 2024

lance6716 removed may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 labels Sep 26, 2024

lance6716 changed the title ~~data inconsistency in table after tikv is down longer than 810s~~ data inconsistency in table (lightning didn't ingest all kv) after tikv is down longer than 810s Sep 26, 2024

tangenta removed their assignment Sep 26, 2024

This was referenced Sep 26, 2024

lightning: fix forget to set lastRetryableErr when ingest RPC fail #56345

Merged

lightning: fix forget to set lastRetryableErr when ingest RPC fail (#56345) #56346

Merged

ti-chi-bot bot closed this as completed in #56345 Sep 26, 2024

ti-chi-bot bot closed this as completed in 448d569 Sep 26, 2024

ti-chi-bot mentioned this issue Sep 26, 2024

lightning: fix forget to set lastRetryableErr when ingest RPC fail (#56345) #56347

Closed

13 tasks

ti-chi-bot bot pushed a commit that referenced this issue Sep 26, 2024

lightning: fix forget to set lastRetryableErr when ingest RPC fail (#…

e02c06b

…56345) (#56346) close #55808

ti-chi-bot mentioned this issue Oct 28, 2024

lightning: fix forget to set lastRetryableErr when ingest RPC fail (#56345) #56913

Merged

13 tasks

ti-chi-bot bot pushed a commit that referenced this issue Oct 29, 2024

lightning: fix forget to set lastRetryableErr when ingest RPC fail (#…

41f1b9e

…56345) (#56913) close #55808

ti-chi-bot mentioned this issue Dec 11, 2024

lightning: fix forget to set lastRetryableErr when ingest RPC fail (#56345) #58165

Merged

13 tasks

ti-chi-bot bot pushed a commit that referenced this issue Dec 11, 2024

lightning: fix forget to set lastRetryableErr when ingest RPC fail (#…

9c6787d

…56345) (#58165) close #55808

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data inconsistency in table (lightning didn't ingest all kv) after tikv is down longer than 810s #55808

data inconsistency in table (lightning didn't ingest all kv) after tikv is down longer than 810s #55808

Lily2025 commented Sep 3, 2024 •

edited

Loading

Lily2025 commented Sep 3, 2024

lance6716 commented Sep 26, 2024

lance6716 commented Sep 26, 2024

data inconsistency in table (lightning didn't ingest all kv) after tikv is down longer than 810s #55808

data inconsistency in table (lightning didn't ingest all kv) after tikv is down longer than 810s #55808

Comments

Lily2025 commented Sep 3, 2024 • edited Loading

Bug Report

1. Minimal reproduce step (Required)

2. What did you expect to see? (Required)

3. What did you see instead (Required)

4. What is your TiDB version? (Required)

Lily2025 commented Sep 3, 2024

lance6716 commented Sep 26, 2024

lance6716 commented Sep 26, 2024

Lily2025 commented Sep 3, 2024 •

edited

Loading