Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report error "EncodeRow error: data and columnID count not match" when deleting a row during add column DDL job with binlog open #53133

Closed
lcwangchao opened this issue May 9, 2024 · 3 comments · Fixed by #53617
Assignees
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. component/ddl This issue is related to DDL of TiDB. report/customer Customers have encountered this bug. severity/major type/bug The issue is confirmed as a bug.

Comments

@lcwangchao
Copy link
Collaborator

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

I wrote a test to reproduce it:

lcwangchao@f434660

It has some conditions:

  1. Add column job
  2. Binlog enabled
  3. Before new column state to public, delete a row

2. What did you expect to see? (Required)

no error

3. What did you see instead (Required)

        	Error:      	Received unexpected error:
        	            	EncodeRow error: data and columnID count not match 3 vs 2
        	            	github.com/pingcap/tidb/pkg/tablecodec.EncodeOldRow

4. What is your TiDB version? (Required)

master

@lcwangchao lcwangchao added type/bug The issue is confirmed as a bug. component/ddl This issue is related to DDL of TiDB. severity/major labels May 9, 2024
@lcwangchao lcwangchao added affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. and removed may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 may-affects-8.1 labels May 9, 2024
@D3Hunter
Copy link
Contributor

D3Hunter commented May 9, 2024

it also report error when drop column or lossy modify column

        	Error:      	Received unexpected error:
        	            	EncodeRow error: data and columnID count not match 3 vs 2

diff on 5.4.1 branch

diff --git a/ddl/column.go b/ddl/column.go
index 632d370aa6..a5e7b7ffb6 100644
--- a/ddl/column.go
+++ b/ddl/column.go
@@ -172,6 +172,8 @@ func checkAddColumn(t *meta.Meta, job *model.Job) (*model.TableInfo, *model.Colu
 	return tblInfo, columnInfo, col, pos, offset, nil
 }
 
+var WaitChan chan struct{}
+
 func onAddColumn(d *ddlCtx, t *meta.Meta, job *model.Job) (ver int64, err error) {
 	// Handle the rolling back job.
 	if job.IsRollingback() {
@@ -587,6 +589,10 @@ func onDropColumn(t *meta.Meta, job *model.Job) (ver int64, _ error) {
 		}
 		job.SchemaState = model.StateWriteOnly
 	case model.StateWriteOnly:
+		if colInfo.Name.L == "wait_notify" {
+			WaitChan <- struct{}{}
+			<-WaitChan
+		}
 		// write only -> delete only
 		colInfo.State = model.StateDeleteOnly
 		setIndicesState(idxInfos, model.StateDeleteOnly)
diff --git a/executor/delete_test.go b/executor/delete_test.go
index 64c61996fa..e76bc6f773 100644
--- a/executor/delete_test.go
+++ b/executor/delete_test.go
@@ -19,6 +19,8 @@ import (
 	"testing"
 	"time"
 
+	"github.com/pingcap/tidb/ddl"
+	"github.com/pingcap/tidb/sessionctx/binloginfo"
 	"github.com/pingcap/tidb/sessionctx/variable"
 	"github.com/pingcap/tidb/testkit"
 )
@@ -130,3 +132,62 @@ func TestIssue21200(t *testing.T) {
 	tk.MustExec("execute stmt using @a")
 	tk.MustQuery("select * from t").Check(testkit.Rows("2"))
 }
+
+func TestColumnNotMatchError2(t *testing.T) {
+	ddl.WaitChan = make(chan struct{})
+	store, _ := testkit.CreateMockStore(t)
+	tk := testkit.NewTestKit(t, store)
+	tk.Session().GetSessionVars().BinlogClient = binloginfo.MockPumpsClient(&testkit.MockPumpClient{})
+	//tk.MustExec("set @@global.tidb_enable_metadata_lock=0")
+	tk.MustExec("use test")
+	tk2 := testkit.NewTestKit(t, store)
+	tk2.MustExec("use test")
+	tk.MustExec("create table t(id int primary key, a int, wait_notify int)")
+	tk.MustExec("insert into t values(1, 1, 1)")
+	go func() {
+		tk2.MustExec("alter table t drop column wait_notify")
+		close(ddl.WaitChan)
+	}()
+	// wait the ddl stage to the reorg
+	<-ddl.WaitChan
+	// begin to make sure the table t in txn contains a no public column
+	tk.MustExec("begin")
+	// continue ddl and wait it finished to avoid some necessary error log
+	ddl.WaitChan <- struct{}{}
+	<-ddl.WaitChan
+	// this statement will fail because:
+	// - schema in the plan includes the non-public column -> data contains non-public column
+	// - table.Cols() does not contain the non-public column
+	tk.MustExec("delete from t where id=1")
+	tk.MustExec("commit")
+}
diff --git a/testkit/mockstore.go b/testkit/mockstore.go
index 181d1c609e..705eaee56e 100644
--- a/testkit/mockstore.go
+++ b/testkit/mockstore.go
@@ -18,15 +18,18 @@
 package testkit
 
 import (
+	"context"
 	"testing"
 
 	"github.com/pingcap/tidb/domain"
 	"github.com/pingcap/tidb/kv"
 	"github.com/pingcap/tidb/session"
 	"github.com/pingcap/tidb/store/mockstore"
+	"github.com/pingcap/tipb/go-binlog"
 	"github.com/stretchr/testify/require"
 	"github.com/tikv/client-go/v2/oracle"
 	"github.com/tikv/client-go/v2/tikv"
+	"google.golang.org/grpc"
 )
 
 // CreateMockStore return a new mock kv.Storage.
@@ -35,6 +38,18 @@ func CreateMockStore(t testing.TB, opts ...mockstore.MockTiKVStoreOption) (store
 	return
 }
 
+type MockPumpClient struct{}
+
+// WriteBinlog is a mock method.
+func (m MockPumpClient) WriteBinlog(ctx context.Context, in *binlog.WriteBinlogReq, opts ...grpc.CallOption) (*binlog.WriteBinlogResp, error) {
+	return &binlog.WriteBinlogResp{}, nil
+}
+
+// PullBinlogs is a mock method.
+func (m MockPumpClient) PullBinlogs(ctx context.Context, in *binlog.PullBinlogReq, opts ...grpc.CallOption) (binlog.Pump_PullBinlogsClient, error) {
+	return nil, nil
+}
+
 // CreateMockStoreAndDomain return a new mock kv.Storage and *domain.Domain.
 func CreateMockStoreAndDomain(t testing.TB, opts ...mockstore.MockTiKVStoreOption) (kv.Storage, *domain.Domain, func()) {
 	store, err := mockstore.NewMockStore(opts...)

@D3Hunter
Copy link
Contributor

D3Hunter commented May 9, 2024

the reason is that we always use public columns in RemoveRecord when write binlog
Add/Update Record has no such issue

cols := t.Cols()

@seiya-annie
Copy link

/found customer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. component/ddl This issue is related to DDL of TiDB. report/customer Customers have encountered this bug. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants