Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add index or lightning or import into failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable” when kill pd leader #8142

Closed
Lily2025 opened this issue May 7, 2024 · 4 comments · Fixed by #8216 or pingcap/tidb#53718
Assignees
Labels
affects-8.1 This bug affects the 8.1.x(LTS) versions. severity/major type/bug The issue is confirmed as a bug.

Comments

@Lily2025
Copy link

Lily2025 commented May 7, 2024

Bug Report

What did you do?

1、add index
2、kill pd leader

What did you expect to see?

add index can success

What did you see instead?

add index failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable”

What version of PD are you using (pd-server -V)?

./pd-server -V
Release Version: v8.2.0-alpha
Edition: Community
Git Commit Hash: 1679dbc
Git Branch: heads/refs/tags/v8.2.0-alpha
UTC Build Time: 2024-04-30 11:39:01

@Lily2025 Lily2025 added the type/bug The issue is confirmed as a bug. label May 7, 2024
@Lily2025 Lily2025 changed the title add add index failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable” when kill pd leader May 7, 2024
@Lily2025
Copy link
Author

Lily2025 commented May 7, 2024

/type bug
/severity major
/assign rleungx

@Lily2025
Copy link
Author

Lily2025 commented May 7, 2024

/assign JmPotato

@Lily2025
Copy link
Author

Lily2025 commented May 7, 2024

lightning failed with error “request pd http api failed with status: '500 Internal Server Error'“ when kill pd leader

lightning logs:
[2024/05/06 15:42:15.072 +00:00] [ERROR] [client.go:234] ["[pd] request failed with a non-200 status"] [source=lightning] [name=GetStores] [url=http://tc-pd-2.tc-pd-peer.ha-test-lightning-tps-7575769-1-386.svc:2379/pd/api/v1/stores] [method=GET] [caller-id=pd-http-client] [status="500 Internal Server Error"] [body="[PD:apiutil:ErrRedirect]redirect failed\n"] [stack="github.com/tikv/pd/client/http.(*clientInner).doRequest\n\t/go/pkg/mod/github.com/tikv/pd/[email protected]/http/client.go:234\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry.func1\n\t/go/pkg/mod/github.com/tikv/pd/[email protected]/http/client.go:139\ngithub.com/tikv/pd/client/retry.(*Backoffer).Exec\n\t/go/pkg/mod/github.com/tikv/pd/[email protected]/retry/backoff.go:78\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry\n\t/go/pkg/mod/github.com/tikv/pd/[email protected]/http/client.go:160\ngithub.com/tikv/pd/client/http.(*client).request\n\t/go/pkg/mod/github.com/tikv/pd/[email protected]/http/client.go:379\ngithub.com/tikv/pd/client/http.(*client).GetStores\n\t/go/pkg/mod/github.com/tikv/pd/[email protected]/http/interface.go:403\ngithub.com/pingcap/tidb/pkg/lightning/tikv.ForAllStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/pkg/lightning/tikv/tikv.go:103\ngithub.com/pingcap/tidb/pkg/lightning/backend/local.(*switcher).switchTiKVMode\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/pkg/lightning/backend/local/tikv_mode.go:69\ngithub.com/pingcap/tidb/pkg/lightning/backend/local.(*switcher).ToImportMode\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/pkg/lightning/backend/local/tikv_mode.go:53\ngithub.com/pingcap/tidb/lightning/pkg/importer.(*Controller).buildRunPeriodicActionAndCancelFunc.func5\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/lightning/pkg/importer/import.go:1001"] [2024/05/06 15:42:16.034 +00:00] [ERROR] [client.go:206] ["[pd] do http request failed"] [source=lightning] [name=SetRegionLabelRule] [url=http://tc-pd-1.tc-pd-peer.ha-test-lightning-tps-7575769-1-386.svc:2379/pd/api/v1/config/region-label/rule] [method=POST] [caller-id=pd-http-client] [error="Post \"http://tc-pd-1.tc-pd-peer.ha-test-lightning-tps-7575769-1-386.svc:2379/pd/api/v1/config/region-label/rule\": dial tcp 10.200.72.140:2379: connect: connection refused"] [stack="github.com/tikv/pd/client/http.(*clientInner).doRequest\n\t/go/pkg/mod/github.com/tikv/pd/[email protected]/http/client.go:206\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry.func1\n\t/go/pkg/mod/github.com/tikv/pd/[email protected]/http/client.go:139\ngithub.com/tikv/pd/client/retry.(*Backoffer).Exec\n\t/go/pkg/mod/github.com/tikv/pd/[email protected]/retry/backoff.go:78\ngithub.com/tikv/pd/client/http.(*clientInner).requestWithRetry\n\t/go/pkg/mod/github.com/tikv/pd/[email protected]/http/client.go:160\ngithub.com/tikv/pd/client/http.(*client).request\n\t/go/pkg/mod/github.com/tikv/pd/[email protected]/http/client.go:379\ngithub.com/tikv/pd/client/http.(*client).SetRegionLabelRule\n\t/go/pkg/mod/github.com/tikv/pd/[email protected]/http/interface.go:691\ngithub.com/pingcap/tidb/br/pkg/pdutil.pauseSchedulerByKeyRangeWithTTL\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/pdutil/pd.go:702\ngithub.com/pingcap/tidb/br/pkg/pdutil.PauseSchedulersByKeyRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/pdutil/pd.go:670\ngithub.com/pingcap/tidb/pkg/lightning/backend/local.(*Backend).ImportEngine\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/pkg/lightning/backend/local/local.go:1291\ngithub.com/pingcap/tidb/pkg/lightning/backend.(*ClosedEngine).Import\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/pkg/lightning/backend/backend.go:373\ngithub.com/pingcap/tidb/lightning/pkg/importer.(*TableImporter).importKV\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/lightning/pkg/importer/table_import.go:1346\ngithub.com/pingcap/tidb/lightning/pkg/importer.(*TableImporter).importEngine\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/lightning/pkg/importer/table_import.go:920\ngithub.com/pingcap/tidb/lightning/pkg/importer.(*TableImporter).importEngines.func3\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/lightning/pkg/importer/table_import.go:526"] [2024/05/06 15:42:16.060 +00:00] [ERROR] [backend.go:378] ["import failed"]

@Lily2025 Lily2025 changed the title add index failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable” when kill pd leader add index or lightning failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable” when kill pd leader May 20, 2024
@Lily2025 Lily2025 changed the title add index or lightning failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable” when kill pd leader add index or lightning or import into failed with error “Error 1105 (HY000): request pd http api failed with status: '503 Service Unavailable” when kill pd leader May 24, 2024
@JmPotato JmPotato added affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. and removed may-affects-5.4 may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 may-affects-8.1 labels May 27, 2024
ti-chi-bot bot pushed a commit that referenced this issue May 27, 2024
…ctor (#8216)

close #8142

Add retry logic to improve PD HTTP request forwarding success rate during PD leader switch.

Signed-off-by: JmPotato <[email protected]>
@JmPotato
Copy link
Member

JmPotato commented May 30, 2024

Introduced by #7896, multi-errors will return an error even if the request eventually succeeds after retries.

@JmPotato JmPotato reopened this May 30, 2024
@JmPotato JmPotato removed affects-5.4 This bug affects the 5.4.x(LTS) versions. affects-6.1 This bug affects the 6.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. labels May 30, 2024
ti-chi-bot bot added a commit that referenced this issue May 31, 2024
ref #8142

Due to the return of historical errors causing the client's retry logic to fail,
and since we currently do not need to obtain all errors during retries, this PR
removes `multierr` from backoffer and add tests to ensure the correctness of the retry logic.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue May 31, 2024
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue May 31, 2024
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Jul 31, 2024
ti-chi-bot bot pushed a commit that referenced this issue Aug 1, 2024
…ctor (#8216) (#8466)

close #8142

Add retry logic to improve PD HTTP request forwarding success rate during PD leader switch.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: JmPotato <[email protected]>
ti-chi-bot bot pushed a commit that referenced this issue Aug 6, 2024
ref #8142, close #8499

Due to the return of historical errors causing the client's retry logic to fail,
and since we currently do not need to obtain all errors during retries, this PR
removes `multierr` from backoffer and add tests to ensure the correctness of the retry logic.

Signed-off-by: JmPotato <[email protected]>

Co-authored-by: JmPotato <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-8.1 This bug affects the 8.1.x(LTS) versions. severity/major type/bug The issue is confirmed as a bug.
Projects
3 participants