Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

br restore hangs sometimes #54140

Closed
fubinzh opened this issue Jun 20, 2024 · 3 comments · Fixed by #54790
Closed

br restore hangs sometimes #54140

fubinzh opened this issue Jun 20, 2024 · 3 comments · Fixed by #54790
Labels
component/br This issue is related to BR of TiDB. severity/moderate type/bug The issue is confirmed as a bug.

Comments

@fubinzh
Copy link

fubinzh commented Jun 20, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  • run br restore, sometime br restore hangs

2. What did you expect to see? (Required)

br restore not hang

3. What did you see instead (Required)

br restore hangs

4. What is your TiDB version? (Required)

[root@br-0 /]# /br -V
Release Version: v8.2.0-alpha
Git Commit Hash: 26d1096
Git Branch: heads/refs/tags/v8.2.0-alpha
Go Version: go1.21.10
UTC Build Time: 2024-06-17 11:39:33
Race Enabled: false

@fubinzh fubinzh added the type/bug The issue is confirmed as a bug. label Jun 20, 2024
@fubinzh
Copy link
Author

fubinzh commented Jun 20, 2024

/component br

@ti-chi-bot ti-chi-bot bot added the component/br This issue is related to BR of TiDB. label Jun 20, 2024
@Leavrth
Copy link
Contributor

Leavrth commented Jun 21, 2024

We can use dlv to find the main goroutine and other two adjacent goroutines.

Goroutine 1 - User: /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/progress.go:71 github.com/pingcap/tidb/br/pkg/utils.(*ProgressPrinter).Close (0x46333a5) [chan receive 9303735014558886]
...
Goroutine 876 - User: /usr/local/go/src/runtime/internal/syscall/syscall_linux.go:38 syscall.RawSyscall6 (0x208bbad) (thread 397) [select 9302713797155187]
Goroutine 877 - User: /go/pkg/mod/github.com/cheggaaa/pb/[email protected]/pb.go:395 github.com/cheggaaa/pb/v3.(*ProgressBar).Finish (0x462702f) [chan send 9303735014558886]
...

Actually the Goroutine 1 is stuck by the Goroutine 877, and the Goroutine 877 is stuck by the Goroutine 876.
The full stack of the Goroutine 876 shows that it is stuck in the syscal of writing bytes into stderr.

(dlv) stack
 0  0x000000000208bbce in runtime/internal/syscall.Syscall6
    at /usr/local/go/src/runtime/internal/syscall/asm_linux_amd64.s:36
 1  0x000000000208bbad in syscall.RawSyscall6
    at /usr/local/go/src/runtime/internal/syscall/syscall_linux.go:38
 2  0x0000000002164346 in syscall.Syscall
    at /usr/local/go/src/syscall/syscall_linux.go:82
 3  0x00000000021623db in syscall.write
    at /usr/local/go/src/syscall/zsyscall_linux_amd64.go:949
 4  0x000000000218429f in syscall.Write
    at /usr/local/go/src/syscall/syscall_unix.go:209
 5  0x000000000218429f in internal/poll.ignoringEINTRIO
    at /usr/local/go/src/internal/poll/fd_unix.go:736
 6  0x000000000218429f in internal/poll.(*FD).Write
    at /usr/local/go/src/internal/poll/fd_unix.go:380
        fd = ("*internal/poll.FD")(0xc0001760c0)
        p = []uint8 len: 100, cap: 128, [...]
        ~r0 = (unreadable empty OP stack)
        ~r0 = (unreadable empty OP stack)
        ~r1 = (unreadable empty OP stack)
        ~r1 = (unreadable empty OP stack)
        fd = 2                               -> It is stuck in writing bytes into stderr.
 7  0x000000000218e0f1 in os.(*File).write
    at /usr/local/go/src/os/file_posix.go:46
 8  0x000000000218e0f1 in os.(*File).Write
    at /usr/local/go/src/os/file.go:183
 9  0x00000000021b119b in bytes.(*Buffer).WriteTo
    at /usr/local/go/src/bytes/buffer.go:261
10  0x0000000002475f45 in github.com/mattn/go-colorable.(*NonColorable).Write
    at /go/pkg/mod/github.com/mattn/[email protected]/noncolorable.go:26
11  0x000000000462646e in github.com/cheggaaa/pb/v3.(*ProgressBar).write
    at /go/pkg/mod/github.com/cheggaaa/pb/[email protected]/pb.go:229
        pb = ("*github.com/cheggaaa/pb/v3.ProgressBar")(0xc001790620)
        finish = false                       -> Actually it has been stuck before br close the progress bar.
        width = 100
        result = "..."
12  0x0000000004625fea in github.com/cheggaaa/pb/v3.(*ProgressBar).writer
    at /go/pkg/mod/github.com/cheggaaa/pb/[email protected]/pb.go:186
        pb = ("*github.com/cheggaaa/pb/v3.ProgressBar")(0xc001790620)
        finish = chan struct {} 0/0
13  0x0000000004625f25 in github.com/cheggaaa/pb/v3.(*ProgressBar).Start.func2
    at /go/pkg/mod/github.com/cheggaaa/pb/[email protected]/pb.go:178
14  0x00000000020fc521 in runtime.goexit
    at /usr/local/go/src/runtime/asm_amd64.s:1650

It looks that it's just stucking writing bytes into stderr.

type FD struct {
  fdmu type fdMutex struct {
    // (dlv) x -fmt hex -size 8 -x 0xc0001760c0
    // 0xc0001760c0:   0x000000000000000c
    // mutexWLock & mutexRef
    state uint64
    // (dlv) x -fmt hex -size 8 -x 0xc0001760c0 + 8
    // 0xc0001760c8:   0x0000000000000000
    rsema uint32
    wsema uint32
    } // + 16
  // ...
}

Besides, the k8s front-ends are closed for these cases, so stderr seems to be full.
we can read the BR's stderr as workaround, and then br can quickly exist.

tail -n 1 /proc/$(pgrep br)/fd/2

@fubinzh
Copy link
Author

fubinzh commented Jun 21, 2024

/severity moderate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/br This issue is related to BR of TiDB. severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants