Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: data race between StdinPipe and Wait #9307

Closed
CSEMike opened this issue Dec 13, 2014 · 17 comments
Closed

os/exec: data race between StdinPipe and Wait #9307

CSEMike opened this issue Dec 13, 2014 · 17 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@CSEMike
Copy link

CSEMike commented Dec 13, 2014

I believe there is an unintentional race in os/exec between Command.Wait and file operations. I am using go version go1.3 linux/amd64, but the race also reproduces with 1.4

A test case follows.

package main

import (
  "fmt"
  "log"
  "os/exec"
)

func main() {
  cmd := exec.Command("/bin/echo")
  stdin, err := cmd.StdinPipe()
  if err != nil {
    log.Fatalf("StdinPipe: %v", err)
  }
  if err := cmd.Start(); err != nil {
    log.Fatalf("Start: %v", err)
  }
  wrote := make(chan bool)
  go func() {
    defer close(wrote)
    fmt.Fprint(stdin, "echo\n")
  }()
  if err := cmd.Wait(); err != nil {
    log.Fatalf("Wait: %v", err)
  }
  <-wrote
}

To observe the race, save this program to racedemo.go, and run:

go run -race racedemo.go

The result:

WARNING: DATA RACE
Read by goroutine 5:
  os.(*File).write()
      /usr/lib/google-golang/src/pkg/os/file_unix.go:211 +0xc5
  os.(*File).Write()
      /usr/lib/google-golang/src/pkg/os/file.go:139 +0xcb
  os/exec.(*closeOnce).Write()
      <autogenerated>:42 +0x81
  fmt.Fprint()
      /usr/lib/google-golang/src/pkg/fmt/print.go:223 +0xc3
  main.func·001()
      /usr/local/google/home/piatek/racedemo.go:21 +0x159

Previous write by main goroutine:
  os.(*file).close()
      /usr/lib/google-golang/src/pkg/os/file_unix.go:108 +0x19b
  os.(*File).Close()
      /usr/lib/google-golang/src/pkg/os/file_unix.go:97 +0x94
  os/exec.(*closeOnce).close()
      /usr/lib/google-golang/src/pkg/os/exec/exec.go:442 +0x45
  os/exec.*closeOnce.(os/exec.close)·fm()
      /usr/lib/google-golang/src/pkg/os/exec/exec.go:437 +0x33
  sync.(*Once).Do()
      /usr/lib/google-golang/src/pkg/sync/once.go:40 +0xb1
  os/exec.(*closeOnce).Close()
      /usr/lib/google-golang/src/pkg/os/exec/exec.go:437 +0x8e
  os/exec.(*Cmd).closeDescriptors()
      /usr/lib/google-golang/src/pkg/os/exec/exec.go:222 +0xa2
  os/exec.(*Cmd).Wait()
      /usr/lib/google-golang/src/pkg/os/exec/exec.go:367 +0x45c
  main.main()
      /usr/local/google/home/piatek/racedemo.go:23 +0x316

Goroutine 5 (running) created at:
  main.main()
      /usr/local/google/home/piatek/racedemo.go:22 +0x307

The race is on a reference to the file descriptor, which is assigned -1 upon close and read when writing. The documentation is not precise, but I suspect this race is unintentional.

The implementation of StdinPipe uses the internal closeOnce type to prevent concurrent calls to Close; see: http://golang.org/src/os/exec/exec.go#L436

Once wrapped, the file descriptor is added to the closeAfterWait slice, suggesting that concurrent calls to Close and Wait are to be expected. Yet, a concurrent Write will race as shown above. The partially concurrent API is odd.

I believe it is common for code to block on Wait in one goroutine while writing to StdinPipe from other goroutines. (Indeed, I found such a race in a library at Google, prompting my investigation.)

If this race is expected behavior, I suggest updating the package docs to mirror the warning for use of StdoutPipe, e.g., "It is incorrect to call Wait before all writes to the pipe have completed."

@minux
Copy link
Member

minux commented Dec 13, 2014

How could the race be prevented at the os/exec level?

Should we delay Wait(), possibly indefinitely, waiting for Write to
complete?
How do we even know that there won't be another Write() when we decide
to do the clean up?

@CSEMike
Copy link
Author

CSEMike commented Dec 13, 2014

Before speculating on how best to fix the problem, I would like to reach agreement on the intended semantics of the API. Specifically, are writes to StdinPipe intended to be concurrent with Wait?

@minux
Copy link
Member

minux commented Dec 13, 2014

It seems the only way to fix the data race is to override provide our own
implementation of io.ReadCloser. (Similar races exist for StdoutPipe and
StderrPipe, if we fix StdinPipe, we should also fix the other two.)

Either we lock the struct for Read/Close, or use a sync/atomic to load the
underlying fd.

I don't think we can continue to use os.(*File), because otherwise we will
have to take the lock for the entire duration of the method call to prevent
races on the fd. But that means you can't Close it while there is concurrent
Write.

so to fix this, we will need to implement the method ourselves using syscall
directly.

We need to weight the pros and cons of fixing this and the effort required
for the user to avoid such races.

Leave for others to decide.

@ianlancetaylor
Copy link
Member

Note that the race only occurs if the process exits before reading the data read on the input pipe.

@ianlancetaylor
Copy link
Member

This is related to issue #6817, in that if that issue were fixed this one would no longer be a problem, as os.Pipe would return an object that supports simultaneous Write/Close.

@minux
Copy link
Member

minux commented Dec 14, 2014

Yes, if fixing os.Pipe will fix the underlying problem. Unfortunately,
#6817 is much harder
a problem and require large scale changes. I'm still not sure how to make a
pollable
stat(2). High performance web servers seem to all workaround this problem
by having
separate worker threads for stat(2).

@ianlancetaylor
Copy link
Member

Yes, #6817 is hard. But, short of that, I'm not sure how hard to work to make a WriteCloser that supports simultaneous Write/Close only for the case where a program exits before reading from the input pipe. It's easy to construct such a case, but it doesn't seem like something that people are likely to do in real life.

Still, I just thought of one approach that should solve the problem at least for StdinPipe. In closeOnce.Close, call Fd to get the file descriptor and call syscall.Close (or syscall.CloseHandler) on the descriptor. Add closeOnce.Write that locks a mutex before writing. Have closeOnce.Close lock the mutex after calling syscall.Close. Trust that the kernel will unblock the write when the descriptor is closed.

@minux
Copy link
Member

minux commented Dec 14, 2014

On Sat, Dec 13, 2014 at 8:37 PM, Ian Lance Taylor [email protected]
wrote:

Still, I just thought of one approach that should solve the problem at
least for StdinPipe. In closeOnce.Close, call Fd to get the file descriptor
and call syscall.Close (or syscall.CloseHandler) on the descriptor. Add
closeOnce.Write that locks a mutex before writing. Have closeOnce.Close
lock the mutex after calling syscall.Close. Trust that the kernel will
unblock the write when the descriptor is closed.

Right. I think to fix the problem within the os/exec package, we must
implement the Write and Close
methods ourselves. Same for StdoutPipe and StderrPipe. It all depends on
whether the problem is
worth the trouble. Or we can just update the document.

A somewhat related issue is #9173. We don't know if the running child
process not consuming
all the write before exiting successfully is an error or not. If it is,
this data race is also an error.

@rsc
Copy link
Contributor

rsc commented Nov 24, 2015

I think I'd argue this is a dup of #7970, although it's more troubling, because you can actually demonstrate it in a simple program.

@rsc
Copy link
Contributor

rsc commented Dec 5, 2015

Too late to fix for Go 1.6.

@rsc rsc modified the milestones: Go1.7Early, Go1.6Early Dec 5, 2015
@odeke-em
Copy link
Member

For what it is worth, if this bug is blocking you, for a hot fix you can get around it by setting your own stdin before cmd.Start(). I ran into this bug two days back and it got me for about 18 hours until I realized that the race condition was caused because in cmd.Wait(), cmd.closeDescriptors() is invoked for the registered(internal) io.Pipe()s. The pipeReader that is returned for stdin is registered internally whenever you invoke cmd.StdinPipe() and will be closed on cmd.Wait(). The work around this is to assign a readCloser that you control and then close it on your own whenever you are done reading. Please see the patch below

--- orig.go 2016-02-11 03:44:56.000000000 -0800
+++ main.go 2016-02-11 03:44:57.000000000 -0800
@@ -2,23 +2,24 @@

 import (
    "fmt"
+   "io"
    "log"
    "os/exec"
 )

 func main() {
    cmd := exec.Command("/bin/echo")
-   stdin, err := cmd.StdinPipe()
-   if err != nil {
-       log.Fatalf("StdinPipe: %v", err)
-   }
+   pr, pw := io.Pipe()
+
+   cmd.Stdin = pr
    if err := cmd.Start(); err != nil {
        log.Fatalf("Start: %v", err)
    }
    wrote := make(chan bool)
    go func() {
        defer close(wrote)
-       fmt.Fprint(stdin, "echo\n")
+       fmt.Fprint(pw, "echo\n")
+       _ = pw.Close()
    }()
    if err := cmd.Wait(); err != nil {
        log.Fatalf("Wait: %v", err)

@odeke-em
Copy link
Member

@adg adg modified the milestones: Go1.8Early, Go1.7Early May 4, 2016
@adg
Copy link
Contributor

adg commented May 4, 2016

Missed again.

@rsc rsc added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Sep 26, 2016
@rsc rsc self-assigned this Sep 26, 2016
@rsc
Copy link
Contributor

rsc commented Sep 26, 2016

We need to decide what to do here.

@gopherbot
Copy link
Contributor

CL https://golang.org/cl/31148 mentions this issue.

@rsc
Copy link
Contributor

rsc commented Oct 18, 2016

os/exec already guards against "implicit Close during cmd.Wait racing against user calling w.Close". The CL I just sent expands that to guard against "implicit Close during cmd.Wait racing against user calling w.Write".

There is still a race between w.Close and w.Write, like there is on any os.File. I'm not solving #7970, just the os/exec case.

@rsc rsc added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. RaceReport labels Oct 18, 2016
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/65490 mentions this issue: os/exec: remove protection against simultaneous Wait/Write

gopherbot pushed a commit that referenced this issue Sep 22, 2017
CL 31148 added code to protect again simultaneous calls to Close and
Wait when using the standard input pipe, to fix the race condition
described in issue #9307. That issue is a special case of the race
between Close and Write described by issue #7970. Since issue #7970
was not fixed, CL 31148 fixed the problem specific to os/exec.

Since then, issue #7970 has been fixed, so the specific fix in os/exec
is no longer necessary. Remove it, effectively reverting CL 31148 and
followup CL 33298.

Updates #7970
Updates #9307
Updates #17647

Change-Id: Ic0b62569cb0aba44b32153cf5f9632bd1f1b411a
Reviewed-on: https://go-review.googlesource.com/65490
Run-TryBot: Ian Lance Taylor <[email protected]>
Reviewed-by: Daniel Martí <[email protected]>
Reviewed-by: Miguel Bernabeu <[email protected]>
Reviewed-by: Russ Cox <[email protected]>
Reviewed-by: Joe Tsai <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
@golang golang locked and limited conversation to collaborators Sep 22, 2018
@rsc rsc removed their assignment Jun 23, 2022
@golang golang unlocked this conversation Jun 21, 2023
@golang golang locked as resolved and limited conversation to collaborators Jun 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

9 participants