-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clean cached rlimit nofile in go runtime #4237
Conversation
8e3496c
to
63ebcf7
Compare
@ls-ggg can you do a rebase instead of a merge? I.e. do not add merge commits. Also, can we have a test case? |
b275b24
to
83eebdf
Compare
@kolyshkin I've done the rebase and added test cases. |
83eebdf
to
d0dbf90
Compare
OK, I took a deeper look and in fact this is already fixed in the main branch via #3813, which brings in this change: So, to my best understanding, this PR is not needed. For release-1.1 branch to fix this, we need something like #3813. |
Closing (not needed in main branch, opened #4239 for release-1.1 branch). |
Interestingly, runc-1.1.12 (as used in the original bug report, #4195) is compiled with Go 1.20.13: $ ./runc-1.1.12 --version
runc version 1.1.12
commit: v1.1.12-0-g51d5e946
spec: 1.0.2-dev
go: go1.20.13
libseccomp: 2.5.4 and I was able to reproduce the issue with this version. Found out that Go 1.20 has CL 476097 backported (as CL 478659), so Go 1.20 is also affected (since Go1.20.4). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should drop the test case entirely, as it takes up to 100000 execs (in my setup) to catch the issue, and we can't afford it in CI.
For some reason, my alternative approach (#4239) is not working. Testing this PR, I got this: $ RUNC=~/git/runc/runc-4237-go1.21; i=0; $RUNC --version && while [ $i -lt 100000 ]; do LIM=$(sudo $RUNC exec bionic sh -c "ulimit -n"); if [ $LIM -ne 65536 ]; then echo "WHOOPSIE (iter $i) got numfile $LIM"; break; fi; ((i%100==0)) && printf "[%s] %10d\n" "$(date)" "$i"; let i++; done
runc version 1.1.0+dev
commit: v1.1.0-994-g4641f17e-dirty
spec: 1.2.0
go: go1.21.7
libseccomp: 2.5.1
[Tue 02 Apr 2024 10:10:36 PM PDT] 0
[Tue 02 Apr 2024 10:10:38 PM PDT] 100
...
[Tue 02 Apr 2024 10:23:23 PM PDT] 42300
WHOOPSIE (iter 42333) got numfile 1048576 So there is still some race going on :( |
Note the number is now as go runtime sets it for itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these changes, I would be okay with taking something like this as a hotfix but IMHO this is clearly a Go stdlib bug. We use prlimit(2)
to set another process's rlimits (to avoid permission issues) and it's not reasonable for Go to assume that this won't happen.
Most importantly, this fix does not work :-\ |
@kolyshkin |
d2c509f
to
bbb035c
Compare
rlimit := syscall.Rlimit{} | ||
if err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &rlimit); err != nil { | ||
return err | ||
} | ||
if err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &rlimit); err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I suggested in the earlier review, for correctness reasons, I think
rlimit := syscall.Rlimit{} | |
if err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &rlimit); err != nil { | |
return err | |
} | |
if err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &rlimit); err != nil { | |
return err | |
} | |
for _, rlimit := range l.config.Rlimits { | |
if rlimit.Type == unix.RLIMIT_NOFILE { | |
if err := unix.Setrlimit(rlimit.Type, &unix.Rlimit{ | |
Cur: rlimit.Soft, | |
Max: rlimit.Hard, | |
}); err != nil { | |
return fmt.Errorf("failed to re-apply nofile rlimit: %w", err) | |
} | |
} | |
} |
makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the proposed implementation is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But running the unit test failed and I encountered an error:
=== RUN TestInitJoinNetworkAndUser
exec_test.go:1543: unexpected error: unable to start container process: error during container init: failed to re-apply nofile rlimit: operation not permitte
My code just clears the cache in go runtime and has no meaning to the operating system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about 2 things:
- I think the code provided by you and cyphar have the same effect, but I don't know why cyphar's code will be fail, could you please give some explainations? If yes, it will be appreciated.
- If I understand right, there is no similar bug for
runc run
, this bug is only inrunc exec
? If yes, insetns_init_linux
, I think we can useprocReady
msg to tell the parent process doSetrlimit
, like what is doing instandard_init_linux
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. like what is doing in
standard_init_linux
.
runc/libcontainer/process_linux.go
Lines 775 to 781 in f9f8abf
case procReady: | |
seenProcReady = true | |
// set rlimits, this has to be done here because we lose permissions | |
// to raise the limits once we enter a user-namespace | |
if err := setupRlimits(p.config.Rlimits, p.pid()); err != nil { | |
return fmt.Errorf("error setting rlimits for ready process: %w", err) | |
} |
bbb035c
to
28c5edd
Compare
28c5edd
to
fa72652
Compare
fa72652
to
2b33989
Compare
As reported in issue opencontainers#4195, the new version of go runtime will cache rlimit-nofile. before executing exec, the rlimit-nofile of the process will be updated with the cache. in runc, this will cause the rlimit-nofile set by the parent process for the container to become invalid. this can be solved by clearing the cache. Signed-off-by: ls-ggg <[email protected]>
2b33989
to
f9f8abf
Compare
Can this mr be approved? |
As I said a month ago, this fix does not work. |
closing in favor of #4265, which incorporates this work. |
Fixes opencontainers#4237 Signed-off-by: Kir Kolyshkin <[email protected]>
fix:#4195
In a relatively new runc version, in other words a new go version (in my environment the go version is go1.20.13),
the runtime will cache the value of rlimit nofile, which will cause the container's rlimit nofile configuration to not take effect. This problem must occur when the customer configures the CAP_SYS_RESOURCE permission.
refer to:[commit](// golang/go@f5eef58#diff-ec665e9789f8cf5cd1828ad7fa9f0ff4ebc1f5b5dd0fc82a296da5c07da7ece6)
go runtime caches rlimit-nofile during startup phase
and before exec,runtime will reset rlimit-nofile.