-
-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core/process] Fix/Implement redirections <>, 5>&-, 6>&5-, {fd}>file, etc. #672
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! It probably fixes some leaks like #503, and #223 is related. This part of the code definitely needed some attention. As mentioned I don't really use these constructs and I have trouble reading them.
As for the schema, I think a sum type like dest = Name(Token name) | Num(int fd)
might be cleaner, but I think that can wait for another commit. Right now it may be tricky to share types between two different ASDL files -- I don't remember exactly.
Lately I tend to preserve the Token
rather than using string
, because it gives better syntax and runtime errors, but that's also a small issue.
I made a couple minor comments and then I think it's ready to merge and we can keep testing more stuff on top of this.
osh/cmd_exec.py
Outdated
fd = consts.RedirDefaultFd(n.op.id) if n.fd == runtime.NO_SPID else n.fd | ||
fd = n.fd | ||
fd_name = n.fd_name | ||
if fd == runtime.NO_SPID and not fd_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NO_SPID is supposed to only be for span_id
. Maybe make another constant NO_FD
? Or -1
might be OK too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I defined process.NO_FD
1caf56d. If you think NO_FD
should be defined in a different place, please let me know.
frontend/lexer_def.py
Outdated
@@ -303,6 +303,17 @@ def IsKeyword(name): | |||
R(r'[0-9]*<>', Id.Redir_LessGreat), | |||
R(r'[0-9]*>\|', Id.Redir_Clobber), | |||
|
|||
R(r'\{[_a-zA-Z][_a-zA-Z0-9]*\}<', Id.Redir_Less), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is identical to VAR_NAME_RE
right? If so let's use it, maybe with an intermediate like:
FD_VAR_NAME = '\{' + VAR_NAME_RE + '\}'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I defined FD_VAR_NAME
cc491ff.
Second thought: maybe we should just have And then in the evaluator can just check But if you don't think that will simplify the code, it's not that important. |
e5e82fe
to
57324c9
Compare
57324c9
to
0f2b96b
Compare
Thank you for all your review! Actually initially I thought about a similar one
Thank you for the suggestion! I created commits for this suggestion (74f9d28 for I updated the default fd for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK there is one issue about FDs over 9 that I didn't pick up in the original review. Let me know what you think.
else: | ||
fd = runtime.NO_SPID | ||
index = 0 | ||
while index < len(op_tok.val) and op_tok.val[index].isdigit(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important issue: I think we need a limit on user descriptors so they don't collide with descriptors the shell uses? Mentioned here:
https://www.aosabook.org/en/bash.html
The other complication is one bash brought on itself. Historical versions of the Bourne shell allowed the user to manipulate only file descriptors 0-9, reserving descriptors 10 and above for the shell's internal use. Bash relaxed this restriction, allowing a user to manipulate any descriptor up to the process's open file limit. This means that bash has to keep track of its own internal file descriptors, including those opened by external libraries and not directly by the shell, and be prepared to move them around on demand. This requires a lot of bookkeeping, some heuristics involving the close-on-exec flag, and yet another list of redirections to be maintained for the duration of a command and then either processed or discarded.
Are you using descriptors over 9? I haven't seen shell scripts using more, and POSIX only guarantees up to 9 I think.
Maybe we can limit it to 99 instead of 9 ? But if there's no usage I would keep it at 9 until we encounter it.
core/process.py
Outdated
|
||
|
||
class FdState(object): | ||
"""This is for the current process, as opposed to child processes. | ||
|
||
For example, you can do 'myfunc > out.txt' without forking. | ||
""" | ||
def __init__(self, errfmt, job_state): | ||
# type: (ErrorFormatter, JobState) -> None | ||
def __init__(self, errfmt, job_state, mem = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mem=None here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I'm sorry, this was an oversight. Fixed 704ba1e.
One thing that is annoying is that the use of unrestricted descriptors means you can't link arbitrary libraries into the shell that do I/O! Like simply For example, I thought it would be cool to link in |
Thinking about this more: I would say if ble.sh does not use descriptors over 9, let's just keep it at a single digit, and add a TODO to revisit that code later. The other changes are more important. If it does use descriptors over 9, we can discuss what the right restriction is. |
You don't have to worry about the collisions because Oil has already implemented above mentioned "a lot of bookkeeping, some heuristics involving the close-on-exec flag, and yet another list of redirections to be maintained for the duration of a command and then either processed or discarded" completely, and it should work without problems now. The above explanation is written in a way that it looks like a highly non-trivial matter, but that's an exaggeration. If you implemented it carefully, there is no problem.
This is a good question. Actually I think shell libraries/frameworks inevitably use larger values of file descriptors (as far as they don't require the redirection of the form
Ah, OK... It's non-trivial. But can we control the range of the file descriptors used in those libraries? If we can't control it, the problem wouldn't be changed even if we restrict the available range to 0-9. For safe programming, I think one should always use the form |
OK yeah I guess you're right, as long as the program always uses I had to learn things about colliding file descriptors the hard way as described here [1], even though I think I had already read that bit about bash. I agree that I think it would be nice to have a better syntax for this, e.g. basically expose |
I don't know exactly why I need this, but I noticed that Travis' fd state is not clean, or at least not the same as the bash on my desktop. Related to PR #672.
FWIW this test passed on my machine, but on Travis bash did NOT fail, so the test failed. I guess this is the same problem where file descriptors are hidden global state that is not specified when opening a program. In Oil I would deprecate it for #673 and that's another reason to avoid direct FD manipulation in shell (again http://www.oilshell.org/blog/2017/08/12.html). (Although obviously in this case we need it for the test) |
Only other problem: this JSON test case is now failing, and I'm able to reproduce it on my machine: http://travis-ci.oilshell.org/jobs/2020-03-21__19-53-14.wwz/_tmp/spec/oil-json.html
Looking into it ... Oh it has to be of this, not sure exactly why but I'll look at it later... https://github.com/oilshell/oil/blob/master/oil_lang/builtin_oil.py#L288 |
I guess |
OK that was a quicker fix than I expected. The issue was that the local file returned by This kind of thing shouldn't be an issue when it's translated to C++, and is another good reason to translate it. BTW I'm not sure if I think it's easier to reason about the edge conditions and |
1. 328e636 Support redirections 5<>, 6>&5- and 6>&-
2. 7ea281a Update test
The arguments don't match with the function declaration? I guess the test is old. If this is not the right fix, please let me know.
3. 8dd5a02 Add tests for redirections
This demonstrates bugs related to the subsequent fixes (4., 5., and 6.) and also includes tests for
20> file
and a Bash bug.4. 10136ce Fix "Bad file descriptor" on 3>&3
This is a bug fix. The file descriptor was broken with
echo 3>&3
. The following test case (inspec/redirect.test.sh
) had been failing due to this bug.5. 17e2f05 Fix fd-restoring order in FdState.Pop
This is another fix. The order of restoring original fds in
saved
andneed_close
were mixed. The ordering should be preserved. The following test case (inspec/redirect.test.sh
) had failed due to this problem:6. f5c2178 Fix fd-leak on
: 5> file
See the following test case. The file descriptor
9
was still alive after the end of the command.7. 0af7781 Fix fd-leak on
exec 5>&1
The saved fds are not released on
FdState.MakePermanent()
. See the following example. In Oil, there were remaining fds as many as the invokations ofexec 5>&1
.I didn't add a test to
spec/redirect.test.sh
because I don't know how to make it portable (/proc/self/fd
is available in Linux but not portable). I tried to useulimit -n 100
for this, but it turned out that the limit cannot be changed in Oil (see below). Maybe I could have implemented the test case to generate fds as many as$(ulimit -n)
, but I'm afraid that$(ulimit -n)
might be extremely large in some system (but I don't know actually).8. da16c45 Support redirection {fd}>, {fd}<, etc.
Also, it includes the support of more-than-single-digit redirections such as
20> file
. Actually, I'm not familiar with ASDL, so maybe I'm doing weird things although it works as expected superficially. I would appreciate it if you could check this commit in detail.