-
-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unset should unshadow variables higher on the stack (at least for nonlocals) #706
Comments
@akinomyoga Let's talk about the I think if we can reconcile our test cases / results it will be easier to see the solution |
Also, if it helps, we could upgrade the version in And here are the 4 test cases that fail with the new behavior. But they aren't set in stone. I mainly want to see if we're missing any cases before figuring out what to do http://travis-ci.oilshell.org/jobs/2020-04-15__02-09-39.wwz/_tmp/spec/osh.html |
Still need to figure out what the right behavior is, but this makes spec tests pass again.
That is hysterical. I can't see from the test results... what's your test case for temp bindings and/or locals and unset? On my tests for nested function calls with temporary bindings: f() {
x=$((x+1))
unset x
}
g() {
x=$((x+1)) f
post_f=$x
}
h() {
orig=$x
x=$((x+1)) g
post_g=$x
case $x in
( $((orig+3)) ) echo "$1 UNNESTED VALUES (BUT NESTED UNSET)" ;;
( ${orig} ) echo "$1 NESTED VALUES WITH NESTED UNSET" ;;
( "" ) echo "$1 HAS GLOBAL UNSET" ;;
( * ) echo "unexpected value: '$x'";;
esac
}
x=0
h GLOBAL
x=0 h LOCAL In bash, dash, Smoosh, and zsh, I get Only a very strange implementation would treat temp bindings and locals differently, but you could, I guess? Nobody does, as far as I can tell. (I didn't test osh or bosh or other shells.) |
Yeah I feel like there should be a consistent rule, but I haven't reconciled that with the test results yet. FWIW this is the page with some idioms that rely on subtle bash behavior: https://www.fvue.nl/wiki/Bash:_Passing_variables_by_reference I think we have to come up with some better test cases ... Also in Koichi's table on #653, there is sometimes a different in behavior on whether the variable being unset is in the current scope (local), or in a frame above (dynamic scope). I would like to avoid those special cases if possible... Does Smoosh have the concept of a "cell"? That is, it's a location for a value plus its metadata. I think it must, because you can have the I don't know if POSIX mentions anything like that. But the two different behaviors we are experimenting with is if As far as I remember, that was the exact same distinction that we played with back in 2019 for the temp bindings... But I don't remember it more clearly. Anyway I think it will be possible to get to the bottom of this but I just wanted to loop you in, in case you had an opinion. Here's a more concrete question: of the two different behaviors you observed, is one of them POSIX compliant and the other isn't, or does POSIX simply not specify it? |
From Section 2.9.1 of the POSIX spec:
Unfortunately, none of this has any bearing on On the following script, bash/dash/zsh/Smoosh all yield "reverts to global", while other yash/ksh/mksh yield "overrides": ORIG=5
NEW=6
OVERRIDE=10
f() {
if [ "$x" = "$ORIG" ]
then
echo "unshadowed"
fi
x=$OVERRIDE
}
x=$ORIG
x=$NEW f
if [ "$x" = "$ORIG" ]
then
echo "reverts to global"
elif [ "$x" = "$OVERRIDE" ]
then
echo "overrides"
elif [ "$x" = "$NEW" ]
then
echo "reverts to temp binding"
else
echo "unexpected value '$x'"
fi As for Smoosh, you can see the binding code in os.lem. The logic of where metadata is stored is... unhappy. Locals and globals are treated differently. Local environments store a possible value for the string (where The top-level global environments have (a) a map defining the values of variables (unset values aren't in the map, set values map to possibly empty symbolic strings), (b) a set of names marked for export, and (c) a set of names marked readonly. So morally, yes, I have "cells". Notably, they're not mutable---nothing is! That's why smoosh is sooooooo slow (4x off other shells written in C, haven't compared to OSH). |
We don't set the cell to Undef. Changed assertions in 'spec/assign'. We no longer match dash/zsh in most cases. - Case 24 we have unique behavior, but we always did (/) - We match bash with one case - We match mksh with two cases Addresses issue #706. This is sort of an experiment. Let's see if ble.sh runs. Either way the semantics are simple, and this appears to be closer to what it wants.
@akinomyoga I changed Oil back to "delete the cell", which makes one of the test cases you gave pass. Can you test it out more and tell me if the rest of ble.sh works? Adding more test cases in If this works I think it's fine as the behavior, since POSIX doesn't seem to address it, and shells diverge a lot anyway. This is apparently the part of shell where shells disagree the most. I believe the only reason we had it this way is because zsh and dash seemed to mostly agree, not because any program needed the behavior. This makes Oil behave more like bash and mksh, and less like dash and zsh. At least for the test cases I have. I'm not that confident they're comprehensive though... Thanks @mgree for looking into it and for the background. Oil still prints "reverts to global" on that case. |
I'm sorry, it took much time to investigate the behavior of these temporary bindings ( 1. Bug in Bash-4.3..5.0 and tempenv/local/unsetI was searching inside the Bash source code (where I found that these temporary bindings are called Details of Bug#!/bin/bash
# bash 4.3..5.0 bug
f1() {
local v=local
unset -v v # this is local-scope unset
echo "$v"
}
v=global
v=tempenv f1
# Results:
# Bash 2.05b-4.2 outputs "v: (unset)"
# Bash 4.3-5.0 outputs "v: global"
# Bash devel branch outputs "v: (unset)" The fix was made just about two months ago in the commit f65f3d54 (commit bash-20200207 snapshot). This is the related ChangeLog:
This bug was actually reported one year and a half ago. https://lists.gnu.org/archive/html/bug-bash/2018-12/msg00031.html
2. Interaction of tempenv/localvar/eval/unsetHere I summarize the behavior in Bash 4.3+. The treatment of tempenv has been largely changed in Bash 4.3. And there was an additional fix in the devel branch which will be released in Bash 5.1. I haven't checked the actual implementation in detail, but the observable behavior can be explained by the following model. The function has its own variable context (let us call it Behavior of function call with
|
Oh, I'm sorry I haven't noticed your reply two hours ago. But the I tried the current master of Oil. In the following case, $ bash -c 'v=global; f1() { local v=1234; unset v; echo "${v-(unset)}"; }; f1'
(unset)
$ osh -c 'v=global; f1() { local v=1234; unset v; echo "${v-(unset)}"; }; f1'
global |
OK wow!! I was not expecting that level of investigation. But thank you for doing that -- those kinds of bugs are why I don't just want to copy bash behavior. I don't want to blindly copy bash 5.0 and then they change it in bash 5.1. I would rather design something that makes sense but is also compatible with existing programs. I guess I'm glad the rule is relatively simple, although it's also just what you originally said in the bug. But I still would want to see the comparisons with other shells. I think the question now is whether Oil:
On the face of it, it seems weird that there's a difference. Why wouldn't you want i.e. are bash scripts using documented behavior, or are they just taking advantage of implementation quirks in bash? I would be interested in if the rule is documented anywhere. And I would like to see the test cases first. Let's do it in two steps -- add test cases that reveals what bash does, and compare with other shells. (I assume no shell behaves like bash, but it will be interesting to see where exactly the differences are.) And then the second step is to figure out what Oil should do, and if there should be a Thanks again for the detailed investigation! (There's no rush to do this, I'm going to make a release tomorrow anyway, but we can settle it in a later release) |
Also to clarify, I think either of these behaviors is acceptable if they would run existing programs:
But you are saying that those simpler rules are not sufficient for some programs. I guess I should try I looked at the upvar page you mentioned: https://www.fvue.nl/wiki/Bash:_Passing_variables_by_reference Were you saying that this relies on the mixed "value-unset cell-unset" rule? I read the page but didn't see that. However I guess I can just try it. (By the way I plan for something a lot simpler in Oil -- #586 is |
Hm to make this decision process shorter, I would say that:
|
For what it's worth, I just tested It seems to work, even though it uses the upvar trick? https://github.com/oilshell/bash-completion/blob/master/bash_completion#L162 (It also worked with the previous I didn't do a comprehensive test but I just hit TAB a bunch, and everything seemed to work the same. |
I think
Yes. This is inconsistent. I guess originally Bash
It's documented. But the interaction with
It's the opposite. Some parts of
No, the Freddy Vulto's trick only uses
It's documented as mentioned above.
I believe no other shell behaves like Bash. Actually I think every shell behaves in its own way here. But probably we need more investigation on the behavior of other shells.
I think |
OK thank you for the thorough analysis! That helps a lot. I changed Oil to I see what you're saying about it being potentially confusing. And "local cell unset" is a rarer behavior -- only yash and mksh match, and they are relatively unpopular. dash/bash/zsh all use "local value unset". (I just tested it) But I'm inclined to leave it now until someone complains. We can add I don't think naive users use unset very much. In fact I don't think I have ever used in a program (just like I don't use If someone unsets a variable, presumably they don't want to use it at all anymore in that function. They could be surprised if it reveals a global, but if they never use it again, that won't happen. So let's see if anyone notices basically :) It does match mksh and yash so it's not completely bizarre. |
Also if ble.sh works with this, I wouldn't bother adding the additional spec tests. We can spend the effort on some of the other things in #653, e.g. maybe the mapfile builtin since you starred it, or spec tests for other differences, etc. |
And honestly I don't see why you would want to unset a local. Unsetting a global makes sense because maybe you don't want to pollute your namespace. But the local will disappear anyway once the function returns. And if I suspect this is the reason why shells can differ so much... because that is not something programs use a lot. But I can see advanced use cases for cell-unset of nonlocals, so we can stick with that. (Something like this could go in the Oil manual, which is why I am thinking about it) Also I think |
Actually back in #329 I mentioned that I wanted Oil to have Basically the idea is that I want dynamic scope to be obvious
Basically I think the dynamic scope rule is confusing in general, e.g. to most people coming from Python and JS. It's not just BTW I think Perl has something like this... dynamic scope is an option, not the default. |
Thank you!
OK! That makes sense.
There are several use cases for
Yeah, that is a good point. I think that's the reason why I don't know real shell scripts that use Summary of behavior of different shellsI added test cases in PR #718. Here is the summary of behavior of different shells.
|
Yes it seems clear that this area where shells disagree the most! That's what I remember from the "temp binding" issue in summer 2019. It looks like it also extends to local and unset. On the one hand, it's pretty weird because these are fundamental parts of the language. On the other hand I can see that most shell programs don't actually use this flexibility. I would say once something is using "upvars" and unshadowing, it is starting to be less like a "shell script" and more like using shell as a programming language. I hope in Oil we can come up with some nicer idioms. I think the core data structure of a "stack of cells" with 3 flags (export/readonly/nameref) will hold up, but we can find cleaner ways to manipulate it. But either way I'm glad that Oil has a pretty simple and documentable behavior. I think we're starting to run some of the biggest shell scripts in the world, so I'm optimistic it will hold up. The sentence you quoted in the bash manual is an example of what I dislike about bash... It tells you literally what bash does, almost repeating the interpreter in prose form. But it doesn't tell you WHY it did that. And a lot of times there is no "why". It's just a mistake that In Oil, the "why" for But I also think that most idiomatic Oil programs will not need to use unset at all. I would go as far as to say there could be a mode which only allows Thanks for the help with this! |
This is the issue from Michael Greenberg, who is working on POSIX shell (Smoosh, the executable formal semantics). So even though POSIX doesn't have locals, it has temp bindings, and this also affects unset!
#329
I didn't refresh my memory of this fully yet, but I think we can work it out between these two issues ... this is why we have the test cases
test/spec.sh assign -r 24-27
, noted in the commitaac8712
Looking at case 24 only, I think I went with that behavior because only dash and zsh agree. Every other shell disagrees to an extent.
Since people rely on the bash idiom, it's probably better to change it to be closer to that, but I'm not sure exactly how. So yeah having the exact test cases for ble.sh will help, e.g. in
spec/ble-idioms.tset.sh
.Originally posted by @andychu in #653 (comment)
The text was updated successfully, but these errors were encountered: