Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting a variable in a 'temp binding' sets a global instead #329

Closed
mgree opened this issue Jun 7, 2019 · 13 comments
Closed

Setting a variable in a 'temp binding' sets a global instead #329

mgree opened this issue Jun 7, 2019 · 13 comments

Comments

@mgree
Copy link

mgree commented Jun 7, 2019

While comparing how various shells implement the POSIX-unspecified behavior of assignments with function calls, I was confused by OSH's behavior.

f() {
    echo "f       [$x]"
    x=$((x+1))
    echo "f incr  [$x]"
}

g() {
    echo "g pre   [$x]"
    x=$((x+1)) f
    echo "g post  [$x]"
}

h() {
    echo "h pre   [$x]"
    x=$((x+1)) g
    echo "h post  [$x]"
}

echo 'x=0 globally'
x=0
h

echo 'x=5 locally'
x=5 h

Running the above code, the increment in f doesn't seem to do anything. That is, I get the following output:

x=0 globally
h pre   [0]
g pre   [1]
f       [2]
f incr  [2]
g post  [1]
h post  [0]
x=5 locally
h pre   [5]
g pre   [6]
f       [7]
f incr  [7]
g post  [6]
h post  [5]

Though I would expect f incr [3] and f incr [7], respectively. Here's my OSH version:

Oil version 0.6.pre21
Release Date: 2019-06-04 16:50:42+00:00
Arch: x86_64
OS: Darwin
Platform: Darwin Kernel Version 17.7.0: Wed Apr 24 21:17:24 PDT 2019; root:xnu-4570.71.45~1/RELEASE_X86_64
Compiler: GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)
Interpreter: OVM
Interpreter version: 2.7.13
Bytecode: bytecode-opy.zip
@andychu
Copy link
Contributor

andychu commented Jun 7, 2019

It's definitely possible there's a bug here, but can you tell me what bash does?

Lines like this frequently lead to confusing (but technically correct) behavior:

x=$((x+1)) f

x=5 h

@mgree
Copy link
Author

mgree commented Jun 7, 2019

Here's a dump from bash (which is the same as zsh, dash, and my own shell, smoosh):

x=0 globally
h pre   [0]
g pre   [1]
f       [2]
f incr  [3]
g post  [1]
h post  [0]
x=5 locally
h pre   [5]
g pre   [6]
f       [7]
f incr  [8]
g post  [6]
h post  [5]

The only other behavior I've observed is the yash/ksh/mksh one:

x=0 globally
h pre   [0]
g pre   [1]
f       [2]
f incr  [3]
g post  [3]
h post  [3]
x=5 locally
h pre   [5]
g pre   [6]
f       [7]
f incr  [8]
g post  [8]
h post  [8]

@mgree
Copy link
Author

mgree commented Jun 7, 2019

I can narrow it down for you a little: it's not about nested scopes, but it is about functions and assignments (a very confusing and tricky part of the spec!).

f() { echo "initial $x"; x=$((x+1)); echo "after  $x"; }
x=5 f

In osh yields:

initial 5
after 5

While bash yields:

intial 5
after 6

My guess is that somehow the logic in assignments is unable to update the (quasi-)local assignment from the function call.

@andychu
Copy link
Contributor

andychu commented Jun 7, 2019

Thanks, I reproduced it. That is a pretty serious bug!

I'm still looking into it, but a "git blame" led me to this commit:

670276e

(The commit description looks like it has a typo -- it should read "are now visible" rather than "are not visible".)

It looks like I was trying to fix some relatively subtle behavior, but this more obvious behavior was broken.


I will look into this, but have you seen the HTML spec test reports for Oil? There are now 1368 total cases and 1198 passing ones. They do this kind of comparison to figure out what the status quo is.

I found these tests through that commit, and they might interest you.

http://www.oilshell.org/release/0.6.pre21/test/spec.wwz/assign.html

e.g. cases #2 and #7:

It is trivial to run tests with a new shell, normally I do:

test/spec.sh assign --range 2

But if I want to add a new shell, I just add it on the end:

test/spec.sh assign --range 2 _tmp/shells/ash  # or smoosh, I'd be interested in seeing what happens

@andychu
Copy link
Contributor

andychu commented Jun 7, 2019

And I'm glad to hear from you, since I have seen Smoosh and thought our projects would intersect at some point!

I have been mainly slogging through a lot of bash features, but it would be definitely be nice to have some confirmation that OSH is POSIX compliant, whatever that means exactly :)

One issue I have been meaning to revisit is IFS splitting. As far as I can tell, OSH is the only shell that implements it with a state machine, rather than a big pile of C code:

https://github.com/oilshell/oil/blob/master/osh/split.py#L200

Although to be fair, all shells seem to highly agree on IFS behavior, which I found surprising, given how the code looked! OSH doesn't support IFS='\' correctly, which I still need to address.

http://www.oilshell.org/release/0.6.pre21/test/spec.wwz/word-split.html

I'm trying to remember where shells disagree the most, on both POSIX features and others... I have been meaning to post-process the output of the spec tests to give some stats on that. i.e. which shells disagree most and which features are disagreed on most.

@andychu
Copy link
Contributor

andychu commented Jun 7, 2019

Oh and the reason for the bug is that OSH is incorrectly setting the global variable to 6. It reads from the temporary binding and sets the global.

For some reason when doing the dynamic scope lookup, I have an explicit check to skip over that "temporary" frame for writes. I thought there was a reason for that, but it might just be a mistake ...

There is a comment: # We don't want the 'read' builtin to write to this frame! but as far as I can tell, the read builtin should behave exactly like an assignments. It uses dynamic scope.

i.e.

read x < file.txt
# should be like
x=$(head -n 1 file.txt)

at least as far as scope is concerned (ignoring IFS).

@andychu
Copy link
Contributor

andychu commented Jun 7, 2019

Oh I think I figured out what's going on. We have 3 ways of interpreting "temporary bindings" (is there another term for them?):

  1. they are local variables in the new stack frame (bash, smoosh, etc.)
  2. they are globals (I think that's what yash/ksh/mksh are doing)
  3. they are NEITHER -- this is the OSH behavior. OSH has these "temp frames", which are readable but not writable.

I started with the construct of "temp frames" for the environment of external processes (temp frames don't have "$@", but other stack frames do). I suspect that when I made that December 2017 commit, I didn't realize I could get rid of that concept and make the temp bindings local variables in the new frame.

I will test this theory by making them locals and seeing what breaks :-/ I think it will make the code a lot simpler. I was always suspicious of this mechanism because it seemed pretty complex.

andychu pushed a commit that referenced this issue Jun 8, 2019
Fixes issue #329, found by Michael Greenberg.

The problem was that temp frames were immutable, so a GLOBAL would be
written instead.  But on a read, we would still use the temp frame.

It's fine to mutate the temp frame because of dynamic scope.
@andychu
Copy link
Contributor

andychu commented Jun 8, 2019

OK I fixed this -- thanks for the report, that was very useful!

It turns out the "temp frame" concept is OK -- having a stack frame without "$@", and also without introspection info like FUNCNAME and ${BASH_SOURCE[@]}.

But they shouldn't be immutable and I don't know why I made them that way. I think as mentioned in the prior commit I was in a rush to get Aboriginal Linux working around December 2017.

I think at some point I had the idea of set +o dynamic-scope as a shell language "cleanup" -- disabling dynamic scope, because it's a surprising feature to most programmers. Mutating a variable in a scope above your own which is not your caller's seems a bit weird.

But that never happened, and this change was a significant simplification. All the tests still pass, in addition to the test case you provided.

If you find any other anomalies, please let me know! OSH should pretty easy to build from master with

build/dev.sh minimal (that should be about it, but more notes here https://github.com/oilshell/oil/wiki/Contributing)

@andychu andychu changed the title Buggy arithmetic? Setting a variable in a 'temp binding' sets a global instead Jun 8, 2019
andychu pushed a commit that referenced this issue Jun 8, 2019
Test case 23 behaves has SIX different behaviors ni dash, bash, mksh,
zsh, yash, and osh!  (yash due to not having a 'local' keyword)

Related to issue #329.

- Split out the 'assign-extended' file for stuff dash doesn't support.
- Support more shells in 'assign', and fewer in 'assign-extended'.
@andychu
Copy link
Contributor

andychu commented Jun 8, 2019

You might be interested in cases 23 and 24 here. I get 5 different behaviors on case 23, and if you count OSH, it's 6!

http://www.oilshell.org/git-branch/dev/andy-14/3cb35d5d/spec/assign.html

This might be where shells disagree the most!

case 24 is pure POSIX by omitting local, and I get 3 different behaviors (with or without OSH). But I think local is pretty important because Debian's style guide is basically POSIX + local.

(I realized that the temp frames are not purely an implementation detail, so which led me to write those cases. After some re-org, the regression for the bug is now case 22.)

@mgree
Copy link
Author

mgree commented Jun 10, 2019

I'm glad it was a useful report with an easy fix---and thanks for the pointer to your test suite.

These test cases are 23 indeed very interesting. Smoosh agrees with zsh, yielding

x=temp-binding
x=mutated-temp
x=local
x=
x=global

I'm of course biased, but that feels like the right behavior to me. In particular, I think it's important that immediately after running unset x that x actually be unset!

Bash and OSH feel wrong here because they leak scope information. That is, if locals were meant to allow for lexical scoping, then these implementations allow users to observe less-than-lexical scope. Bash is arguably worse than OSH, because it leaks a global rather than an intermediate, lexical temporary. Dash makes slightly more sense to me. Any write to a non-local variable is treated as globally visible.

Case 24 is trickier. I think I disagree with OSH/bash's behavior again (because it means that unset leaks information about your stack). I'm not sure whether I prefer zsh's behavior (which is also smoosh's) or dash/yash's. In the zsh approach, writing a variable alongside a function 'lexicalizes' that variable for the function's dynamic extent... which may be surprising if the function was written to mutate global scope!

@mgree
Copy link
Author

mgree commented Jun 10, 2019

Finally (and sorry to pollute this issue with what is just conversation), I haven't been able to build on OS X (which I think is a known bug). I can run some of the tests you have... if I have the time and you'd be interested in a PR, I can try to add annotate test cases with smoosh's expected behaviors.

I may also steal some of your tests, if you don't mind. I can put in a URL pointing to the original... would you also like me to put in a full copy of the Apache 2.0 license and copyright line?

@andychu
Copy link
Contributor

andychu commented Jun 10, 2019

Yeah I see what you're saying about "unset". I think I could make OSH match the zsh/smoosh behavior by making unset turn the variable back to Undef rather than removing the name from the scope dictionary. I made a note on issue #288, which is a related plan to change the "data model" of OSH.

I doubt any shell scripts rely on this exact behavior (and if they do they will broken across almost every shell), but I guess using the Undef value makes conceptual sense. I don't use this value in my shell programming, but it has to exist for ${x-default} vs ${x:-default}, and set -o nounset as far as I understand.


Yes I would be interested in annotations for smoosh! I split the file into assign and assign-extended, the idea being that POSIX shell tests are in the first file and all the ksh extensions are in the second. So that should reduce the work of annotating (i.e. not having to put N-I for "not implemented" everywhere).

And yes feel free to use the tests. Adding the URL and license would be appreciated.

Yes OSH is mostly Linux now. The end user tarball has run on OS X before (though I haven't tested it in awhile, I plan to for the 0.6.0 release). The build/dev.sh minimal probably doesn't run, but it's a plain Python 2.7 program with C extensions, so it's not that unportable. A lot of the shell scripts use coreutils rather than OS X utils.

Are you mainly developing smoosh on OS X or should I be able to build and run it on Linux?

We can chat on https://oilshell.zulipchat.com/ if it's easier (requires log in e.g. with Github).

andychu pushed a commit that referenced this issue Jun 10, 2019
It used to remove the name from the scope.

Added spec tests that reveal the difference.  Now there are 5 unique
behaviors for case 23 of spec/assign, rather than 6.  OSH behaves like
zsh and smoosh.

Inspired by conversation with Michael Greenberg in issue #329.
@andychu
Copy link
Contributor

andychu commented Jun 11, 2019

Fixed with 0.6.pre22: http://www.oilshell.org/release/0.6.pre22/changelog.html

@andychu andychu closed this as completed Jun 11, 2019
andychu pushed a commit that referenced this issue Jul 11, 2019
And expose a bug where the 'export' flag persists after unsetting!

Some notes in #288.  Related to the change in #329.

Also change style of spec tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants