Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive arithmetic evaluation (ble.sh) #648

Closed
akinomyoga opened this issue Mar 9, 2020 · 14 comments
Closed

Recursive arithmetic evaluation (ble.sh) #648

akinomyoga opened this issue Mar 9, 2020 · 14 comments

Comments

@akinomyoga
Copy link
Collaborator

In Bash, arithmetic evaluation is performed recursively when the variable is referenced in the arithmetic expressions. The script ble.sh uses this behavior in some part. Oil seems not to support this.

$ bash -c 'a="x=123"; ((a)); echo $x'
123
$ bin/osh -c 'a="x=123"; ((a)); echo $x'
[??? no location ???] warning: Invalid integer constant 'x=123'
$

Maybe the above use case is not popular for light users, but I think the following case is what many users can fall into, which is related to either the expansion order or the recursive evaluation:

$ bin/osh -c 'a="x=123"; (($a)); echo $x'
  a="x=123"; (($a)); echo $x
                ^~
[ -c flag ]:1: warning: Invalid integer constant 'x=123'
$
@andychu
Copy link
Contributor

andychu commented Mar 9, 2020

This is intentional, and this is the first time I've heard of someone using it on purpose.

It relates to the security problem here, which has been documented by the Fedora security team, etc.

http://www.oilshell.org/blog/2019/01/18.html#a-story-about-a-30-year-old-security-problem

https://www.oilshell.org/release/0.8.pre2/doc/known-differences.html#static-parsing (could be elaborated on maybe)

I added a note here that an explicit eval can be a small rewrite:

https://github.com/oilshell/oil/wiki/What-Is-Expected-to-Run-Under-OSH

@andychu
Copy link
Contributor

andychu commented Mar 9, 2020

Another way to think about it is that this restriction is exactly the one that allowed Oil to find the coding error with ?:

(( x ? 42 $c ))   # is this valid?

In bash, it's valid if c=':43', and it's a runtime syntax error if c='43'. In Oil it's always a syntax error because arithmetic is statically parsed.

@akinomyoga
Copy link
Collaborator Author

akinomyoga commented Mar 9, 2020

this is the first time I've heard of someone using it on purpose.

Oh, really. Hmm, ble.sh might be one of the highly non-trivial type of Bash scripts such that even small part of the codes is out of scope of Oil...

It relates to the security problem here, which has been documented by the Fedora security team, etc.

Actually the security issue of the arithmetic expansion is more specifically related to the shell expansion of the arithmetic subscripts mentioned in #645 . Consider the code var='a[$(malicious_code)]'; : $((var)), so one should not reference variables whose values are not under control in Bash scripts. If there was no shell expansion, the worst which could be done with the recursive evaluation is to create a pseudo infinite loop by i=0 var='i<=1020&&(i++,var,var,i--)'; : $((var)) (Note: Bash restricts the maximal recursion depth to be around 1023).

Another way to think about it is that ...

Thank you for the explanation! So Oil only allows the static arithmetic expressions (except for eval arguments). I now noticed that let isn't either supported by Oil. I also like let because I sometimes enjoy let in combination with brace expansions (such as let i=0 i+={0..100}) which cannot be achieved by (()).

I understand the decision that Oil made. But I think it will be helpful if some of these examples (#644 #645 #646 #648) are explicitly described in known-differences.html. By the way I have related naive questions:

Q1. Why #640 ((${prefix}a=1)) is considered but #644 ((${prefix}a)) is not considered?

It appears to be asymmetric. Maybe under the philosophy of Oil, #640 doesn't need to be supported as well. Or, I think it is also useful to allow ((${prefix}a)) only in the case that the expanded result of ${prefix}a forms a valid identifier.

Q2. How to reference associative array elements in arithmetic expressions?

For the other cases there is a simple workaround that works in both Bash and Oil, so it is fine if Oil doesn't support historical Bash behaviors. But for #645, I'm not sure how to write the code to reference an associative array element with keys being parameter expansions in arithmetic expressions so that it works in both of Bash and Oil. As explained in #645, dict[$key] and dict["$key"] doesn't work in Bash, and dict[\$key] and dict['$key'] doesn't work in Oil. There is no universal choice. The only solution that I come up with for now is to load a value of the associative array element in another variable before entering the arithmetic evaluation, which is non-trivial for users.

Q3. Checks of associative array keys on designated initializers?

If Oil requires the associative array subscripts to be quoted, is there reason that the following constructs (mentioned in #646 ) without quoting is allowed and works as expected? Maybe additional checks can be performed on execution to ensure the quoting of associative array keys.

$ bin/osh -c 'declare -A d=([a]=123); echo ${d["a"]}'
123

The following is the case where the type of the variable cannot be determined statically (which seems not be prohibited by Oil):

$ bin/osh -c 'v=A; declare -$v d=([a]=123); echo ${d["a"]}'
123

@andychu
Copy link
Contributor

andychu commented Mar 9, 2020

Q1. Good question, in (( )), the RHS always evaluates to an integer, and a LHS now is a constant but I would make it evaluate to a string. Neither side would do recursive evaluation -- there's only on step of evaluation:

e.g.

name=3
(( x$name = bar ))  # LHS is, sets x3
(( x = 2$name   ))   # RHS is 23

expr=1+2
(( x$expr = 3 )) # doesn't work, no recursive evaluation
(( x = expr ))   # also doesn't work, no recursive evaluation

So are you saying that BLE relies on the latter behavior on the LHS? I was thinking the former behavior (a string part of a variable name) would be more common, but I didn't inspect the code to see. In any case osh -n '(( ${x}a = ${x}b )) would be accepted, but it's not right now. That is the change that's being considered.

@andychu
Copy link
Contributor

andychu commented Mar 9, 2020

Q2. I'm not sure why dict['$key'] doesn't work in bash, but something that works in both shells is to surround it with ${}.

declare -A A=(['$foo']=42)                                                                                                                                                                                                                    
echo foo=${A['$foo']}                                                                                                                                                                                                                         
                                                                                                                                                                                                                                              
(( d = 1 + ${A['$foo']} ))                                                                                                                                                                                                                    
echo d=$d      # d=43

I'm not sure what bash does to be honest... I was surprised just now testing the behavior of assoc arrays in (( )).

I think Oil's behavior is "explainable" and consistent but I'm open to more
counterexamples...

@andychu
Copy link
Contributor

andychu commented Mar 9, 2020

OK the reason I'm confused is because in bash 4.4 this doesn't work:

declare -A A=(['$foo']=42)
(( c = 1 + A[\$foo] ))                                                                                                                                                                                                                        
echo c=$c                                                                                                                                                                                                                                     

I get

foo=42
_tmp/a.sh: line 9: A: bad array subscript
c=1

@andychu
Copy link
Contributor

andychu commented Mar 9, 2020

Q3. Yes I thought about this but didn't do it... It would be possible to add those checks.

It's sort of an internal parsing detail -- we parse the whole word, and then extract parts within [] as the subscript, but we don't check if they're quoted. Where as in $A{['key']} we also accept an expression.

Filed #652 for this

@andychu
Copy link
Contributor

andychu commented Mar 9, 2020

Also Oil the ([k]=v) is automatically assumed to be an assoc array, it relates to this issue:

https://www.oilshell.org/release/0.8.pre2/doc/known-differences.html#indexed-and-associative-arrays-are-distinct

@andychu
Copy link
Contributor

andychu commented Mar 9, 2020

OK I misunderstood the meaning of Q2!

(( A[\$file]+=1 ))

in bash uses the variable file instead of the string '$file' . That is very very confusing...

andychu pushed a commit that referenced this issue Mar 9, 2020
Works in both bash and osh.

Related to discussion on #648.
@andychu
Copy link
Contributor

andychu commented Mar 9, 2020

This was the example that was a syntax error. I suggested using '$file' (quoted), but that is not what bash does! It looks like you can just remove the \ and it works the same way in bash -- it treats it as a variable.

See the test case in the commit above -- it passes under osh and bash.

src/history.sh
        ((_ble_builtin_history_rskip_dict[\$file]+=$2))

@akinomyoga
Copy link
Collaborator Author

akinomyoga commented Mar 9, 2020

Q1

So are you saying that BLE relies on the latter behavior on the LHS?

Do you mean this expr=1+2; (( x$expr = 3 ))? No, ble.sh doesn't use this type of parameter expansion which changes the syntactic structure. It only uses the former behavior where x$expr is expanded into a single parameter name.

Besides that, ble.sh uses recursive arithmetic evaluations. By recursive arithmetic evaluation, I mean the case expr=1+2; ((x = expr)). It's definitely different from shell expansions in arithmetic commands such as ((x = $expr)) and (( x$name = bar )). I think my initial explanation was confusing. In the case expr=1+2; ((x = expr)), Bash does not expand the arithmetic expression to ((x = 1 + 2)) but performs two independent arithmetic evaluation separately. When Bash encounters the variable name expr, it invokes another arithmetic evaluation on 1+2 to get the result 3 and then evaluates x = 3. The interpretation of the syntactic structure will never be interfered by any subexpression stored in expr because they are evaluated as separate arithmetic expressions. With this recursive arithmetic evaluation, it is even possible to make a recursive call of arithmetic expressions. The following Bash code is some example to sum up 0..100 by recursive arithmetic calls of loop:

$ loop='(i<=100)&&(s+=i,i++,loop)'; let i=0 s=0 loop; echo "$s"

In any case osh -n '(( ${x}a = ${x}b )) would be accepted, but it's not right now. That is the change that's being considered.

OK, I think that's understandable design.

Q2

I'm sorry. I should have make it clear what we want to do in these example. I consider the situation that we have some key a stored in a shell variable key (key=a) and want to access the associative array element dict['a'] through the shell variable key.

Maybe I also need to point out the fact that Bash doesn't accept an empty string as a key, which may be one of the factor that confuses you.

but something that works in both shells is to surround it with ${}.

Ah, yes. You are right. I generally don't want to use parameter expansions of uncontrolled variables in arithmetic expressions that can cause unexpected syntactic structure of arithmetic expressions, but it should work once we assume that the associative array contains only atomic arithmetic expressions. Edit: I noticed that I cannot use this technique when the associative array element is in the LHS of assignments.

declare -A A=(['$foo']=42)
(( c = 1 + A[\$foo] ))
echo c=$c

I get

foo=42
_tmp/a.sh: line 9: A: bad array subscript
c=1

This is caused by an empty key (which Bash doesn't allow). The following example should work:

$ foo=X; declare -A A=(['$foo']=42); (( c = 1 + A[\$foo] )); echo c=$c
c=1

It looks like you can just remove the \ and it works the same way in bash

I cannot remove \ in ((_ble_builtin_history_rskip_dict[\$file]+=$2)) because here the variable file contains a user-supplied value and can be any strings. If I don't quote $file, the syntactic structures can be distorted by some values like file='1],a[$(malicious_code)', etc. My explanation in #645 might not be so clear, but now I think you can understand the explanation in #645. I recommend you to read it again.

Q3

Thank you!

abathur pushed a commit to abathur/oil that referenced this issue Mar 10, 2020
Works in both bash and osh.

Related to discussion on oils-for-unix#648.
@andychu
Copy link
Contributor

andychu commented Mar 10, 2020

OK I think #640 and #645 and #652 cover everything. Is there anything left to discuss here?

@akinomyoga
Copy link
Collaborator Author

Nothing. The recursive evaluation of arithmetic expressions are different from #640, #645 and #645, but I think you can just describe the recursive evaluation of arithmetic expressions in known-differences.html unless you consider missing recursive evaluation is trivial. (Note: this recursive nature is documented in Bash manual 6.5 "The value of a variable is evaluated as an arithmetic expression when it is referenced"). You can close the issue now. Thank you.

@andychu
Copy link
Contributor

andychu commented Apr 3, 2020

@akinomyoga I implemented shopt -s unsafe_arith_eval and the first 7 cases of spec/ble-idioms pass now.

400f670

Working on unset next which I think should be pretty similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants