Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

single quotes within double quoted brace sub treated differently for the # ## % %% / operators #291

Closed
andychu opened this issue Apr 30, 2019 · 28 comments

Comments

@andychu
Copy link
Contributor

andychu commented Apr 30, 2019

Forking from #290.

$ var=+-
$ echo "${var//['+-']}"
Line 16 of '/home/user/.config/oil/oshrc'                                                            
  echo "${var//['+-']}"
             ^
fatal: Error matching regex "(['+-'])": Invalid regex syntax (func_regex_first_group_match)

This is indeed a difference between OSH and bash/mksh. It has to do with whether the single quote and hyphen is a SHELL metcharacter or a REGEX metacharacter.

  • Is it a character class containing two characters + and - ?
  • Or does the char class contain a range of characters + through ' ? (which is a regex syntax error)

@Crestwave can you tell me why you have the single quotes? What are they supposed to do? How is it different than

  echo "${var//[+-]}"

This removes all + and - characters from the string in bash. I can't see what the extra single quotes do?

I'm not saying this is not a bug -- it's definitely a difference between bash and OSH. But I'm trying to understand the intention behind the code. Thanks.

@andychu
Copy link
Contributor Author

andychu commented Apr 30, 2019

Here's a script that shows they behave similarly in bash:

$ cat _tmp/r.sh 
var="--++''--++''"
echo "no plus  -> ${var//+}"
echo "no minus -> ${var//-}"
echo "no plus or minus -> ${var//[+-]}"
echo "no plus or minus -> ${var//['+-']}"
$ bash _tmp/r.sh 
no plus  -> --''--''
no minus -> ++''++''
no plus or minus -> ''''
no plus or minus -> ''''

@andychu
Copy link
Contributor Author

andychu commented Apr 30, 2019

OK that makes more sense. Now I see what you meant by ['[]'].

Most of your test cases come down to a disagreement between OSH and bash about what single quotes means within a double quoted var sub.

I am having a hard time reconciling most of the bash behavior you observed with this behavior of bash:

$ echo "${unset:-'x'}"    # this prints three characters, not just  a single unquoted x !
'x'

Basically I treat single quotes as LITERALS inside double quotes for that reason.

In OSH, this affects the meaning of:

"${var#'x'}"
"${var//'x'}"

Which are the cases you filed bugs about.

Do you know of any intuitive explanation for the difference between the behaviors? In my mind if ' is a NOT metachar in the first case, it should not be a metchar in the other two. Or vice versa, if it is in cases 2 and 3, it should also be in the first, and I should get x rather than 'x'.

@andychu
Copy link
Contributor Author

andychu commented Apr 30, 2019

FWIW running osh -n on a script like this may give some insight to what OSH is doing:

echo "${unset:-'x'}"
echo ${unset:-'x'}

echo "${var#'x'}"
echo ${var#'x'}

echo "${var//'x'}"
echo ${var//'x'}

You will get something like this. In the first case the arg_word has a string of 3 characters. In the second case the arg word has a SQ token of one character.

    (C {(echo)} 
      {
        (DQ 
          (word_part.BracedVarSub
            token: <VSub_Name unset>
            suffix_op: (suffix_op.StringUnary op_id:VTest_ColonHyphen arg_word:{("'x'")})
          )
        )
      }
    )
    (C {(echo)} 
      {
        (word_part.BracedVarSub
          token: <VSub_Name unset>
          suffix_op: (suffix_op.StringUnary op_id:VTest_ColonHyphen arg_word:{(SQ <x>)})
        )
      }
    )

@Crestwave
Copy link
Contributor

Whoops, I didn't delete my comment in time. I remembered that there is a difference with the quotes but forgot exactly what.

@Crestwave
Copy link
Contributor

Ah, I think that it my use case it was interpreting the - as a range expression.

@andychu
Copy link
Contributor Author

andychu commented Apr 30, 2019

OSH was giving the syntax error because it was passing +-' through literally to the regex engine, and that is a bad range expression.

However that's not what bash does AFAICT. Bash appears to evaluate the '' as SHELL metacharacters, so the regex engine never sees them.

In my model of the world, bash is being inconsistent by treating single quotes in #1 differently than #2 and #3.

"${unset:-'x'}"    # 1
"${var#'x'}"        # 2
"${var//'x'}"        # 3

Two possibilities:

  • bash and mksh have some other evaluation model that is consistent that I'm not seeing. They do a lot of runtime parsing which OSH mostly avoids.
  • Or maybe they are just consistently inconsistent and there's no real reason for it.

Certainly I could parse single quotes within double quotes differently depending on whether the operator is :- or ( #, / ). That just seems really weird though.

I would have thought that the quoting rules are the same regardless of what the operator is. I think maybe the underlying reason is that OSH processes the operator and the quotes in a single pass, "at the same time".

But bash and mksh process the operator first, and then the inner quotes as a separate step. And the second step can use knowledge of the operator to perform an additional layer of dequoting (or not). This seems dumb to me but it's my best guess as to what's going on.

(And I would expect that this is explained nowhere in the manual for any shell, or in the POSIX spec ...)

@Crestwave
Copy link
Contributor

Crestwave commented Apr 30, 2019

Oh, I meant that Bash interpreted it as a range expression for my use-case when unquoted, which is why I quoted it.

However that's not what bash does AFAICT. Bash appears to evaluate the '' as SHELL metacharacters, so the regex engine never sees them.

The latter two are evaluated by pattern matching, not regex, which is what handles the quotes.

(And I would expect that this is explained nowhere in the manual for any shell, or in the POSIX spec ...)

It is documented in both the Bash manual and the POSIX spec, and pattern matching is what is standardly used in shells; the only application of regex I know of in POSIX-compatible shells is Bash's =~ operator.

@andychu
Copy link
Contributor Author

andychu commented Apr 30, 2019

Where is it documented that single quotes are literals in "${unset:-'x'}" but are shell metachars in "${x#'x'}" and "${x//'x'}" ?

@andychu
Copy link
Contributor Author

andychu commented Apr 30, 2019

Hm I guess you removed this part from your message (because Github sent me a mail earlier), but it doesn't explain the difference between the two constructs:

An ordinary character is a pattern that shall match itself. It can be any character in the supported character set except for NUL, those special shell characters in Quoting that require quoting, and the following three special pattern characters. Matching shall be based on the bit pattern used for encoding the character, not on the graphic representation of the character. If any character (ordinary, shell special, or pattern special) is quoted, that pattern shall match the character itself. The shell special characters always require quoting.

This would only explain the behavior outside of double quotes anyway. Once you're inside double quotes, you can argue that single quotes ARE quoted.

@andychu
Copy link
Contributor Author

andychu commented Apr 30, 2019

Anyway, my goal is not to have a debate about POSIX... my goal is to come up with some reasonable semantics for shell and then get rid of it :)

In other words, we should be able to write a manual with a "straight face", while still running existing shell scripts.

It sounds like you are not particularly attached to "${x//['[]']}" in your script. You would just as well write "${x//[\[\]]}" ? I would argue that the latter is an improvement in clarity. It's not mixing shell metacharacters and glob metacharacters.

Ideally OSH would have a parse error, but the runtime error is reasonably informative.

I think I should maintain a list of "unresolved cases", which would include this and the []] issue. (Even though POSIX resolves that case, I also think it can be avoided with the trivial patch of [\]], which is also an improvement.)

If there are trivial patches to a script that "improve" it, then I'm inclined to keep OSH simple. But someone could come along later with a widely used script that relies on the subtle behavior, and it might be worth changing OSH. What do you think?

@Crestwave
Copy link
Contributor

This would only explain the behavior outside of double quotes anyway. Once you're inside double quotes, you can argue that single quotes ARE quoted.

I think you're thinking about this the wrong way; don't think of quoting the parameter for it as actual quoting for its contents, but rather making it behave like a keyword, making * expand to match the parameter instead of a file and such.

See, globs and backslashes are also literal when quoted, as in keywords, but it evaluates it itself, which is what also handles the quotes. Or think about it in my original use: program=${program//[^'><+-.,[]']}; there are no double quotes visible there, but it works the same because it's a keyword.

The following four varieties of parameter expansion provide for substring processing. In each case, pattern matching notation (see Pattern Matching Notation), rather than regular expression notation, shall be used to evaluate the patterns. If parameter is '#', '*', or '@', the result of the expansion is unspecified. If parameter is unset and set -u is in effect, the expansion shall fail. Enclosing the full parameter expansion string in double-quotes shall not cause the following four varieties of pattern characters to be quoted, whereas quoting characters within the braces shall have this effect. In each variety, if word is omitted, the empty pattern shall be used.

@andychu
Copy link
Contributor Author

andychu commented May 1, 2019

Hm I'm having a very hard time following what you're saying. What do you even mean by "keyword"? Some examples would help.

Another thing that would help is if you can address the question above:

Where is it documented that single quotes are literals in "${unset:-'x'}" but are shell metachars in "${x#'x'}" and "${x//'x'}" ?

@andychu
Copy link
Contributor Author

andychu commented May 1, 2019

Also, dash and busybox ash agree with OSH on the "${var#'a'}" case. They do NOT turn a into the empty string in that case. bash and mksh do.

So are dash and busybox ash not POSIX compliant? No, POSIX simply doesn't say what the correct behavior is.

I care more about what "real" shell scripts rely on than POSIX... but you claimed that this behavior was documented somewhere, which seems false.

@Crestwave
Copy link
Contributor

Hm I'm having a very hard time following what you're saying. What do you even mean by "keyword"? Some examples would help.

Shell keywords, like case, [[ in Bash, etc. They are special, which is how you can use globs in them without it matching a filename instead.

Another thing that would help is if you can address the question above:

Where is it documented that single quotes are literals in "${unset:-'x'}" but are shell metachars in "${x#'x'}" and "${x//'x'}" ?

Does my most recent quote not document that? If you're not satisfied with that, they're documented to use pattern matching, where quotes are special.

@Crestwave
Copy link
Contributor

Also, dash and busybox ash agree with OSH on the "${var#'a'}" case. They do NOT turn a into the empty string in that case. bash and mksh do.

So are dash and busybox ash not POSIX compliant? No, POSIX simply doesn't say what the correct behavior is.

I care more about what "real" shell scripts rely on than POSIX... but you claimed that this behavior was documented somewhere, which seems false.

Wait, what? I tried it on ash, Android's sh, bash, dash, ksh, yash, and even zsh and all received an empty string.

@andychu
Copy link
Contributor Author

andychu commented May 1, 2019

OK let me check in a test case so we can agree on the observations.

I don't think you're understanding my question. Explain how the quote you gave addresses the difference between :- and (# and / ) ? It talks about the difference between

"${a#x}" # "Enclosing the full parameter expansion string in double-quotes shall not cause"

and

"${a#"x"}" # "whereas quoting characters within the braces shall have this effect"

That's not the issue I'm talking about.

@Crestwave
Copy link
Contributor

OK let me check in a test case so we can agree on the observations.

I don't think you're understanding my question. Explain how the quote you gave addresses the difference between :- and (# and / ) ? It talks about the difference between

The quote only applies to #, ##, %, and %% (since / is not POSIX).

"${a#x}" # "Enclosing the full parameter expansion string in double-quotes shall not cause"

and

"${a#"x"}" # "whereas quoting characters within the braces shall have this effect"

That's not the issue I'm talking about.

Single-quotes are quotes, too...

The various quoting mechanisms are the escape character, single-quotes, and double-quotes.

@andychu
Copy link
Contributor Author

andychu commented May 1, 2019

I understand that globs are not expanded for case and [[, but I don't understand what bearing that has on this discussion.

The issue is whether the single quotes are processed by the shell, or not processed and sent through to the string matching engine.

@Crestwave
Copy link
Contributor

Crestwave commented May 1, 2019

I understand that globs are not expanded for case and [[, but I don't understand what bearing that has on this discussion.

The issue is whether the single quotes are processed by the shell, or not processed and sent through to the string matching engine.

From what I understand, they are not processed, and the string matching engine is what removes them. I brought up those two because I think the same thing happens there, and IIRC OSH does handle single-quotes properly in them.

andychu pushed a commit that referenced this issue May 1, 2019
andychu pushed a commit that referenced this issue May 1, 2019
@andychu
Copy link
Contributor Author

andychu commented May 1, 2019

I checked in repros for the issues we're discussing here so we can be on the same page:

http://www.oilshell.org/git-branch/dev/andy-11/a21a1b80/spec/bugs.html

Now I see that bash, mksh, and zsh all AGREE on that single quotes are processed by the shell in this case. So that's very strong evidence, so we should probably change it.

dash and ash disagree in both cases #4 and #5. They strip the string "'a'" -- three characters, not "a" (one character).

However I do think dash and ash are less "important". The versions I'm using are perhaps a bit old:

http://www.oilshell.org/blob/spec-bin/

If newer versions of dash and ash adopt the bash/mksh/zsh behavior, then that would be very strong evidence to change it.


Regarding POSIX, I see what you're saying -- single quotes are quotes -- but where does it document this behavior?

$ echo ${unset:-'a'}
a
$ echo "${unset:-'a'}"
'a'

In the second case, the single quotes are NOT processed by the shell. My claim is that this behavior is inconsistent with the above behavior of # and /, and it's not documented anywhere.

The best rule I can come up with is that the # and / operators take GLOBS, but :- does not take a glob. So somehow single quotes are treated differently.

Anyway, like I said, I care more about what shells do and what shell scripts rely on rather than POSIX, but that is why I'm not really satisifed by the quote. But we don't have to resolve this debate to move forward.

@andychu andychu changed the title regex engine error in $ echo "${var//['+-']}" single quotes within double quoted brace sub treated differently for the # ## % %% / operators May 1, 2019
@Crestwave
Copy link
Contributor

Crestwave commented May 1, 2019

I checked in repros for the issues we're discussing here so we can be on the same page:

http://www.oilshell.org/git-branch/dev/andy-11/a21a1b80/spec/bugs.html

Now I see that bash, mksh, and zsh all AGREE on that single quotes are processed by the shell in this case. So that's very strong evidence, so we should probably change it.

dash and ash disagree in both cases #4 and #5. They strip the string "'a'" -- three characters, not "a" (one character).

However I do think dash and ash are less "important". The versions I'm using are perhaps a bit old:

http://www.oilshell.org/blob/spec-bin/

If newer versions of dash and ash adopt the bash/mksh/zsh behavior, then that would be very strong evidence to change it.

I'll double-check ash when I get back to my machine, but I can confirm that dash 0.5.10.2 returns an empty string in the test case.

Regarding POSIX, I see what you're saying -- single quotes are quotes -- but where does it document this behavior?

$ echo ${unset:-'a'}
a
$ echo "${unset:-'a'}"
'a'

In the second case, the single quotes are NOT processed by the shell. My claim is that this behavior is inconsistent with the above behavior of # and /, and it's not documented anywhere.

The best rule I can come up with is that the # and / operators take GLOBS, but :- does not take a glob. So somehow single quotes are treated differently.

That's kind of it; the single-quotes are literal, as they should be since it's quoted, but #, %, and / are then evaluated by pattern matching, which handles quoting as I mentioned above. The quote I deleted before was from the POSIX specification of pattern matching:

An ordinary character is a pattern that shall match itself. It can be any character in the supported character set except for NUL, those special shell characters in Quoting that require quoting, and the following three special pattern characters. Matching shall be based on the bit pattern used for encoding the character, not on the graphic representation of the character. If any character (ordinary, shell special, or pattern special) is quoted, that pattern shall match the character itself. The shell special characters always require quoting.

And as I mentioned earlier, POSIX mentions it again specifically for #, ##, %, and %%:

The following four varieties of parameter expansion provide for substring processing. In each case, pattern matching notation (see Pattern Matching Notation), rather than regular expression notation, shall be used to evaluate the patterns. If parameter is '#', '*', or '@', the result of the expansion is unspecified. If parameter is unset and set -u is in effect, the expansion shall fail. Enclosing the full parameter expansion string in double-quotes shall not cause the following four varieties of pattern characters to be quoted, whereas quoting characters within the braces shall have this effect. In each variety, if word is omitted, the empty pattern shall be used.

andychu pushed a commit that referenced this issue May 1, 2019
The issue is that we were computing s[:-0] when the suffix is empty.
Added an explicit check to fix it.

Addresses issue #291.

I also moved the spec tests around.  This bug was reported by Crestwave
as the "nested strip" case in spec/var-op-strip.
@andychu
Copy link
Contributor Author

andychu commented May 1, 2019

I just fixed the bug you mentioned on the other thread with

$ printf '%q\n' "${var%"${var#?}"}"

But it actually just boiled down to ${var%''} ! Thanks for testing.


As for the single quote issue, I think I'll end up changing it at some point in the future. Let me see if I can run your BF interpreter first :) It sounded like that change was not essential to run it.

@Crestwave
Copy link
Contributor

Crestwave commented May 1, 2019

I just fixed the bug you mentioned on the other thread with

$ printf '%q\n' "${var%"${var#?}"}"

But it actually just boiled down to ${var%''} ! Thanks for testing.

That just tries to remove two single-quotes because of this issue. Did you mean "${var%""}"?

As for the single quote issue, I think I'll end up changing it at some point in the future. Let me see if I can run your BF interpreter first :) It sounded like that change was not essential to run it.

The one written in sh now works with double-quotes instead of single ones, thanks! Although read exits with osh I/O error: Interrupted system call when the window is resized; let me know if I should make an issue for this. Also, I can confirm that BusyBox 1.30.1's ash returns an empty string with the single quoting.

@andychu
Copy link
Contributor Author

andychu commented May 1, 2019

There was a bug whenever the suffix to be stripped by % was empty, regardless of how it was computed. It didn't rely on nesting.

Yes please file a bug about read! That looks like a SIGWINCH issue.

Thanks for the confirmation. I noticed that ash has been copying bash features, so they are probably also fixing corner cases to be more like bash.

@Crestwave
Copy link
Contributor

Yes please file a bug about read! That looks like a SIGWINCH issue.

#292

Thanks for the confirmation. I noticed that ash has been copying bash features, so they are probably also fixing corner cases to be more like bash.

Isn't POSIX compliancy also one of their goals?

@andychu
Copy link
Contributor Author

andychu commented Jun 22, 2019

@wertercatt hit this in #161 with:

  GAMEROOT=$(cd "${0%/*}" && echo $PWD)
                ^
hl.sh:1: cd '/home/andy/git/oilshell/oil/hl.sh': Not a directory

@Crestwave
Copy link
Contributor

@wertercatt hit this in #161 with:

  GAMEROOT=$(cd "${0%/*}" && echo $PWD)
                ^
hl.sh:1: cd '/home/andy/git/oilshell/oil/hl.sh': Not a directory

I'm not sure how that is related to this issue, though? Actually, that seems to be because they ran it as osh hl.sh, so there was no / and thus it expanded to hl.sh. osh ./hl.sh or running it from PATH should work.

andychu pushed a commit that referenced this issue Apr 7, 2020
- Fixes #698
- Fixes fallout from #695 (failing test in spec/var-sub-quote)

There is still a different failing test case to remind us to overhaul
this code.  And we still need to fix issue #291 too.

This area of the code needs a bunch of work.
@andychu andychu mentioned this issue Apr 15, 2020
5 tasks
andychu pushed a commit that referenced this issue Apr 15, 2020
This is a weird exception for the # ## % %% and / operators.

Fixes issue #291.
@andychu
Copy link
Contributor Author

andychu commented Apr 15, 2020

Finally fixed this! It wasn't that hard to fix, but it is a weird rule ... there is no syntactic indication of the difference between "${x-'single'}" and "${x#'single'}".

@andychu andychu closed this as completed Apr 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants