Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #79: Incorrect parsing of complex ++ expressions #215

Merged
merged 13 commits into from
Oct 19, 2023

Conversation

fioriandrea
Copy link
Contributor

@fioriandrea fioriandrea commented Oct 11, 2023

As the issue pointed out, there was an error while parsing complex post-increment expressions involving field expressions. The fix on the parser was easy enough. However, while working on the parser, I realized that there was a bug in the compiler too. For example:

./goawk 'BEGIN { $0="3 4 5 6 7 8 9"; a=3; print ($($a++)++); print $0; print a }' prints 7\n3 4 7 6 7 8 9\n3, whereas

gawk 'BEGIN { $0="3 4 5 6 7 8 9"; a=3; print ($($a++)++); print $0; print a }' prints 7\n3 4 6 6 8 8 9\n3

./goawk 'BEGIN { n = split("12345", a, ""); i = 1; a[a[a[1]++]++]++; for (i=1;i<=n;i++) printf a[i]; print } prints 34345\n whereas

gawk 'BEGIN { n = split("12345", a, ""); i = 1; a[a[a[1]++]++]++; for (i=1;i<=n;i++) printf a[i]; print } prints 33345\n.

This pull request fixes both issues. I uncommented the TODO tests in interp/interp_test.go and added one more.

Fixes #79

post-increment/post-decrement operators

there is still a bug in the compiler and vm for this kind of expression
@benhoyt
Copy link
Owner

benhoyt commented Oct 11, 2023

This looks great at a glance, and fixes a long-standing issue. Thanks @fioriandrea! I'll take a look at the code change more closely in the next few days.

Copy link
Owner

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is good work, and as I've noted has uncovered some other (related) issues! Let me know how you'd like to proceed.

parser/parser.go Outdated Show resolved Hide resolved
parser/parser.go Show resolved Hide resolved
internal/compiler/compiler.go Outdated Show resolved Hide resolved
} else {
c.expr(e.Expr)
c.incrExpr(e)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've actually uncovered (or made me notice) a larger problem with GoAWK here! All the augmented assignment operations that assign to an lvalue -- ++ / --, += and other augmented assignment operators, and sub / gsub (because they can assign to an lvalue) -- are buggy when used as expressions too.

They evaluate the field/array index twice. For example, in the above, c.expr(e.Expr) evaluates the field/array index for FieldExpr/IndexExpr, and the c.assign(e.Expr) does it again. This is a bug.

I've added a bunch of tests in this Gist that show this.

So we actually need to step back a bit and fix it in expr for IncrExpr, AugAssignExpr, and CallExpr for the sub / gsub cases. I had a play with this today, but it's not trivial!

My current thinking is to not use c.expr() directly on the expression, but use a function with only handles lvalues and saves the index away somewhere to be used by c.assign later. But I'm not sure -- there's bound to be a more elegant way to solve this.

If you'd like to suggest a good way to fix all of these, go ahead. Otherwise we can reduce this PR to just fixing the parsing issue and I'll address the compiler bugs in a separate PR.

Copy link
Contributor Author

@fioriandrea fioriandrea Oct 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible solution could be to introduce a data structure on the virtual machine side (like a parallel stack just for the indices) where you can save the indices and then use them within c.assign by fetching them with specific instructions.

Another type of solution could be to create functions similar to my function incrExpr that handle all cases (one for IncrExpr with pre-increment, one for IncrExpr with post-increment, one for AugAssignExpr, and one for sub and gsub, all with switch statements similar to those in my incrExpr function). I played a bit with this approach and it seemed to work, but there was lots of repetitive code. Maybe it can be factored somehow. Also, I didn't know exactly how to handle sub/gsub without introducing a new instruction in the VM, since sub/gsub push two values on the stack.

A final approach (which is what I committed) is to save the index on the stack and retrieve it using the Roll instruction (similar to the roll instruction in Forth), which allows you to move an element from a lower position on the stack to the top. So, essentially, the solution is similar to the first one, but instead of creating a new data structure, the stack is reused. I understand that this approach might be inefficient as you need to "shift" the stack by a few elements each time you call Roll. I was tempted to just copy the value without shifting the stack, but that way there would remain a lot of garbage on the stack. Also, the code I committed for this approach could be cleaned up a bit (maybe) though.

Let me know what you think

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! I think your basic approach is great. I've pushed a couple of commits to simplify a little: we don't need the full "roll-by-n" functionality, just what I've called Rote (rotate 3, which is called ROT in Forth). So I've replaced your Roll opcode with Rote which has the stack effect a b c -- b c a.

I've also tweaked the names of a couple of functions and removed some redundancy.

In addition, I removed the new ast.IsIndexed function, as I'd rather not export a new function just for that (it's really an implementation detail of the compiler). So I just did a type switch for those, which is only one line longer.

Let me know what you think, and if you can see anything else that needs adjusting before merging the PR.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, just in case: @fioriandrea

Copy link
Contributor Author

@fioriandrea fioriandrea Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code can be merged. I just committed one last additional test for nested gsub expressions (I think this case is worth testing, since gsub/sub, as opposed to the other augmented assignment operators, pushes two values on the stack)

@benhoyt benhoyt merged commit 9241da4 into benhoyt:master Oct 19, 2023
10 checks passed
@benhoyt
Copy link
Owner

benhoyt commented Oct 19, 2023

Thanks again @fioriandrea for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect parsing of complex ++ expressions
2 participants