-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue #79: Incorrect parsing of complex ++ expressions #215
Conversation
post-increment/post-decrement operators there is still a bug in the compiler and vm for this kind of expression
This looks great at a glance, and fixes a long-standing issue. Thanks @fioriandrea! I'll take a look at the code change more closely in the next few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is good work, and as I've noted has uncovered some other (related) issues! Let me know how you'd like to proceed.
internal/compiler/compiler.go
Outdated
} else { | ||
c.expr(e.Expr) | ||
c.incrExpr(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've actually uncovered (or made me notice) a larger problem with GoAWK here! All the augmented assignment operations that assign to an lvalue -- ++
/ --
, +=
and other augmented assignment operators, and sub
/ gsub
(because they can assign to an lvalue) -- are buggy when used as expressions too.
They evaluate the field/array index twice. For example, in the above, c.expr(e.Expr)
evaluates the field/array index for FieldExpr/IndexExpr, and the c.assign(e.Expr)
does it again. This is a bug.
I've added a bunch of tests in this Gist that show this.
So we actually need to step back a bit and fix it in expr
for IncrExpr
, AugAssignExpr
, and CallExpr
for the sub
/ gsub
cases. I had a play with this today, but it's not trivial!
My current thinking is to not use c.expr()
directly on the expression, but use a function with only handles lvalues and saves the index away somewhere to be used by c.assign
later. But I'm not sure -- there's bound to be a more elegant way to solve this.
If you'd like to suggest a good way to fix all of these, go ahead. Otherwise we can reduce this PR to just fixing the parsing issue and I'll address the compiler bugs in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possible solution could be to introduce a data structure on the virtual machine side (like a parallel stack just for the indices) where you can save the indices and then use them within c.assign
by fetching them with specific instructions.
Another type of solution could be to create functions similar to my function incrExpr
that handle all cases (one for IncrExpr
with pre-increment, one for IncrExpr
with post-increment, one for AugAssignExpr
, and one for sub
and gsub
, all with switch statements similar to those in my incrExpr
function). I played a bit with this approach and it seemed to work, but there was lots of repetitive code. Maybe it can be factored somehow. Also, I didn't know exactly how to handle sub/gsub
without introducing a new instruction in the VM, since sub/gsub
push two values on the stack.
A final approach (which is what I committed) is to save the index on the stack and retrieve it using the Roll instruction (similar to the roll instruction in Forth), which allows you to move an element from a lower position on the stack to the top. So, essentially, the solution is similar to the first one, but instead of creating a new data structure, the stack is reused. I understand that this approach might be inefficient as you need to "shift" the stack by a few elements each time you call Roll. I was tempted to just copy the value without shifting the stack, but that way there would remain a lot of garbage on the stack. Also, the code I committed for this approach could be cleaned up a bit (maybe) though.
Let me know what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I think your basic approach is great. I've pushed a couple of commits to simplify a little: we don't need the full "roll-by-n" functionality, just what I've called Rote
(rotate 3, which is called ROT
in Forth). So I've replaced your Roll
opcode with Rote
which has the stack effect a b c -- b c a
.
I've also tweaked the names of a couple of functions and removed some redundancy.
In addition, I removed the new ast.IsIndexed
function, as I'd rather not export a new function just for that (it's really an implementation detail of the compiler). So I just did a type switch for those, which is only one line longer.
Let me know what you think, and if you can see anything else that needs adjusting before merging the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, just in case: @fioriandrea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code can be merged. I just committed one last additional test for nested gsub expressions (I think this case is worth testing, since gsub/sub, as opposed to the other augmented assignment operators, pushes two values on the stack)
… post-increment is necessary
Implement Roll instruction to save the index on the stack and retrieve it at a later point through Roll
Thanks again @fioriandrea for the fix! |
As the issue pointed out, there was an error while parsing complex post-increment expressions involving field expressions. The fix on the parser was easy enough. However, while working on the parser, I realized that there was a bug in the compiler too. For example:
./goawk 'BEGIN { $0="3 4 5 6 7 8 9"; a=3; print ($($a++)++); print $0; print a }'
prints7\n3 4 7 6 7 8 9\n3
, whereasgawk 'BEGIN { $0="3 4 5 6 7 8 9"; a=3; print ($($a++)++); print $0; print a }'
prints7\n3 4 6 6 8 8 9\n3
./goawk 'BEGIN { n = split("12345", a, ""); i = 1; a[a[a[1]++]++]++; for (i=1;i<=n;i++) printf a[i]; print }
prints34345\n
whereasgawk 'BEGIN { n = split("12345", a, ""); i = 1; a[a[a[1]++]++]++; for (i=1;i<=n;i++) printf a[i]; print }
prints33345\n
.This pull request fixes both issues. I uncommented the TODO tests in
interp/interp_test.go
and added one more.Fixes #79