-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add tests and started implementation of an expression parser #42
[WIP] Add tests and started implementation of an expression parser #42
Conversation
Hi @MarcelloDuarte, Perhaps if we keep the idea of This looks like an awesome start! |
src/parsers.php
Outdated
@@ -8,7 +8,7 @@ | |||
|
|||
function variable(): Parser | |||
{ | |||
return rtoken('/[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/'); | |||
return rtoken('/\$[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure you already know it, this is just an obvious reminder: variable has some other branches like https://3v4l.org/10bFt and the not so popular var var $$$x
syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to be missing something about rtoken
. I would think the following would work:
return rtoken('/\$[\$]+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/');
but it doesn't. $$a
is matched as $a
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single \$+
works nicely according to https://regex101.com/r/WhuVPY/1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it should work, but phpunit
disagrees:
There was 1 failure:
1) Yay\SpecsTest::testSpecs with data set "/Users/mduarte/Developer/code/MarcelloDuarte/yay/tests/phpt/expr/variable.phpt" (Yay\Test Object (...))
Failed asserting that format description matches text.
--- Expected
+++ Actual
@@ @@
<?php
-expression {
- $$a
+$expression {
+ $a
}
?>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking, did you really use /\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/
as the regexp? This seems to work as specified:
use \Yay\{TokenStream, function rtoken};
$ts = TokenStream::FromSourceWithoutOpenTag('$a $bb $ccc');
token(T_OPEN_TAG))->parse($ts); // jump over <?php
var_dump(rtoken('/\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/')->parse($ts));
# T_VARIABLE($a)
var_dump(rtoken('/\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/')->parse($ts));
# T_VARIABLE($bb)
var_dump(rtoken('/\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/')->parse($ts));
# T_VARIABLE($ccc)
var_dump(rtoken('/\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/')->parse($ts));
# null
If so, that would be scary, perhaps PCRE engine is behaving differently on our builds 🤔 Are you using Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I did use it. It works fine with normal variables, but not with variable variable:
var_dump(\Yay\rtoken('/\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/')->parse(\Yay\TokenStream::fromSourceWithoutOpenTag('$$a')));
# null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so there was one issue fixed with a2329f8.
But the real culprit is that PHP tokenizes variable variables as [token('$'), token('$'), token('$a')]
instead of [token('$$a')]
, hence why rtoken
would never work.
The following is guaranteed to work:
use \Yay\{TokenStream};
use function \Yay\{chain, optional, repeat, token};
$variable = chain(optional(repeat(token('$')))->as('varvar'), token(T_VARIABLE)->as('var'));
$ts = TokenStream::FromSourceWithoutOpenTag('$a $$b $$$c');
var_dump($variable->parse($ts)); // Ast for $a
var_dump($variable->parse($ts)); // Ast for $$b
var_dump($variable->parse($ts)); // Ast for $$$c
var_dump($variable->parse($ts)); // null
And sorry, my previous test was wrong.
👍
tests/phpt/expr/instanceof.phpt
Outdated
<?php | ||
|
||
expression { | ||
$ainstanceof\Some\Large\FQCN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that whitespace should be preserved in the Ast every time we get an operator that is also a valid identifier, so expansion won't break like in this case $ainstanceof\Some\Large\FQCN
. A self contained example.:
$instanceof = chain(
chain(variable(), indentation())->as('left'),
token(T_INSTANCEOF)->as('operator'),
chain(indentation(), ns())->as('right')
)
->as('expr');
$ts = \Yay\TokenStream::fromSource('<?php $a instanceof \Some\Large\FQCN');
\Yay\token(T_OPEN_TAG)->parse($ts); // get rid of <?php
$ast = $instanceof->parse($ts);
$src = '';
$nodes = $ast->array();
array_walk_recursive($nodes, function($n) use(&$src){ $src .= $n; });
var_dump($src); #>> "$a instanceof \Some\Large\FQCN"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in f60fe4f
`Parser::optmize()` makes difference on very complex parser stacks, like `expression()`. Activating and deactivating the optimizations should yield all the tests passed, except that with optimizations off the results are generally: ``` Time: 15.49 seconds, Memory: 168.46MB // high mem usage, slow :( ``` And with optimizations turned on, results should be: ``` Time: 763 ms, Memory: 16.00MB // low mem usage, much faster :D ``` The optimization can be activated on macros declarations as well: ``` macro ·optimize { [...] } >> { [...] } ``` Relates to #42 and #43 and preprocess/pre-short-closures/issues/#4
`Parser::optmize()` makes difference on very complex parser stacks, like `expression()`. Activating and deactivating the optimizations should yield all the tests passed, except that with optimizations off the results are generally: ``` Time: 15.49 seconds, Memory: 168.46MB // high mem usage, slow :( ``` And with optimizations turned on, results should be: ``` Time: 763 ms, Memory: 16.00MB // low mem usage, much faster :D ``` The optimization can be activated on macros declarations as well: ``` macro ·optimize { [...] } >> { [...] } ``` Relates to #42 and #43 and preprocess/pre-short-closures#4
Added `expression()` parser with support for registering new operators. There is no macro support for declaring and visiting the newly added operator AST during expansions right now, so macro based operators will come later. Relates to #42 and preprocess/pre-short-closures#4 Closes #43
`Parser::optmize()` makes difference on very complex parser stacks, like `expression()`. Activating and deactivating the optimizations should yield all the tests passed, except that with optimizations off the results are generally: ``` Time: 15.49 seconds, Memory: 168.46MB // high mem usage, slow :( ``` And with optimizations turned on, results should be: ``` Time: 763 ms, Memory: 16.00MB // low mem usage, much faster :D ``` The optimization can be activated on macros declarations as well: ``` macro ·optimize { [...] } >> { [...] } ``` Relates to #42 and #43 and preprocess/pre-short-closures#4
I really appreciate the efforts here, but turns out the task was quite complex. First, it was necessary to add indirect and direct left recursion to the parser, then it became necessary to add knowledge about operators and operator precedence and, later, be able to self-optimize the resulting parser stacks - so the resulting grammar was at least efficient. Refer to ExpressionParser. The resulting grammar was converted from zend_language_parser.y and may certainly contain a few bugs right now. Contributions are still very welcome 😄 |
TODO
This is the todo of what the expression parser should cover based this grammar available on the PHP repo.
' backticks_expr '
'