Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add tests and started implementation of an expression parser #42

Conversation

MarcelloDuarte
Copy link

@MarcelloDuarte MarcelloDuarte commented Aug 27, 2017

TODO

This is the todo of what the expression parser should cover based this grammar available on the PHP repo.

  • variable
  • T_LNUMBER
  • T_DNUMBER
  • other_scalar_expr
  • T_LIST ( array_pair_list )
  • array_pair_list
  • variable phpOperator expr
  • expr phpOperator expr
  • variable = & variable
  • T_CLONE expr
  • variable instanceof ns()
  • expr instanceof ns()
  • ( expr )
  • new_expr | new anonumous class
  • ternary_expr
  • expr '?' ':' expr
  • expr T_COALESCE expr
  • casting expr
  • '@' expr
  • '' backticks_expr ''
  • T_PRINT expr
  • T_YIELD | T_YIELD expr | T_YIELD expr T_DOUBLE_ARROW expr | T_YIELD_FROM expr
  • anonymous function

@marcioAlmada
Copy link
Owner

marcioAlmada commented Aug 28, 2017

Hi @MarcelloDuarte,

Perhaps if we keep the idea of ·expr() match as "greedy" and avoid internal language knowledge like operator precedence, at least for now, the feature will cover 90% of use cases while maintaining a high KISS factor.

This looks like an awesome start!

src/parsers.php Outdated
@@ -8,7 +8,7 @@

function variable(): Parser
{
return rtoken('/[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/');
return rtoken('/\$[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/');
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure you already know it, this is just an obvious reminder: variable has some other branches like https://3v4l.org/10bFt and the not so popular var var $$$x syntax.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to be missing something about rtoken. I would think the following would work:

    return rtoken('/\$[\$]+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/');

but it doesn't. $$a is matched as $a.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single \$+ works nicely according to https://regex101.com/r/WhuVPY/1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should work, but phpunit disagrees:

There was 1 failure:

1) Yay\SpecsTest::testSpecs with data set "/Users/mduarte/Developer/code/MarcelloDuarte/yay/tests/phpt/expr/variable.phpt" (Yay\Test Object (...))
Failed asserting that format description matches text.
--- Expected
+++ Actual
@@ @@
 <?php

-expression {
-    $$a
+$expression {
+    $a
 }

 ?>

Copy link
Owner

@marcioAlmada marcioAlmada Sep 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking, did you really use /\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/ as the regexp? This seems to work as specified:

use \Yay\{TokenStream, function rtoken};

$ts = TokenStream::FromSourceWithoutOpenTag('$a $bb $ccc');

token(T_OPEN_TAG))->parse($ts); // jump over <?php

var_dump(rtoken('/\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/')->parse($ts));
# T_VARIABLE($a)

var_dump(rtoken('/\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/')->parse($ts));
# T_VARIABLE($bb)

var_dump(rtoken('/\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/')->parse($ts));
# T_VARIABLE($ccc)

var_dump(rtoken('/\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/')->parse($ts));
# null

If so, that would be scary, perhaps PCRE engine is behaving differently on our builds 🤔 Are you using Windows?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I did use it. It works fine with normal variables, but not with variable variable:

var_dump(\Yay\rtoken('/\$+[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/')->parse(\Yay\TokenStream::fromSourceWithoutOpenTag('$$a')));
# null

Copy link
Owner

@marcioAlmada marcioAlmada Oct 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so there was one issue fixed with a2329f8.

But the real culprit is that PHP tokenizes variable variables as [token('$'), token('$'), token('$a')] instead of [token('$$a')], hence why rtoken would never work.

The following is guaranteed to work:

use \Yay\{TokenStream};

use function \Yay\{chain, optional, repeat, token};

$variable = chain(optional(repeat(token('$')))->as('varvar'), token(T_VARIABLE)->as('var'));

$ts = TokenStream::FromSourceWithoutOpenTag('$a $$b $$$c');

var_dump($variable->parse($ts)); // Ast for $a
var_dump($variable->parse($ts)); // Ast for $$b
var_dump($variable->parse($ts)); // Ast for $$$c
var_dump($variable->parse($ts)); // null

And sorry, my previous test was wrong.

👍

<?php

expression {
$ainstanceof\Some\Large\FQCN
Copy link
Owner

@marcioAlmada marcioAlmada Aug 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that whitespace should be preserved in the Ast every time we get an operator that is also a valid identifier, so expansion won't break like in this case $ainstanceof\Some\Large\FQCN. A self contained example.:

$instanceof = chain(
    chain(variable(), indentation())->as('left'),
    token(T_INSTANCEOF)->as('operator'),
    chain(indentation(), ns())->as('right')
)
->as('expr');

$ts = \Yay\TokenStream::fromSource('<?php $a  instanceof     \Some\Large\FQCN');

\Yay\token(T_OPEN_TAG)->parse($ts); // get rid of <?php

$ast = $instanceof->parse($ts);

$src = '';
$nodes = $ast->array();
array_walk_recursive($nodes, function($n) use(&$src){ $src .= $n; });
var_dump($src); #>>  "$a  instanceof     \Some\Large\FQCN"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in f60fe4f

marcioAlmada added a commit that referenced this pull request Feb 12, 2018
Added `expression()` parser with support for registering new operators. There is
no macro support for declaring and visiting the newly added operator AST during expansions
right now, so macro based operators will come later.

Relates to #42 and preprocess/pre-short-closures/issues/#4
Fixes #43
marcioAlmada added a commit that referenced this pull request Feb 12, 2018
`Parser::optmize()` makes difference on very complex parser stacks, like
`expression()`. Activating and deactivating the optimizations should yield
all the tests passed, except that with optimizations off the results are generally:

```
Time: 15.49 seconds, Memory: 168.46MB // high mem usage, slow :(
```

And with optimizations turned on, results should be:

```
Time: 763 ms, Memory: 16.00MB // low mem usage, much faster :D
```

The optimization can be activated on macros declarations as well:

```
macro ·optimize {
    [...]
} >> {
    [...]
}
```

Relates to #42 and #43 and preprocess/pre-short-closures/issues/#4
marcioAlmada added a commit that referenced this pull request Feb 12, 2018
`Parser::optmize()` makes difference on very complex parser stacks, like
`expression()`. Activating and deactivating the optimizations should yield
all the tests passed, except that with optimizations off the results are generally:

```
Time: 15.49 seconds, Memory: 168.46MB // high mem usage, slow :(
```

And with optimizations turned on, results should be:

```
Time: 763 ms, Memory: 16.00MB // low mem usage, much faster :D
```

The optimization can be activated on macros declarations as well:

```
macro ·optimize {
    [...]
} >> {
    [...]
}
```

Relates to #42 and #43 and preprocess/pre-short-closures#4
marcioAlmada added a commit that referenced this pull request Feb 12, 2018
Added `expression()` parser with support for registering new operators. There is
no macro support for declaring and visiting the newly added operator AST during expansions
right now, so macro based operators will come later.

Relates to #42 and preprocess/pre-short-closures#4
Closes #43
marcioAlmada added a commit that referenced this pull request Feb 12, 2018
`Parser::optmize()` makes difference on very complex parser stacks, like
`expression()`. Activating and deactivating the optimizations should yield
all the tests passed, except that with optimizations off the results are generally:

```
Time: 15.49 seconds, Memory: 168.46MB // high mem usage, slow :(
```

And with optimizations turned on, results should be:

```
Time: 763 ms, Memory: 16.00MB // low mem usage, much faster :D
```

The optimization can be activated on macros declarations as well:

```
macro ·optimize {
    [...]
} >> {
    [...]
}
```

Relates to #42 and #43 and preprocess/pre-short-closures#4
@marcioAlmada
Copy link
Owner

marcioAlmada commented Feb 12, 2018

I really appreciate the efforts here, but turns out the task was quite complex. First, it was necessary to add indirect and direct left recursion to the parser, then it became necessary to add knowledge about operators and operator precedence and, later, be able to self-optimize the resulting parser stacks - so the resulting grammar was at least efficient.

Refer to ExpressionParser. The resulting grammar was converted from zend_language_parser.y and may certainly contain a few bugs right now.

Contributions are still very welcome 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants