Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotations 2.0 #75

Closed
wants to merge 28 commits into from
Closed

Annotations 2.0 #75

wants to merge 28 commits into from

Conversation

FabioBatSilva
Copy link
Member

@FabioBatSilva FabioBatSilva commented Mar 20, 2016

  • Move namespace to Doctrine\Annotations (Removing Common)
  • Uses hoa/compiler instead of doctrine/lexer ( see : grammar )
  • Drop AnnotationRegistry and all autoload magic
  • Drop Attribute/Attributes annotations
  • Drop SimpleAnnotationReader
  • Drop FileCacheReader
  • Drop IndexedReader
  • Requires php 7

TODO:

Local reviews (checkout + run locally):

@Ocramius Ocramius added this to the v2.0.0 milestone Mar 21, 2016
@schmittjoh
Copy link
Member

How does it affect userland? Any benefits/goals?

composer.json Outdated
],
"require": {
"php": ">=5.3.2",
"doctrine/lexer": "1.*"
"php": ">=7.0.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't align constraints. It leads to a nightmare when adding new constraints (merge conflicts for nothing due to alignment changes)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FabioBatSilva

Aligning constraints tells us you're manually editing composer.son, which should not be necessary when requiring dependencies.

@stof
Copy link
Member

stof commented Mar 21, 2016

Drop Attribute/Attributes annotations

Why is it dropped ?

}

/**
* @return \Doctrine\Annotations\Metadata\MetadataFactory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes you're using FQCN and sometimes you're using alias. It'd be better to use only one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FabioBatSilva keep fqcn

@@ -82,7 +82,7 @@ public function getClassAnnotations(ReflectionClass $class) : array
/**
* {@inheritDoc}
*/
public function getClassAnnotation(ReflectionClass $class, $annotationName)
public function getClassAnnotation(ReflectionClass $class, string $annotationName)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to split decorative CS changes and BC-breaking signature changes into separate distinct commits.
Adding a type hint changes the signature, which breaks any implementing classes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then, the decorative code style changes can go into a dedicated PR to the 1.x branch.

* Constructor.
*
* @param \Doctrine\Annotations\Resolver $resolver
* @param \Doctrine\Annotations\MetadataFactory $metadataFactory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type is incorrect, it should be \Doctrine\Annotations\Metadata\MetadataFactory.


$class = new \ReflectionClass($className);
$constructor = $class->getConstructor();
$docComment = $class->getDocComment();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused local variable $docComment.

@donquixote
Copy link

Did anyone run a benchmark to compare the hoa compiler to the one currently in use?
Maybe it is all "fast enough" so we don't have to worry. Just asking.

@@ -107,11 +107,11 @@ public function getMetadataFor(string $className)
*/
private function isAnnotation(ReflectionClass $class, array $annotations) : bool
{
if ($class->isSubclassOf('Doctrine\Annotations\Annotation')) {
if ($class->isSubclassOf(Annotation::CLASS)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

::CLASS should be lowercase ::class.

@donquixote
Copy link

Architecture: I propose to drop Metadata and MetadataFactory.
Instead, have one factory object per class (or give it a different name, dunno (*)).

class Builder
{
    [..]

    /**
     * @param Context   $context
     * @param Reference $reference
     *
     * @return object
     */
    public function create(Context $context, Reference $reference)
    {
        $target    = $reference->nested ? Target::TARGET_ANNOTATION : $context->getTarget();
        $fullClass = $this->resolver->resolve($context, $reference->name);
        $values    = $reference->values;

        if (null === $factory = $this->factoryProvider->classGetFactory($fullClass)) {
            throw InvalidAnnotationException::notAnnotationException($fullClass, $reference->name, $context->getDescription());
        }

        return $factory->instantiate($context, $values, $target);
    }
}

Now all the metadata stuff can be encapsulated in the $factory object.
There can even be separate factory classes depending how the class should be constructed, and how the annotation values are transformed into arguments.
There can also be a ClassNotFoundFactory, which always throws an exception in the ->instantiate() method.
The interface for those factories really only has this one method.

(*) I was initially going to call the "factory" "instantiator". But this name already exists in Doctrine\Instantiator\Instantiator. So...

}

return $this->imports = array_merge($classImports, $traitImports);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is confusing. If the method is coming from a trait, then the annotation is living in the trait file, and the imports should be from the trait's file only. If the method is declared in the class itself, then the annotation is living in the class file, and it should use the imports from the class file. I don't see a case where it should combine imports from different files.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is how to find out if a method is defined in a trait, https://stackoverflow.com/a/45912866/246724.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, imo all of this reflection + inheritance adds unnecessary complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, forbidding people to use inheritance and traits when using annotations would mean that nobody would migrate to version 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a functional perspective, it should stay the exact same, maybe with some API changes, but no behavioral ones.

The point here is getting rid of our own hacky parser, using a formalised one (HOA's)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe I was unclear.

all of this reflection + inheritance adds unnecessary complexity.

What I mean is we don't need to inherit from \ReflectionClass to find the imports. Instead, have an ImportFinder or something like that.

Of course, people who write annotated classes should be allowed to use inheritance, and traits!

From a functional perspective, it should stay the exact same, maybe with some API changes, but no behavioral ones.

If the code in the PR is replicating existing behavior, then it needs to stay this way.
Or, if we agree that the old behavior is wrong, we could have two implementations of ImportFinder or of the class name resolver: One that operates the BC way, another that operates the "correct" way.

Why would we say that "the old behavior is wrong"?

Consider this example:

File T.php:

<?php
namespace Acme\Foo;
use Acme\Annotation\Hello;
trait T {
  /**
   * @Hello("I am an annotation on a trait method.")
   * @Goodbye("I am annotation on a trait method, but the import is in the class file.")
   */
  function foo() {}
}

File C.php:

<?php
namespace Acme\Bar;
use Acme\Annotation\Goodbye;
class C {
  use T;
}

With the behavior proposed in the PR, which I assume is also the current behavior, the second annotation @Goodbye(..) will use the import Acme\Annotation\Goodbye from the class file.

I am saying this is wrong. It should only use the imports from the trait file. So the @Hello(..) should work, but the @Goodbye(..) should not.

This would be consistent with how the language itself works.
Imports are only available within the same file.

Well personally I think having annotations on a method in a trait is probably a bad idea anyway. but if we support it, it should at least be "correct". Unless, of course, it is for BC reasons.

The point here is getting rid of our own hacky parser, using a formalised one (HOA's)

Which I assume will be more maintainable, more reliable, more understandable (people have to look at the grammar only). So yeah, seems like a good idea.

Personally I care more about the registry going away.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the methods defined in traits: It gets even more interesting if the method is renamed.

Trait T {
  /**
   * @Hello()
   */
  function foo() {}
}

class C {
  use T {
    foo as bar;
  }
}

$m = new \ReflectionMethod('C', 'bar');
$reader->getMethodAnnotations($m);

The current behavior will not understand that the method is defined in a trait under a different name.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And about properties in traits - this is more difficult. There is no \ReflectionProperty::getFileName().
https://stackoverflow.com/questions/18257158/how-to-extract-start-line-of-a-property-declaration-in-php

As a heuristic, we could say that:

  • If none of the traits of the class has a property with the same name, then the property belongs to the class, obviously.
  • If one or more of the traits define the property then we compare the doc comment. See https://3v4l.org/KY9nl.

Maybe all of this should be discussed in a separate issue. I only brought it up here because the PR affects the code where this behavior is implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@donquixote we kinda fixed all of these horrors in roave/better-reflection, although it is not the primary aim here to provide very precise reflection of ugly stuff like traits.

@Hywan
Copy link

Hywan commented Aug 28, 2017

Just a link about hoa/compiler performances, https://blog.hoa-project.net/2016/08-Performance-boost-for-Hoa-Compiler.html. cc @donquixote

@donquixote
Copy link

Thanks @Hywan !
This does not compare it with a hand-written parser, only with previous hoa parsers.

In general, a hand-written or a generated parser should be faster than a parser combinator, or one that needs to interpret a piece of grammar in every step. Since it adds overhead and indirection to every micro-operation, the difference could be something like factor 2 (or more, or less). I remember this, because I used to experiment with the vektah parser combinator.

Your linked article, "Exporting the parser into PHP code", claims that the parser can be exported to PHP code. If this is equivalent to a generator, then it should be similarly fast as a hand-written parser.

Of course, even if it would take 2x longer, does not mean we have to care, if overall it is still "fast enough".

@stof
Copy link
Member

stof commented Aug 28, 2017

I'm still against the namespace change if it does not have a migration path:

  • if packages are not able to easily support both v1 and v2 of the library, it means that a project cannot migrate to v2 until all its dependencies using doctrine/annotations are migrated, and it cannot update other dependencies already migrated until that time. Given the number of packages relying on doctrine/annotations out there to parse annotations, such community split is a bad news
  • just migrating to v2 is not an option for packages needing to keep support for PHP 5.x (meaning that if they cannot support both versions, they will stay on v1 forever).

So this means we need a continuous migration path if you want to keep the namespace change.
This could be done by doing a new 1.x release adding class aliases using the new namespace. See how Twig did it for instance.

@Hywan
Copy link

Hywan commented Aug 29, 2017

This does not compare it with a hand-written parser, only with previous hoa parsers.

Yes it doesn't, but it gives an overview of the last big improvements :-). I should have clarified this, sorry.

In general, a hand-written or a generated parser should be faster than a parser combinator, or one that needs to interpret a piece of grammar in every step. Since it adds overhead and indirection to every micro-operation, the difference could be something like factor 2 (or more, or less). I remember this, because I used to experiment with the vektah parser combinator.

True and false (well, you have started your sentence with “In general” 😉). In PHP, a parser combinator might be slower than a generated parser because of function calls and indirections, but it will not be the bottleneck I guess. The real bottleneck is the data copy. In a parser combinator, you have to copy the data being parsed into each parser. Even if PHP does a COW (Copy-On-Write), each data split (substr) will generate a new copy, and so it's going to be slow. This particular problem can also be present in a generated parser, but the API surface is much smaller. For instance, the lexer for hoa/compiler does not consume the data by doing a substring, it just read it with an offset: https://github.com/hoaproject/Compiler/blob/c86ccfbce9b9cad17cf84ffdf5c505c695d83d7a/Llk/Lexer.php#L276-L282 This is highly optimized.

In a parser combinator however, the lexer and the parser phases are “merged”, so the memory peak should be smaller than in a generated parser. However, regarding the last improvement in hoa/compiler, the lexer now works as a buffered iterator, so the behavior is similar to a parser combinator: https://github.com/hoaproject/Compiler/blob/c86ccfbce9b9cad17cf84ffdf5c505c695d83d7a/Llk/Parser.php#L162-L165

A parser combinator is like a hand-written parser, except it has a predefined formalism, is more testable, is more re-usable etc. Compared to a generated parser, the API is larger. However, a generated parser can be seen as a parser combinator with a small API. hoa/compiler has only one method: _parse, which adapts its behavior whether it meets a token, a concatenation, a choice, or a repetition, which are the rules (the grammar description language intrinsics/constructions). One method also means a better caching by the VM, and the CPU.

This thread is not the place to debate about this, but: A generated parser, a parser combinator, or a hand-written parser can all be fast and efficient, or slow and ineffective. It really depends of how they are implemented. They all have pros and cons. I personally prefer a parser combinator when working with Rust (see nom) because it is testable and brings interesting garantees, while when working with PHP, I prefer a generated parser.

Your linked article, "Exporting the parser into PHP code", claims that the parser can be exported to PHP code. If this is equivalent to a generator, then it should be similarly fast as a hand-written parser.

I would claim that a hand-written parser is most of the time not fast. You have to re-optimise and re-implement everything, like the lexer (a good one is not simple) and the parser with all the optimisation. And the error-management, the AST builder, the memory management, the profiling etc. It's better to have a hackable compiler toolchain I guess.

But indeed, once a Hoa\Compiler\Llk\Parser is compiled into PHP code, it just creates an instance of Hoa\Compiler\Llk\Parser directly without loading the grammar from a textual file. It builds the grammar as a set of rules, which is fast to instanciate, and cachable by the VM. The lexing and parsing in themselves are optimized and use a very small API, which is also cachable correctly by the VM.

The most obvious way to be faster now is to use really good data structures instead of generic array, but we are limited by the language (I want php-ds in the core, pleaaase).

Of course, even if it would take 2x longer, does not mean we have to care, if overall it is still "fast enough".

Correct. I don't want to speak for the Doctrine team, but my understanding of the problem is the following: Drop a hand-written, hard to maintain, hacky, and maybe buggy parser by a formal parser which is easy to maintain and fast enough. hoa/compiler plays this role. Also, hoa/compiler brings interesting algorithms to generate data from a grammar (it is called Grammar-based Testing). More resources about this:

These algorithms can help to test the Doctrine annotations, and DQL.

@donquixote
Copy link

I would claim that a hand-written parser is most of the time not fast. You have to re-optimise and re-implement everything, like the lexer (a good one is not simple) and the parser with all the optimisation. And the error-management, the AST builder, the memory management, the profiling etc. It's better to have a hackable compiler toolchain I guess.

Maybe I should have said "hardcoded" rather than "hand-written".
And instead of "should be faster", I should have said we are comparing the fastest theoretically possible parsers of each category. If you make all the right choices, what remains is the overhead and indirection. E.g. for this same reason, a parser in C would be faster than one in PHP.
You don't need a lexer or memory management, if all you do is string index lookups, like here:https://github.com/donquixote/annotation-parser/blob/1.0/src/Parser/AnnotationParser.php. Also an AST doesn't have to be complicated.

This is not an argument against the hoa parser, just a conversation.

@Hywan
Copy link

Hywan commented Aug 30, 2017

We agree 😃.

#constant:
<identifier> (<colon> <colon> <identifier>)?

string:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a rule recognizes only one token, then it's faster to just use this token. I assume being fast is important in this context.

Same for the text, number, and identifier rules.

Annotations 2.0: Read annotations from functions
@Majkl578 Majkl578 changed the base branch from 2.0 to master May 7, 2018 23:07
@Majkl578 Majkl578 mentioned this pull request Dec 18, 2018
27 tasks
@range-of-motion
Copy link

What's going on with this? I'd very much like to see this because of multi line support.

@alcaeus
Copy link
Member

alcaeus commented Apr 1, 2020

Closing this PR: there has been a second effort to create 2.0 which has been just as successful as this one. We'll be revisiting this at a later date.

@alcaeus alcaeus closed this Apr 1, 2020
@alcaeus alcaeus removed this from the 2.0.0 milestone Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.