Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No syntax exception raised in some cases for JDT #325

Open
codinuum opened this issue Jan 17, 2024 · 16 comments
Open

No syntax exception raised in some cases for JDT #325

codinuum opened this issue Jan 17, 2024 · 16 comments
Assignees
Labels

Comments

@codinuum
Copy link

String literals seem to be ignored in some cases.
(tested with build 6584227)
01-17-2024 22 45 17

@jrfaller jrfaller self-assigned this Jan 17, 2024
@jrfaller jrfaller added the bug label Jan 17, 2024
@jrfaller
Copy link
Member

Hi! well spotted! Can you attach the test files? Thanks!

@codinuum
Copy link
Author

Here they are.
a0.java.txt
a1.java.txt

@jrfaller
Copy link
Member

Great!

Ok with a quick gumtree parse a0.java

I get

CompilationUnit [0,114]
    TypeDeclaration [0,113]
        TYPE_DECLARATION_KIND: class [0,5]
        SimpleName: Foo [6,9]
        MethodDeclaration [14,111]
            Modifier: public [14,20]
            Modifier: static [21,27]
            Modifier: final [28,33]
            Modifier: public [49,55]
            Modifier: static [56,62]
            PrimitiveType: void [63,67]
            SimpleName: main [68,72]
            Block [75,111]
                ExpressionStatement [81,107]
                    MethodInvocation [81,106]
                        METHOD_INVOCATION_RECEIVER [81,91]
                            QualifiedName: System.out [81,91]
                        SimpleName: println [92,99]
                        METHOD_INVOCATION_ARGUMENTS [100,105]
                            StringLiteral: "bar" [100,105]

The field declaration is strangely completely ignored... This is strange. I have to look in the jdt generator code.

@jrfaller
Copy link
Member

OK by opening the file I realize that there is no return type for the attribute which is a syntax error. If you add the type String to both files it works. I guess the big question is why there is no SyntaxError raised by JDT in this case ;-)

@jrfaller jrfaller changed the title String literals ignored No syntax exception raised Jan 17, 2024
@codinuum
Copy link
Author

Confirmed. Thanks!
How about comparing another pair of files with python-treesitter?
a0.py.txt
a1.py.txt
String literals are missing in the parse trees...

@jrfaller
Copy link
Member

Working on my side, no?

module [0,113]
    expression_statement [0,9]
        assignment [0,9]
            identifier: s [0,1]
            =: = [2,3]
            string: 'bar' [4,9]
    function_definition [12,112]
        identifier: match [16,21]
        parameters [21,30]
            identifier: command [22,29]
        block [36,112]
            return_statement [36,112]
                parenthesized_expression [43,112]
                    boolean_operator [44,111]
                        comparison_operator [44,74]
                            string: 'push' [44,50]
                            attribute [54,74]
                                identifier: command [54,61]
                                identifier: script_parts [62,74]
                        and: and [75,78]
                        comparison_operator [79,111]
                            string: 'set-upstream' [79,93]
                            attribute [97,111]
                                identifier: command [97,104]
                                identifier: output [105,111]⏎

@codinuum
Copy link
Author

Parsing results in the following in my environment...

module [0,113]
    expression_statement [0,9]
        assignment [0,9]
            identifier: s [0,1]
            =: = [2,3]
            string [4,9]
                ": ' [4,5]
                ": ' [8,9]
    function_definition [12,112]
        def: def [12,15]
        identifier: match [16,21]
        parameters [21,30]
            (: ( [21,22]
            identifier: command [22,29]
            ): ) [29,30]
        :: : [30,31]
        block [36,112]
            return_statement [36,112]
                return: return [36,42]
                parenthesized_expression [43,112]
                    (: ( [43,44]
                    boolean_operator [44,111]
                        comparison_operator [44,74]
                            string [44,50]
                                ": ' [44,45]
                                ": ' [49,50]
                            in: in [51,53]
                            attribute [54,74]
                                identifier: command [54,61]
                                .: . [61,62]
                                identifier: script_parts [62,74]
                        and: and [75,78]
                        comparison_operator [79,111]
                            string [79,93]
                                ": " [79,80]
                                ": " [92,93]
                            in: in [94,96]
                            attribute [97,111]
                                identifier: command [97,104]
                                .: . [104,105]
                                identifier: output [105,111]
                    ): ) [111,112]

@tsantalis
Copy link

@jrfaller
I can confirm that JDT Parser skips parts of the AST that are syntactically invalid.
Missing type in a field declaration makes it syntactically invalid.
JDT Parser will continue parsing the rest of the file.

@jrfaller
Copy link
Member

jrfaller commented Jan 17, 2024

@tsantalis yes just what I realized. On top of that, neither the flag MALFORMED nor the flag RECOVERED seems to work in these cases. The only thing I get is that I have one "problem" in the problem list when casting the node to a CompilationUnit. I guess it could work better to detect syntax errors but I fear that it will report more than just syntax errors 😢 WDYT?

@codinuum
Copy link
Author

@jrfaller Updating tree-sitter-parser resolved the problem.
Thanks a lot!

@tsantalis
Copy link

@tsantalis yes just what I realized. On top of that, neither the flag MALFORMED nor the flag RECOVERED seems to work in these cases. The only thing I get is that I have one "problem" in the problem list when casting the node to a CompilationUnit. I guess it could work better to detect syntax errors but I fear that it will report more than just syntax errors 😢 WDYT?

For commit analysis, which cannot be guaranteed to be syntactically valid and compilable, I think what JDT Parser is doing is fine. Instead of failing the whole file, it just skips the syntactically invalid sub-trees.
There should be a way to isolate syntax errors from the problem list. I will look into it.

@jrfaller jrfaller changed the title No syntax exception raised No syntax exception raised in some cases for JDT Jan 17, 2024
@codinuum
Copy link
Author

codinuum commented Jan 17, 2024

That reminds me of something.
I have seen conflict markers >>>>>>>, =======, and <<<<<<< left even in release versions of some projects.

@tsantalis
Copy link

@jrfaller

We discovered that JDT ASTParser recommends a version to setup the parser, when it finds a parsing problem.
For example, when parsing a record declaration with 1.8 compatibility, it generates a problem suggesting the use of Java 16 compatibility to parse record declarations.

We basically process the reported problems and re-parse the file with the recommended compatibility version. Here is our implementation in RefactoringMiner.

https://github.com/tsantalis/RefactoringMiner/blob/master/src/main/java/gr/uom/java/xmi/UMLModelASTReader.java#L160-L177

@jrfaller
Copy link
Member

jrfaller commented Jun 14, 2024

Hi @tsantalis and thanks a lot for the tips. In the last commit I switched to private final static String JAVA_VERSION = JavaCore.latestSupportedJavaVersion(); for compiler settings and ASTParser parser = ASTParser.newParser(AST.getJLSLatest()); for the parser initialization. Do you think anything can go wrong like this or parsers are backward compatible? Cheers!

@tsantalis
Copy link

@jrfaller
The parser initialization ASTParser parser = ASTParser.newParser(AST.getJLSLatest());
is totally fine and creates no problems.

The problem is with the compiler settings.
For example, since Java 16 record is a keyword reserved for record type declarations.
But before Java 16, you could use record as a variable name in your code.

(The same thing happened many years ago when enum was introduced as a keyword in Java 1.5,
but now it is very rare to find a commit using enum as a variable. The code should be older than 2004.)

If you put the latest Java version in the compiler settings, and parse a code using record as a variable
the parser will not generate a complete AST and will skip the parts using record as a variable name.

The best strategy is to start with a lower version compiler setting.
We start from JavaCore.VERSION_14 because we found that since this version the parser generates recommendations about what version to use to parse some code, which was not successfully parsed.

If we get such a recommendation in the problems, we re-parse the file using the recommended version in the compiler settings.

@jrfaller
Copy link
Member

@tsantalis thanks a lot for the detailed explanation (and for pointing me the fix!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants