Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exhaustive MySQL Parser #157

Merged
merged 169 commits into from
Nov 18, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
169 commits
Select commit Hold shift + click to select a range
ccc341b
MySQL AST Parser
adamziel Aug 17, 2024
78fdf69
Fix parser overriding parts of the parse tree as it constructs them.
adamziel Aug 17, 2024
c8652d5
Output ParseTree using a class, not an array for much simpler processing
adamziel Aug 17, 2024
137d6ca
Manually factor left recursion into right recursion in the grammar fi…
adamziel Aug 18, 2024
0a2440c
Explore support for SQL_CALC_FOUND_ROWS
adamziel Aug 20, 2024
0406d71
Support VALUES() call
adamziel Aug 20, 2024
87573f2
Extract queries from MySQL test suite and test the parser against them
JanJakes Sep 26, 2024
9629702
Implement handling for manually added lexer symbols
JanJakes Sep 26, 2024
d63bc6e
Fix passing nulls to "ctype_" functions
JanJakes Sep 26, 2024
8e7e2e8
Add support for hex format x'ab12', X'ab12', and bin format x'01' and…
JanJakes Sep 26, 2024
ebcc17e
Fix wrong MySQL version conditions (AI hallucinations)
JanJakes Sep 26, 2024
1551b0e
Implement the checkCharset() placeholder function
JanJakes Sep 26, 2024
cdd84b4
Document manual grammar factoring
JanJakes Sep 26, 2024
f50b515
Fix "alterOrderList" that has a wrong definition in the original grammar
JanJakes Sep 26, 2024
e267f67
Fix "createUser" that was incorrectly converted from ANTLR to EBNF
JanJakes Sep 26, 2024
cd543af
Fix "castType" that was incomplete in the original grammar
JanJakes Sep 26, 2024
135f29f
Fix "SELECT ... WHERE ... INTO @var" using a negative lookahead
JanJakes Sep 26, 2024
27524dd
Fix "EXPLAIN FORMAT=..." by reordering grammar rules
JanJakes Sep 27, 2024
069342f
Fix special "WINDOW" and "OVER" cases by adjusting grammar rules
JanJakes Sep 27, 2024
9bfc977
Fix "GRANT" and "REVOKE" by adjusting grammar rules to solve conflicts
JanJakes Sep 27, 2024
ca4de77
Use ebnfutils to dump grammar conflicts
JanJakes Sep 27, 2024
cd3504d
Implement the determineFunction() placeholder function, unify SQL modes
JanJakes Sep 30, 2024
81bbde0
Fix processing NOW() synonyms in lexer
JanJakes Sep 30, 2024
71292fb
Match mysqltest commands case-insensitively
JanJakes Sep 30, 2024
1ab3723
Add a script to test lexer on all the testing queries
JanJakes Oct 1, 2024
42ffc1b
Replace lexer switch/case and function calls with lookup tables
JanJakes Oct 1, 2024
01241b8
Fix unicode handling when extracting test queries
JanJakes Oct 2, 2024
f5266f1
Fix identifier matching, improve lexer performance by ~25%
JanJakes Oct 2, 2024
a2ac60b
Unify charset matching with identifier matching, remove non-existent …
JanJakes Oct 2, 2024
90e2af6
Fix quoted text/identifier matching, improve lexer performance by ~10%
JanJakes Oct 2, 2024
dfcad3e
Remove dependency on ctype, improve lexer performance by ~4%
JanJakes Oct 3, 2024
0f23dd3
Determine token names lazily, reduce LOC to < 3000, improve lexer per…
JanJakes Oct 3, 2024
6bb8ff9
Finish manual token pass, use MySQL Workbench token IDs, add comments
JanJakes Oct 3, 2024
ec55c10
Fix wrong token type
JanJakes Oct 3, 2024
21f3f16
Inline some simple single-use methods, remove unused methods
JanJakes Oct 3, 2024
b4f0e08
Move token iteration loop outside nextToken() method
JanJakes Oct 3, 2024
f658751
Fix and simplify lookahead logic, improve lexer performance by ~6%
JanJakes Oct 3, 2024
4eee991
Fix date and time literals by reordering grammar rules
JanJakes Oct 3, 2024
64e2068
Implement handling of unquoted user variable suffixes
JanJakes Oct 4, 2024
f1c56cc
Fix number vs identifier matching, add some tests
JanJakes Oct 4, 2024
883c08a
Fix handling of identifiers preceded by a “.”
JanJakes Oct 4, 2024
f56f036
Implement lexeer support for NCHAR literals
JanJakes Oct 4, 2024
972f7e0
Fix administration statements to support both "TABLE" and "TABLES" ke…
JanJakes Oct 4, 2024
df65873
Use MySQL 8.0.38 test suite, the same version as the grammar
JanJakes Oct 4, 2024
f25b4fa
Use the grammar MySQL version 8.0.38 as a default also in the lexer
JanJakes Oct 5, 2024
1b53df5
Add a missing charset from MySQL 5, improve comments
JanJakes Oct 5, 2024
905a7d1
Sort version-specific keyword list alphabetically
JanJakes Oct 5, 2024
7d1cdd7
Fix conflict between “<index> USING” and “<index> TYPE” by adjusting …
JanJakes Oct 5, 2024
4e5e499
Fix “LIKE …” and “LIKE (…)” by reordering grammar rules
JanJakes Oct 15, 2024
4eb9dd5
Fix "sumExpr" (AVG, SUM, COUNT, …) by reordering grammar rules
JanJakes Oct 15, 2024
4f968cd
Fix "REVOKE ALL ON ... FROM ..." by adding a missing rule segment
JanJakes Oct 15, 2024
6188d90
Add full MySQLParser.g4 grammar
JanJakes Oct 16, 2024
b371daf
Add manual fixes to MySQLParser.g4 grammar
JanJakes Oct 16, 2024
1765e4f
Add a script to parse grammar directly from MySQLParser.g4
JanJakes Oct 16, 2024
50ef591
Handle empty branches during grammr parsing
JanJakes Oct 16, 2024
28e44f5
Place and document grammar fixes below the original rules
JanJakes Oct 16, 2024
cd3c335
Skip perl, append_file, and write_file commands in query extraction
JanJakes Oct 16, 2024
09bd0c7
Fix “histogram” rule by adding missing “USING DATA” clause
JanJakes Oct 16, 2024
d06be19
Fix conflict in "explainStatement" by reordering grammar rules
JanJakes Oct 17, 2024
cebfdd7
Handle escaped quotes when parsing test queries
JanJakes Oct 17, 2024
94b7bd2
Fix "alterCommandList" to solve conflicts between "alterCommandsModif…
JanJakes Oct 17, 2024
ba43fc7
Add missing column visibility settings to the grammar
JanJakes Oct 17, 2024
03fbb98
Fix "fieldDefinition" to solve conflict between "columnAttribute" and…
JanJakes Oct 17, 2024
ed15ffa
Fix "replicationStatement" to correctly support the "RESET PERSIST" s…
JanJakes Oct 17, 2024
0b16f44
Implement missing "EXCEPT" and "INTERSECT" operators in the grammar
JanJakes Oct 17, 2024
8df7294
Skip if and while blocks when parsing tests, skip mysqltest.test
JanJakes Oct 18, 2024
0a35989
Fix processing “--“ commands and “--delimiter”
JanJakes Oct 18, 2024
442a682
Fix "alterTableActions" to solve conflicts between "alterCommandsModi…
JanJakes Oct 18, 2024
207078a
Fix REVOKE statement grammar
JanJakes Oct 18, 2024
56c7b8d
Add missing branches to “accountLockPasswordExpireOptions”
JanJakes Oct 18, 2024
9bdd088
Fix ALTER USER statement grammar
JanJakes Oct 18, 2024
8f54a3f
Fix "userIdentifierOrText" to support omitting sequence after "@"
JanJakes Oct 18, 2024
724e1c3
Fix conflict within "queryExpressionParens"
JanJakes Oct 25, 2024
6e985b2
Fix CHARACTER VARYING support by reordering “dataType” subrules
JanJakes Oct 25, 2024
e2e7501
Fix ALL/ANY support by reordering subrules
JanJakes Oct 25, 2024
61d2bcc
Implement MySQL-specific and version comments: /*!… */
JanJakes Oct 28, 2024
3118f11
Fix SHOW GRANTS … USING by reordering grammar rules
JanJakes Oct 28, 2024
cb44ea7
Fix query expressions followed by “INTO …”
JanJakes Oct 28, 2024
0812288
Fix “SELECT … FOR UPDATE … INTO …” by adding a missing rule part
JanJakes Oct 28, 2024
d5e1abb
Fix “DO … AS …” by reordering rules
JanJakes Oct 28, 2024
e3555d2
Fix PARTITION being defined as a non-reserved keyword
JanJakes Oct 28, 2024
abdbe47
Fix handling —error in MySQL test files
JanJakes Oct 28, 2024
5faef36
Fix ignoring MySQL test commands without params
JanJakes Oct 28, 2024
53cb169
Add missing MySQL test commands
JanJakes Oct 28, 2024
ea6d067
Fix handling prefixed quotes in MySQL tests
JanJakes Oct 28, 2024
22ac89b
Add support for missing "INSTALL COMPONENT" statement "SET ..." suffix
JanJakes Oct 29, 2024
ce35d1d
Fix "indexHintList" to use only whitespace as a separator (not commas)
JanJakes Oct 29, 2024
a33723d
Fix and simplify line comment handling in lexer
JanJakes Oct 29, 2024
eb1e756
Add support for "CHANGE REPLICATION SOURCE ..." statement
JanJakes Oct 29, 2024
96accd0
Mark ATTRIBUTE keyword as non-reserved
JanJakes Oct 29, 2024
f0c97bd
Fix START TRANSACTION statement with multiple characteristics
JanJakes Oct 29, 2024
d861ba7
Fix handling column attributes by reordering rules
JanJakes Oct 29, 2024
c74dbca
Fix GRANT “role with @“ by reordering grammar rules
JanJakes Oct 29, 2024
143c421
Fix GRANT CREATE/DROP ROLE by reordering rules
JanJakes Oct 29, 2024
a48810c
Add support for COMMENT and ATTRIBUTE to CREATE USER
JanJakes Oct 29, 2024
a80dbf1
Fix FLUSH TABLES with qualified identifiers
JanJakes Oct 29, 2024
f6ebd7a
Fix temporal literals to support double quotes
JanJakes Oct 29, 2024
09cdac7
Add support for CAST(... AT TIME ZONE ... AS DATETIME ...)
JanJakes Oct 29, 2024
05d02ec
Fix transaction characteristics separator
JanJakes Oct 29, 2024
8022fc5
Add support for IF NOT EXISTS for function, procedure, and trigger
JanJakes Oct 29, 2024
e98caad
Fix SET PASSWORD statement conflicts
JanJakes Oct 29, 2024
47b7ec4
Fix handling nchar, national character, etc.
JanJakes Oct 29, 2024
522867d
Ignore “query_attributes” MySQL client command
JanJakes Oct 29, 2024
0d80a16
Fix "leadLagInfo" to support identifiers and variables as well
JanJakes Oct 30, 2024
d12d01e
Fix WEIGHT_STRING(… AS BINARY(…)) by reordering grammar rules
JanJakes Oct 30, 2024
ad62b8a
Add support for "ALTER INSTANCE ..." statement
JanJakes Oct 30, 2024
560064a
Fix CAST(… AS FLOAT(N)) by fixing a wrong grammar rule
JanJakes Oct 30, 2024
d004dad
Add missing CREATE TABLE options
JanJakes Oct 30, 2024
52d9ed1
Add support for ON DELETE SET DEFAULT
JanJakes Oct 30, 2024
fb1e5cf
Add support for ENGINE_ATTRIBUTE to tablespaces
JanJakes Oct 30, 2024
e5f59fe
Add support for "JSON_VALUE(..., '...' RETURNING <type>)
JanJakes Oct 30, 2024
106a6e5
Skip disabled MySQL tests
JanJakes Oct 30, 2024
e5d3a03
Detect & convert charset when extracting tests
JanJakes Oct 30, 2024
88e3ec2
Fix lexing numbers and identifiers
JanJakes Oct 30, 2024
c934f37
Fix CREATE/ALTER TABLE … UNION = ()
JanJakes Oct 30, 2024
dfee145
Improve RESET PERSIST fix
JanJakes Oct 30, 2024
1c7c18c
Fix ALTER DABATASE to support optional schema name
JanJakes Oct 30, 2024
ae0b31a
Fix support for “:=“ assignment in “updateElement”
JanJakes Oct 31, 2024
839d807
Add “phpize-grammar” logic to grammar conversion script
JanJakes Oct 31, 2024
495e227
Delete all no longer necessary grammar tools
JanJakes Oct 31, 2024
d9827ac
Use WP coding styles for all new files
JanJakes Oct 31, 2024
a19d439
Ignore coding styles for compressed grammar file
JanJakes Oct 31, 2024
8d810fe
Use WP coding styles for filenames, use file per class, improve direc…
JanJakes Oct 31, 2024
d869785
Exclude new parser tests for now
JanJakes Oct 31, 2024
f3f0c0f
Remove no longer needed rule name processing
JanJakes Oct 31, 2024
d993041
Omit grammar tools and WIP work from plugin builds
JanJakes Oct 31, 2024
8c2d227
Omit all test files from plugin builds
JanJakes Oct 31, 2024
2cf39d4
Omit new lexer & parser from plugin builds for now
JanJakes Oct 31, 2024
09dda8a
Improve class naming and directory structure
JanJakes Nov 5, 2024
e42bf52
Move tokenize_query to WP_MySQL_Lexer::tokenize()
JanJakes Nov 5, 2024
c42ee15
Move tests under mysql directory
JanJakes Nov 5, 2024
7b84cfc
Move manual lexer tests to PHPUnit
JanJakes Nov 5, 2024
be7257f
Move test tools to tests/tools and ignore them in PHPUnit
JanJakes Nov 5, 2024
9c34178
Polish and document test downloading and extraction scripts
JanJakes Nov 5, 2024
c2cfd45
Skip queries testing parser stack overflow
JanJakes Nov 5, 2024
03cc66a
Run the full MySQL server test suite in unit tests
JanJakes Nov 6, 2024
5b2f488
Repurpose testing scripts for benchmarking, add single-parse testing …
JanJakes Nov 6, 2024
1e3e254
Use strcspn to handle all types of quoted text
JanJakes Nov 6, 2024
7d5869b
Use strspn to handle unquoted user-defined variables
JanJakes Nov 6, 2024
04bf6e5
Replace preg_match with strspn and UTF-8 decoder when handling identi…
JanJakes Nov 7, 2024
43357d0
Simplify lexer state, use and advance only byte position
JanJakes Nov 8, 2024
6e3b31e
Replace all other byte scanning loops with strspn
JanJakes Nov 8, 2024
3b789fe
Inline bin and hex numbers to number method
JanJakes Nov 8, 2024
ad4cf36
Simplify version comment processing
JanJakes Nov 8, 2024
3730d80
Simplify matching EOF
JanJakes Nov 8, 2024
833ca68
Remove channel, token, and token instance properties
JanJakes Nov 8, 2024
e0dfccd
Replace type property with local variables and function returns
JanJakes Nov 8, 2024
dff4649
Improve property and method naming and documentation
JanJakes Nov 8, 2024
17641f7
Check for the U+0080-U+FFFF range manually, add test coverage
JanJakes Nov 11, 2024
ee9d59d
Improve lexer docs and naming
JanJakes Nov 12, 2024
b32a9dd
Unify identifier matching with other "read_" methods
JanJakes Nov 12, 2024
432bf1f
Reorder methods to group "read_" methods together
JanJakes Nov 12, 2024
73c6c0a
Implement integer type detection
JanJakes Nov 12, 2024
4464cea
Move empty rule (ε) constant to WP_Parser_Grammar
JanJakes Nov 12, 2024
79b2622
Cleanup, add docs and TODOs
JanJakes Nov 12, 2024
43cec5d
Rename WP_Parser_Tree to WP_Parser_Node and document it
JanJakes Nov 12, 2024
634635e
Explain the "ε" rule in a comment
JanJakes Nov 12, 2024
96dd467
Rename $rule to $branches for better clarity
JanJakes Nov 12, 2024
dec0e3f
Use `false` rather than `null` when a parser subtree doesn't match
JanJakes Nov 12, 2024
afc70bb
Declare mbstring in dev dependencies
JanJakes Nov 12, 2024
33d55c4
Use more descriptive file name for MySQL test queries
JanJakes Nov 13, 2024
dd31359
Read quote from the SQL payload instead of passing it as parameter
JanJakes Nov 13, 2024
ae42217
Improve code comments
JanJakes Nov 13, 2024
5942fd6
Simplify next_token() loop condition
JanJakes Nov 13, 2024
c567a3f
Inline is-digit and is-whitespace logic
JanJakes Nov 13, 2024
60d995d
Fix 0b and 0x indentifiers
JanJakes Nov 13, 2024
ccebf37
Fix unclosed b'...' and x'...' numbers
JanJakes Nov 13, 2024
7155b7b
Rename parser testing script to dump-ast.php
JanJakes Nov 13, 2024
9426969
Implement "next_token()" & "get_next_token" API
JanJakes Nov 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions custom-parser/parser/DynamicRecursiveDescentParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -422,6 +422,16 @@ private function parse_recursive($rule_id) {
$node->append_child($subnode);
}
}

// Negative lookahead for INTO after a valid SELECT statement.
JanJakes marked this conversation as resolved.
Show resolved Hide resolved
// If we match a SELECT statement, but there is an INTO keyword after it,
// we're in the wrong branch and need to leave matching to a later rule.
// For now, it's hard-coded, but we could extract it to a lookahead table.
$la = $this->tokens[$this->position] ?? null;
if ($la && $rule_name === 'selectStatement' && $la->type === MySQLLexer::INTO_SYMBOL) {
$branch_matches = false;
}

if ($branch_matches === true) {
break;
}
Expand Down
Loading
Loading