Skip to content
This repository has been archived by the owner on Jun 3, 2022. It is now read-only.

Lua parser fails to find ending of string enclosed by level 1 long brackets if the last character of the string is a square bracket #7

Open
zeertzjq opened this issue Nov 28, 2021 · 7 comments
Assignees

Comments

@zeertzjq
Copy link

Describe the bug

Lua parser fails to find ending of string enclosed by level 1 Lua long brackets if the last character of the string is a square bracket.

To Reproduce

  1. Open https://raw.githubusercontent.com/neovim/neovim/725cbe7d414f609e769081276f2a034e32a4337b/test/functional/terminal/tui_spec.lua using Nvim with nvim-treesitter Lua parser for highlight.
  2. All of lines 35 to 874 are highlighted as string.

Expected behavior

Lines 35 to 874 are highlighted correctly.

Output of :checkhealth nvim-treesitter

nvim-treesitter: require("nvim-treesitter.health").check()
========================================================================
## Installation
  - OK: `tree-sitter` found  0.20.0 (parser generator, only needed for :TSInstallFromGrammar)
  - OK: `node` found v16.11.1 (only needed for :TSInstallFromGrammar)
  - OK: `git` executable found.
  - OK: `cc` executable found. Selected from { vim.NIL, "cc", "gcc", "clang", "cl", "zig" }
    Version: cc (GCC) 11.1.0
  - OK: Neovim was compiled with tree-sitter runtime ABI version 13 (required >=13). Parsers must be compatible with runtime ABI.

## Parser/Features H L F I J
  - python         ✓ ✓ ✓ ✓ ✓ 
  - lua            ✓ ✓ ✓ ✓ ✓ 
  - cpp            ✓ ✓ ✓ ✓ ✓ 
  - c              ✓ ✓ ✓ ✓ ✓ 

  Legend: H[ighlight], L[ocals], F[olds], I[ndents], In[j]ections
         +) multiple parsers found, only one will be used
         x) errors found in the query, try to run :TSUpdate {lang}

Output of nvim --version

NVIM v0.6.0-dev+604-g725cbe7d41
Build type: RelWithDebInfo
LuaJIT 2.0.5
Compilation: /usr/bin/cc -D_FORTIFY_SOURCE=2 -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1 -DNVIM_TS_HAS_SET_MATCH_LIMIT -O2 -g -Og -g -Wall -Wextra -pedantic -Wno-unused-parameter -Wstrict-prototypes -std=gnu99 -Wshadow -Wconversion -Wmissing-prototypes -Wimplicit-fallthrough -Wvla -fstack-protector-strong -fno-common -fdiagnostics-color=auto -DINCLUDE_GENERATED_DECLARATIONS -D_GNU_SOURCE -DNVIM_MSGPACK_HAS_FLOAT32 -DNVIM_UNIBI_HAS_VAR_FROM -DMIN_LOG_LEVEL=3 -I/build/neovim-git/src/build/config -I/build/neovim-git/src/neovim-git/src -I/usr/include -I/build/neovim-git/src/build/src/nvim/auto -I/build/neovim-git/src/build/include
Compiled by builduser

Features: +acl +iconv +tui
See ":help feature-compile"

   system vimrc file: "$VIM/sysinit.vim"
  fall-back for $VIM: "/usr/share/nvim"

Run :checkhealth for more info

Additional context

Screenshot:
Screenshot_20211120_145802

@theHamsta
Copy link
Member

theHamsta commented Nov 28, 2021

Could you provide a minimal source code example as plain text?

@zeertzjq
Copy link
Author

zeertzjq commented Nov 28, 2021

return function()
  local a = [=[]]=]
  local b = [=[]=]
  return a, b
end

Screenshot_20211128_224027

@theHamsta
Copy link
Member

theHamsta commented Nov 28, 2021

The reason for this is probably that the parsing for [[ ]] [= =] is mixed so =] can close [[

          if (lexer->lookahead == '[' || lexer->lookahead == '=') {
// further down
            if (lexer->lookahead == ']' || lexer->lookahead == '=') {

@theHamsta
Copy link
Member

It's probably good to start with #6 which already cleaned up the parser code.

@stsewd
Copy link
Member

stsewd commented Dec 12, 2021

We could also switch to https://github.com/MunifTanjim/tree-sitter-lua. It has the string start/end as separated nodes, which makes it easier for injections :)

Looks like the project took some inspiration from the original repo that was forked.

@MunifTanjim what's the status of your parser? I think it's more complete than this one :D

And it doesn't have this issue

return function()
  local a = [=[]]=]
  local b = [=[]=]
  return a, b
end

is parsed as

(chunk [0, 0] - [5, 0]
  (return_statement [0, 0] - [4, 3]
    (function_definition [0, 7] - [4, 3]
      parameters: (parameters [0, 15] - [0, 17])
      body: (block [1, 2] - [3, 13]
        local_declaration: (variable_declaration [1, 2] - [1, 19]
          (assignment_statement [1, 8] - [1, 19]
            (variable_list [1, 8] - [1, 9]
              name: (identifier [1, 8] - [1, 9]))
            (expression_list [1, 12] - [1, 19]
              value: (string [1, 12] - [1, 19]))))
        local_declaration: (variable_declaration [2, 2] - [2, 18]
          (assignment_statement [2, 8] - [2, 18]
            (variable_list [2, 8] - [2, 9]
              name: (identifier [2, 8] - [2, 9]))
            (expression_list [2, 12] - [2, 18]
              value: (string [2, 12] - [2, 18]))))
        (return_statement [3, 2] - [3, 13]
          (identifier [3, 9] - [3, 10])
          (identifier [3, 12] - [3, 13]))))))

@MunifTanjim
Copy link

MunifTanjim commented Dec 12, 2021

@MunifTanjim what's the status of your parser?

It's complete as far as I know. It has corpus tests for all the possbile cases and can parse the whole luvit/luvit repo without any errors.

I've been using it since August 2021 using this MunifTanjim/nvim-treesitter-lua#2 . Haven't face any issues in my daily usage yet.

In case you wanna try it, the queries are available in this repo: https://github.com/MunifTanjim/nvim-treesitter-lua

The only reason I haven't shared it anywhere yet is because it's not easily usable without building neovim yourself from this branch neovim/neovim#15260 (it's necessary to ignore the queries for lua parser that comes with nvim-treesitter).

@stsewd
Copy link
Member

stsewd commented Dec 18, 2021

That sounds great! Any thoughts about switching to that parser? @theHamsta @vigoux

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants