Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue #36

Open
manuel-rubio opened this issue Jun 25, 2015 · 3 comments
Open

Performance issue #36

manuel-rubio opened this issue Jun 25, 2015 · 3 comments

Comments

@manuel-rubio
Copy link

I was trying to use this PEG to parse INI files:

ini <- (comment / section)+ `
    lists:append(Node)
`;

section <- space? header space? (comment / config)* space? `
    [_,_Header,_,ConfigLines|_] = Node,
    [ CL || CL <- ConfigLines, CL =/= ignore ]
`;

header <- '[' (!']' .)+ ']' breakline? ~;

config <- key space? '=' space? (!(breakline / comment) .)* (breakline / comment)? `
    [Key,_,_,_,Value|_] = Node,
    {Key, iolist_to_binary(Value)}
`;

comment <- space? ';' (!breakline .)* breakline? `
    ignore
`;

key <- [a-zA-Z0-9_\.]* `
    iolist_to_binary(Node)
`;

space <- [ \t\n\s\r]+ ~;

breakline <- [\n\r]+ ~;

And when I try to use it against a php.ini file with more than 1000 lines (most of them are comment lines), this code spend more than 5 seconds to parse the whole file... finding another solution (eini & zucchini in github.com, those projects use yrl and xrl files) spend less than 1 second to parse the same file, what part of my code is wrong? Thanks.

The php.ini file is here: https://raw.githubusercontent.com/php/php-src/master/php.ini-production

@bookshelfdave
Copy link
Contributor

what happens if you comment out your semantic actions? (iolist_to_binary() etc)

@seancribbs
Copy link
Owner

@manuel-rubio There are known performance issues with large files and large grammars. Can you profile the parser to see where it is taking the most time? My gut suspicion is that the negative-lookahead+repeat is what's killing it (lots of backtracking).

@ElectronicRU
Copy link

@seancribbs The performance issue can be alleviated by getting rid of ETS table and explicitly threading through a dict/map.

It would be quite easy to do, except neotoma itself uses memo table for some auxiliary information. For grammars that don't use memo table explicitly, should be an easy transformation though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants