Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

report the faulty character / line / column / offset when no parse trees are generated #134

Open
valyrie97 opened this issue May 26, 2023 · 3 comments

Comments

@valyrie97
Copy link

valyrie97 commented May 26, 2023

Is your feature request related to a problem? Please describe.
Im trying to catch syntax errors in a custom plaintext file format, and when the parse fails, i simply receive no parse trees, with no information about where it failed.

Describe the solution you'd like
I'd like to see a line/column or character index, or something along those lines, that i can use to log to the user where the parsing has failed.

@CrockAgile
Copy link
Collaborator

I would also like this! I'll start researching & brainstorming for how to get it added

@CrockAgile
Copy link
Collaborator

CrockAgile commented Jun 29, 2023

I've been giving this some thoughts, and there are two questions to answer before solving this

How to report faulty input error?

the current API has no way to report errors (which it probably should! 😅)

// this returns an iterator! but no Result 
let parsed = grammar.parse_input(sentence);

Changing this would be a breaking change, but a Result could return Error with the faulty input text offset.

Where in the faulty input to report?

The current parser uses Earley parsing. There are many different parsing attempts for most grammars, each of which get differing levels of "success" parsing the input text.

A "good first try" solution to this might be "What parsing got the farthest in the input text?". There will be times when this reporting is naive, or another parse tree more useful, but it seems like a simple strategy that can be implemented.

Solutions

So with those questions in mind, what options does BNF have?

Optional "longest match" Parsing

A non-breaking change would be to allow ParseTree to be possibly incomplete, if opted into that behavior.

let mut parse_trees = grammar.parse_input(sentence);
parse_trees.report_incomplete(Grammar::parse::Longest);

// now when parsing is being done, the parser will record the longest incomplete match
for tree in parse_trees {
  // and tree.is_complete will tell if the parse tree is the complete input, or only partial
}

Include In Error

A breaking change, but a new API could be added that is more Error verbose, and includes the longest partial match.

let mut parser = grammar.parser();
let parse_trees: Iterator<Result<ParseTree, ParseError>> = parser.parse_input(sentence);

for parse_tree in parse_trees {
  match parse_tree {
    Ok(tree) => /* full success parse tree */,
    Err(parse_error) => /* parse_error will have the input offset, and what BNF production was being used */
  }
}

Thank You

Thanks for bringing this up! I definitely see the use case, and want it for my own reasons. Let me know if any of the options look appealing, or if there is another option you'd like considered.

@shnewto
Copy link
Owner

shnewto commented Jun 29, 2023

great proposals! personally I prefer the idea of having a result with an error so I have to deal with it if there's a problem parsing the whole input, but would be glad for any more input around use cases / needs from the lib.

the question I find I still have is should this error be the sort of thing the a user writing the input should know what to do with? or instead something the developer writing an app that uses bnf knows how to handle and report to their user. I can see some convenience in the former, but maybe more flexibility in developing using bnf in the latter 🤔.

Maybe in some cases the developer is the same as the user, but I'm thinking in terms of maybe someone working on a compiler or text input handler... or even this crate 😄 for example, I appreciate the errors nom gives me but they're definitely stuff I don't want to pass on to people using bnf when possible (though there may be some coming through).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants