-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
better error handling in bibmarkup.bibtex_to_html #63
Comments
I found at least one example where bibtex and pybtex have different behavior:
This entry has |
I've experimented with several bibtex parsers using a set of 670 bibtex files that authors have uploaded.
It seems that we need to use one library for parsing and validating, and another for HTML generation. Overall it seems that the error handling from bibtexparser 2.0 is far superior, and we can use it to split the bibtex file into the different entries. We can then use pybtex to format those as HTML, and catch errors on individual entries. |
This seems like the proper way to proceed and will increase the user experience significantly! |
It seems that the pybtex library for converting bibtex to html is riddled with bugs. Dan Bernstein has noted quite a few:
We are not the first to have encountered this problem. The pybtex formatting language is very poorly documented and as far as I can tell, nobody has built anything with it. Just trying to figure out how to change |
I found at least one problem that was orthogonal to Luckily, this can be fixed by using our own bst file. We can invoke our own bst file with the option
(added the |
Apparently some bibtex entries cause errors in
bibmarkup.bibtex_to_html
(@jwbos made reference to a reference "B97" but I can't find it). The problem here is that BibTeX has no formally written grammar for bibtex files, and the only expression of this is in the bibtex binary itself (which is written in WEB, and translated into even more unreadable C). It turns out that thebiber
andbibtex
binaries have different behaviors because of this, and the situation for python parsers is even worse. We're using pybtex, and the first time it encounters an error it just gives up. Notably, this may happen even for legitimate bibtex files. The error reporting in pybtex is also weird, because it was designed to go to stdout so you have to useto capture the errors while it is running. I think the first time it encounters an error it quits.
One solution would be to do a pre-parse of the bibtex file and try to parse the entries one-by-one, but even that is fraught with peril because it's hard to recognize when you have hit the end or start of an entry.
We should try hard to improve
bibmarkup_test.py
and supply error cases. Perhaps we can try a project to process a large number of bibtex files looking for errors.The text was updated successfully, but these errors were encountered: