-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FoLiA-page: add support for linebreaks #65
Comments
… like this but it seems needed (underlying libfolia issue?) #65
…ur, and an extra --nostrings parameter to omit the strings #65
Assuming this is solved |
Would it be possible to treat end-of-line hyphens in the same way as FoLiA-txt does? #67 |
I assume this is doable. But Page documents have a rather exotic structure, so this needs some studying. If you could provide me with a SHORT page document with a few hyphens? Maybe soft-hyphens too? |
|
Thank you for looking into it. It is by far not a priority, I was just wondering. Happy that the other tools can do it. |
I updated git master now with the newpage branch. @pirolen I tested quite a bit, but feedback is still welcome |
seems ok, for now |
PageXML textlines are currently not reflected in the original paragraph text, we can insert linebreaks to make the line more explicit and we can use
<t-str>
to more explicitly mark the text lines. These are linked to the<str>
annotations that are produced already (and where in turn an explicit relation with the original PageXML TextLine is stored). The use-case for this that FLAT requires this explicit information to properly display the document, and we may have an annotation task (knaw-huc/golden-agents-htr#1).I implemented this in the
page-br
branch but currently fails because of text validation issue proycon/folia#101.The text was updated successfully, but these errors were encountered: