Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figures and tables in the back / annex section ignored #737

Open
de-code opened this issue Apr 14, 2021 · 3 comments · May be fixed by #738
Open

Figures and tables in the back / annex section ignored #737

de-code opened this issue Apr 14, 2021 · 3 comments · May be fixed by #738

Comments

@de-code
Copy link
Collaborator

de-code commented Apr 14, 2021

This is related to #698

Some documents have main figures and supplementary figures.
If in those cases, the segmentation model labels the supplementary figures as annex,
then the content is passed separately to the fulltext model.
If the fulltext then correctly labels it as figure, then the figures from the annex are not included in the output.

@de-code
Copy link
Collaborator Author

de-code commented Apr 14, 2021

This seems to be due to FullTextParser processing figures and tables from the body only.

@lfoppiano
Copy link
Collaborator

@de-code do you have a Pdf for testing?

@de-code
Copy link
Collaborator Author

de-code commented Apr 16, 2021

One example is DOI 10.1101/306803 or 306803v1 (from the bioRxiv 10k validation dataset).
It has "Extended Data Figure 1" etc.
I haven't tested whether they are going to get extracted well with the default models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants