Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add layout option to keep layout during text extraction #132

Merged
merged 1 commit into from
Mar 8, 2017

Conversation

scarfacedeb
Copy link
Contributor

It allows to preserve layout of pdf in text.

I noticed that pdftotext produces better results with -layout option with some pdf files. (e.g. table of contents look a lot better and closer to the original markup)

At first, I implemented passing of random options to pdftotext command, but later realised that I only need it for -layout option.

It passed -layout option to pdftotext.
@scarfacedeb
Copy link
Contributor Author

@knowtheory any news on this one?

@alexandremello
Copy link

I need this feature too.

Is there anybody here?

@scarfacedeb
Copy link
Contributor Author

Unfortunately it seems like this project is dead

@knowtheory knowtheory merged commit 4c5ba50 into documentcloud:master Mar 8, 2017
@knowtheory
Copy link
Member

Well that's one way to get my attention (which i probably shouldn't encourage).

Thank you very much for the commit and the extraction here :)

@scarfacedeb
Copy link
Contributor Author

@knowtheory That was a pleasant surprise 😄 Thank you!

@knowtheory
Copy link
Member

You're welcome! Happy to talk about other issues over the short term, and things to tackle too :)

@scarfacedeb
Copy link
Contributor Author

Do you have something in mind? 🤔

Btw, could you release new version to rubygems? It'll be great to get rid of the ugly github: dependency in gemfile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants