How to get Docling to output page number or chunk by page? #444

jmagoga · 2024-11-26T17:24:34Z

jmagoga
Nov 26, 2024

I'm extracting information from bank account statement PDFs, that are sometimes over 10 pages long. Passing all of the markdown content to an LLM is not ideal because of the context length, so my solution is chunking.

However as I'm looking into bank account statements I cannot have a transaction line items cut in half (imagine if what gets cut is the transaction value and what it was for).

The model output (at least in markdown) doesn't seem to include page number ("page 1", "page 2"). Is there a way that the model can be forced to do that? Or another technique that is not what I'm describing but could still solve my problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get Docling to output page number or chunk by page? #444

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How to get Docling to output page number or chunk by page? #444

jmagoga Nov 26, 2024

Replies: 0 comments

jmagoga
Nov 26, 2024