Document orientations #389
Unanswered
m-salewski
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, i was investigating Docling to scrape texts from various delivery-related documents. One document is a scan of product labels with different orientations on a A4 in portrait orientation.
Attached is a sample PDF with the labels like this:
!
Here are the bounding boxes for the page segments (red solid lines) and page cells (blue dotted lines): it shows the orientations compromise the OCR
!
I tried different rotations with the document: 270 degree helped to get parts of the rotated label but not what i expected (like the company name and address in the sample); 90 and 180 found some page segments but failed to detect any words. In the actual document, there are detectable words at all 4 orientations.
One of the easiest hacks was to use EasyOCR's
rotation_info
parameter in the reader. This helped in some parts but failed for others as this only really affects the page cells which are dependent on the page segments. Why doesn't this work? Is the OCR's orientation fixed for all page segments?label_orientations.pdf
Beta Was this translation helpful? Give feedback.
All reactions