Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT-based OCR follow-up work #147

Open
4 of 6 tasks
danvk opened this issue Oct 27, 2024 · 0 comments
Open
4 of 6 tasks

GPT-based OCR follow-up work #147

danvk opened this issue Oct 27, 2024 · 0 comments

Comments

@danvk
Copy link
Owner

danvk commented Oct 27, 2024

  • More complete removal of “Negative No. 2” lines (see test_is_negative in cleaner_test.py)
  • Split lines with lots of interior whitespace (example: 731107b)
  • Remove more low-value stamps (FS Lincoln, Lenox, Tilden, President Borough Manhattan, etc.)
  • Insert blanks before “(2)”, etc.
  • Manual review of long new (not replacement) OCR
  • Re-implement OCR feedback
This was referenced Oct 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant