-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Claim extraction from Images #72
Comments
Tuesday, Wednesday Spike
Check in on Wednesday 11 am. |
Tested Tesseract with english and hindi LSTM models with multiple Tested image pre-processing, which degraded text quality and did not improve OCR. Relatively good OCR otherwise for English and Hindi on images |
tested out various models on hugging face and looked at (Learned Visual Model) LVM for vision. Will attach link soon |
Wednesday :
|
|
It's called |
The technical term for image understanding is |
Thanks. I think an added layer of automation that would make for useful claim extraction is if we can detect the entities(people/landmarks) in a picture. So instead of the extracted claim being "a man is standing next to a building" if it said "politician X is standing next to taj mahal". we could create a dataset of persons of interest to facilitate this. |
Found this nice use of traditional image processing to segment portions from newspaper clippings - https://stackoverflow.com/questions/64241837/use-python-open-cv-for-segmenting-newspaper-article Should be also useful for multi text portion memes/posters. |
Identify the 5 most popular categories of imagesCategories I could come up with -
(In the dataset, I saw some images repeat) Extract Text from Images (Vision Encoder Decoder Models)
Detect the entities(people/landmarks) in a picture
Gibberish Text Detection
Large Vision Models (LVM)Other
GPT4-Vision
SAM
|
@aatmanvaidya can you try out two things that I believe will be useful pre-processing steps regardless of what model we use :
|
SummaryFrom my perspective, writing a rough pipeline that could be followed Once we have the image, we could follow a process like this
|
Swair had a long response to this, I am cherry picking insights and typing here :
He also said our approach of segmenting relevant portions and indexing it might be interesting/publishable. |
@aatmanvaidya @duggalsu can summarize a 5 line blurb on the various text extraction models and libraries they used and their conclusions. |
References shared on the call : |
Summary of the CDT report
|
We should use today to test out one of the remaining solutions
End of Spike Requirements : |
GPT4-Vision does not seem good for any kind of OCR - it will not do OCR for copyrighted articles in English and does not work well for Hindi However, it can describe the image in detail i.e. do "image captioning" very well, better than the previously tested huggingface models |
https://github.com/VikParuchuri/surya
Accurate line-level text detection |
The various challenges involved in making sense of an image found on social media is summarized by this image
![Screenshot 2023-12-04 at 15-13-05 Tech Interventions against Online Harms](https://private-user-images.githubusercontent.com/1415361/287655262-2c31439e-97fb-4c0b-a16d-3a185c41ad8a.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzQ0NjQyNjEsIm5iZiI6MTczNDQ2Mzk2MSwicGF0aCI6Ii8xNDE1MzYxLzI4NzY1NTI2Mi0yYzMxNDM5ZS05N2ZiLTRjMGItYTE2ZC0zYTE4NWM0MWFkOGEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MTIxNyUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDEyMTdUMTkzMjQxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9M2IyNzMwMDFiNmMxMGNjYTFkOTYxMmY5MjAyY2Y1NDVmMDY5ODIzY2FiYmExZTdjYTdkNDIzMTMyZmE3Y2U5YSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.HWKEsVzqGW-IFJ93SlTT_bri3Jgkx4c_YQUNeNHbJjY)
The images could be a photograph, a manipulated image, screenshots, newspaper clippings or a meme. We have to device a solution to extract claims out of these images using a mix of automated and manual methods that can be deployed at population scale.
Some ideas on what type of functionality ML can enable :
This is meant to be a timebound 5 day spike with the goal to learn as much as possible about what the state of the art in LLMs and ML can help us with claim extraction. We will like to include working prototype in this so that we get a good sense of system requirements and prices. As such evaluating paid proprietary solutions like ChatGPT could also be part of this.
The text was updated successfully, but these errors were encountered: