-
-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for MS Azure Vision OCR #85
Conversation
Very cool. Ideally we'd have some docs or notes for how to get the azure credential |
also, is @anishhiranandani the same account as you @ml5ah or are you collaborating on this PR? |
Sure, @asim-shrestha, I updated the readme file with details of how to create the credentials JSON for Azure as well as a link that shows the UI screen where the credentials are. Yes, anishhiranandani and ml5ah are both me. My bad, I think my Mac was logged on to the old account (anishhiranandani). |
No worries! @ml5ah i'll make an azure account tomorrow and check it out. Thanks for doing this, has been on the back burner for quite some time |
Would you be interested in doing some additional work in this repo? |
pyproject.toml
Outdated
google-cloud-vision = "^3.4.5" | ||
azure-ai-vision-imageanalysis = "^1.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally we should make these optional dependancies going forward
tarsier/__main__.py
Outdated
|
||
|
||
async def main(credentials_path: str, url: str, verbose: bool) -> None: | ||
|
||
async def main(credentials_path: str, ocr_service: str, url: str, verbose: bool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will break existing versions, lets make ocr_service and optional dependancy:
async def main(credentials_path: str, ocr_service: str, url: str, verbose: bool) -> None: | |
async def main(credentials_path: str, url: str, verbose: bool, ocr_service: str = "google") -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @awtkns, done.
Sure, I can try to spend some time to help out based on my bandwidth. Do you have any specific items in mind? |
tarsier/__main__.py
Outdated
|
||
|
||
async def main(credentials_path: str, url: str, verbose: bool) -> None: | ||
|
||
async def main(credentials_path: str, url: str, verbose: bool, ocr_service: str = "google") -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of having ocr_service be type str we should probably make a type:
OcrProvider = Literal["google", "microsoft"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, did it slightly differently using StrEnum. Defined in tarsier/ocr
tarsier/__main__.py
Outdated
match ocr_service: | ||
case "google": | ||
ocr_service = GoogleVisionOCRService(credentials) | ||
|
||
case "microsoft": | ||
ocr_service = MicrosoftAzureOCRService(credentials) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to throw an error here as the default case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Would you be able to share a couple screen shot examples of it in action?
@ml5ah I do have some items that are more involved. Here's an open list of them: https://reworkd.notion.site/Tarsier-optimizations-37ad18b8301a4ac085c64adf7533dfbb?pvs=25 How about two phase tagging in tag optimizations? |
Sure, @awtkns , shared. I'm having some issues with the poetry.lock file -- not sure why. Are you able to take a quick look? Thanks! |
Yup, i can fix the poetry lock issues~ |
Thanks for this @ml5ah! Tested, looks good. I removed the strenum dependancy in favor of the built-in Literal type. Also added an e2e test. |
Sure, @asim-shrestha. Sounds like a very interesting problem! |
Cool @ml5ah! Can you make an issue and we can collaborate / answer questions there |
This PR implements support for MS Azure Vision OCR. This is part of the roadmap for Tarsier and provides an alternative to the existing Google Cloud Vision OCR option.
It has been reported that MS Azure OCR performs better on webpages and therefore it makes this addition a worthwhile enhancement.
Please share feedback or potential improvements.
Issue: #84