-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support PDF in GDrive #56
Comments
I will be happy to take this one |
Great! When are you expecting to finish this? |
I am trying to set up the dev environment right now but getting some issues in the following environment:
the function
The following line is giving me the below error:
Upon researching I found out that os.getlogin() is the culprit. If you cannot provide any help with this issue, can you describe a proper environment setup that will be suitable for development? |
Just hardcode any path that is valid in WSL. |
Currently, it is possible to parse the entire content of pdf files as text, but as it's apparent from your parsers, the program needs to compile it in the following form: Some title: related text Am I right? There is already a pull request that parses the entire pdf document as text. If you have any enhancements or suggestions for that, I'll be more than willing to implement them. Meanwhile, I am also researching how can I parse pdf while keeping the hierarchical information intact. |
Hey! |
@rishi003 let's chat on discord! I could guide you a little bit :) |
Sure, shall we discuss it on the discuss thread? |
See similar parsers here:
Add new parser:
e.g: docx, txt, html, pptx etc...
https://github.com/GerevAI/gerev/tree/main/app/parsers
add file type:
app/data_source_api/basic_document.py
Google Drive support: app/data_sources/google_drive.py
...
The text was updated successfully, but these errors were encountered: