You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
And here's the error -
[Document(page_content='', metadata={'_ab_source_file_last_modified': '2023-11-28T19:43:49.000000Z', '_ab_source_file_url': 'TermPaper.docx', 'document_key': 'TermPaper.docx', '_ab_source_file_parse_error': "Error parsing record. This could be due to a mismatch between the config's file type and the actual file type, or because the file or record is not parseable. Contact Support if you need assistance.\nfilename=TermPaper.docx message=\n**********************************************************************\n Resource \x1b[93mpunkt_tab\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('punkt_tab')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mtokenizers/punkt_tab/english/\x1b[0m\n\n Searched in:\n - '/home/soham/nltk_data'\n - '/home/soham/work/apps/tryouts/gdrive_integration/.venv-source-google-drive/nltk_data'\n - '/home/soham/work/apps/tryouts/gdrive_integration/.venv-source-google-drive/share/nltk_data'\n - '/home/soham/work/apps/tryouts/gdrive_integration/.venv-source-google-drive/lib/nltk_data'\n - '/usr/share/nltk_data'\n - '/usr/local/share/nltk_data'\n - '/usr/lib/nltk_data'\n - '/usr/local/lib/nltk_data'\n**********************************************************************\n", '_airbyte_raw_id': '01JAQ6ZEB720CS3BNHYVMKFQEC', '_airbyte_extracted_at': datetime.datetime(2024, 10, 21, 9, 36, 50, 530000), '_airbyte_meta': {}, 'last_modified': '2024-10-21T15:06:52.694685'})]
Any idea how to resolve this ?
The text was updated successfully, but these errors were encountered:
Hi @soham-aiplanet !
I see the error message you encountered, and I believe it has to do with a missing resource in the Natural Language Toolkit (NLTK) library.
The error appears because the punkt tokenizer is needed to parse text in the document, but it’s not currently available in your environment. To resolve this, please install punkt by running:
importnltknltk.download('punkt')
After installing it, try running the code again, and the error should be resolved. Please let me know if this works or if you run into any other issues.
This is my code
And here's the error -
[Document(page_content='', metadata={'_ab_source_file_last_modified': '2023-11-28T19:43:49.000000Z', '_ab_source_file_url': 'TermPaper.docx', 'document_key': 'TermPaper.docx', '_ab_source_file_parse_error': "Error parsing record. This could be due to a mismatch between the config's file type and the actual file type, or because the file or record is not parseable. Contact Support if you need assistance.\nfilename=TermPaper.docx message=\n**********************************************************************\n Resource \x1b[93mpunkt_tab\x1b[0m not found.\n Please use the NLTK Downloader to obtain the resource:\n\n \x1b[31m>>> import nltk\n >>> nltk.download('punkt_tab')\n \x1b[0m\n For more information see: https://www.nltk.org/data.html\n\n Attempted to load \x1b[93mtokenizers/punkt_tab/english/\x1b[0m\n\n Searched in:\n - '/home/soham/nltk_data'\n - '/home/soham/work/apps/tryouts/gdrive_integration/.venv-source-google-drive/nltk_data'\n - '/home/soham/work/apps/tryouts/gdrive_integration/.venv-source-google-drive/share/nltk_data'\n - '/home/soham/work/apps/tryouts/gdrive_integration/.venv-source-google-drive/lib/nltk_data'\n - '/usr/share/nltk_data'\n - '/usr/local/share/nltk_data'\n - '/usr/lib/nltk_data'\n - '/usr/local/lib/nltk_data'\n**********************************************************************\n", '_airbyte_raw_id': '01JAQ6ZEB720CS3BNHYVMKFQEC', '_airbyte_extracted_at': datetime.datetime(2024, 10, 21, 9, 36, 50, 530000), '_airbyte_meta': {}, 'last_modified': '2024-10-21T15:06:52.694685'})]
Any idea how to resolve this ?
The text was updated successfully, but these errors were encountered: