Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expecting value: line 1 column 1 (char 0) Attachment not found at ... - ConfluenceLoader fails to parse attachment JSON and generates incorrect URLs #28672

Open
5 tasks done
ml-lubich opened this issue Dec 11, 2024 · 2 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@ml-lubich
Copy link

ml-lubich commented Dec 11, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Description:
When attempting to download attachments from Confluence using the ConfluenceLoader, two issues occur:

  1. JSON parsing error: "Expecting value: line 1 column 1 (char 0)"
  2. Malformed URLs with double slashes and missing /wiki/ path segment

include_attachments=True seems to be broken

Steps to Reproduce:

  1. Initialize ConfluenceLoader with attachments enabled:
loader = ConfluenceLoader(
url="https://your-confluence-instance.atlassian.net",
username="username",
api_key="api_key",
cloud=True,
include_attachments=True
)
  1. Attempt to load documents with attachments
  2. Observe errors in console

Current Behavior:

  1. JSON parsing error occurs when trying to process attachments
  2. Generated URLs are malformed:
    https://instance.atlassian.net//download/attachments/...
    
    Instead of:
    https://instance.atlassian.net/wiki/download/attachments/...
    

Expected Behavior:

  1. Proper JSON parsing of attachment data
  2. Correctly formatted URLs with:
    • No double slashes
    • Proper /wiki/ path segment
    • Correct query parameters

Error Messages:

Expecting value: line 1 column 1 (char 0)
Attachment not found at https://instance.atlassian.net//download/attachments/371949598/image.png?version=1&modificationDate=1530849923653&cacheVersion=1&api=v2

Expected Behavior:
Correct URL format should be:

https://instance.atlassian.net/wiki/download/attachments/PAGE_ID/file.png?version=1&modificationDate=123&cacheVersion=1&api=v2&download=true

Key URL Differences:

  1. Missing /wiki/ in path
  2. Double slash // after domain
  3. Missing &download=true parameter
  4. Incorrect URL encoding of special characters in filenames

Environment:

  • Langchain version: 0.3.9
  • Langchain Community version: 0.3.8
  • Python version: 3.11.10
  • Operating System: Red Hat Enterprise Linux Server 7.9 (Maipo) (I know this is a bit old, but this fails on macOS Sequoia (15.0 (24A335)) as well
  • Kernel: Linux 3.10.0-1160.62.1.el7.x86_64

Additional Context:
This appears to be an issue with both the attachment JSON parsing and URL construction in the ConfluenceLoader implementation. The double slash and missing /wiki/ segment in the URL causes 404 errors, while the JSON parsing error suggests problems with the attachment data handling.

Error Message and Stack Trace (if applicable)

(provided above)

Description

Description
I'm trying to use the ConfluenceLoader from langchain-community to download and process Confluence documents with attachments.

Expected behavior:

  • Correctly formatted attachment URLs with proper path structure
  • Successful download of attachments from Confluence

Current behavior:

  • JSON parsing error: "Expecting value: line 1 column 1 (char 0)"
  • Malformed URLs with incorrect structure
  • 404 errors when trying to download attachments

System Info

System Information

OS: Linux
OS Version: #1 SMP Wed Mar 23 09:04:02 UTC 2022
Python Version: 3.11.10 (main, Oct 3 2024, 07:29:13) [GCC 11.2.0]

Package Information

langchain_core: 0.3.21
langchain: 0.3.9
langchain_community: 0.3.8
langsmith: 0.1.147
langchain_chroma: 0.1.4
langchain_google_genai: 2.0.6
langchain_ollama: 0.2.0
langchain_text_splitters: 0.3.2

Optional packages not installed

langserve

Other Dependencies

aiohttp: 3.11.9
async-timeout: Installed. No version info available.
chromadb: 0.5.20
dataclasses-json: 0.6.7
fastapi: 0.115.5
filetype: 1.2.0
google-generativeai: 0.8.3
httpx: 0.27.2
httpx-sse: 0.4.0
jsonpatch: 1.33
langsmith-pyo3: Installed. No version info available.
numpy: 1.26.4
ollama: 0.4.2
orjson: 3.10.12
packaging: 24.2
pydantic: 2.10.2
pydantic-settings: 2.6.1
PyYAML: 6.0.2
requests: 2.32.3
requests-toolbelt: 1.0.0
SQLAlchemy: 2.0.35
tenacity: 9.0.0
typing-extensions: 4.12.2

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Dec 11, 2024
@ccurme
Copy link
Collaborator

ccurme commented Dec 13, 2024

@Tonkonozhenko I'm wondering if you can replicate this or if #27620 is related?

@cosmorocker
Copy link

@ml-lubich, could you try again using the URL ending with the wiki?
I see in the docs it's required: https://python.langchain.com/docs/integrations/document_loaders/confluence/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants