Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AsciiDoc backend fails parsing admonitions #566

Open
jramcast opened this issue Dec 11, 2024 · 0 comments
Open

AsciiDoc backend fails parsing admonitions #566

jramcast opened this issue Dec 11, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@jramcast
Copy link

Bug

Asciidoc backend cannot parse files that contain admonitions in block syntax.

Steps to reproduce

  1. Create the test.adoc file with these contents:
= Title 1

This is regular content.

== Title 2

More content

[NOTE]
====
Content displayed as an admonition
====
  1. Run this code:
from docling.document_converter import DocumentConverter

source = "test.adoc"
converter = DocumentConverter()
result = converter.convert(source)
  1. The generated error is an AttributeError:
Traceback (most recent call last):
  File "/home/student/dev/test-docling/main.py", line 14, in <module>
    main()
  File "/home/student/dev/test-docling/main.py", line 7, in main
    result = converter.convert(source)
  File "/home/student/dev/test-docling/.venv/lib/python3.10/site-packages/pydantic/_internal/_validate_call.py", line 38, in wrapper_function
    return wrapper(*args, **kwargs)
  File "/home/student/dev/test-docling/.venv/lib/python3.10/site-packages/pydantic/_internal/_validate_call.py", line 111, in __call__
    res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
  File "/home/student/dev/test-docling/.venv/lib/python3.10/site-packages/docling/document_converter.py", line 172, in convert
    return next(all_res)
  File "/home/student/dev/test-docling/.venv/lib/python3.10/site-packages/docling/document_converter.py", line 193, in convert_all
    for conv_res in conv_res_iter:
  File "/home/student/dev/test-docling/.venv/lib/python3.10/site-packages/docling/document_converter.py", line 228, in _convert
    for item in map(
  File "/home/student/dev/test-docling/.venv/lib/python3.10/site-packages/docling/document_converter.py", line 269, in _process_document
    conv_res = self._execute_pipeline(in_doc, raises_on_error=raises_on_error)
  File "/home/student/dev/test-docling/.venv/lib/python3.10/site-packages/docling/document_converter.py", line 292, in _execute_pipeline
    conv_res = pipeline.execute(in_doc, raises_on_error=raises_on_error)
  File "/home/student/dev/test-docling/.venv/lib/python3.10/site-packages/docling/pipeline/base_pipeline.py", line 52, in execute
    raise e
  File "/home/student/dev/test-docling/.venv/lib/python3.10/site-packages/docling/pipeline/base_pipeline.py", line 44, in execute
    conv_res = self._build_document(conv_res)
  File "/home/student/dev/test-docling/.venv/lib/python3.10/site-packages/docling/pipeline/simple_pipeline.py", line 41, in _build_document
    conv_res.document = conv_res.input._backend.convert()
  File "/home/student/dev/test-docling/.venv/lib/python3.10/site-packages/docling/backend/asciidoc_backend.py", line 75, in convert
    doc = self._parse(doc)
  File "/home/student/dev/test-docling/.venv/lib/python3.10/site-packages/docling/backend/asciidoc_backend.py", line 117, in _parse
    item = self._parse_section_header(line)
  File "/home/student/dev/test-docling/.venv/lib/python3.10/site-packages/docling/backend/asciidoc_backend.py", line 303, in _parse_section_header
    marker = match.group(1)  # The list marker (e.g., "*", "-", "1.")
AttributeError: 'NoneType' object has no attribute 'group'

It appears that the error is caused by the _is_section_header method in the asciidoc_backend.
This method is considering the ==== line as a title, when in fact the line is part of the admonition block syntax.

Docling version

Docling version: 2.10.0
Docling Core version: 2.9.0
Docling IBM Models version: 2.0.7
Docling Parse version: 3.0.0

Python version

Python 3.10.15

@jramcast jramcast added the bug Something isn't working label Dec 11, 2024
@jramcast jramcast changed the title AsciiDoc parsing fails parsing asciidoc files that contain admonitions AsciiDoc backend fails parsing asciidoc files that contain admonitions Dec 11, 2024
@jramcast jramcast changed the title AsciiDoc backend fails parsing asciidoc files that contain admonitions AsciiDoc backend fails parsing admonitions Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant