You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Figures/images that contain text are incorrectly exploded as content, rather than just an image.
Steps to reproduce
For example, here is some debug output attained by executing docling --debug-visualize-cells --debug-visualize-layout https://arxiv.org/pdf/2408.09869
Notice the figure is an image of a document. Docling is detecting the text within the image as text, I would have expected just two elements on this page, the image and its description.
Is there a means to achieve this with configuration?
Bug
Figures/images that contain text are incorrectly exploded as content, rather than just an image.
Steps to reproduce
For example, here is some debug output attained by executing
docling --debug-visualize-cells --debug-visualize-layout https://arxiv.org/pdf/2408.09869
Notice the figure is an image of a document. Docling is detecting the text within the image as text, I would have expected just two elements on this page, the image and its description.
Is there a means to achieve this with configuration?
Docling version
Python version
The text was updated successfully, but these errors were encountered: