Skip to content

Commit

Permalink
feat: Azure converter updates (#7409)
Browse files Browse the repository at this point in the history
* Initial commit

* Remove old mock tests

* Fix current_last_page_number calculation

* Carry over unit tests from the other side

* Update pydocs, skip failing tests

* Fix pylint and mypy

* Minor adjustments

* Add release note

* Minor touch ups

* Resolve Document unique id issue by using custom id calculation

* Better hashing, add unit tests

* Small fixes
  • Loading branch information
vblagoje authored Apr 9, 2024
1 parent 174ac79 commit 988c360
Show file tree
Hide file tree
Showing 8 changed files with 26,514 additions and 65 deletions.
387 changes: 359 additions & 28 deletions haystack/components/converters/azure.py

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
enhancements:
- |
Enhanced the AzureOCRDocumentConverter to include advanced handling of tables and text. Features such as extracting preceding and following context for tables, merging multiple column headers, and enabling single column page layout for text have been introduced. This update furthers the flexibility and accuracy of document conversion within complex layouts.
269 changes: 232 additions & 37 deletions test/components/converters/test_azure_ocr_doc_converter.py

Large diffs are not rendered by default.

Loading

0 comments on commit 988c360

Please sign in to comment.