You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If the document languages are not provided by the user, use langdetect to detect the language of the text (on a document level for speed). If confidence in result is high enough, we can assume all elements are in the detected language.
Pattern has been established in this PR for text partitioning. Apply this pattern - adding the languages parameter for user input, detecting the document language, and adding the resulting document language to the element metadata - to all other non-image-based documents (all but pdf and image) partitioning functions as well as auto partition.
The text was updated successfully, but these errors were encountered:
### Summary
Closes#1534 and #1535
Detects document language using `langdetect` package.
Creates new kwargs for user to set the document language (`languages`)
or detect the language at the element level instead of the default
document level (`detect_language_per_element`)
---------
Co-authored-by: shreyanid <[email protected]>
Co-authored-by: ryannikolaidis <[email protected]>
Co-authored-by: Coniferish <[email protected]>
Co-authored-by: cragwolfe <[email protected]>
Co-authored-by: Austin Walker <[email protected]>
If the document languages are not provided by the user, use langdetect to detect the language of the text (on a document level for speed). If confidence in result is high enough, we can assume all elements are in the detected language.
Pattern has been established in this PR for text partitioning. Apply this pattern - adding the languages parameter for user input, detecting the document language, and adding the resulting document language to the element metadata - to all other non-image-based documents (all but pdf and image) partitioning functions as well as auto partition.
The text was updated successfully, but these errors were encountered: