-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: change default hi_res model to yolox quantized #1607
Conversation
- refactor the way to set default model for hi_res mode for image and pdf partition into a function that is callable and returns either an env varaible or a default model - this keeps the current pattern for setting the default hi_res model - change the default model name from `detectron2_onnx` to `yolox_quantized` - the new default mode has better recall for tables and richer categories for partitioned elements than detectron2
…ixtures update (#1618) This pull request includes updated ingest test fixtures. Please review and merge if appropriate. Co-authored-by: badGarnet <[email protected]>
please review comments here for the ingest diff eval: #1618 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, however, I would like to count with the metrics to make this change (not necessarily a blocker). I was able to manually found a lot of the text of the previous model in the new ingest tests.
actually do we have gold label for those ingest test documents? if not maybe that is something we should have even before generating metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than a documentation nit.
Co-authored-by: qued <[email protected]>
This PR was initially created to close GitHub Issue #1604 (Synchronizing the default layout model), but since it was already resolved in PR [#1607](#1607), this PR now only adds the visualization script used to investigate the issue. ### Summary - add python script to annotate elements PDF: [references.pdf](https://github.com/Unstructured-IO/unstructured/files/12778270/references.pdf) ### Evaluation ``` PYTHONPATH=. python examples/layout-analysis/visualization.py references.pdf hi_res ```
detectron2_onnx
toyolox_quantized