-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: form parsing placeholders #3034
Conversation
🔍 Existing Issues For ReviewYour pull request is modifying functions with the following pre-existing issues: 📄 File: unstructured/documents/elements.py
Did you find this useful? React with a 👍 or 👎 |
I did not touch that and I refuse to use this PR to fix that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Couple small coments.
Concur, not sure why that alert appeared. |
The error is rather sensible - many numpy/pandas/similar objects overwrite the == operator into a broadcasting one, one of the previous pull requests likely allowed such situation to happen. Now any change to elements.py will get this notification ;) |
…tadata, also small qol changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM pending a minor changelog nit!
### Summary Closes #3034 and reenables ARM64 in the docker build and publish job. This was taken out in #3039 because we've only build `libreoffice` for AMD64 and not ARM64. If Chainguard publishes an `apk` for `libreoffice`, we can support a Chainguard image for both architectures. The smoke test now differs for both architectures, to reflect differences in the directory structure. ### Testing Build and publish ran successfully for ARM64 (job [here](https://github.com/Unstructured-IO/unstructured/actions/runs/9129712470/job/25104907497)) and AMD64 (job [here](https://github.com/Unstructured-IO/unstructured/actions/runs/9129712470/job/25104907826)).
Allows introduction of form extraction in the future - sets up the FormKeysValues element & format, puts in an empty function call in the partition_pdf_or_image pipeline.