Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: remove image extraction related code #299

Merged
merged 6 commits into from
Dec 5, 2023

Conversation

christinestraub
Copy link
Contributor

@christinestraub christinestraub commented Dec 1, 2023

Summary

This PR is the first part of the "image extraction" refactor to move it from unstructured-inference repo to unstructured repo. This PR removes all "image extraction" related code from unstructured-inference repo and works together with the unstructured refactor PR - Unstructured-IO/unstructured#2201.

Note

The ingest test won't pass until we merge the unstructured refactor PR - Unstructured-IO/unstructured#2201.

Copy link
Contributor

@cragwolfe cragwolfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test instructions in associated unstructured PR Unstructured-IO/unstructured#2201 (review) LGTM!

@cragwolfe cragwolfe merged commit 794f38b into main Dec 5, 2023
5 of 8 checks passed
@cragwolfe cragwolfe deleted the refacotr/remove_image_extraction_code branch December 5, 2023 07:41
github-merge-queue bot pushed a commit to Unstructured-IO/unstructured that referenced this pull request Dec 5, 2023
### Summary
This PR is the second part of the "image extraction" refactor to move it
from unstructured-inference repo to unstructured repo, the first part is
done in
Unstructured-IO/unstructured-inference#299. This
PR adds logic to support extracting images.

### Testing

`git clone -b refactor/remove_image_extraction_code --single-branch
https://github.com/Unstructured-IO/unstructured-inference.git && cd
unstructured-inference && pip install -e . && cd ../`

```
elements = partition_pdf(
        filename="example-docs/embedded-images.pdf",
        strategy="hi_res",
        extract_images_in_pdf=True,
    )

print("\n\n".join([str(el) for el in elements]))
```
@christinestraub christinestraub restored the refacotr/remove_image_extraction_code branch May 20, 2024 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants