Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Azure AI Vision from 3.2 to 4.0 #827

Closed
1 task done
jeffpaul opened this issue Nov 21, 2024 · 4 comments · Fixed by #829
Closed
1 task done

Update Azure AI Vision from 3.2 to 4.0 #827

jeffpaul opened this issue Nov 21, 2024 · 4 comments · Fixed by #829
Assignees
Labels
help wanted Extra attention is needed
Milestone

Comments

@jeffpaul
Copy link
Member

Is your enhancement related to a problem? Please describe.

Related to #826 and alongside adding OpenAI to alt text generation, let's look to update the Azure API version in ClassifAI.

While there are some differences from the features that are / are not available in 4.0 (which we'll want to validate ClassifAI features that are available in v4), let's look to update to 4.0 where that version includes coverage for specific ClassifAI image processing features using Azure.

Version 4.0 features available: Read text, Captions, Dense captions, Tags, Object detection, Custom image classification / object detection, People, Smart crop

Better models; use version 4.0 if it supports your use case.

Version 3.2 features available: Tags, Objects, Descriptions, Brands, Faces, Image type, Color scheme, Landmarks, Celebrities, Adult content, Smart crop

Wider range of features; use version 3.2 if your use case is not yet supported in version 4.0

Additional context from Azure:

We recommend you use the Image Analysis 4.0 API if it supports your use case. Use version 3.2 if your use case is not yet supported by 4.0.

You'll also need to use version 3.2 if you want to do image captioning and your Vision resource is outside the supported Azure regions. The image captioning feature in Image Analysis 4.0 is only supported in certain Azure regions. Image captioning in version 3.2 is available in all Azure AI Vision regions. See Region availability.

Designs

No response

Describe alternatives you've considered

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@jeffpaul jeffpaul added help wanted Extra attention is needed type:enhancement labels Nov 21, 2024
@jeffpaul jeffpaul added this to the 3.2.0 milestone Nov 21, 2024
@jeffpaul jeffpaul moved this from Incoming to To Do in Open Source Practice Nov 21, 2024
@dkotter
Copy link
Collaborator

dkotter commented Nov 21, 2024

Note a lot of research was done into this in #553, though some things have changed since then. From what I recall, the way the v4 API works is pretty different and will require changes to how we currently do things (seems there were asynchronous vs synchronous differences)

@dkotter dkotter self-assigned this Nov 22, 2024
@dkotter
Copy link
Collaborator

dkotter commented Nov 22, 2024

Status update (mostly to remind myself where I left off for next week):

Started work on this and have successfully migrated the Descriptive Text Generator, Image Tags Generator and Image Text Extraction Features over to the v4.0 API with no real challenges. Image Text Extraction required the most changes as we used to make two separate API requests and now that can be done in one.

Still left to fully look into are Image Cropping (which at a glance looks to be a fairly straight forward change) and PDF Text Extraction (which is a new API, Azure AI Document Intelligence, so will probably require more work)

@dkotter
Copy link
Collaborator

dkotter commented Nov 26, 2024

In doing more research, found that Image Cropping isn't quite as cut and dry to move over. In v3.2, we send an image URL plus the dimensions we want the final cropped image to be and Azure sends back the cropped image, which we then store.

In v4.0, you send an image URL plus the aspect ratio you want to maintain, and Azure sends back a bounding box within the image representing what they recommend be cropped. You then have to crop the image yourself. The main concern I have is how to crop images smaller using that bounding box. As an example, if the full size image is 1024x768 and I want a cropped 300x300, it seems the bounding box returned is for the full image size, so not sure how to translate that down into the 300x300 size. Because of the extra effort here, recommending we look into that in a separate PR.

For the PDF Text Extraction, this doesn't exist in the v4.0 API, it now lives in a new API, Azure AI Document Intelligence. I don't think it will be too hard to integrate this but because this issue is focused on migrating from v3.2 to v4.0, I'd suggest we look into that in a different Issue so we don't block the other updates.

@jeffpaul
Copy link
Member Author

jeffpaul commented Dec 5, 2024

In v4.0, you send an image URL plus the aspect ratio you want to maintain, and Azure sends back a bounding box within the image representing what they recommend be cropped. You then have to crop the image yourself. The main concern I have is how to crop images smaller using that bounding box. As an example, if the full size image is 1024x768 and I want a cropped 300x300, it seems the bounding box returned is for the full image size, so not sure how to translate that down into the 300x300 size. Because of the extra effort here, recommending we look into that in a separate PR.

I concur, separate PR probably best for the level of effort there. Any chance we can get a single point in the image that's the focal and then expand from that to our desired crop width/height?

For the PDF Text Extraction, this doesn't exist in the v4.0 API, it now lives in a new API, Azure AI Document Intelligence. I don't think it will be too hard to integrate this but because this issue is focused on migrating from v3.2 to v4.0, I'd suggest we look into that in a different Issue so we don't block the other updates.

I concur, different issue/PR there as well.

@github-project-automation github-project-automation bot moved this from To Do to Done in Open Source Practice Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants