Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update from v3.2 to v4.0 of the Azure AI Vision API #829

Merged
merged 11 commits into from
Dec 12, 2024
Merged

Conversation

dkotter
Copy link
Collaborator

@dkotter dkotter commented Nov 26, 2024

Description of the Change

In #559, we switched over to using the Azure AI Vision v3.2 API for all Features relying on that. We decided not to switch to the v4.0 of that API as it was still in public preview and had some breaking changes.

That API seems to be more stable now so this PR switches over to that for the following Features:

  • Descriptive Text Generator
  • Image Tags Generator
  • Image Text Extraction

It does not change the following Features:

  • Image Cropping: the v4.0 API is fairly different (doesn't actually return a cropped image but returns the image coordinates that need cropped) and will require additional work
  • PDF Text Extraction: this doesn't exist in the v4.0 API but has been moved to an entirely new API, Azure AI Document Intelligence, so will look to tackle that in a separate PR

Things to note:

  • the v4.0 API supports images up to 20MB, up from previous of 4MB and larger dimensions, up to 16000x16000px
  • Image Text Extraction (OCR) used to be two separate API requests. That can all be done in the v4.0 API so the code for this has been simplified (we've removed the OCR class entirely)
  • The v4.0 API has less regions supported, in particular for the captions feature, which is used for the Descriptive Text Generator Feature. See https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/overview-image-analysis?tabs=4-0#region-availability for that list
  • Also seems the v4.0 API has fixed confidence scores. In v3.0, we recommended a threshold of 70-75%. In v3.2, we saw scores drop to 50-55% and used that as our recommendation. In testing multiple images, it seems 70% is again a good default so that has been updated

In addition, we now output an error message if a valid caption is returned but the confidence score is lower than our threshold. Previously we would just silently discard that, which can lead to people thinking things aren't working. We still don't save that caption but we show an error letting the user know what happened.

Partially closes #827

For some tests, here's some results I got:

Image v3.2 Caption v3.2 Confidence Score v4.0 Caption v4.0 Confidence Score
A scientist with a microscope a woman wearing a white coat and white lab coat sitting at a desk 36.85% a woman in a lab coat and gloves holding a pen and looking at a microscope 70.76%
Stop sign a stop sign with a cloudy sky 49.51% a stop sign with clouds in the background 82.06%
A dog a dog with its mouth open 57.06% a dog with its mouth open 83.40%

You could argue on if these captions are better or not but they are definitely not worse and the confidence scores are back to being more realistic, which is great as that's an issue that trips up a lot of people

How to test the Change

  1. Setup the Descriptive Text Generator Feature with Azure AI. Ensure it works as expected
  2. Setup the Image Tags Generator Feature with Azure AI. Ensure it works as expected
  3. Setup the Image Text Extraction Feature with Azure AI. Ensure it works as expected

Changelog Entry

Changed - Migrate from the Azure AI Vision v3.2 API to the v4.0 API

Credits

Props @dkotter, @jeffpaul

Checklist:

…rgest image based on filesize and dimensions. Remove the OCR class as it is no longer needed
…hreshold, output an error message instead of just discarding silently. Ensure the caption we save has the first letter uppercased. Ensure the values we want exist before using them
@dkotter dkotter added this to the 3.2.0 milestone Nov 26, 2024
@dkotter dkotter self-assigned this Nov 26, 2024
@dkotter dkotter requested review from jeffpaul and a team as code owners November 26, 2024 20:22
@github-actions github-actions bot added the needs:code-review This requires code review. label Nov 26, 2024
@dkotter dkotter changed the title Feature/827 Update from v3.2 to v4.0 of the Azure AI Vision API Nov 26, 2024
@github-actions github-actions bot added the needs:refresh This requires a refreshed PR to resolve. label Dec 10, 2024
@github-actions github-actions bot removed the needs:refresh This requires a refreshed PR to resolve. label Dec 10, 2024
Copy link
Member

@iamdharmesh iamdharmesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dkotter. PR looks good to me and it tests well.

@dkotter dkotter merged commit 54d067b into develop Dec 12, 2024
17 checks passed
@dkotter dkotter deleted the feature/827 branch December 12, 2024 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs:code-review This requires code review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Azure AI Vision from 3.2 to 4.0
2 participants