-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add threshold image similarity scores #526
Conversation
val defaultMinScore: Double = similarityMetric match { | ||
case SimilarityMetric.Blended => 300 | ||
case SimilarityMetric.Features => 300 | ||
case SimilarityMetric.Colors => 20 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain how you came up with these default scores? Why is Colors
so much lower?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. There's a notebook in the data science repo which produced the results in wellcomecollection/platform#5581. Based on that analysis, we decided that ~300 seemed like an appropriate threshold for the blended similarity metric.
I re-ran that analysis with the state.inferredData.lshEncodedFeatures
and state.inferredData.palette
fields individually, and produced these corresponding graphs:
If my mental maths is right, the scores for colours are generally lower because the state.inferredData.palette
field contains fewer, more commonly occurring terms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, super. Maybe just include a comment pointing to that ticket, so we can find this again in future?
This PR sets a threshold similarity score for images so that we can avoid displaying questionable matches. Closes #516
This change messes with some of our existing tests - the dummy index populated while setting up the tests can't produce the same scores as the fully populated prod index, and the thresholds therefore filter out any results which might have been matched. In some tests I've been able to manually set the
minScore
to 0, but in others which call the API itself, that's not possible. I feel like these things might be more appropriately tested inrank
, where tests can be run against a full index and scoring can be properly examined.