Investigate problematic visually similar images #5581

jtweed · 2022-07-20T14:00:40Z

We've had a couple of reports of problematic visually similar images.

The images in question don't look particularly similar to a human, but do have some limited colour similarities that should not be enough to have them appear as visually similar images.

We should perform an initial investigation as to why these images are appearing and what we can do to improve the model. We may need to also increase the threshold required to display an image as visually similar.

Slack thread: https://wellcome.slack.com/archives/C8X9YKM5X/p1658310838492189

harrisonpim · 2022-07-26T09:52:02Z

some things to discern and visualise:

the distribution of highest scores for x random images in the collection, using the blended similarity metric
how the scores of the problematic matches compare to that distribution. could we solve the problem by setting a reasonable threshold similarity?
how much of the bad matching is down to palette similarity, and how much is related to the features?
can we produce better matches by boosting the scores for features/palettes?
how do the LSH scores compare to exact cosine similarity for those bad matches?
does the approximate-nearest-neighbour matching in elasticsearch 8 do any better?
does extracting clusters with eg dbscan, which leaves out poorly-clustered data points do any better at LSH matching?
is the problem with the features themselves? could we do any better by extracting features with a different network backbone?
is any candidate actually better? can we prove it using elo-style comparison of results etc?

harrisonpim · 2022-07-26T10:34:54Z

scores for the most similar image, for 1000 randomly chosen images

Statistic	Value
mean	852.234550
std	457.023707
min	214.409360
25%	478.300000
50%	730.812350
75%	1141.242475
max	2546.215000

harrisonpim · 2022-07-26T10:40:28Z

scoring the problematic matches:

the match between {'source_id': 'fdgrjrwb', 'target_id': 'v75jmdmc'} gets a score of 268.3973
the match between {'source_id': 'dwhuv3ph', 'target_id': 'cg7hzgv8'} gets a score of 244.78856

harrisonpim · 2022-07-26T11:24:06Z

We show 6 similar matches in the image modal. Scores for the top 6 for another 1000 randomly chosen images:

Statistic	0	1	2	3	4	5
mean	805.663409	680.741450	647.117353	622.758611	607.618408	595.415108
std	447.583576	374.179087	364.052818	355.968056	352.783572	350.241211
min	207.642960	203.781980	187.570050	184.856670	173.226070	169.811940
25%	464.282765	401.164182	377.921843	363.498445	350.860665	344.204062
50%	663.031220	563.406435	524.253205	500.598565	485.957535	472.531535
75%	1047.624250	860.295825	806.941750	785.379488	759.093130	744.427825
max	2510.807400	2326.201700	2300.046000	2251.751700	2204.464600	2194.731200

jtweed · 2022-07-26T18:38:17Z

Without seeing examples of images with different scores, just from the two examples and the percentiles it seems like our threshold is far too low?

I would rather we didn't display similar images when we can't hit a high threshold, rather than always try to include them and end up with a poor set of suggestions.

harrisonpim · 2022-08-02T12:20:52Z

summarising what i spoke about with @jtweed earlier

Here's the explain response for {"source_id": "fdgrjrwb", "target_id": "v75jmdmc"}

The palette contributions are a much smaller part of the total score than we expected. For the other example (dwhuv3ph / cg7hzgv8), the result is even more extreme.

Because the results aren't particularly visually similar, my hypothesis is that we're seeing a lot of bad matches in the lsh features. This might be happening because we use k-means clustering, which doesn't allow weakly connected points to fall outside the clusters and remain unlabelled. The sklearn docs of this effect in action.

If we imagine a super simple dataset with two major clusters in a 2d feature space:

Here's a very basic illustration of what we get with k-means

and what we should really be looking for

Switching from k-means to OPTICS or DBSCAN would allow us to keep all of our existing query patterns in place while significantly limiting the number of poor matches within each feature subspace, and thereby limit the number of visually dissimilar results on the site.

jtweed assigned harrisonpim Jul 20, 2022

jtweed added 🔬Data Science 📚Catalogue labels Jul 20, 2022

jtweed added this to Digital platform Jul 20, 2022

jtweed moved this to Backlog in Digital platform Jul 20, 2022

pollecuttn moved this from Backlog to Next in Digital platform Jul 21, 2022

harrisonpim moved this from Next to In Progress in Digital platform Jul 25, 2022

harrisonpim mentioned this issue Aug 2, 2022

add a threshold score for visually similar images wellcomecollection/catalogue-api#516

Closed

harrisonpim closed this as completed Aug 2, 2022

Repository owner moved this from In Progress to Done in Digital platform Aug 2, 2022

pollecuttn moved this from Done to Archive in Digital platform Aug 3, 2022

harrisonpim mentioned this issue Aug 12, 2022

Add threshold image similarity scores wellcomecollection/catalogue-api#526

Merged

pollecuttn removed this from Digital platform Feb 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate problematic visually similar images #5581

Investigate problematic visually similar images #5581

jtweed commented Jul 20, 2022 •

edited

Loading

harrisonpim commented Jul 26, 2022 •

edited

Loading

harrisonpim commented Jul 26, 2022

harrisonpim commented Jul 26, 2022

harrisonpim commented Jul 26, 2022

jtweed commented Jul 26, 2022

harrisonpim commented Aug 2, 2022

Investigate problematic visually similar images #5581

Investigate problematic visually similar images #5581

Comments

jtweed commented Jul 20, 2022 • edited Loading

harrisonpim commented Jul 26, 2022 • edited Loading

harrisonpim commented Jul 26, 2022

harrisonpim commented Jul 26, 2022

harrisonpim commented Jul 26, 2022

jtweed commented Jul 26, 2022

harrisonpim commented Aug 2, 2022

jtweed commented Jul 20, 2022 •

edited

Loading

harrisonpim commented Jul 26, 2022 •

edited

Loading