Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate problematic visually similar images #5581

Closed
jtweed opened this issue Jul 20, 2022 · 6 comments
Closed

Investigate problematic visually similar images #5581

jtweed opened this issue Jul 20, 2022 · 6 comments

Comments

@jtweed
Copy link
Contributor

jtweed commented Jul 20, 2022

We've had a couple of reports of problematic visually similar images.

The images in question don't look particularly similar to a human, but do have some limited colour similarities that should not be enough to have them appear as visually similar images.

We should perform an initial investigation as to why these images are appearing and what we can do to improve the model. We may need to also increase the threshold required to display an image as visually similar.

Slack thread: https://wellcome.slack.com/archives/C8X9YKM5X/p1658310838492189

@jtweed jtweed moved this to Backlog in Digital platform Jul 20, 2022
@pollecuttn pollecuttn moved this from Backlog to Next in Digital platform Jul 21, 2022
@harrisonpim harrisonpim moved this from Next to In Progress in Digital platform Jul 25, 2022
@harrisonpim
Copy link
Contributor

harrisonpim commented Jul 26, 2022

some things to discern and visualise:

  • the distribution of highest scores for x random images in the collection, using the blended similarity metric
  • how the scores of the problematic matches compare to that distribution. could we solve the problem by setting a reasonable threshold similarity?
  • how much of the bad matching is down to palette similarity, and how much is related to the features?
  • can we produce better matches by boosting the scores for features/palettes?
  • how do the LSH scores compare to exact cosine similarity for those bad matches?
  • does the approximate-nearest-neighbour matching in elasticsearch 8 do any better?
  • does extracting clusters with eg dbscan, which leaves out poorly-clustered data points do any better at LSH matching?
  • is the problem with the features themselves? could we do any better by extracting features with a different network backbone?
  • is any candidate actually better? can we prove it using elo-style comparison of results etc?

@harrisonpim
Copy link
Contributor

scores for the most similar image, for 1000 randomly chosen images

Image

Statistic Value
mean 852.234550
std 457.023707
min 214.409360
25% 478.300000
50% 730.812350
75% 1141.242475
max 2546.215000

@harrisonpim
Copy link
Contributor

scoring the problematic matches:

  • the match between {'source_id': 'fdgrjrwb', 'target_id': 'v75jmdmc'} gets a score of 268.3973
  • the match between {'source_id': 'dwhuv3ph', 'target_id': 'cg7hzgv8'} gets a score of 244.78856

@harrisonpim
Copy link
Contributor

We show 6 similar matches in the image modal. Scores for the top 6 for another 1000 randomly chosen images:

Statistic 0 1 2 3 4 5
mean 805.663409 680.741450 647.117353 622.758611 607.618408 595.415108
std 447.583576 374.179087 364.052818 355.968056 352.783572 350.241211
min 207.642960 203.781980 187.570050 184.856670 173.226070 169.811940
25% 464.282765 401.164182 377.921843 363.498445 350.860665 344.204062
50% 663.031220 563.406435 524.253205 500.598565 485.957535 472.531535
75% 1047.624250 860.295825 806.941750 785.379488 759.093130 744.427825
max 2510.807400 2326.201700 2300.046000 2251.751700 2204.464600 2194.731200

Image

@jtweed
Copy link
Contributor Author

jtweed commented Jul 26, 2022

Without seeing examples of images with different scores, just from the two examples and the percentiles it seems like our threshold is far too low?

I would rather we didn't display similar images when we can't hit a high threshold, rather than always try to include them and end up with a poor set of suggestions.

@harrisonpim
Copy link
Contributor

summarising what i spoke about with @jtweed earlier

Here's the explain response for {"source_id": "fdgrjrwb", "target_id": "v75jmdmc"}

The palette contributions are a much smaller part of the total score than we expected. For the other example (dwhuv3ph / cg7hzgv8), the result is even more extreme.

Because the results aren't particularly visually similar, my hypothesis is that we're seeing a lot of bad matches in the lsh features. This might be happening because we use k-means clustering, which doesn't allow weakly connected points to fall outside the clusters and remain unlabelled. The sklearn docs of this effect in action.

If we imagine a super simple dataset with two major clusters in a 2d feature space:

Untitled-2021-02-08-140

Here's a very basic illustration of what we get with k-means

Untitled-2021-02-08-1406

and what we should really be looking for

Untitled-2021-02-08-1406-2

Switching from k-means to OPTICS or DBSCAN would allow us to keep all of our existing query patterns in place while significantly limiting the number of poor matches within each feature subspace, and thereby limit the number of visually dissimilar results on the site.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants