Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add threshold image similarity scores #526

Merged
merged 6 commits into from
Aug 22, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -117,19 +117,22 @@ class ImagesRequestBuilder(queryConfig: QueryConfig)
colorQuery(field = "state.inferredData.palette", hexColors)
}

def requestWithBlendedSimilarity: (Index, String, Int) => SearchRequest =
def requestWithBlendedSimilarity
: (Index, String, Int, Double) => SearchRequest =
similarityRequest(ImageSimilarity.blended)

def requestWithSimilarFeatures: (Index, String, Int) => SearchRequest =
def requestWithSimilarFeatures
: (Index, String, Int, Double) => SearchRequest =
similarityRequest(ImageSimilarity.features)

def requestWithSimilarColors: (Index, String, Int) => SearchRequest =
def requestWithSimilarColors: (Index, String, Int, Double) => SearchRequest =
similarityRequest(ImageSimilarity.color)

private def similarityRequest(
query: (String, Index) => Query
)(index: Index, id: String, n: Int): SearchRequest =
)(index: Index, id: String, n: Int, minScore: Double): SearchRequest =
search(index)
.query(query(id, index))
.size(n)
.minScore(minScore)
}
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@ class ImagesService(
def retrieveSimilarImages(
index: Index,
imageId: String,
similarityMetric: SimilarityMetric = SimilarityMetric.Blended
similarityMetric: SimilarityMetric = SimilarityMetric.Blended,
minScore: Option[Double] = None
): Future[List[IndexedImage]] = {
val builder = similarityMetric match {
case SimilarityMetric.Blended =>
Expand All @@ -53,7 +54,18 @@ class ImagesService(
requestBuilder.requestWithSimilarColors
}

val searchRequest = builder(index, imageId, nVisuallySimilarImages)
// default minimum scores for each similarity metric determined using this notebook
// https://github.com/wellcomecollection/data-science/blob/47245826c70bf2d76c63d2c4b3ace6c824673784/notebooks/similarity_problems/notebooks/01-similarity-scores.ipynb
val defaultMinScore: Double = similarityMetric match {
case SimilarityMetric.Blended => 300
case SimilarityMetric.Features => 300
case SimilarityMetric.Colors => 20
}
Comment on lines +59 to +63
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how you came up with these default scores? Why is Colors so much lower?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. There's a notebook in the data science repo which produced the results in wellcomecollection/platform#5581. Based on that analysis, we decided that ~300 seemed like an appropriate threshold for the blended similarity metric.

I re-ran that analysis with the state.inferredData.lshEncodedFeatures and state.inferredData.palette fields individually, and produced these corresponding graphs:

lshEncodedFeatures:
2c12fc1b-c57d-490f-83fd-7590149831de

palette:
9c4a9ae3-49d1-4834-a769-f56f32e1bc1c

If my mental maths is right, the scores for colours are generally lower because the state.inferredData.palette field contains fewer, more commonly occurring terms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, super. Maybe just include a comment pointing to that ticket, so we can find this again in future?


val minScoreValue: Double = minScore.getOrElse(defaultMinScore)

val searchRequest =
builder(index, imageId, nVisuallySimilarImages, minScoreValue)

elasticsearchService
.findBySearch(searchRequest)(decoder)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,284 +1,46 @@
{
"id" : "a5lj9duc",
"locations" : [
"id": "a5lj9duc",
"locations": [
{
"accessConditions" : [
],
"license" : {
"id" : "cc-by",
"label" : "Attribution 4.0 International (CC BY 4.0)",
"type" : "License",
"url" : "http://creativecommons.org/licenses/by/4.0/"
},
"linkText" : "Link text: 4N3l8IuPN0",
"locationType" : {
"id" : "iiif-image",
"label" : "IIIF Image API",
"type" : "LocationType"
},
"type" : "DigitalLocation",
"url" : "https://iiif.wellcomecollection.org/image/RlZ.jpg/info.json"
"accessConditions": [],
"license": {
"id": "cc-by",
"label": "Attribution 4.0 International (CC BY 4.0)",
"type": "License",
"url": "http://creativecommons.org/licenses/by/4.0/"
},
"linkText": "Link text: 4N3l8IuPN0",
"locationType": {
"id": "iiif-image",
"label": "IIIF Image API",
"type": "LocationType"
},
"type": "DigitalLocation",
"url": "https://iiif.wellcomecollection.org/image/RlZ.jpg/info.json"
}
],
"source" : {
"id" : "anys7vbv",
"title" : "title-vpbsaVsvTg",
"type" : "Work"
"source": {
"id": "anys7vbv",
"title": "title-vpbsaVsvTg",
"type": "Work"
},
"thumbnail" : {
"accessConditions" : [
],
"license" : {
"id" : "cc-by",
"label" : "Attribution 4.0 International (CC BY 4.0)",
"type" : "License",
"url" : "http://creativecommons.org/licenses/by/4.0/"
"thumbnail": {
"accessConditions": [],
"license": {
"id": "cc-by",
"label": "Attribution 4.0 International (CC BY 4.0)",
"type": "License",
"url": "http://creativecommons.org/licenses/by/4.0/"
},
"linkText" : "Link text: 4N3l8IuPN0",
"locationType" : {
"id" : "iiif-image",
"label" : "IIIF Image API",
"type" : "LocationType"
"linkText": "Link text: 4N3l8IuPN0",
"locationType": {
"id": "iiif-image",
"label": "IIIF Image API",
"type": "LocationType"
},
"type" : "DigitalLocation",
"url" : "https://iiif.wellcomecollection.org/image/RlZ.jpg/info.json"
"type": "DigitalLocation",
"url": "https://iiif.wellcomecollection.org/image/RlZ.jpg/info.json"
},
"type" : "Image",
"visuallySimilar" : [
{
"id" : "dqlalauc",
"locations" : [
{
"accessConditions" : [
],
"license" : {
"id" : "cc-by",
"label" : "Attribution 4.0 International (CC BY 4.0)",
"type" : "License",
"url" : "http://creativecommons.org/licenses/by/4.0/"
},
"locationType" : {
"id" : "iiif-image",
"label" : "IIIF Image API",
"type" : "LocationType"
},
"type" : "DigitalLocation",
"url" : "https://iiif.wellcomecollection.org/image/I2D.jpg/info.json"
}
],
"source" : {
"id" : "j4l3wsjt",
"title" : "title-Q8uNY7ZOEu",
"type" : "Work"
},
"thumbnail" : {
"accessConditions" : [
],
"license" : {
"id" : "cc-by",
"label" : "Attribution 4.0 International (CC BY 4.0)",
"type" : "License",
"url" : "http://creativecommons.org/licenses/by/4.0/"
},
"locationType" : {
"id" : "iiif-image",
"label" : "IIIF Image API",
"type" : "LocationType"
},
"type" : "DigitalLocation",
"url" : "https://iiif.wellcomecollection.org/image/I2D.jpg/info.json"
},
"type" : "Image"
},
{
"id" : "fp9x20si",
"locations" : [
{
"accessConditions" : [
],
"credit" : "Credit line: 7SWk3TTSq0",
"license" : {
"id" : "cc-by",
"label" : "Attribution 4.0 International (CC BY 4.0)",
"type" : "License",
"url" : "http://creativecommons.org/licenses/by/4.0/"
},
"locationType" : {
"id" : "iiif-image",
"label" : "IIIF Image API",
"type" : "LocationType"
},
"type" : "DigitalLocation",
"url" : "https://iiif.wellcomecollection.org/image/WIk.jpg/info.json"
}
],
"source" : {
"id" : "othidyxy",
"title" : "title-pEAVa4Rz7i",
"type" : "Work"
},
"thumbnail" : {
"accessConditions" : [
],
"credit" : "Credit line: 7SWk3TTSq0",
"license" : {
"id" : "cc-by",
"label" : "Attribution 4.0 International (CC BY 4.0)",
"type" : "License",
"url" : "http://creativecommons.org/licenses/by/4.0/"
},
"locationType" : {
"id" : "iiif-image",
"label" : "IIIF Image API",
"type" : "LocationType"
},
"type" : "DigitalLocation",
"url" : "https://iiif.wellcomecollection.org/image/WIk.jpg/info.json"
},
"type" : "Image"
},
{
"id" : "fsq3gsq0",
"locations" : [
{
"accessConditions" : [
],
"credit" : "Credit line: u1A2IYFqOO",
"license" : {
"id" : "cc-by",
"label" : "Attribution 4.0 International (CC BY 4.0)",
"type" : "License",
"url" : "http://creativecommons.org/licenses/by/4.0/"
},
"locationType" : {
"id" : "iiif-image",
"label" : "IIIF Image API",
"type" : "LocationType"
},
"type" : "DigitalLocation",
"url" : "https://iiif.wellcomecollection.org/image/T69.jpg/info.json"
}
],
"source" : {
"id" : "80trgzaf",
"title" : "title-AOCEgK2yRE",
"type" : "Work"
},
"thumbnail" : {
"accessConditions" : [
],
"credit" : "Credit line: u1A2IYFqOO",
"license" : {
"id" : "cc-by",
"label" : "Attribution 4.0 International (CC BY 4.0)",
"type" : "License",
"url" : "http://creativecommons.org/licenses/by/4.0/"
},
"locationType" : {
"id" : "iiif-image",
"label" : "IIIF Image API",
"type" : "LocationType"
},
"type" : "DigitalLocation",
"url" : "https://iiif.wellcomecollection.org/image/T69.jpg/info.json"
},
"type" : "Image"
},
{
"id" : "vcflqowx",
"locations" : [
{
"accessConditions" : [
],
"credit" : "Credit line: iMzNq5f99",
"license" : {
"id" : "cc-by",
"label" : "Attribution 4.0 International (CC BY 4.0)",
"type" : "License",
"url" : "http://creativecommons.org/licenses/by/4.0/"
},
"linkText" : "Link text: Z6HmsHmLL",
"locationType" : {
"id" : "iiif-image",
"label" : "IIIF Image API",
"type" : "LocationType"
},
"type" : "DigitalLocation",
"url" : "https://iiif.wellcomecollection.org/image/yw2.jpg/info.json"
}
],
"source" : {
"id" : "ov8cs7dy",
"title" : "title-U90KOXEeRV",
"type" : "Work"
},
"thumbnail" : {
"accessConditions" : [
],
"credit" : "Credit line: iMzNq5f99",
"license" : {
"id" : "cc-by",
"label" : "Attribution 4.0 International (CC BY 4.0)",
"type" : "License",
"url" : "http://creativecommons.org/licenses/by/4.0/"
},
"linkText" : "Link text: Z6HmsHmLL",
"locationType" : {
"id" : "iiif-image",
"label" : "IIIF Image API",
"type" : "LocationType"
},
"type" : "DigitalLocation",
"url" : "https://iiif.wellcomecollection.org/image/yw2.jpg/info.json"
},
"type" : "Image"
},
{
"id" : "wfdcghky",
"locations" : [
{
"accessConditions" : [
],
"credit" : "Credit line: daywHchuw",
"license" : {
"id" : "cc-by",
"label" : "Attribution 4.0 International (CC BY 4.0)",
"type" : "License",
"url" : "http://creativecommons.org/licenses/by/4.0/"
},
"locationType" : {
"id" : "iiif-image",
"label" : "IIIF Image API",
"type" : "LocationType"
},
"type" : "DigitalLocation",
"url" : "https://iiif.wellcomecollection.org/image/lHZ.jpg/info.json"
}
],
"source" : {
"id" : "hs3nzqux",
"title" : "title-lPUpm8UfHO",
"type" : "Work"
},
"thumbnail" : {
"accessConditions" : [
],
"credit" : "Credit line: daywHchuw",
"license" : {
"id" : "cc-by",
"label" : "Attribution 4.0 International (CC BY 4.0)",
"type" : "License",
"url" : "http://creativecommons.org/licenses/by/4.0/"
},
"locationType" : {
"id" : "iiif-image",
"label" : "IIIF Image API",
"type" : "LocationType"
},
"type" : "DigitalLocation",
"url" : "https://iiif.wellcomecollection.org/image/lHZ.jpg/info.json"
},
"type" : "Image"
}
]
"type": "Image",
"visuallySimilar": []
}
Loading