Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

attribution is null in search results view #1036

Closed
1 task
sarayourfriend opened this issue Dec 8, 2022 · 6 comments · Fixed by #1040
Closed
1 task

attribution is null in search results view #1036

sarayourfriend opened this issue Dec 8, 2022 · 6 comments · Fixed by #1040
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟨 priority: medium Not blocking but should be addressed soon

Comments

@sarayourfriend
Copy link
Contributor

sarayourfriend commented Dec 8, 2022

Description

Example where it is not null: https://api.openverse.engineering/v1/images/0aff3595-8168-440b-83ff-7a80b65dea42/
Example where it is null: https://api.openverse.engineering/v1/images/?q=dog

See any of the results and attribution will be null for them. If you go to the detail view for the result, it will be populated.

See this comment for an explanation of why and a potential fix: #1036 (comment)

Reproduction

  1. Go to a search results list view and see attribution is null.

Additional context

Discovered by @zackkrida and @krysal in Make Slack here: https://wordpress.slack.com/archives/C02012JB00N/p1670442907182319 (Requires a free account from https://chat.wordpress.org to access).

Resolution

  • 🙋 I would be interested in resolving this bug.
@sarayourfriend sarayourfriend added 🟨 priority: medium Not blocking but should be addressed soon 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository labels Dec 8, 2022
@dhruvkb
Copy link
Member

dhruvkb commented Dec 9, 2022

@sarayourfriend I checked and both have the attribution field populated.

""A cute Dog 4" by Chen Vision is licensed under CC BY-NC 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc/2.0/."

""File:Open book 01.svg" by Unknown author is marked with CC0 1.0. To view the terms, visit https://creativecommons.org/publicdomain/zero/1.0/deed.en."

We need a different example to reproduce.

@dhruvkb dhruvkb added the 🧹 status: ticket work required Needs more details before it can be worked on label Dec 9, 2022
@sarayourfriend
Copy link
Contributor Author

Oh interesting! It looks like attribution is only null on the search results view but not the detail view:

https://api.openverse.engineering/v1/images/?q=dog

The null attribution result I just copied the detail link without actually checking it, my bad!

It looks like the reason for this is because in the detail view we're passing the media model to the serializer, whereas for the results view, we're passing the ES results. Because attribution isn't defined on the ES result, it is null and never calls into the property on the model (because of course it isn't aware of the model at that point).

To solve this we could add attribution as a serializer method field and then call get_attribution_text with the correct params whenever we have an ES Hit, otherwise call to the object's attribution property.

Do you think that would be an appropriate solution, @dhruvkb?

@sarayourfriend sarayourfriend changed the title attribution is sometimes null attribution is null in search results view Dec 11, 2022
@sarayourfriend sarayourfriend removed the 🧹 status: ticket work required Needs more details before it can be worked on label Dec 11, 2022
@dhruvkb
Copy link
Member

dhruvkb commented Dec 12, 2022

@sarayourfriend that makes total sense, we already do something like this for the peaks field in the AudioSerializer class.

peaks = serializers.SerializerMethodField(

def get_peaks(self, obj) -> list[int]:

If the obj being serialized is a Hit, we get the actual Audio instance and return the function/property value from the model object.

To be fair, I can rationalize the attribution being missing on the search results as a way to reduce our DB queries and speed up the results. If we get the media model instance for each Hit, we'll have a lot of individual SELECT queries.

@sarayourfriend
Copy link
Contributor Author

If we get the media model instance for each Hit, we'll have a lot of individual SELECT queries.

Is this true even if we do the query for the DB data before sending the results to the serializer? If we need the API DB data for each result a WHERE identifier IN (...) would get all the results for the page in a single query. This would need to happen before we call to the serializer to avoid the n queries that happen when we add queries to single-object serializers. Might as well just use the media objects directly, in that case, rather than the Hit. It's nice to avoid this though given querying the Postgres DB includes the query execution + marshalling into the list of Python objects.

Luckily, I think all the things we need to build the attribution are present in the index anyway: title, creator, license, license_version and even license_url. Hopefully we can avoid even needing to add the single "where in" query in the view. Am I missing a piece here that would preclude using the Hit to build the attribution?

@sarayourfriend
Copy link
Contributor Author

For the audio one, by the way, would we be able to do as I suggested in the immediately previous comment? That is, lift the query for the full list of media objects into the view instead of serializing from the Hits? This would reduce the query count for the view from n to 1 (unless, as I said above, there's a reason this isn't possible that I'm missing).

@obulat
Copy link
Contributor

obulat commented Dec 12, 2022

I seem to remember some work you've done on ES improvements, @dhruvkb, included adding all of the model fields to the ES Hit results. So, even though only some of the fields are used by the ES for search indexing, all of the fields (even the non-indexed ones) will be available in the search results. This way we wouldn't have to make a new request for additional information.

This is the specific place where I've got this from:

changes the ES mapping mode from dynamic to strict false (this means unmapped fields are stored but not indexed this prevents adding documents with fields other than those defined in the mapping schema)
removes unnecessary schema mappings, changing them to purely stored fields

Could that solve the issue with attribution?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟨 priority: medium Not blocking but should be addressed soon
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants