Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the Additional search views IP #3868

Merged
merged 6 commits into from
Mar 13, 2024
Merged

Update the Additional search views IP #3868

merged 6 commits into from
Mar 13, 2024

Conversation

obulat
Copy link
Contributor

@obulat obulat commented Mar 4, 2024

Fixes

Updates the Implementation Plan in line with the discussion conclusions

Description

This PR updates the IP to move from path parameters to the query parameters. Also, SEO-related changes were added, too.

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@obulat obulat added 🟧 priority: high Stalls work on the project or its dependents 🌟 goal: addition Addition of new feature 💻 aspect: code Concerns the software code in the repository 🧱 stack: api Related to the Django API 🧱 stack: frontend Related to the Nuxt frontend 🧭 project: implementation plan An implementation plan for a project labels Mar 4, 2024
@obulat obulat self-assigned this Mar 4, 2024
@obulat obulat requested a review from a team as a code owner March 4, 2024 08:09
@obulat obulat requested review from krysal and sarayourfriend March 4, 2024 08:09
Copy link

github-actions bot commented Mar 4, 2024

Comment on lines +229 to +282
The generic Openverse thumbnail will be used. We could also generate a thumbnail
for the collection pages in the future, but this is not in scope for this
project.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to keeping the scope constrained.

Copy link
Member

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some feedback for places where some small changes for clarity would be useful. Thank you for doing this! It can be a chore to document changes as they're being made but this has clarified my own understanding of the work needed here.

@obulat obulat marked this pull request as draft March 6, 2024 17:12
Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaving a few initial comments even though it's in draft. I'll be digesting the API Search controller changes. Things have changed a lot since the last time I looked at that part of the code.

@obulat obulat marked this pull request as ready for review March 7, 2024 16:55
@obulat obulat requested review from zackkrida and krysal March 7, 2024 16:59
Copy link
Member

@zackkrida zackkrida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one blocking comment, which is to request clarification and deeper explanation of the implementation details for the validation of the query parameters. The current approach, as described, I am fairly confident will create similar problems as we had with path parameters. I could be misunderstanding the intention, so I am not explicitly requesting a specific change, just either clarification of the proposed approach or a change to address the potential problems I've mentioned.

Comment on lines 160 to 166
The existing `source` and `creator` parameters will be reused, but will be
parsed differently when `collection` parameter is present: they will only allow
a single value instead of being split by `,` as it is for the default search. If
the `source` contains invalid values, such as `source=flickr,europeana`, when
the `collection` parameter is present, a 400 error with `detail` of "Invalid
source parameter for source collection: `flickr,europeana`" is returned. If
possible, a list of valid sources should be returned in the error message.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify the actual implementation details of this? Will it use new validate_ methods on the serializer, and then for each one check if collection is present, and if so, what? Will it throw an error any time a comma is present?

I don't think we can realiably validate this for anything other than source. For all collection parameters, they are passed as-is to the ES query: https://github.com//WordPress/openverse/blob/f56f7310e81ac0266807192803e4a451a8983766/api/api/controllers/search_controller.py#L399-L410

They already do not support multiple values.

Which, again, we need to keep in mind that we cannot reliably split on any character for creator or tag names, neither of which are sanitised to remove commas. What would the validator do if a Flickr creator's name was Photographer, Esq."? What if the tag is "Taipei, Taiwan"? Will these throw errors? How would a user resolve them? By URL encoding the commas? I'm not sure that will work, they may need to double encode them. Normally, you can pass multiple values for a query parameter by using param=1,param=2`:

> p = new URLSearchParams()
URLSearchParams {}
> p.append("param", "1")
undefined
> p.append("param", "2")
undefined
> p.toString()
'param=1&param=2'

Some frameworks require adding [] to the front of explicit lists.

We, instead, have chosen to use a comma, which creates significant complication for things like creator. Luckily, we already do not treat creator on the search endpoint as a "listable" input. It is passed as-is in the normal search strategy:

https://github.com//WordPress/openverse/blob/f56f7310e81ac0266807192803e4a451a8983766/api/api/controllers/search_controller.py#L335-L349

I'm highly sceptical we can reliably validate these inputs, and should instead pass them directly to ES regardless of what is in them. We could validate source to check that it exists, so that flickr,wikimedia_commons would 404 if passed as source with collection because there is no flickr,wikimedia_commons source. But trying to validate and prevent multiple inputs on tag and creator looks to me like it will create similar issues that we had with the path parameters.

Copy link
Member

@zackkrida zackkrida Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked a similar question and Olga shared that the intention was to only validate the source, yes.

#3868 (comment)

Good call for more specifics here though; I overlooked that this still wasn't absolutely clear in my re-review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add more clarification in the IP itself, but will also add a short explanation here.

For collection view, only two validation steps will be used:

  1. if collection parameter is present, we check that the relevant parameters are also present:
  • if collection=source, there is the source value in the query
  • if collection=creator, there are creator and source values in the query
  • if collection=tag, there is the tag value in the query
  1. if collection is source or creator, we check that the value of the source parameter, as is, is present in the list of available creators for the media type. So, flickr or stocksnap will be valid, but flick, Flickr, flickr,stocksnap will throw an error. We will not split this values by comma or lower-case them. And since we control the creator names, it should be okay.

As for the tag and creator parameters, they will be passed to the search controller "as is".

By the way, while trying this out, I realized that our current source validation is not very clear. We take a string of the source value, split it by comma, leave all of the values that are names of sources and ignore invalid values. So, https://api.openverse.engineering/v1/images/?source=flick ignores the source parameter and returns all of the images. This might be very confusing if the user has made a spelling mistake.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds, good, Olga. Thanks for clarifying. It would resolve the issue for me if you update this paragraph to clarify that only the source parameter would ever be validated in any capacity. As it reads now, the creator value would also be validated, but I think that's an artefact of trying to clarify the difference between the normal search route's handling of these two, and the creator parameter getting sidelined a bit.

By the way, while trying this out, I realized that our current source validation is not very clear. We take a string of the source value, split it by comma, leave all of the values that are names of sources and ignore invalid values. So, https://api.openverse.engineering/v1/images/?source=flick ignores the source parameter and returns all of the images. This might be very confusing if the user has made a spelling mistake.

That sounds like a good new issue, maybe even a "help wanted" one!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarayourfriend, I opened a discussion instead of an issue for this because I'm not sure what the best solution is. #3895

@obulat obulat marked this pull request as draft March 11, 2024 14:58
@obulat obulat marked this pull request as ready for review March 11, 2024 15:36
@obulat
Copy link
Contributor Author

obulat commented Mar 11, 2024

I updated the serializer and validator descriptions, so this PR is ready for re-review, @sarayourfriend.
There's also a draft PR implementing the API changes described here: #3887

@sarayourfriend sarayourfriend dismissed their stale review March 12, 2024 00:47

Changes addressed, not an official reviewer so deferring to @krysal

Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that there is a certain consensus. Now that I have read the API part more carefully, I have a couple of doubts that I would like to clarify before approving.

obulat and others added 6 commits March 13, 2024 08:03
…719-implementation_plan_additional_search_views.md

Co-authored-by: zack <[email protected]>
…719-implementation_plan_additional_search_views.md

Co-authored-by: Krystle Salazar <[email protected]>
Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The corrections make it a lot clearer. Thanks for re-writing this @obulat! LGTM.

@obulat obulat merged commit 4ab217b into main Mar 13, 2024
41 checks passed
@obulat obulat deleted the update/ip_asv branch March 13, 2024 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature 🟧 priority: high Stalls work on the project or its dependents 🧭 project: implementation plan An implementation plan for a project 🧱 stack: api Related to the Django API 🧱 stack: frontend Related to the Nuxt frontend
Projects
Status: Accepted
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants