Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning to search response when source parameter has mixed validity #4031

Merged
merged 3 commits into from
Apr 8, 2024

Conversation

sarayourfriend
Copy link
Collaborator

Fixes

Fixes #4030 by @sarayourfriend
Closes #3895 by @obulat

Description

This PR:

  • Removes the logging we added in Log when source query parameter contains invalid values #3945 for investingating the potential solutions in the linked discussion
  • Adds a "warnings" on the search response when the request has mixed validity in the source parameter
  • Raises a validation error when none of the sources are valid. See the linked discussion for @obulat's reasoning for why this is a necessary and acceptable breakage. I've left a comment in the code explaining it as well.

I am happy with the PR in its current state, but I went back and forth about whether the new warnings key should default to an empty list or to not being present at all when there are no warnings. I went with the empty list because it's the simplest and most consistent. But I could easily argue that having it not be present at all is also totally reasonable. I could also see wanting to prefix the key to _warnings or nest it in some meta object on the response. I'm open to any suggestions on this.

I originally started with a more complex idea for this PR, to have a middleware that would add warnings to the responses based on a list of warnings set onto the request context. That would make it so any endpoint could (theoretically) easily set a warning on any request, so long as it had some way of accessing the request. This was inspired by Django's messages utility used to flash warnings in rendered HTML pages. DRF does not have an equivalent, and existing libraries for it do something totally different then what I wanted. All of that is way more complex than we need for this specific issue, though, so I chose to go for a more direct approach. If we find other use cases for the warnings, we can evaluate whether a more generic solution is appropriate.

Testing Instructions

Evaluate the changes and confirm the tests sufficiently cover the new cases. I've gone with additional integration tests rather than testing at the serializer level because there are three places that need to work together (i.e., to be "integrated") for this to work, so unit testing just the serializer would be insufficient, and would duplicate any meaningful testing at the integration level.

Run the application locally using just api/up and visit the search endpoint. Evaluate the following scenarios:

  • No parameters: empty warnings list
  • Only valid parameters: empty warnings list
  • Mixed validity of the source parameter: the new warning
  • Only bad parameters: 400 response

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • [N/A] I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@sarayourfriend sarayourfriend requested a review from a team as a code owner April 4, 2024 01:00
@sarayourfriend sarayourfriend requested review from krysal and stacimc April 4, 2024 01:00
@openverse-bot openverse-bot added 🟧 priority: high Stalls work on the project or its dependents ✨ goal: improvement Improvement to an existing user-facing feature 🕹 aspect: interface Concerns end-users' experience with the software 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work labels Apr 4, 2024
@github-actions github-actions bot added the 🧱 stack: api Related to the Django API label Apr 4, 2024
Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I have strong feelings about this one comment (the 2nd one below) and would definitely like to see that fixed but it felt wrong to block a PR that works over a change in the tests.

"code": "partially invalid source parameter",
"message": "The source parameter was partially invalid.",
"invalid_sources": invalid_sources,
"referenced_sources": valid_sources,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "valid" feels clearer to me than "referenced".

Suggested change
"referenced_sources": valid_sources,
"valid_sources": valid_sources,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found juggling the list of valid and available sources confusing. Usually when I see an error or warning about invalid values, the valid values listed are all the possible valid values, if that makes sense? I'm torn, so I'll wait and see what the other reviewer says, and change it if they want it changed as well, if that's okay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"valid", "invalid" and "available"/"all" potentially. But yeah, let's allow one more review to see what they think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

referenced did not feel clear to me either, but I see what Sara's saying as well.

Catching up on the linked discussions, Dhruv mentioned it seems like that is the most common way of handling cases where the input is partially acceptable and can generate a valid response. Are you referencing something in particular you can link to? I'm not familiar with this type of response.

Along those lines, is the shape of this response following some established convention, or could it be changed? For example, do we need to explicitly list which of the provided sources were valid at all, or can we just have invalid_sources and available_sources? Or could this information all be spelled out in the "message" instead of in separate named fields?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or could this information all be spelled out in the "message" instead of in separate named fields?

I actually originally had the warnings just be a list of strings, but I found it hard to do a meaningful test without just duplicating the string almost word-for-word in the test case 😰 On top of that, because of using sets instead of lists, the order of sources in the strings was non-deterministic, making it even hard to test against a simple string.

We should use whatever format here we want. I've included in the documentation for the response field that it is meant to be human readable rather than read by a machine, and that the contents of each dict are not stable.

Maybe discarded_sources, kept_sources, and available_sources? 🤷 whatever folks want here, happy to change it, I am not attached to any specific language, even if I found something or other personally confusing. I think it will get the idea across that something isn't right about the parameter on the request and that the developer needs to take a closer look at it.

Which also makes me wonder whether the warnings should go first in the JSON, rather than at the end? On a page of 20 results, I don't know whether it's easier to miss at the front or end of the document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referencing something in particular you can link to?

I didn't keep a record of my search when looking for a good pattern but I went through my browser history and found these references.

To be clear, this is not an established convention. It's the simplest, backwards compatible way I could think of to stick with a 200 OK status code but also convey problems in their input to the user.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! And thanks for the links -- I wanted to make sure I wasn't suggesting deviating from some widely accepted pattern :)

I actually originally had the warnings just be a list of strings, but I found it hard to do a meaningful test without just duplicating the string almost word-for-word in the test case 😰 On top of that, because of using sets instead of lists, the order of sources in the strings was non-deterministic, making it even hard to test against a simple string.

Dang, that makes sense. One final suggestion -- what if we moved just the link to the available sources into the message? So the warning could be something like:

        {
            "code": "partially invalid source parameter",
            "message": "The source parameter was partially invalid. For a list of available sources, see http://localhost:50280/v1/images/stats",
            "invalid_sources": [
                "foo"
            ],
            "valid_sources": [
                "flickr"
            ]
        }

I think that would fix the problem with testing but make it a little clearer.

Which also makes me wonder whether the warnings should go first in the JSON, rather than at the end? On a page of 20 results, I don't know whether it's easier to miss at the front or end of the document.

+1 for putting it first in the JSON, now you mention it.

api/test/integration/test_media_integration.py Outdated Show resolved Hide resolved
@obulat obulat removed the 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work label Apr 5, 2024
Copy link
Collaborator

@stacimc stacimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all tests well for me, approved! I did +1 to your suggestion to move the warnings up in the JSON and had one more suggestion for the names, but not a blocker -- up to you :)

@sarayourfriend
Copy link
Collaborator Author

I've moved the warning to the top, made it only appear when there is actually a warning (otherwise it's kind of ominous looking on requests that don't have issues, plus it's a few bytes over the wire that we can save on those responses). I also updated the warning dict to match Staci's suggestion.

@sarayourfriend sarayourfriend force-pushed the remove/invalid-source-logger-warning branch from d6f14eb to 861e946 Compare April 8, 2024 01:45
@sarayourfriend sarayourfriend force-pushed the remove/invalid-source-logger-warning branch from 861e946 to 4eea42f Compare April 8, 2024 01:58
@sarayourfriend sarayourfriend merged commit e0e0e27 into main Apr 8, 2024
41 checks passed
@sarayourfriend sarayourfriend deleted the remove/invalid-source-logger-warning branch April 8, 2024 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🕹 aspect: interface Concerns end-users' experience with the software ✨ goal: improvement Improvement to an existing user-facing feature 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Improve invalid source parameter handling
5 participants