-
Notifications
You must be signed in to change notification settings - Fork 50
Conversation
Thanks for the PR @PrajwalBorkar! Welcome to Openverse as well 🎉 If my understanding of this is correct I think we can also get rid of the for loop, we don't need to build an array of queries anymore, just the single "terms" query. @dhruvkb can hopefully confirm this though. I'm not very familiar with this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, @sarayourfriend is correct. Changing from term
to terms
removes the need to build an array and we can directly specify multiple queries.
I'm not familiar with the code and functions so it is difficult to understand. I made the changes as per my knowledge please review. |
No worries. I've broken down the code in question and I'll try to walk through the discrete parts for you and explain why they need to change. Hopefully this helps. If you have any questions about it or still don't understand, please do not hesitate to ask. def _apply_filter(
s: Search,
search_params: MediaSearchRequestSerializer,
serializer_field: str,
es_field: Optional[str] = None,
behaviour: Literal["filter", "exclude"] = "filter",
):
if serializer_field in search_params.data:
filters = []
for arg in search_params.data[serializer_field].split(","):
_param = es_field or serializer_field
args = {"name_or_query": "term", _param: arg}
filters.append(Q(**args))
method = getattr(s, behaviour)
return method("bool", should=filters)
else:
return s Let's start with the parameters.
method = getattr(s, behaviour)
return method("bool", should=filters)
Okay, now that we have an explanation of the parameters, let's look at how we use them. The first thing we check is whether {
"provider": "flickr,nasa"
} Notice that if the requester does not pass a particular parameter, it will not be present in the serializer's if serializer_field not in search_params.data:
return s
# otherwise apply the filter This is a matter of taste/preference though. I share this just to give a different perspective on the code in case it helps you to understand it better. Once we have established that we do have data for the serializer field in question, we then build a list of filters to eventually pass to Elasticsearch. In the example I provided above for the For the current behavior, we first define an empty The This is where the issue's requested changes come in. In our current approach, we create a single query = [
{ "type": "term", "provider": "flickr" },
{ "type": "term", "provider": "nasa" },
] we can do the following: query = { "type": "terms", "provider": "flickr,nasa" } Much cleaner! When we consider that we can have dozens of values to iterate over, it becomes much clearer to use the So, in the end, I think the code that needs to change goes something like this: diff --git a/api/catalog/api/controllers/search_controller.py b/api/catalog/api/controllers/search_controller.py
index 0a9fb980..0a3fa8a5 100644
--- a/api/catalog/api/controllers/search_controller.py
+++ b/api/catalog/api/controllers/search_controller.py
@@ -168,17 +168,15 @@ def _apply_filter(
:return: the input ``Search`` object with the filters applied
"""
- if serializer_field in search_params.data:
- filters = []
- for arg in search_params.data[serializer_field].split(","):
- _param = es_field or serializer_field
- args = {"name_or_query": "term", _param: arg}
- filters.append(Q(**args))
- method = getattr(s, behaviour)
- return method("bool", should=filters)
- else:
+ arguments = search_params.data.get(serializer_field)
+ if arguments is None:
return s
+ parameter = es_field or serializer_field
+ query = Q("terms", **{parameter: arguments})
+ method = getattr(s, behaviour)
+ return method("bool", should=query)
+
def _exclude_filtered(s: Search):
""" Note: I'm not sure if this is the exact final code. I haven't tested this and I'm not personally familiar enough with Elasticsearch to confirm at a glance if this is correct. However, it's a good starting point. If you apply these changes locally and run the application, see if you can read the logs and debug any potential issues that come up. I would start this process by sending some queries to your local API that use both single and multiple values. |
Thanks a ton for a great explanation. I made the changes accordingly please review and test for any changes :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test this PR, you can use the license
query parameter like so:
http://localhost:8000/v1/images/?q=cat&license=by-sa,cc0
It should return all the items that have CC BY_SA licenses, as well as the CC0. In this PR, it returns
"detail": "RequestError(400, 'x_content_parse_exception', '[terms] query does not support [license.keyword]')"
I'm not sure what needs to be updated, but I can help you figure it out if you want.
Co-authored-by: Olga Bulat <[email protected]>
Thanks for the suggestions. Also I will need help to test this. |
@PrajwalBorkar have you been able to get the API running locally on your computer? We have instructions for how to do it here: https://wordpress.github.io/openverse-api/guides/quickstart.html Then you should be able to visit |
What a nice detailed instructions @sarayourfriend! Is anyone interested in resuming this PR? We haven't heard from the author for a long time. |
We are going to close this, @ramadanomar will attempt the approach in a new PR 🙂 |
Fixes
Fixes #698
Description
Updated
term
toterms
Testing Instructions
Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin