-
Notifications
You must be signed in to change notification settings - Fork 50
Add option to sort search results by created_on
#916
Conversation
API Developer Docs Preview: Ready https://wordpress.github.io/openverse-api/_preview/916 Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again. You can check the GitHub pages deployment action list to see the current status of the deployments. |
When @panchovm returns we can revisit turning this into a formal project with some frontend UI and additional testing. |
Marking as blocked until @panchovm is back and is able to work on the UI for this. |
@panchovm Just wanted to confirm whether you've seen this PR. I've marked it as "design" to hopefully surface it better for you. I suspect this would need a frontend issue for the design of the feature? @zackkrida do you intend for this PR to get merged eventually or would this be picked up as part of a larger project down the road? |
I tried to replace the
Yes. And I envision a few design iterations before starting its implementation. I am thinking of different solutions that might force the current layout to tweak other areas. Since this is marked with low priority, I was planning to work on the homepage redesign before arriving at this task. But now that it is blocked, should we revisit the priority? |
It's still low priority @panchovm, no need to rush on this. I think the API feature can be completed before the design, but I see it as part of a larger project. |
@zackkrida Should this be closed in favour of a more comprehensive planning? This change would be blocked until we have analytics, is my understanding at least, and it comes around for the MSR to review weekly to check this PR until then. |
@sarayourfriend, there was a recent sync session where folks discussed this, and there was actually consensus on having someone open and finish this PR on the sooner side. The idea being that it would be useful for two things:
The only actual "work" this PR needs is to wrap this in a conditional which checks for a query string: openverse-api/api/catalog/api/controllers/search_controller.py Lines 387 to 389 in f5b2080
Sure we'll probably want test(s) as well, and to decide whether to leave this feature documented or undocumented for now. Folks felt we could defer adding new frontend UI to allow users to sort by new. What do you think about this? |
Maybe we can follow the Gutenberg convention of adding |
…o sort-by-new-proof-of-concept
I verified that in both the catalog and the API, the SELECT meta_data FROM image WHERE identifier = '9dd74c70-6069-4658-a843-01c1b072c6fc'; {"views": "21", "pub_date": "1284481786", "date_taken": "2010-09-11 08:25:09", "license_url": "https://creativecommons.org/licenses/by-nd/2.0/", "raw_license_url": null} This makes me hesitant to proceed with this PR because |
Maybe the framing of "new in Openverse" would be most appropriate for this. |
created_on
Update: This PR is undrafted and open for review. The additional features have have been implemented:
This PR also adds documentation and fixes an ordering bug that occurs on |
Just noting here: I always interpreted this to mean "new to Openverse". If a site adds hundreds of cool new photos of a collection of Helenistic pottery from 30 BCE, I would want that to appear in this filter. "Recently added" is probably the most common name for this type of sorting. That said, as long as we mark this "unstable" as you've done, @dhruvkb, I'm totally cool with merging this and playing with the results. We should formally make a decision about what this filter means and what value it offers users before we use it anywhere in production. |
If 'newly indexed in Openverse' is what this sort was supposed to mean, then this works 100% according to your expectation. Personally I thought new meant 'newly created' or 'newly uploaded at the source' so I was surprised that the sorted results were totally different. Again, 'recently added' is not the same as 'recently indexed'. If a site added photos of "Helenistic pottery from 30 BCE" in early 1990s and we just added it as a new provider, we'll still get those photos in this category (which is neither "recently created", nor "recently added", just "recently indexed"). Considering they're not already present, |
Definitely, though it seems like it might not be something that's present in all providers and so I wonder if a full-on column makes sense in those cases. Perhaps a way to add it in the |
@dhruvkb Do you mind updating the description and testing instructions? |
@stacimc done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested locally while anonymous and while authenticated and all looks well. I used the query q='waterfall'
to reduce the results so I could easily see the full sorting order. I also verified that the created_on
field is listed in the documentation under the response schema
, but the unstable sort options are not documented. Sorting behavior worked just as described 🎉
For what it's worth, I also was under the impression that the filter was for "new to Openverse", for the pages mentioned in this comment (I also think it's still an interesting filter to have for search, although probably worth having analytics in place so we can verify that). But the concerns raised in this thread are all excellent points and make it very clear that getting the messaging right will be important to avoid confusion when frontend work is done.
As far as this PR, my only concern is naming the field 'created_on' while using 'indexed_on' for the sort param. I agree that indexed_on
is the clearer of the two for describing what we actually mean, and would prefer to use it in both places unless it's critical that the field name match the field in the catalog. Would that be possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 🚀
Since I'm technically the author, I can't officially approve this, but it looks excellent, @dhruvkb. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving as proxy for @zackkrida.
Updated description by @dhruvkb
Description
This PR adds two query parameters
unstable__sort_by
andunstable__sort_dir
to sort media items on the basis of thecreated_on
field.unstable__sort_by
can takerelevance
(default) orindexed_on
(sort bycreated_on
field)unstable__sort_dir
can takedesc
(newest first, default) orasc
(oldest first).Testing instructions
just up
.https://localhost:50280/v1/images/
and note that the results have no specific sort order.https://localhost:50280/v1/images/?unstable__sort_by=indexed_on
and note that the results still have no specific sort order.https://localhost:50280/v1/images/?unstable__sort_by=indexed_on
and note that the results are now sorted bycreated_on
.unstable__sort_dir=asc
as well.Also see update below.
Original description by @zackkrida
This draft PR should not be merged. It's a simple proof of concept to showcase:
created_on
to the API responsescreated_on
It's meant to show how ElasticSearch's built-in sorting functionality works. I would love to see someone expand on this (my python abilities are really limited 😅) in the following ways:
sort_by=new
query param.sort_dir=asc/desc
option to configure the direction of sortOnce those items are completed we could test this against production data! We should also research the integrity of the
created_on
field, and make sure it is a true representation of when a creative work is added to the Openverse catalog and that we never accidentally rewrite that value in production.