Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a user, I want to apply an additional query filter (q=) to members of an aggregate product (/products/{identifier}/members) #298

Closed
jordanpadams opened this issue Mar 23, 2023 · 5 comments Β· Fixed by #526

Comments

@jordanpadams
Copy link
Member

Checked for duplicates

Yes - I've already checked

πŸ§‘β€πŸ”¬ User Persona(s)

Data User

πŸ’ͺ Motivation

...so that I can search for the products of a bundle/collection, and then provide additional filters to the search within the same query.

πŸ“– Additional Details

Follow-on to #197, we do not want to support q= from a /members or /member-of endpoint, so we need to some other way to provide this query functionality.

Acceptance Criteria

Given
When I perform
Then I expect

βš™οΈ Engineering Details

Initial design idea is to update provenance script to include adding the collection_lidvid and bundle_lidvid to each product.

In order the support the example from #197

curl --get 'http://pds.nasa.gov/api/search-en-gamma/1.1/classes/collections/urn:nasa:pds:gbo.pluto-charon.mutual-events:data::1.0/members' --data-urlencode 'q=pds:External_Reference.pds:reference_text eq "YOUNG1992"' -H Accept:application/json -L | json_pp

the API query would instead be something like:

/products?q=collection_lidvid eq "urn:nasa:pds:gbo.pluto-charon.mutual-events:data::1.0" AND pds:External_Reference.pds:reference_text eq "YOUNG1992"
@jordanpadams jordanpadams added needs:triage requirement the current issue is a requirement labels Mar 23, 2023
@jordanpadams jordanpadams self-assigned this Mar 23, 2023
@jordanpadams jordanpadams changed the title As a user, I want to query child products for an aggregate product along with additional metadata criteria As a user, I want to provide additional filters to a query for members of an aggregate product Mar 23, 2023
@github-project-automation github-project-automation bot moved this to Release Backlog in B14.0 Mar 23, 2023
@alexdunnjpl
Copy link
Contributor

@jordanpadams @tloubrieu-jpl for all that I'd much prefer to write a python script and be done with it, isn't this strictly a job for harvest?

Pros: (Python script - should be separate from provenance imho but that's whatever)

  • Faster/easier
  • Will apply retroactively (this is the big one)

Cons:

  • Will require reindexing
  • Seems like a bandaid for something that seems squarely within the jurisdiction of harvest

I suppose the correct solution is to bandaid it with a python script, implement it in harvest as well, then rip off the bandaid once the updated version of harvest is deployed everywhere it needs to be.

Note to self - LIDs are strictly-defined in the PDS Standards Reference as urn:<national_agency>:<archiving_agency>:<bundle>:<?collection>:<?product>, so it's trivial to split and extract bundle/collection by chunk index.

@alexdunnjpl alexdunnjpl changed the title As a user, I want to provide additional filters to a query for members of an aggregate product As a user, I want to query-filter products by collection- and/or bundle- membership Apr 11, 2023
@alexdunnjpl
Copy link
Contributor

alexdunnjpl commented Apr 12, 2023

85ca61e implements addition of membership metadata to products whose documents lack such (to prevent having to update every product on every script run)

Metadata is currently written to the document in this format. I'm assuming the nesting isn't a problem but I can tweak it to a flat structure if need be.

All products will have that full membership metadata structure, with null indicating lack of membership (collections have no collection membership, bundles have neither membership).

Ensuring that this structure is included in the index is an outstanding question. @jordanpadams @jimmie @al-niessner @tloubrieu-jpl would it be appropriate for the script to ensure presence of these fields in the index? I wouldn't think reindexing is necessary in that case as on first run, the index would be added and then the relevant metadata would be written for all products (triggering indexing on each product).

@jordanpadams
Copy link
Member Author

@alexdunnjpl just as an FYI, even though the standard says this:

LIDs are strictly-defined in the PDS Standards Reference

That is not actually always the case. There is an alternate_ids field that was added to the registry a while back to support backwards compatibility there because there are cases where a new version of a product contained a different LID.

@jordanpadams
Copy link
Member Author

@alexdunnjpl per:

Metadata is currently written to the document in this format. I'm assuming the nesting isn't a problem but I can tweak it to a flat structure if need be.

how would a user then query for that information based upon it's nesting? for other metadata we have added to the registry, we have been flattening it for the time being, e.g. ops:Harvest_Info/ops:archive_status. We may want to stick to that paradigm for the time being?

@jordanpadams
Copy link
Member Author

@alexdunnjpl per:

for all that I'd much prefer to write a python script and be done with it, isn't this strictly a job for harvest?

similar to the provenance script, we could do this in harvest, but there are a few reasons why we want this in a separate script (e.g. within the provenance script):

  1. there is no requirement for a user to execute harvest on a bundle. it could be a collection or directory or an individual file, so we can't assume harvest has any more information than some separate standalone script.
  2. performance - having to keep all this information, query the registry, etc., slows down the execution, which is constantly a complaint from our users. we need to do as little work as we can alongside the data, and do the rest of the processing on our end.

@github-project-automation github-project-automation bot moved this to Release Backlog in EN Portfolio Backlog Nov 20, 2023
@jordanpadams jordanpadams changed the title As a user, I want to query-filter products by collection- and/or bundle- membership As a user, I want to query products by collection membership Jul 12, 2024
@jordanpadams jordanpadams changed the title As a user, I want to query products by collection membership As a user, I want to apply an additional query filter (q=) to products that belong to a collection (/products/{identifier}/members) Jul 12, 2024
@jordanpadams jordanpadams changed the title As a user, I want to apply an additional query filter (q=) to products that belong to a collection (/products/{identifier}/members) As a user, I want to apply an additional query filter to products that belong to a collection Jul 12, 2024
@jordanpadams jordanpadams changed the title As a user, I want to apply an additional query filter to products that belong to a collection As a user, I want to apply an additional query filter (q=) to members of an aggregate products (/products/{identifier}/members) Jul 12, 2024
@jordanpadams jordanpadams changed the title As a user, I want to apply an additional query filter (q=) to members of an aggregate products (/products/{identifier}/members) As a user, I want to apply an additional query filter (q=) to members of an aggregate product (/products/{identifier}/members) Jul 12, 2024
@github-project-automation github-project-automation bot moved this from ToDo to 🏁 Done in EN Portfolio Backlog Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏁 Done
Status: Release Backlog
Development

Successfully merging a pull request may close this issue.

3 participants