Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As an API user, I want to be able to use the API for free text search #460

Open
jordanpadams opened this issue Jul 11, 2021 · 9 comments
Open
Assignees
Labels
B12.0 B15.1 Epic p.must-have requirement the current issue is a requirement

Comments

@jordanpadams
Copy link
Member

jordanpadams commented Jul 11, 2021

Motivation

...so that I can perform a keyword or Google-like search on the registry and get reasonable results back.

Additional Details

This development with be later developed into a full natural language search feature. See #570

The premise of this task is to come up with the API design in order to enable keyword search. The actual result set will be refined per #570.

Acceptance Criteria

Given : some data products for a particular mission (e.g. insight) or targeting a particular planet (e.g. mars)
When I perform: an API search for something like keyword=insight or keyword=mars
Then I expect: it returns products that provide some fuzzy match for the keyword terms searched

Engineering Details

This ticket requires to update the web API specification to accept a free text search criteria instead of only {pds4 field attribute}=value criteria.

For this ticket, we, at minimum, what to allow for free text search utilizing ES default weighting. We will then want to investigate what we think should maybe be weighted a little more strongly to enable more robust search results. NASA-PDS/pds-api#49

@jordanpadams
Copy link
Member Author

@tdddblog this is next on the list. we can meet to chat about this some more if needed. the acceptance criteria is not super detailed, but hopefully it provides some basic insight into what we are looking for

@tloubrieu-jpl
Copy link
Member

the freetext search criteria is going to be available in existing data end-points (/bundles, /collections, /products ) in a keyword query parameter.

@tdddblog
Copy link
Contributor

In those few schemas I have in my registry instance there are 128 description fields. I assume we have to search in all of them. We'll have to change Harvest to automatically merge them into custom "description" field.
Few examples:

pds:Bundle/pds:description
pds:Collection/pds:description
pds:Document/pds:description
pds:Array/pds:description
geom:Geometry_Lander/geom:description
img:Brightness_Correction_File/img:description
img:Subframe/img:description

@jordanpadams
Copy link
Member Author

@tdddblog I am starting to see a lot of this metadata cleanup coming now and down the road. Rather than require this at ingestion time, do we think some sort of post-processing tool should run in the background for the registries to perform this kind of "metadata curation" and update the records? I am just thinking as our natural language search capabilities evolve, it will be difficult to get everyone to re-ingest all their data. just a thought...

also, would this have any impact on weighting the returned results?

@tdddblog
Copy link
Contributor

@jordanpadams Updating every document in Elasticsearch is very expensive operation. Elasticsearch would have to reindex every document.
We can also try using "copy_to" fields, but then Registry Manager has to be updated. And it will only work with newly created fields & documents. Old documents indexed before "copy_to" was added have to be reindexed.

@tloubrieu-jpl
Copy link
Member

For this ticket, @tdddblog will only consider the 'description' fields for free text search.

@jordanpadams
Copy link
Member Author

done per NASA-PDS/registry-api-service#60

@jordanpadams
Copy link
Member Author

Re-opening this requirement since this no longer works.

@jordanpadams jordanpadams assigned alexdunnjpl and unassigned tdddblog Sep 19, 2024
@jordanpadams
Copy link
Member Author

Note: We did get several requests from the IPDA for this feature being added back in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B12.0 B15.1 Epic p.must-have requirement the current issue is a requirement
Projects
None yet
Development

No branches or pull requests

4 participants