Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search criteria not producing expected matches #287

Closed
jjacob7734 opened this issue Mar 15, 2023 · 9 comments
Closed

Search criteria not producing expected matches #287

jjacob7734 opened this issue Mar 15, 2023 · 9 comments
Assignees

Comments

@jjacob7734
Copy link

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When I applied this constraint to my search, no matches are found even though the result set without the constraint shows some matching items: orex:spatial.orex:target_range lt 400.0

🕵️ Expected behavior

I expected the items with orex:spatial.orex:target_range value less than 400 to appear in the search results.

📜 To Reproduce

  1. Run curl --get 'https://pds.nasa.gov/api/search/1/products' --data-urlencode 'limit=10' --data-urlencode 'q=((ref_lid_target eq "urn:nasa:pds:context:target:asteroid.101955_bennu") and (ref_lid_instrument eq "urn:nasa:pds:context:instrument:ovirs.orex"))' | json_pp | grep -A 1 target_range
  2. Observe a number of hits with orex:spatial.orex:target_range around 177 (which is less than 400).
  3. Run curl --get 'https://pds.nasa.gov/api/search/1/products' --data-urlencode 'limit=10' --data-urlencode 'q=((ref_lid_target eq "urn:nasa:pds:context:target:asteroid.101955_bennu") and (ref_lid_instrument eq "urn:nasa:pds:context:instrument:ovirs.orex") and (orex:spatial.orex:target_range lt 400.0))' | json_pp
  4. Observe that there are no hits, so the previously observed hits with values around 177 didn't match this time.
  5. Can also try wrapping the 400.0 in double quotes as "400", but that didn't seem to make a difference.

🖥 Environment Info

📚 Version of Software Used

  • curl --version returns: curl 7.80.0 (x86_64-apple-darwin13.4.0) libcurl/7.80.0 OpenSSL/1.1.1m zlib/1.2.11 libssh2/1.9.0 nghttp2/1.46.0. Release-Date: 2021-11-10

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #xyz

⚙️ Engineering Details

No response

@jjacob7734 jjacob7734 added bug Something isn't working needs:triage labels Mar 15, 2023
@jordanpadams jordanpadams transferred this issue from NASA-PDS/software-issues-repo Mar 15, 2023
@jordanpadams
Copy link
Member

@jjacob7734 can we verify that all the OREX data has actually been ingested into the registry? there is no guarantee it has all been loaded, or even more specifically, the date you are looking for has been loaded.

we have a snapshot of their data set from about a year ago here: https://pds.nasa.gov/data/pds4/test-data/registry/orex.ovirs/ (you can ping the SAs to request access to this server)

could probably just do an overall product count check for the labels in the collection (XML files) vs. the number of products returned for a query for all OVIRS data

@jordanpadams jordanpadams added the s.high High severity label Mar 15, 2023
@jordanpadams
Copy link
Member

jordanpadams commented Mar 15, 2023

actually, from their Kibana Dashboard, I can see they have 1,146,784 OVIRs products ingested.

@jjacob7734
Copy link
Author

jjacob7734 commented Mar 15, 2023

Yeah, it looks like there should be matches. In the instructions to reproduce, the first query does get matches that show a target range around 177, but when I add the requirement that target_range < 400 I get no matches.

@alexdunnjpl
Copy link
Contributor

Interestingly, it looks like retrieval based on equality doesn't work.

Given (among others)

"orex:spatial.orex:target_range" : [
   "177.51266033499203"

The following queries fail to hit

curl --get 'https://pds.nasa.gov/api/search/1/products'     --data-urlencode 'limit=10'     --data-urlencode 'q=((ref_lid_target eq "urn:nasa:pds:context:target:asteroid.101955_bennu") and (ref_lid_instrument eq "urn:nasa:pds:context:instrument:ovirs.orex") and (orex:spatial.orex:target_range eq "177.51266033499203"))' | json_pp

curl --get 'https://pds.nasa.gov/api/search/1/products'     --data-urlencode 'limit=10'     --data-urlencode 'q=((ref_lid_target eq "urn:nasa:pds:context:target:asteroid.101955_bennu") and (ref_lid_instrument eq "urn:nasa:pds:context:instrument:ovirs.orex") and (orex:spatial.orex:target_range like "1*"))' | json_pp

ref_lid_instrument and ref_lid_targetare in the production index, while there is no mention of target_range.

Interestingly, like * produces the expected full set of hits, but that could just be getting optimized out of the query.

@jordanpadams I realise that #281 is shown as closed, but was it confirmed that the tested "semi-random" fields weren't indexed (can't see how they couldn't be, given my understanding of OpenSearch)? And if those fields were present due to dynamic reindexing when products with new fields are added, was the fix with that dynamic addition ever deployed to prod?

@alexdunnjpl
Copy link
Contributor

alexdunnjpl commented Mar 15, 2023

@jordanpadams @jjacob7734 running latest-tagged harvest against bundle at https://pds.nasa.gov/data/pds4/test-data/registry/orex.ovirs.small/ results in

$ curl -k -u admin:admin https://localhost:9200/registry | json_pp | grep target_range -A 1
            "orex:spatial/orex:target_range" : {
               "type" : "keyword"

so it looks like this is the result of data being harvested prior to implementation of all-fields search support (or use of an equally-old release)

Fix is to reingest all such data with an updated version of harvest.

Leaving ticket open in case there is additional action needed (notifying some/all users, arranging for wholesale reingestion of large quantities of data on some/all nodes, etc)

@alexdunnjpl alexdunnjpl removed bug Something isn't working s.high High severity labels Mar 15, 2023
@alexdunnjpl
Copy link
Contributor

@jordanpadams pinging SBN to re-ingest

@gxtchen
Copy link

gxtchen commented Apr 24, 2023

curl --get 'https://pds.nasa.gov/api/search/1/products' --data-urlencode 'limit=10' --data-urlencode 'q=((ref_lid_target eq "urn:nasa:pds:context:target:asteroid.101955_bennu") and (ref_lid_instrument eq "urn:nasa:pds:context:instrument:ovirs.orex") and (orex:spatial.orex:target_range lt 400.0))' | json_pp
still not returning any hits, has the data been re-ingest yet? Should I just harvest https://pds.nasa.gov/data/pds4/test-data/registry/orex.ovirs.small/ locally for the test?

@jordanpadams
Copy link
Member

@gxtchen you need to test on gamma. not on production since we haven't deployed there yet :-)

@tloubrieu-jpl
Copy link
Member

Hi @gxtchen the latest registry-api is not deployed in production yet, you need to test this ticket on gamma, with base URL https://pds.nasa.gov/api/search-en-gamma/1/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants