-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for json+pds4 response fails in production #349
Comments
Cannot replicate using the registry test data. The error does not give enough information to isolate the bad assumption but the message does indicate that some value being returned from opensearch is expected to always be an array but getting a string. Once we locate the item then we can discuss if the db is bad or the code needs to change. Just not sure how to isolate the problematic entry. |
Thanks @al-niessner that is useful. can you narrow down which product triggers the error with the start/limit parameters (dichotomy algo). Once we narrowed the error to a single product (start=?, limit=1) we can find the product by requesting it with a header Accept:application/json. Do you think that can work ? |
I was working on the idea that I would collect all of the lidivids from /products and then run through all of them to build a table of passed/failed. We may have more than one bad entry and I wanted to get a hint on any similarities if more than one. |
There is a pattern emerging. Can you please give me the opensearch document for this lidvid: urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n6_n6137final_h_fit::1.0 I do not have access to the opensearch behind pds.nasa.gov/api/search/1 but I can then insert that document into my opensearch and test locally to see what field is the problem. Interestingly, standard JSON does just fine. It is this PDS4+ that is causing the problem. Hence some wacky field or bad assumption. |
Okay, just working with the 100 (10000 total hits) there is a clear pattern. Comparing a broken record to one that is not broken may be sufficient but loading a bad record in to the test database would be more informative. Here are the results from the first 100 which clearly show urn:nasa:pds:gbo-kpno is different from the rest in a bad way:
|
|
@al-niessner did they turn off the blob store? |
No, many of fields (labels) are supposed to be arrays even if just one element. For instance
should be
|
Copy. Thanks @al-niessner . I may just request that they re-ingest this data since their file_refs are invalid anyways. |
This comment was marked as off-topic.
This comment was marked as off-topic.
@al-niessner @tloubrieu-jpl per this ticket, SBN re-loaded their data and everything seems to be working as expected the following both work: Query for everything:
Query for one of the wonky products in question from SBN:
|
Thanks @jordanpadams , I confirm it works. Maybe we could make the api more robust by skipping the corrupted products and generating a log error but answering the user with what is not corrupted. I will create a ticket for that. |
I am reopening this ticket because I saw that happening again:
I initially made a pull request so that the API supports the older version of loaded products, where properties are not arrays #204 But I am guessing I missed to test the case where the Accept type is 'application/vnd.nasa.pds.pds4+json' or the xml equivalent. So I see 2 options:
The most difficult part in 2) is to have non regression tests on that, since that means we would need to load in the reference registry products using different versions of harvest. |
The problem is not the json/xml it is the PDS4 reading fields normally skipped that want arrays even when singled valued. It needs tested, but it probably fails for all accept types if you request the offending field. It is when the registry-api tries to read/handle the field from the database not when it writes it out. It may just be the PDS4 handler in the registry-api however and other types may handle it generically (not look at it but blindly pass it along as a value). |
@al-niessner have you been able to take a look at this? |
@jordanpadams Yes, strings in DB instead of array of strings. |
@al-niessner is there anything we can do for a workaround here to fix this? is there a script we can do to find / update all those fields? or update the way we are querying that metadata from the DB? |
@jordanpadams ad-hoc python script should be plenty-good for this. Could save the wrapper in registry-sweepers as a scratch/utility script for future use. |
Three items:
I do not get why 1 is not already happening. I would think that PDS4 would make the documents uniform at least. Then you could do 2 where PDS4 is lacking, hopefully less rather than more lacking. |
Yeah, my 1 can be accomplished with @alexdunnjpl suggestion |
@al-niessner per 1. above, PDS4 does that, but earlier versions of Harvest didn't take into account some documents may have only 1 value, while others may have multiple, and after that discovery, we decided to treat everything as a list. So we basically need to go back and find all those loaded incorrectly, and update those values to be lists. |
Then we doing an improving 1 today but need @alexdunnjpl script suggestion to go help with yesterday. Seems like a good fix. It would mean identifying common problems in the opensearch document and repairing them. As we improve harvest, the script would have to be updated to account for changes. Knowing the harvest version is far less important than simply deciding what makes a valid document. |
@al-niessner I do believe knowing that version helps a bit though in debugging. But agreed the script is more important. We have not been good about updating harvest and then updating this new script X to make sure the existing metadata is updated. can you take this on or is a discussion with @alexdunnjpl warranted at the breakout tomorrow? |
I can take it on. It is just another provenance like script that fixes a bunch of fields. harvest has not written out its version yet so it would only be half way informative since ingesting does not require a version they can be using an old version that does not insert version. Anyway, doing so would be easy enough since it already inserts some harvest info (datetime and node name). |
@al-niessner thanks! Per this:
I would actually like to eventually prevent writing with old versions of the software. |
You could theoretically do this if harvest is only used in our containers with client SSL certs. Changes the certs at the server so that only certain copies of harvest would work. |
Checked for duplicates
Yes - I've already checked
🐛 Describe the bug
When I did request:
Note that the same goes with
application/ vnd.nasa.pds.pds4+xml
I am getting an error 500
The logs found in cloudwatch says:
🕵️ Expected behavior
I expected to receive the json transaction of the pds4 labels which are stored in the json_label in the registry index.
📜 To Reproduce
Run curl command above against operational registry
⚙️ Engineering Details
This bug specifically shows up in the production registry due to data that was loaded with older versions of harvest. To test this, execute tests against production registry/API, not a test version.
The text was updated successfully, but these errors were encountered: