Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for json+pds4 response fails in production #349

Closed
tloubrieu-jpl opened this issue Jun 13, 2023 · 27 comments · Fixed by NASA-PDS/registry-sweepers#54
Closed

Request for json+pds4 response fails in production #349

tloubrieu-jpl opened this issue Jun 13, 2023 · 27 comments · Fixed by NASA-PDS/registry-sweepers#54
Assignees
Labels
B14.0 bug Something isn't working s.high High severity

Comments

@tloubrieu-jpl
Copy link
Member

tloubrieu-jpl commented Jun 13, 2023

Checked for duplicates

Yes - I've already checked

🐛 Describe the bug

When I did request:

% curl -L -H 'Accept:application/vnd.nasa.pds.pds4+json' http://pds.nasa.gov/api/search/1/products

Note that the same goes with application/ vnd.nasa.pds.pds4+xml

I am getting an error 500

The logs found in cloudwatch says:

  | 2023-06-13T11:46:03.638-07:00Copyjava.lang.ClassCastException: class java.lang.String cannot be cast to class java.util.ArrayList (java.lang.String and java.util.ArrayList are in module java.base of loader 'bootstrap') | java.lang.ClassCastException: class java.lang.String cannot be cast to class java.util.ArrayList (java.lang.String and java.util.ArrayList are in module java.base of loader 'bootstrap')
-- | -- | --

🕵️ Expected behavior

I expected to receive the json transaction of the pds4 labels which are stored in the json_label in the registry index.

📜 To Reproduce

Run curl command above against operational registry

⚙️ Engineering Details

This bug specifically shows up in the production registry due to data that was loaded with older versions of harvest. To test this, execute tests against production registry/API, not a test version.

@al-niessner
Copy link
Contributor

@jordanpadams @tloubrieu-jpl

Cannot replicate using the registry test data. The error does not give enough information to isolate the bad assumption but the message does indicate that some value being returned from opensearch is expected to always be an array but getting a string. Once we locate the item then we can discuss if the db is bad or the code needs to change. Just not sure how to isolate the problematic entry.

@tloubrieu-jpl
Copy link
Member Author

Thanks @al-niessner that is useful. can you narrow down which product triggers the error with the start/limit parameters (dichotomy algo). Once we narrowed the error to a single product (start=?, limit=1) we can find the product by requesting it with a header Accept:application/json. Do you think that can work ?

@al-niessner
Copy link
Contributor

@tloubrieu-jpl

I was working on the idea that I would collect all of the lidivids from /products and then run through all of them to build a table of passed/failed. We may have more than one bad entry and I wanted to get a hint on any similarities if more than one.

@al-niessner
Copy link
Contributor

@tloubrieu-jpl

There is a pattern emerging. Can you please give me the opensearch document for this lidvid: urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n6_n6137final_h_fit::1.0

I do not have access to the opensearch behind pds.nasa.gov/api/search/1 but I can then insert that document into my opensearch and test locally to see what field is the problem.

Interestingly, standard JSON does just fine. It is this PDS4+ that is causing the problem. Hence some wacky field or bad assumption.

@al-niessner
Copy link
Contributor

Okay, just working with the 100 (10000 total hits) there is a clear pattern. Comparing a broken record to one that is not broken may be sufficient but loading a bad record in to the test database would be more informative. Here are the results from the first 100 which clearly show urn:nasa:pds:gbo-kpno is different from the rest in a bad way:

index lidvid success

0 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n6_n6137final_h_fit::1.0 False
1 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n6_n6137final_j_fit::1.0 False
2 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n6_n6137final_k_fit::1.0 False
3 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n6_n6140final_h_fit::1.0 False
4 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5130final_k_fit::1.0 False
5 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5132final_h_fit::1.0 False
6 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5135final_h_fit::1.0 False
7 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5135final_k_fit::1.0 False
8 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5136final_j_fit::1.0 False
9 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5136final_k_fit::1.0 False
10 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5137final_h_fit::1.0 False
11 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5137final_j_fit::1.0 False
12 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5138final_j_fit::1.0 False
13 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5141final_h_fit::1.0 False
14 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5141final_j_fit::1.0 False
15 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5142final_j_fit::1.0 False
16 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5143final_h_fit::1.0 False
17 urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n5_n5144final_h_fit::1.0 False
18 urn:nasa:pds:epoxi_mri:hartley2_photometry:aper_phot::1.0 True
19 urn:nasa:pds:epoxi_mri:hartley2_photometry:profile::1.0 True
20 urn:nasa:pds:epoxi_mri:hartley2_photometry:hartley2_mri_anomaly::1.0 True
21 urn:nasa:pds:compil-comet:halebopp::1.0 True
22 urn:nasa:pds:compil-comet:halebopp:figures::1.0 True
23 urn:nasa:pds:compil-comet:halebopp:schleicherdustphasehm_table::1.0 True
24 urn:nasa:pds:compil-comet:halebopp:whypointsremovedfromdata_csv::1.0 True
25 urn:nasa:pds:compil-comet:lightcurves:19p_borrelly::1.0 True
26 urn:nasa:pds:compil-comet:lightcurves:29p_schwassmann_wachmann_1::1.0 True
27 urn:nasa:pds:compil-comet:lightcurves:48p_johnson::1.0 True
28 urn:nasa:pds:compil-comet:lightcurves:49p_arend_rigaux_infrared::1.0 True
29 urn:nasa:pds:compil-comet:lightcurves:9p_tempel_1::1.0 True
30 urn:nasa:pds:compil-comet:lightcurves:p_levy_1991::1.0 True
31 urn:nasa:pds:compil-comet:nuc_properties:extinct::1.0 True
32 urn:nasa:pds:compil-comet:phys_char:magnitude_parameters::1.0 True
33 urn:nasa:pds:compil-comet:phys_char:description::1.0 True
34 urn:nasa:pds:compil-comet:unid-emis:23pbrorsen_metcalf_1_csv::1.0 True
35 urn:nasa:pds:compil-comet:unid-emis:c1980_y1_bradfield_csv::1.0 True
36 urn:nasa:pds:compil-comet:unid-emis:c1995_o1_hale_bopp_csv::1.0 True
37 urn:nasa:pds:compil-comet:unid-emis:c1996_b2_hyakutake_csv::1.0 True
38 urn:nasa:pds:gbo-kpno:mosaic-9p:raw_2005_july03_cal_kp050703_014_fit::1.0 False
39 urn:nasa:pds:gbo-kpno:mosaic-9p:raw_2005_july03_cal_kp050703_016_fit::1.0 False
40 urn:nasa:pds:gbo-kpno:mosaic-9p:raw_2005_july03_cal_kp050703_017_fit::1.0 False
41 urn:nasa:pds:gbo-kpno:mosaic-9p:raw_2005_july03_cal_kp050703_018_fit::1.0 False
42 urn:nasa:pds:gbo-kpno:mosaic-9p:raw_2005_july03_cal_kp050703_047_fit::1.0 False
43 urn:nasa:pds:gbo-kpno:mosaic-9p:raw_2005_july03_cal_kp050703_021_fit::1.0 False
44 urn:nasa:pds:gbo-kpno:mosaic-9p:raw_2005_july03_cal_kp050703_050_fit::1.0 False
45 urn:nasa:pds:gbo-kpno:mosaic-9p:raw_2005_july03_cal_kp050703_024_fit::1.0 False
46 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n2_n2170j_fit::1.0 False
47 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n2_n2170k_fit::1.0 False
48 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n2_n2171h_fit::1.0 False
49 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n2_n2171j_fit::1.0 False
50 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n2_n2172h_fit::1.0 False
51 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n2_n2173h_fit::1.0 False
52 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3005h_fit::1.0 False
53 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3026h_fit::1.0 False
54 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3026k_fit::1.0 False
55 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3006h_fit::1.0 False
56 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3027h_fit::1.0 False
57 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3006j_fit::1.0 False
58 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3027k_fit::1.0 False
59 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3009h_fit::1.0 False
60 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3009j_fit::1.0 False
61 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3009k_fit::1.0 False
62 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3030k_fit::1.0 False
63 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3010k_fit::1.0 False
64 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3011h_fit::1.0 False
65 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3033k_fit::1.0 False
66 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3023j_fit::1.0 False
67 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3034h_fit::1.0 False
68 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3034j_fit::1.0 False
69 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3024h_fit::1.0 False
70 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3024j_fit::1.0 False
71 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3035j_fit::1.0 False
72 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3036k_fit::1.0 False
73 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3037j_fit::1.0 False
74 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3037k_fit::1.0 False
75 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n3_n3038h_fit::1.0 False
76 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4065k_fit::1.0 False
77 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4066h_fit::1.0 False
78 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4066j_fit::1.0 False
79 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4066k_fit::1.0 False
80 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4067h_fit::1.0 False
81 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4067j_fit::1.0 False
82 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4067k_fit::1.0 False
83 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4070j_fit::1.0 False
84 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4071j_fit::1.0 False
85 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4072k_fit::1.0 False
86 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4073h_fit::1.0 False
87 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4074j_fit::1.0 False
88 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4075h_fit::1.0 False
89 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4075j_fit::1.0 False
90 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4076j_fit::1.0 False
91 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4076k_fit::1.0 False
92 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4077k_fit::1.0 False
93 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4078j_fit::1.0 False
94 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4078k_fit::1.0 False
95 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4080h_fit::1.0 False
96 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4080j_fit::1.0 False
97 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4080k_fit::1.0 False
98 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4092j_fit::1.0 False
99 urn:nasa:pds:gbo-kpno:nirimage-9p:raw_n4_n4092k_fit::1.0 False

@jordanpadams
Copy link
Member

@al-niessner

{
   "_shards" : {
      "failed" : 0,
      "skipped" : 0,
      "successful" : 3,
      "total" : 3
   },
   "hits" : {
      "hits" : [
         {
            "_id" : "urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n6_n6137final_h_fit::1.0",
            "_index" : "registry",
            "_score" : 1,
            "_source" : {
               "_package_id" : "97051a77-b0f9-43a8-83b8-4b818929c26e",
               "description" : [
                  "J, H, and K band observations of Comet 9P/Tempel 1. Airmass: 1.84",
                  "Migration from PDS3 (B. Hirsch)"
               ],
               "disp:Display_Direction/disp:horizontal_display_axis" : "Sample",
               "disp:Display_Direction/disp:horizontal_display_direction" : "Left to Right",
               "disp:Display_Direction/disp:vertical_display_axis" : "Line",
               "disp:Display_Direction/disp:vertical_display_direction" : "Bottom to Top",
               "geom:Display_Direction/geom:horizontal_display_axis" : "Sample",
               "geom:Display_Direction/geom:horizontal_display_direction" : "Left to Right",
               "geom:Display_Direction/geom:vertical_display_axis" : "Line",
               "geom:Display_Direction/geom:vertical_display_direction" : "Bottom to Top",
               "geom:Object_Orientation_RA_Dec/geom:celestial_north_clock_angle" : "0",
               "geom:Object_Orientation_RA_Dec/geom:declination_angle" : "-10.77363888888889",
               "geom:Object_Orientation_RA_Dec/geom:right_ascension_angle" : "205.93125",
               "geom:Reference_Frame_Identification/geom:name" : "J2005.51001",
               "img:Exposure_Parameters/img:exposure_duration" : "8",
               "img:Filter/img:filter_name" : "H",
               "lid" : "urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n6_n6137final_h_fit",
               "lidvid" : "urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n6_n6137final_h_fit::1.0",
               "ops:Data_File_Info/ops:creation_date_time" : "2007-01-19T15:31:41Z",
               "ops:Data_File_Info/ops:file_name" : "n6137final_h.fit",
               "ops:Data_File_Info/ops:file_ref" : "/bx/sbnarch04/PDS4/gbo-kpno/nirimage-9p/data/reduced/n6/n6137final_h.fit",
               "ops:Data_File_Info/ops:file_size" : "1056960",
               "ops:Data_File_Info/ops:md5_checksum" : "09e395078e01b92b6ed99af6f90a08dd",
               "ops:Data_File_Info/ops:mime_type" : "application/fits",
               "ops:Harvest_Info/ops:harvest_date_time" : "2022-02-18T19:07:58.862147Z",
               "ops:Harvest_Info/ops:node_name" : "PDS_SBN",
               "ops:Label_File_Info/ops:blob" : "eJzNWVlv4zgSft4G+j8QfpoB2pJl5zTUGqQ7l6eTTjbOw2JfCEaiHW5Looakknh//Rapi5LlI7u9gwmC2FJ9rPrqYPGI/9tbEqMXKiTj6eeB54wGiKYhj1i6/DzI1WJ4Mvgt+Pjh4wdfI4cJj2iMngVdfB48K5VNXTeLpJMSSZwlf9EPB/qP++K59+fzAwx/sPdlNHJk+Dz4+OFv8EETolYZTWWjIhexw8M4dLhYupGMYrfECZ4a+++wHjGZ1ebPZ/P7P9k+S5a1+dntFfbOvD/R+pLypDZ/dXF3a7zH3tH/TAJo3Ase5aHCd0+SiheioGZIjICbrWhDOQwQmDbQqc7Q7gRaA7RTu322BkAOdmbIgr9JVsNfX1+d14mJwng08tx/3N7MTSCGLJWKpCEtBko2LQJ0w0MTiZ0RQO+bMG8yQpuHlEHaCehOg+1ai8jskndqe7vKMjk7Ab0VC6oHUHoI+bOIpootWBFrfCYoMQIQxXwJr2PMSggVQS7SqbYyBQvT5RMf/shSPk2ZYAlZ0uFpNhUUKplGOD2CX29yvGBQyvgZL5jy3R6Npa2yU4IkgF7pu9ZziVBMxTR4KNSj75SI4SxdCAIG0UxbR3yBvvKEKnR67z7SJINJ7fluMa5UwtIFF0nhqpn2uDQEVj0wrE1vxpRKsnK2hjGRMuidu77bBpUjvzJV6J01NkoZSEmunrnAMZMq+Jay5bP6hG6dW8d3bUkNz/KnuErbCsIRjEfeKRjuvq4HRFSGgmXG6O+f0PUnRNIIfUNP+oM39GVvJB10xoCylFP4fnLgu7a60j93i4P+LSx+dZ1dgytcrBpyLek5VYTFtRDEiS2OiKLG2+HocDgGKutSa+jO0loLzi1bCqMLLQRPEEygCfrli4OumYDW9Guf69r5zS50ZLbzvrtpBvpWRbXm5SNLKP7KuYDNBPgqGwrQR4UyAcAKQBAjCNDoGH4fRwfTg5Pp5PCfvttFWcN5tsfoFqhycAMpWNqgNYgVfqAyjxWe54l+tKtYZFzSYB4y2B9RXb/FiwYheEilhI0TjukLjYN7oM9IHK/QfSGikZlvbVQ9vtSML0lIlbRz/koASNOlesaCpEsa6K6Cqq7iu2tyu16YDFkWs5TilEAcdAsC477bFVhjFpqBF1wJspLQBsHZ8k1TQ71kfXdrFP1Z+kKlYsu1UgGZ4fCdp2CsRcfX+5TgTj1T7bE1HjqmltS4Waqo0D38gS6o0ORsl2IWYVELWstDyGHkm5oyWzs8ReyFRTmJndSwamuwVNcvsSEEJUew4pi1yXZQTSQ38taiDfEqp5wuovlKKpo0+roSqPUkAwdSZXM2If7GlEL3lPxA36utXDWVYdZ38lBnwka0XNmVg/2yAJXGYqZWU94YcvTqvTkD6zlgEld6NkZ+a+y1cI9IvifaY8ebDBNYr0QKePQVVigK7Zul7leeRxQBzZiG4O6GwD/CFJchz+j/Jeyq0j7VsXbCguU48SbvjLtqaP5FAv/97uwOzVkCDYmklOcS/T0nEWptynRGzukLC7vtp4r+DPb+Ik9Af7fv7Ka7DqlXSCKWVOH2ytrpibC18U6OjtGVh36p9ji/9jZJsxX66W1RGY7wCMqd0wwSrDlg7/0dsdD037XCbZHyz5ulrL2o6DPQFKRZTFZ4TpWCFLRWVn12i/HO+cPbhwHLa3OewETAUqmPDRtxPfq6USppypKmjlhL75bptIcb7VicMwG9xi62BgM7ePZvSD4oqjiRNyaDOUkyvRfYCtpLXVRbv6ELhRRHD/oUsVl1tIUubJOVOay1eNywtKLaD9hDUWP1C1cKNthA9JFnm9T2kSyxm8PeAazVqA8n7mnZof6ihcsKdjgjAnoSEDKlW+jnT/8Ch39SBVehoLg6y/YdT2voxRvsz3MB6JpYC1TCaAWL8vIwladMfR4kchCc+G4vpG3M3ceasXXJYkVFH4uFkRQ78etCo/1q3d6aquL17vg0OLukfH0LM72iustbx55KUKitqrQHtm9J/vyi3NVPoRbNArRnKe5ZjFVgtjfUCrWrpW4F7alwY1PdPaDPwra2ugWwl6qNjXUXvJ2mPRJQELgzqcd3As6s5b3Pwxk+p2EvXaHDhokMaWruX+BEHdOyKUR0OQjGo0PndOKND0sKvQN6VUc0jPW1Q6/aoTdyjo8nR5OT8ue01L82qld3qDfe+r4Bp1yoZxxCof9YtzIqlW6B96qvJwG+1P1t0561Pcj0rt/1/Yxz6I1GXml7raXV2XyHlXLEPrktoTuaWInqvPfdvt1ltaXvXoVBVy5Q7WvWSpcWN/YWGmxiYd9AO+YGupFVNKyx/jUlkdX8fb5YwLaxTPPTSlGT5+KthSpCVdwUtcCHx0cab8utey0izHWV/udHRESkLyQvZ49zNNG3lH3SinKLpn+m2zoen2OTh0Z/t89vXwW2Ol360fGbvFEZjH3XfFpv4bTK0oi+YS40zRsiFZwK4QW6hK8wO/SQDqgefhFTfRrExqvWPtucdsyZ8eLi4vjw4Hb+Zc70vIK9Xi1riq5fkX+mLa8pN3xMWRSduHm2QLTQKINDD7yunyyEpH/kZpalefIEXsHE7L5qCPYw2cmuWth+Er/xe/mVXWGuOwJ0ubClPQFWSZ5gc7cZ4RcS5zQYH3jHx84BWOoV28NZui4fwhH9UM+HXql9WdBPDNxYnx7FpO/vJ/qutecfOsF/AK/2Qjg=",
               "ops:Label_File_Info/ops:creation_date_time" : "2019-06-04T16:14:18Z",
               "ops:Label_File_Info/ops:file_name" : "n6137final_h.xml",
               "ops:Label_File_Info/ops:file_ref" : "/bx/sbnarch04/PDS4/gbo-kpno/nirimage-9p/data/reduced/n6/n6137final_h.xml",
               "ops:Label_File_Info/ops:file_size" : "8145",
               "ops:Label_File_Info/ops:json_blob" : "eJy9V1tv2zYU/iuEnlbAli1f4kRvucdtXHtxHoYVA8FItM1VEjWScuIF+e87h5RkxUnsDsVWBGlEnvOd+4XP3kzJuIgMnT5ortbMCJmxxAufvXHMMyMWIrJn9FRxhsd5SR8lTGsv/IC/5U1kvGW+EdpItUH+V+cX3DBhpaXN45gZDtC9bnDS7g7bvQHgxVxHSuR4D1cTsVSWliyUTMnsYt4nv5z55EYoHa0+Af2aK41YIvbC4OWl5YlsIVXqBIA0ntCSBOACPwj8rt8FvkQuQY0E+Jz5XMF9obIwY5qFeazD5YNsf88zGWZCiZQtefskDxUHL/CYZkfwE/RHCwFuoCu6EGZXmZZ3LozTY7zVyfq2eEgqH2w4A8nogl3bP7fITYuwLCZfyAP+J7eu10QuyLlMuSEns849T3OekMAnpwLEaB3C38foTVaYlVQ0gbgA4pdMLFemRSb+xPfAVUaYBANw54wiX0GXNuiqGJhJxmjzu3KQ9ylNMvARlykArIzJw04HvOaj+/ylXOPHoIP3nXXglfT7SOFXSalFCMHlKbuVUemzQ2zkwH0HMmdA4RcNzrpd/0nH5GOWWOh8L2ZJ4EAvxvPZj6CKdLkX1N07zPHkmganwSHI0rsHCRzo9eV0YhWlwZGDrsISokH7nFwaXNNDiLbkj4+P/mPfl2rZ6XW7Qee3ye3chq8tMm1YFnHguxIJt73lbQO64SzG6nv25GKhucG/ikxgwj5sDDJHMjNQpV7YhcTLGdRYtqQIHTMV22Lzrsb3c9K3lS0f/uTQqBKeLc1qD9hwdNTFhnGqFNvQ3gW1CY8MiXzTGWwDoAxJUcQhRS02lN+T0FRkMX+iUlkjvVumDRnjEbmCP7nGvjF1Ks+xXWgjIm1bpchEWqRUgzLQcdYsKUC79vHRaNjyUoB+e9kbBKORP+hZ0RxQei3vMuEp6EStmYgLXZdRs8mx8seXl5ej4WAyP5uDT6EZoD9Q6ZL627On+V8FhyDSrEgf0ARobNxhgoBh0CvNzFiKiLciQ5R3+Hp7+eYszVH+Hy8uWVDTBSZNed/stj5225e6CUHl7EteV1jWy3Xm1WNunK0hBGK5e2q4QlF3fMEVWmHTQsRUbQ9eTwsb+icTiiYgfMViLeKCJX4mM0yRGqAKgQuHpK8YMQ6l4V8dX0k9NSuuyHiX9kLoSOQJ+L62AWs/vObYu91Etgc2xymQ5wkkffP61ib9B5bvFMQrN7wuDUf61spSIhhqDXZFiqpbtcoCmCoBIlws7k5hbYhqxRWOLsp0xDM7YplN120FxnzZLMBed+if9IPesJIQ8wjc46AP8LahO45G/aP+cfnvpAKpnUKvFESHvt6dal3LyH2Gdjj0hwE0xdrQCEoAIgcuyqQyK1ivZPT9kEK2TVn2KnAXQoG/mjJhzIu/gQGQK19jeW0rq0SAFcXYxaeiirdQ3pk0BtYsI8m9zCuOd5CbPLd8YZDjDgP0oZRSF9cd0By8qM2Zc2Og/+j/Lw91KRET0rHs6vTKxfbmX7nufY6mG0qaHwjcR3Qfh8EuwunSljuY+d/7VThBFKYzJD/IsJ51rNtar1TitHpO7GzGSHD5lEtdKKCpsaorXl3Fhap5yppJdbNkjisPwCwxbrvAr4X9qmbKDabiSzUYUP35BkZyitS7ZxS24BxaMYJ/+6kJsWCRSITZhNU+Dy8mHx8a700HGJAVfWMmfBHGkBln38nXcpEi0y1YY1o0DnEm/4TWBhtXJHMeoqp+JFUGzumlQf8DtWuGht49P+i3U3hQWGZyDk8VDg88kXXOZRFzAjgJpIpUWxPuGzDP9Uycnk7JHBagBJZALgtNfi0YvFiaTxcUcMHXImpMzzEspKrAFcQtGjN82KkNuEMDFJ0XKX66J5qCVLOLSSS422HhSRxxbZfPhK95gq9iprCZJxsyc5ccl+qShV6xiBtd9o9qPpcmVJXZ8hZIFcDRNVSY3efg8JGBALvBUgXTwRoNr7PaRFvh9wJm0LmE1RInG7eS4AGe24c1NcL5HIZQuzuCn/vuIBwch/3h7yAA1mdlDhOiFKaW3Lwz634mmywmfEJ9+yc5JAu+LGmwb0FyPI1sghdpANswuQ7IL9XT9NM22PbZaiv8HzlN95g=",
               "ops:Label_File_Info/ops:md5_checksum" : "e7c521a71baf0da4646987707f427f8b",
               "ops:Tracking_Meta/ops:archive_status" : "archived",
               "pds:Array_2D_Image/pds:axes" : "2",
               "pds:Array_2D_Image/pds:axis_index_order" : "Last Index Fastest",
               "pds:Array_2D_Image/pds:local_identifier" : "image_array",
               "pds:Array_2D_Image/pds:offset" : "5760",
               "pds:Axis_Array/pds:axis_name" : [
                  "Line",
                  "Sample"
               ],
               "pds:Axis_Array/pds:elements" : [
                  "512",
                  "512"
               ],
               "pds:Axis_Array/pds:sequence_number" : [
                  "1",
                  "2"
               ],
               "pds:Citation_Information/pds:author_list" : "Knight, M.M.",
               "pds:Citation_Information/pds:description" : "J, H, and K band observations of Comet 9P/Tempel 1. Airmass: 1.84",
               "pds:Citation_Information/pds:publication_year" : "2019",
               "pds:Element_Array/pds:data_type" : "IEEE754MSBSingle",
               "pds:File/pds:file_name" : "n6137final_h.fit",
               "pds:Header/pds:object_length" : "5760",
               "pds:Header/pds:offset" : "0",
               "pds:Header/pds:parsing_standard_id" : "FITS 3.0",
               "pds:Identification_Area/pds:information_model_version" : "1.11.0.0",
               "pds:Identification_Area/pds:logical_identifier" : "urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n6_n6137final_h_fit",
               "pds:Identification_Area/pds:product_class" : "Product_Observational",
               "pds:Identification_Area/pds:title" : "Reduced Near-Infrared Image of Comet 9P/Tempel 1",
               "pds:Identification_Area/pds:version_id" : "1.0",
               "pds:Internal_Reference/pds:lid_reference" : [
                  "urn:nasa:pds:context:investigation:individual.none",
                  "urn:nasa:pds:context:facility:observatory.kpno",
                  "urn:nasa:pds:context:telescope:kpno.corning2m13",
                  "urn:nasa:pds:context:target:comet.9p_tempel_1"
               ],
               "pds:Internal_Reference/pds:reference_type" : [
                  "data_to_investigation",
                  "is_facility",
                  "is_telescope",
                  "data_to_target"
               ],
               "pds:Investigation_Area/pds:name" : "None",
               "pds:Investigation_Area/pds:type" : "Other Investigation",
               "pds:Local_Internal_Reference/pds:local_identifier_reference" : [
                  "image_array",
                  "image_array",
                  "image_array"
               ],
               "pds:Local_Internal_Reference/pds:local_reference_type" : [
                  "display_settings_to_array",
                  "imaging_parameters_to_image_object",
                  "display_to_data_object"
               ],
               "pds:Modification_Detail/pds:description" : "Migration from PDS3 (B. Hirsch)",
               "pds:Modification_Detail/pds:modification_date" : "2019-05-24T00:00:00Z",
               "pds:Modification_Detail/pds:version_id" : "1.0",
               "pds:Object_Statistics/pds:maximum_scaled_value" : "24177.42",
               "pds:Object_Statistics/pds:minimum_scaled_value" : "-8675.0",
               "pds:Observing_System_Component/pds:name" : [
                  "Kitt Peak National Observatory",
                  "2.13-m Corning Cassegrain/Coude reflector",
                  "NOAO Simultaneous Quad Infrared Imaging Device"
               ],
               "pds:Observing_System_Component/pds:type" : [
                  "Observatory",
                  "Telescope",
                  "Instrument"
               ],
               "pds:Primary_Result_Summary/pds:processing_level" : "Partially Processed",
               "pds:Primary_Result_Summary/pds:purpose" : "Science",
               "pds:Science_Facets/pds:discipline_name" : "Imaging",
               "pds:Science_Facets/pds:facet1" : "Grayscale",
               "pds:Science_Facets/pds:wavelength_range" : "Near Infrared",
               "pds:Target_Identification/pds:name" : "9P/1867 G1 (Tempel 1)",
               "pds:Target_Identification/pds:type" : "Comet",
               "pds:Time_Coordinates/pds:start_date_time" : "2005-07-07T04:48:35Z",
               "pds:Time_Coordinates/pds:stop_date_time" : "2005-07-07T04:48:35Z",
               "product_class" : "Product_Observational",
               "ref_lid_facility" : "urn:nasa:pds:context:facility:observatory.kpno",
               "ref_lid_investigation" : "urn:nasa:pds:context:investigation:individual.none",
               "ref_lid_target" : "urn:nasa:pds:context:target:comet.9p_tempel_1",
               "ref_lid_telescope" : "urn:nasa:pds:context:telescope:kpno.corning2m13",
               "title" : "Reduced Near-Infrared Image of Comet 9P/Tempel 1",
               "vid" : "1.0"
            },
            "_type" : "_doc"
         }
      ],
      "max_score" : 1,
      "total" : {
         "relation" : "eq",
         "value" : 1
      }
   },
   "timed_out" : false,
   "took" : 5
}

@jordanpadams
Copy link
Member

@al-niessner did they turn off the blob store?

@al-niessner
Copy link
Contributor

@jordanpadams @tloubrieu-jpl

No, many of fields (labels) are supposed to be arrays even if just one element. For instance

               "ops:Data_File_Info/ops:creation_date_time" : "2007-01-19T15:31:41Z",
               "ops:Data_File_Info/ops:file_name" : "n6137final_h.fit",
               "ops:Data_File_Info/ops:file_ref" : "/bx/sbnarch04/PDS4/gbo-kpno/nirimage-9p/data/reduced/n6/n6137final_h.fit",
               "ops:Data_File_Info/ops:file_size" : "1056960",
               "ops:Data_File_Info/ops:md5_checksum" : "09e395078e01b92b6ed99af6f90a08dd",
               "ops:Data_File_Info/ops:mime_type" : "application/fits",

should be

               "ops:Data_File_Info/ops:creation_date_time" : ["2007-01-19T15:31:41Z"],
               "ops:Data_File_Info/ops:file_name" : ["n6137final_h.fit"],
               "ops:Data_File_Info/ops:file_ref" : ["/bx/sbnarch04/PDS4/gbo-kpno/nirimage-9p/data/reduced/n6/n6137final_h.fit"],
               "ops:Data_File_Info/ops:file_size" : ["1056960"],
               "ops:Data_File_Info/ops:md5_checksum" : ["09e395078e01b92b6ed99af6f90a08dd"],
               "ops:Data_File_Info/ops:mime_type" : ["application/fits"],
```

There are many many more.

@jordanpadams
Copy link
Member

Copy. Thanks @al-niessner . I may just request that they re-ingest this data since their file_refs are invalid anyways.

@al-niessner

This comment was marked as off-topic.

@jordanpadams
Copy link
Member

@al-niessner @tloubrieu-jpl per this ticket, SBN re-loaded their data and everything seems to be working as expected the following both work:

Query for everything:

curl -L -H 'Accept:application/vnd.nasa.pds.pds4+json' http://pds.nasa.gov/api/search/1/products

Query for one of the wonky products in question from SBN:

curl -L -H 'Accept:application/vnd.nasa.pds.pds4+json' http://pds.nasa.gov/api/search/1/products/urn:nasa:pds:gbo-kpno:nirimage-9p:reduced_n6_n6137final_h_fit::1.0 | json_pp

@tloubrieu-jpl
Copy link
Member Author

Thanks @jordanpadams , I confirm it works.

Maybe we could make the api more robust by skipping the corrupted products and generating a log error but answering the user with what is not corrupted. I will create a ticket for that.

@tloubrieu-jpl
Copy link
Member Author

tloubrieu-jpl commented Jul 21, 2023

I am reopening this ticket because I saw that happening again:

    curl --header 'Accept:application/vnd.nasa.pds.pds4+json' https://pds.nasa.gov/api/search/1/products/urn:nasa:pds:system_bundle:document_pds4_standards::1.0

I initially made a pull request so that the API supports the older version of loaded products, where properties are not arrays #204

But I am guessing I missed to test the case where the Accept type is 'application/vnd.nasa.pds.pds4+json' or the xml equivalent.

So I see 2 options:

  1. we migrate all the documents in the registry, Jimmie had that plan but we gave up on that since it sounded useless (I am not finding this ticket)
  2. we fix the registry-api to support the single string in the json in every cases.

The most difficult part in 2) is to have non regression tests on that, since that means we would need to load in the reference registry products using different versions of harvest.

@tloubrieu-jpl
Copy link
Member Author

For later work, a note, the error in logs is:
image

@al-niessner
Copy link
Contributor

@tloubrieu-jpl

The problem is not the json/xml it is the PDS4 reading fields normally skipped that want arrays even when singled valued. It needs tested, but it probably fails for all accept types if you request the offending field. It is when the registry-api tries to read/handle the field from the database not when it writes it out. It may just be the PDS4 handler in the registry-api however and other types may handle it generically (not look at it but blindly pass it along as a value).

@jordanpadams
Copy link
Member

@al-niessner have you been able to take a look at this?

@al-niessner
Copy link
Contributor

@jordanpadams Yes, strings in DB instead of array of strings.

@jordanpadams
Copy link
Member

@al-niessner is there anything we can do for a workaround here to fix this? is there a script we can do to find / update all those fields? or update the way we are querying that metadata from the DB?

@alexdunnjpl
Copy link
Contributor

@jordanpadams ad-hoc python script should be plenty-good for this. Could save the wrapper in registry-sweepers as a scratch/utility script for future use.

@al-niessner
Copy link
Contributor

@jordanpadams

Three items:

  1. declare what it means to be a valid document in opensearch and remove/fix all that is not
  2. continuously write more corner case code that when you get an int instead of a string instead of an array do something
  3. As a user, I want my API request to execute successfully even when the registry contains corrupted documents #361 which implies all chicken scratch is valid and if error parsing just kind of ignore it until the pile is so big that more fails than succeeds then pick 1 or 2

I do not get why 1 is not already happening. I would think that PDS4 would make the documents uniform at least. Then you could do 2 where PDS4 is lacking, hopefully less rather than more lacking.

@al-niessner
Copy link
Contributor

@alexdunnjpl @jordanpadams

Yeah, my 1 can be accomplished with @alexdunnjpl suggestion

@jordanpadams
Copy link
Member

jordanpadams commented Jul 31, 2023

@al-niessner per 1. above, PDS4 does that, but earlier versions of Harvest didn't take into account some documents may have only 1 value, while others may have multiple, and after that discovery, we decided to treat everything as a list. So we basically need to go back and find all those loaded incorrectly, and update those values to be lists.

@al-niessner
Copy link
Contributor

@alexdunnjpl @jordanpadams

Then we doing an improving 1 today but need @alexdunnjpl script suggestion to go help with yesterday. Seems like a good fix. It would mean identifying common problems in the opensearch document and repairing them. As we improve harvest, the script would have to be updated to account for changes. Knowing the harvest version is far less important than simply deciding what makes a valid document.

@jordanpadams
Copy link
Member

@al-niessner I do believe knowing that version helps a bit though in debugging. But agreed the script is more important. We have not been good about updating harvest and then updating this new script X to make sure the existing metadata is updated. can you take this on or is a discussion with @alexdunnjpl warranted at the breakout tomorrow?

@al-niessner
Copy link
Contributor

@alexdunnjpl @jordanpadams

I can take it on. It is just another provenance like script that fixes a bunch of fields.

harvest has not written out its version yet so it would only be half way informative since ingesting does not require a version they can be using an old version that does not insert version. Anyway, doing so would be easy enough since it already inserts some harvest info (datetime and node name).

@jordanpadams
Copy link
Member

@al-niessner thanks!

Per this:

harvest has not written out its version yet so it would only be half way informative since ingesting does not require a version they can be using an old version that does not insert version.

I would actually like to eventually prevent writing with old versions of the software.

@al-niessner
Copy link
Contributor

@al-niessner thanks!

Per this:

harvest has not written out its version yet so it would only be half way informative since ingesting does not require a version they can be using an old version that does not insert version.

I would actually like to eventually prevent writing with old versions of the software.

@jordanpadams

You could theoretically do this if harvest is only used in our containers with client SSL certs. Changes the certs at the server so that only certain copies of harvest would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B14.0 bug Something isn't working s.high High severity
Projects
None yet
4 participants